This document will show how to calculate correlations across multiply imputed data sets.

There is a built-in function from the ‘miceadds’ package that can perform multiple imputation for correlations (called ‘micombine.cor()’).

Let’s begin by reading in the data set.

data_AcadAchiev = read.csv('/Users/jhelm/Desktop/data_AcadAchiev.csv')

Calculate a Correlation

We can start by calculating a correlation across two variables

cor.test(data_AcadAchiev$Math01, data_AcadAchiev$Math02, use = 'pairwise.complete.obs')
## 
##  Pearson's product-moment correlation
## 
## data:  data_AcadAchiev$Math01 and data_AcadAchiev$Math02
## t = 24.486, df = 177, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8403011 0.9082913
## sample estimates:
##       cor 
## 0.8786774

Calculating Correlations across Multiply Imputed Data Sets

First, we need to create the multiply imputed data sets.

library(mice)
library(miceadds)

Create the imputed data sets

imp_data = mice(data_AcadAchiev, m = 40, seed = 142)

    # This will create 40 imputed data sets to fill in the missing
    # values from the data set 'data_AcadAchiev'

    # If we set the seed value (Jon recommends this), then we will
    # reproduce the results if we rerun the imputation 

Perform the analysis on each of the imputed data sets with the ‘mi.anova()’ function from the ‘miceadds’ library.

micombine.cor(mi.res = imp_data, 
                  variables = c('Math01', 'Math02'))
##   variable1 variable2         r        rse fisher_r fisher_rse       fmi
## 1    Math01    Math02 0.8843767 0.01629752 1.395508 0.07486735 0.5358853
## 2    Math02    Math01 0.8843767 0.01629752 1.395508 0.07486735 0.5358853
##          t            p   lower95   upper95
## 1 18.63973 1.529991e-77 0.8479384 0.9124968
## 2 18.63973 1.529991e-77 0.8479384 0.9124968

Practice Problem

Using the salary data set:

  1. Calculate the correlation between salary and number of publications
  2. Create multiply imputed data sets (use 40 imputations, set the seed equal to 806)
  3. Calculate the correlation between salary and number of publications across all data sets
  4. Combine the results across data sets