This document will show how to calculate correlations across multiply imputed data sets.
There is a built-in function from the ‘miceadds’ package that can perform multiple imputation for correlations (called ‘micombine.cor()’).
Let’s begin by reading in the data set.
data_AcadAchiev = read.csv('/Users/jhelm/Desktop/data_AcadAchiev.csv')
We can start by calculating a correlation across two variables
cor.test(data_AcadAchiev$Math01, data_AcadAchiev$Math02, use = 'pairwise.complete.obs')
##
## Pearson's product-moment correlation
##
## data: data_AcadAchiev$Math01 and data_AcadAchiev$Math02
## t = 24.486, df = 177, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8403011 0.9082913
## sample estimates:
## cor
## 0.8786774
First, we need to create the multiply imputed data sets.
library(mice)
library(miceadds)
Create the imputed data sets
imp_data = mice(data_AcadAchiev, m = 40, seed = 142)
# This will create 40 imputed data sets to fill in the missing
# values from the data set 'data_AcadAchiev'
# If we set the seed value (Jon recommends this), then we will
# reproduce the results if we rerun the imputation
Perform the analysis on each of the imputed data sets with the ‘mi.anova()’ function from the ‘miceadds’ library.
micombine.cor(mi.res = imp_data,
variables = c('Math01', 'Math02'))
## variable1 variable2 r rse fisher_r fisher_rse fmi
## 1 Math01 Math02 0.8843767 0.01629752 1.395508 0.07486735 0.5358853
## 2 Math02 Math01 0.8843767 0.01629752 1.395508 0.07486735 0.5358853
## t p lower95 upper95
## 1 18.63973 1.529991e-77 0.8479384 0.9124968
## 2 18.63973 1.529991e-77 0.8479384 0.9124968
Using the salary data set: