This document will show how to perform a t-test with across multiply imputed data sets.
t-tests are a bit tricky to analyze with MICE in R. There is a built-in function from the ‘miceadds’ package that can perform multiple imputation for analysis of variance (called ‘mi.anova()’). We can capitalize on that approach because a t-test is a special case of an ANOVA.
Let’s begin by reading in the data set.
data_AcadAchiev = read.csv('/Users/jhelm/Desktop/data_AcadAchiev.csv')
Now we can perform a t-test (via ANOVA) using some of the variables from the Academic Achievement data set. We will need the car package to calculate type 3 sums of squares.
library(car)
As an initial set, we need to make sure we are using an effect coding strategy. This detail is specific to ANOVA, not multiple imputation.
options(contrasts = c('contr.sum', 'contr.poly'))
Now lets fit the model that tests for biological sex differences across Math scores that were collected from the first semester.
model.01 = lm(Math01 ~ Sex, data = data_AcadAchiev)
Anova(model.01, type = 3)
## Anova Table (Type III tests)
##
## Response: Math01
## Sum Sq Df F value Pr(>F)
## (Intercept) 31987 1 2919.2627 < 2e-16 ***
## Sex 31 1 2.8625 0.09184 .
## Residuals 2904 265
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Just to follow up, we can also perform a t-test using the sample data.
t.test(Math01 ~ Sex, data = data_AcadAchiev)
##
## Welch Two Sample t-test
##
## data: Math01 by Sex
## t = -1.6896, df = 261.96, p-value = 0.09229
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.4851991 0.1134431
## sample estimates:
## mean in group F mean in group M
## 10.60870 11.29457
The results are almost identical. Any idea why the slight difference? :)
First, we need to create the multiply imputed data sets.
library(mice)
library(miceadds)
Create the imputed data sets
imp_data = mice(data_AcadAchiev, m = 40, seed = 142)
# This will create 40 imputed data sets to fill in the missing
# values from the data set 'data_AcadAchiev'
# If we set the seed value (Jon recommends this), then we will
# reproduce the results if we rerun the imputation
Perform the analysis on each of the imputed data sets with the ‘mi.anova()’ function from the ‘miceadds’ library.
mi.anova(mi.res = imp_data,
formula = "Math01 ~ 1 + Sex",
type = 3)
## Univariate ANOVA for Multiply Imputed Data (Type 3)
##
## lm Formula: Math01 ~ 1 + Sex
## R^2=0.0198
## ..........................................................................
## ANOVA Table
## SSQ df1 df2 F value Pr(>F) eta2 partial.eta2
## Sex 84.69521 1 6651.509 7.0105 0.00812 0.0198 0.0198
## Residual 4192.71493 NA NA NA NA NA NA
Using the salary data set: