This document will show how to perform a t-test with across multiply imputed data sets.

t-tests are a bit tricky to analyze with MICE in R. There is a built-in function from the ‘miceadds’ package that can perform multiple imputation for analysis of variance (called ‘mi.anova()’). We can capitalize on that approach because a t-test is a special case of an ANOVA.

Let’s begin by reading in the data set.

data_AcadAchiev = read.csv('/Users/jhelm/Desktop/data_AcadAchiev.csv')

Performing a t-test (via ANOVA)

Now we can perform a t-test (via ANOVA) using some of the variables from the Academic Achievement data set. We will need the car package to calculate type 3 sums of squares.

library(car)

As an initial set, we need to make sure we are using an effect coding strategy. This detail is specific to ANOVA, not multiple imputation.

options(contrasts = c('contr.sum', 'contr.poly'))

Now lets fit the model that tests for biological sex differences across Math scores that were collected from the first semester.

model.01 = lm(Math01 ~ Sex, data = data_AcadAchiev)

Anova(model.01, type = 3)
## Anova Table (Type III tests)
## 
## Response: Math01
##             Sum Sq  Df   F value  Pr(>F)    
## (Intercept)  31987   1 2919.2627 < 2e-16 ***
## Sex             31   1    2.8625 0.09184 .  
## Residuals     2904 265                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Just to follow up, we can also perform a t-test using the sample data.

t.test(Math01 ~ Sex, data = data_AcadAchiev)
## 
##  Welch Two Sample t-test
## 
## data:  Math01 by Sex
## t = -1.6896, df = 261.96, p-value = 0.09229
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.4851991  0.1134431
## sample estimates:
## mean in group F mean in group M 
##        10.60870        11.29457

The results are almost identical. Any idea why the slight difference? :)

Performing a t-test (via ANOVA) with Multiply Imputed Data Sets

First, we need to create the multiply imputed data sets.

library(mice)
library(miceadds)

Create the imputed data sets

imp_data = mice(data_AcadAchiev, m = 40, seed = 142)

    # This will create 40 imputed data sets to fill in the missing
    # values from the data set 'data_AcadAchiev'

    # If we set the seed value (Jon recommends this), then we will
    # reproduce the results if we rerun the imputation 

Perform the analysis on each of the imputed data sets with the ‘mi.anova()’ function from the ‘miceadds’ library.

mi.anova(mi.res = imp_data, 
        formula = "Math01 ~ 1 + Sex", 
        type = 3)
## Univariate ANOVA for Multiply Imputed Data (Type 3)  
## 
## lm Formula:  Math01 ~ 1 + Sex
## R^2=0.0198 
## ..........................................................................
## ANOVA Table 
##                 SSQ df1      df2 F value  Pr(>F)   eta2 partial.eta2
## Sex        84.69521   1 6651.509  7.0105 0.00812 0.0198       0.0198
## Residual 4192.71493  NA       NA      NA      NA     NA           NA

Practice Problem

Using the salary data set:

  1. Perform a t-test on salary using biological sex as a predictor
  2. Create multiply imputed data sets (use 40 imputations, set the seed equal to 806)
  3. Perform the t-test across all data sets
  4. Combine the results across data sets