This document will show how to perform an ANOVA across multiply imputed data sets.
There is a built-in function from the ‘miceadds’ package that can perform multiple imputation for analysis of variance (called ‘mi.anova()’).
Let’s begin by reading in the data set.
data_AcadAchiev = read.csv('/Users/jhelm/Desktop/data_AcadAchiev.csv')
We will need the car package to calculate type 3 sums of squares.
library(car)
As an initial set, we need to make sure we are using an effect coding strategy. This detail is specific to ANOVA, not multiple imputation.
options(contrasts = c('contr.sum', 'contr.poly'))
Now lets fit the model that tests for guardian differences across Math scores that were collected from the first semester.
model.01 = lm(Math01 ~ Guardian, data = data_AcadAchiev)
Anova(model.01, type = 3)
## Anova Table (Type III tests)
##
## Response: Math01
## Sum Sq Df F value Pr(>F)
## (Intercept) 10295.6 1 948.5357 < 2e-16 ***
## Guardian 69.5 2 3.2033 0.04221 *
## Residuals 2865.5 264
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
First, we need to create the multiply imputed data sets.
library(mice)
library(miceadds)
Create the imputed data sets
imp_data = mice(data_AcadAchiev, m = 40, seed = 142)
# This will create 40 imputed data sets to fill in the missing
# values from the data set 'data_AcadAchiev'
# If we set the seed value (Jon recommends this), then we will
# reproduce the results if we rerun the imputation
Perform the analysis on each of the imputed data sets with the ‘mi.anova()’ function from the ‘miceadds’ library.
mi.anova(mi.res = imp_data,
formula = "Math01 ~ 1 + Guardian",
type = 3)
## Univariate ANOVA for Multiply Imputed Data (Type 3)
##
## lm Formula: Math01 ~ 1 + Guardian
## R^2=0.0147
## ..........................................................................
## ANOVA Table
## SSQ df1 df2 F value Pr(>F) eta2 partial.eta2
## Guardian 62.95282 2 58409.1 2.7323 0.06508 0.01472 0.01472
## Residual 4214.45732 NA NA NA NA NA NA
We will need the car package to calculate type 3 sums of squares.
library(car)
As an initial set, we need to make sure we are using an effect coding strategy. This detail is specific to ANOVA, not multiple imputation.
options(contrasts = c('contr.sum', 'contr.poly'))
Now lets fit the model that tests for relationship status by biological sex differences across Math scores that were collected from the first semester.
model.01 = lm(Math01 ~ Rel_status + Sex + Rel_status * Sex, data = data_AcadAchiev)
Anova(model.01, type = 3)
## Anova Table (Type III tests)
##
## Response: Math01
## Sum Sq Df F value Pr(>F)
## (Intercept) 28064.8 1 2562.9967 < 2e-16 ***
## Rel_status 13.2 1 1.2056 0.27320
## Sex 35.3 1 3.2277 0.07355 .
## Rel_status:Sex 8.6 1 0.7840 0.37673
## Residuals 2879.9 263
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
First, we need to create the multiply imputed data sets.
library(mice)
library(miceadds)
Create the imputed data sets
imp_data = mice(data_AcadAchiev, m = 40, seed = 142)
# This will create 40 imputed data sets to fill in the missing
# values from the data set 'data_AcadAchiev'
# If we set the seed value (Jon recommends this), then we will
# reproduce the results if we rerun the imputation
Perform the analysis on each of the imputed data sets with the ‘mi.anova()’ function from the ‘miceadds’ library.
mi.anova(mi.res = imp_data,
formula = "Math01 ~ 1 + Rel_status + Sex + Rel_status * Sex",
type = 3)
## Univariate ANOVA for Multiply Imputed Data (Type 3)
##
## lm Formula: Math01 ~ 1 + Rel_status + Sex + Rel_status * Sex
## R^2=0.0179
## ..........................................................................
## ANOVA Table
## SSQ df1 df2 F value Pr(>F) eta2
## Rel_status 5.84013 1 19297.050 0.4583 0.49842 0.00137
## Sex 69.98155 1 9003.745 5.8390 0.01569 0.01642
## Rel_status:Sex 0.60642 1 105463.481 0.0333 0.85525 0.00014
## Residual 4186.17380 NA NA NA NA NA
## partial.eta2
## Rel_status 0.00139
## Sex 0.01644
## Rel_status:Sex 0.00014
## Residual NA