Regression with MICE

This document will show how to perform regression across multiply imputed data sets.

Let’s begin by reading in the data set.

data_AcadAchiev = read.csv('/Users/jhelm/Desktop/data_AcadAchiev.csv')

Perform a Regression

We can start by performing regression with the observed data

model.01 = lm(Math02 ~ 1 + Math01 + Portu01, data = data_AcadAchiev)
summary(model.01)

## 
## Call:
## lm(formula = Math02 ~ 1 + Math01 + Portu01, data = data_AcadAchiev)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.4305  -0.8456  -0.0402   0.9966   4.5778 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.38681    0.77343  -0.500   0.6179    
## Math01       0.86714    0.05984  14.490   <2e-16 ***
## Portu01      0.14116    0.07807   1.808   0.0731 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.672 on 122 degrees of freedom
##   (257 observations deleted due to missingness)
## Multiple R-squared:  0.7629, Adjusted R-squared:  0.759 
## F-statistic: 196.3 on 2 and 122 DF,  p-value: < 2.2e-16

Performing Regression across Multiply Imputed Data Sets

First, we need to create the multiply imputed data sets.

library(mice)
library(miceadds)

Create the imputed data sets

imp_data = mice(data_AcadAchiev, m = 40, seed = 142)

    # This will create 40 imputed data sets to fill in the missing
    # values from the data set 'data_AcadAchiev'

    # If we set the seed value (Jon recommends this), then we will
    # reproduce the results if we rerun the imputation

Perform the analysis on each of the imputed data sets. We can use the ‘with()’ function.

results = with(imp_data, lm(Math02 ~ Math01 + Portu01))

Now we can pool the estimates across these analyses.

summary(pool(results))

##               estimate  std.error statistic        df    p.value
## (Intercept) -0.8441735 0.53849558 -1.567652 120.98786 0.11957422
## Math01       0.9246537 0.04768383 19.391349  75.45568 0.00000000
## Portu01      0.1228634 0.05989967  2.051154  79.47602 0.04354663

Practice Problem

Using the salary data set:

Perform a regression analysis that predicts salary from number of publications and years on the job
Create multiply imputed data sets (use 40 imputations, set the seed equal to 806)
Perform a regression analysis that predicts salary from number of publications and years on the job across all data sets
Combine the results across data sets

Regression with MICE

Jonathan Helm

5/17/2019

Perform a Regression

Performing Regression across Multiply Imputed Data Sets

Practice Problem