Multiple imputation can be performed through a range of different packages in R. In this workshop, we will focus on the use of the ‘MICE’ package. MICE stands for ‘Multiple Imputation by Chained Equations’, and uses a very general approach for creating imputations.
install.packages('mice')
install.packages('miceadds')
install.packages('car')
We can open these packages, although we may not use them in this document.
library(mice)
library(miceadds)
library(car)
Let’s begin by reading in one of the example data sets. We can read in the academic achievement data.
data_AcadAchiev = read.csv('/Users/jhelm/Desktop/data_AcadAchiev.csv')
Let’s take a look at the first six rows of the data set
head(data_AcadAchiev)
## ID Math01 Math02 Math03 Portu01 Portu02 Portu03 Sex Rel_status Guardian
## 1 1 7 NA NA 13 13 13 F no mother
## 2 2 8 NA 5 13 NA 11 F yes mother
## 3 3 14 13 13 NA 13 12 F no mother
## 4 4 10 9 NA 10 11 NA F no mother
## 5 5 10 10 10 13 13 13 F yes other
## 6 6 NA NA 11 11 NA NA F no mother
## Reason
## 1 home
## 2 reputation
## 3 reputation
## 4 course
## 5 reputation
## 6 course
We can see that many of the variables have missing data.
We can use the ‘mice()’ function to created imputed data sets.
imp_data = mice(data_AcadAchiev, m = 40, seed = 142)
# This will create 40 imputed data sets to fill in the missing
# values from the data set 'data_AcadAchiev'
# If we set the seed value (Jon recommends this), then we will
# reproduce the results if we rerun the imputation
We can view one of the imputations using the ‘complete()’ function. I’ll nest this inside the ‘head()’ function so we only see the first six rows of the output.
head( complete(imp_data, 1))
## ID Math01 Math02 Math03 Portu01 Portu02 Portu03 Sex Rel_status Guardian
## 1 1 7 7 8 13 13 13 F no mother
## 2 2 8 7 5 13 11 11 F yes mother
## 3 3 14 13 13 14 13 12 F no mother
## 4 4 10 9 9 10 11 12 F no mother
## 5 5 10 10 10 13 13 13 F yes other
## 6 6 12 10 11 11 10 11 F no mother
## Reason
## 1 home
## 2 reputation
## 3 reputation
## 4 course
## 5 reputation
## 6 course
# The complete function prints back the complete data set
# The second argument (the '1' in this case) indicates which imputed data
# we would like to view. Since we imputed 40, we can choose values 1-40.
Now that we have the imputed data sets, we can go ahead and perform various analyses. These will be completed in different R Markdown files. ```