Steps for conducting statistical tests: 1. Test assumptions 2. Run your tests 3. Interpret the results 4. Visualize (optional)
Boxplots are useful for getting an idea of the variability of your data by group. They DO NOT show you group means - the measures of central tendency captured are medians.
The outer edges of the boxes are the Q1 upper and Q3 lower boundaries (in other words the 25th and 75th percentiles, or more simply the middle 50%), while the tails of the boxes show the outer ends of the normal range (the 1st and 99th percentiles). Any extreme values are represented as dots beyond the normal range.
Mean plots are analogous to bar charts of means, except they show the raw data around each mean. The error bars can represent the SE of the means or CIs around the estimates of each mean. It is important to know which statistic is being represented
Use the mtcars data set as an example. It is embedded in base R and in ggplot
Call the data:
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
data("mtcars")
We can use the hist() function to do a quick and dirty visualization of mpg.
hist(mtcars$mpg)
We can also generate box plots using base R or ggplot:
boxplot(mtcars$mpg~mtcars$cyl, ylab="Miles per Gallon", xlab="Cylinder Type")
#using ggplot
#NOTE --- you need to add "group=" if your groups are numerically coded****
ggplot(mtcars, aes(group=cyl, x=cyl, y=mpg))+
geom_boxplot(fill="gray") +
labs(title ="MPG by Cylinder Number", x= "Miles per Gallon", y = "Cylinder Type")+
theme_classic()
We can also create a Means plot:
#using base R
#pch ="" <- defines the shape of the data points in the plot
plot(x=mtcars$cyl, y=mtcars$mpg, ylim=c(0,35), xlim=c(4,8),
pch=19)
#Getting the CIs for the means in base R is pretty brutal. don't bother. use ggplot.
#We will need the standard error of the miles per gallon variable in order to compute CIs.
stddev<-sd(mtcars$mpg)
se <- sd(mtcars$mpg) / sqrt(length(mtcars$mpg))
ggplot(mtcars, aes(x=cyl, y=mpg)) +
geom_point(shape=19, size=3)+
geom_errorbar(aes(ymin=(mean(mtcars$mpg))-1.96*se, ymax=(mean(mtcars$mpg))+1.96*se ))