Back to Repository Home



HW2 - General Notes and Questions:


Steps for conducting statistical tests: 1. Test assumptions 2. Run your tests 3. Interpret the results 4. Visualize (optional)

Boxplots

Boxplots are useful for getting an idea of the variability of your data by group. They DO NOT show you group means - the measures of central tendency captured are medians.

The outer edges of the boxes are the Q1 upper and Q3 lower boundaries (in other words the 25th and 75th percentiles, or more simply the middle 50%), while the tails of the boxes show the outer ends of the normal range (the 1st and 99th percentiles). Any extreme values are represented as dots beyond the normal range.

Mean plots

Mean plots are analogous to bar charts of means, except they show the raw data around each mean. The error bars can represent the SE of the means or CIs around the estimates of each mean. It is important to know which statistic is being represented


Examples

Use the mtcars data set as an example. It is embedded in base R and in ggplot

Call the data:

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
data("mtcars")

Histogram

We can use the hist() function to do a quick and dirty visualization of mpg.

hist(mtcars$mpg)

Boxplot

We can also generate box plots using base R or ggplot:

boxplot(mtcars$mpg~mtcars$cyl, ylab="Miles per Gallon", xlab="Cylinder Type")

#using ggplot
#NOTE --- you need to add "group=" if your groups are numerically coded****

ggplot(mtcars, aes(group=cyl, x=cyl, y=mpg))+ 
  geom_boxplot(fill="gray") + 
  labs(title ="MPG by Cylinder Number", x= "Miles per Gallon", y = "Cylinder Type")+
  theme_classic()

Means plot

We can also create a Means plot:

#using base R
#pch =""  <- defines the shape of the data points in the plot 
plot(x=mtcars$cyl, y=mtcars$mpg, ylim=c(0,35), xlim=c(4,8),
     pch=19)

#Getting the CIs for the means in base R is pretty brutal. don't bother. use ggplot.

#We will need the standard error of the miles per gallon variable in order to compute CIs.

stddev<-sd(mtcars$mpg)
se <- sd(mtcars$mpg) / sqrt(length(mtcars$mpg))
ggplot(mtcars, aes(x=cyl, y=mpg)) +
  geom_point(shape=19, size=3)+
  geom_errorbar(aes(ymin=(mean(mtcars$mpg))-1.96*se, ymax=(mean(mtcars$mpg))+1.96*se ))