– use a multiple linear regression. You can do this stepwise if you’d like, in order to describe how each predictor makes the model fit better.
Model Fitting using the cherry trees example from Lab 7
How does each predictor variable influence cherry tree volume?
Calculate a series of linear regressions to explain it.
library(boot) #this package contains the trees data set
library(ggplot2)
library(dplyr)
data(trees)
#compute a girth:height ratio
trees <- trees %>% mutate(GH_ratio = Girth/Height )
head(trees)
## Girth Height Volume GH_ratio
## 1 8.3 70 10.3 0.1185714
## 2 8.6 65 10.3 0.1323077
## 3 8.8 63 10.2 0.1396825
## 4 10.5 72 16.4 0.1458333
## 5 10.7 81 18.8 0.1320988
## 6 10.8 83 19.7 0.1301205
#before running the model, test the collinearity of the predictors, to see which model you should start with.
#
#model1 <- lm(Volume~height +GH_ratio + lc, data=trees)
#AIC(model1)
#model2 <- lm(Volume~height +GH_ratio, data=trees)
#AIC(model2)
This question includes negative log likelihood
Check stats with Josh Stamer on Youtube (StatQuest) for an explanation of maximum likelihood. Max Likelihood begins to take us into Bayesian territory, where we are calculating the probability of a model given the data
::rather than the frequentist approach of finding the probability of the data given a model [typically the null model])
Maxmium likelihood is similar to bootstrapping, where an analysis involves fitting multiple models (that often contain multiple parameters) and seeing which one best fits. -> uses up a good deal of computing power.
What is negative log likelihood (-LLH)?
Most optimization methods we use in computers are set up to search for minimum values rather than maximum values (so we are effectively just inverting the LH values we get from a MaxLH fitting procedure).
When -LH values become very close to zero, they become too small for computers to handle reliably, SO we take the log of the negative likelihood that will increase it to a size that is more manageable for computers. -> Note that very small -LLH values are increasingly common and problematic as your sample size increases.
library(stats4) #package for running MLH
#dummy data
set.seed(1001)
N<- 100
x<-runif(N)
y<- 5*x+3+rnorm(N)
#fit our data using OLS first for comparison
fit<- lm(y~x)
summary(fit)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.91541 -0.75171 0.02844 0.75250 2.17175
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0332 0.1808 16.78 <2e-16 ***
## x 4.8772 0.3117 15.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9501 on 98 degrees of freedom
## Multiple R-squared: 0.7142, Adjusted R-squared: 0.7112
## F-statistic: 244.9 on 1 and 98 DF, p-value: < 2.2e-16
#plot it.
plot(x,y)
abline(fit,col="red")
#Now do it using -LL
#first set the negative log likelihood to a funciton
NLL <-function(a,b,sigma) {
y.pred=a+(b*x)
LL=(dnorm(y,mean=y.pred,sd=sigma, log=T))
-sum(LL)
}
#now we run it using mle
?mle
#run the MLE but note that this can take a lot of computing power in some cases.
#the nelder-mead method is one of several that you can use to converge on a solution.
fit2 <- mle(NLL, start=list(a=mean(y), b=0, sigma=sqrt(var(y))), method="Nelder-Mead")
summary(fit2)
## Maximum likelihood estimation
##
## Call:
## mle(minuslogl = NLL, start = list(a = mean(y), b = 0, sigma = sqrt(var(y))),
## method = "Nelder-Mead")
##
## Coefficients:
## Estimate Std. Error
## a 3.0332596 0.17897472
## b 4.8768778 0.30854689
## sigma 0.9405478 0.06650392
##
## -2 log L: 271.5342
#we can look at the NLL estimate
logLik(fit2)
## 'log Lik.' -135.7671 (df=3)
#Now we can compare the two models.
fit$coefficients
## (Intercept) x
## 3.033213 4.877196
fit2@coef
## a b sigma
## 3.0332596 4.8768778 0.9405478
check out datacamp.com/community/tutorials/pca-analysis-r
For Q4 - Logistic Regresion is needed.
Bayesian Priors example
Nothing to do with the HW, but if you want to learn more about this topic, check out: