In this week’s lab we
will learn about analysing longitudinal data, that is, data collected over time
for each study unit (e.g. subject or site).
Standard (generalised)
linear models assume all observations are independent, which is typically not
reasonable if repeated measures have been taken over time. Data collected for one site at times close
together will be more similar than data for different sites, or for times
further apart. This induces correlation in the data, which we must account for
to obtain valid inference.
Normal data
#Load data
library(lme4)
library(nlme)
library(MASS)
data(BodyWeight)
#scatterplot by Diet
plot(weight~Time,col=Diet,data=BodyWeight)
#ignore all dependence,
bad analysis
Rat.lm0=lm(weight~Time,data=BodyWeight)
Rat.lm=lm(weight~Time+Diet,data=BodyWeight)
summary(Rat.lm)
anova(Rat.lm0,Rat.lm)
plot(Rat.lm,which=1)
plot(Rat.lm$fitted,Rat.lm$residuals,col=BodyWeight$Rat)
Problem:
The effects of ignoring
dependence vary with the type of dependence but in many cases you can expect
- Standard errors are underestimated, giving “false confidence”
- T-statistics will be overestimated, and regression coefficients that appear significant may not be
Solution 1 – mixed effects model
with random intercept and slope
#plot data by Rat
interaction.plot(BodyWeight$Time,
BodyWeight$Rat, BodyWeight$weight)
#model with random
effects for intercept and slope
Rat.mixed0<-
lmer(weight ~ Time + (1 + Time | Rat) , data = BodyWeight)
Rat.mixed<-
lmer(weight ~Time +Diet+ (1 + Time | Rat) , data = BodyWeight)
anova(Rat.mixed0,Rat.mixed)
#There seems to be an
effect of diet.
summary(Rat.mixed)
plot(Rat.mixed)
plot(fitted(Rat.mixed),residuals(Rat.mixed),col=BodyWeight$Rat)
This analysis assumes
that after fitting a line for each subject, the observations for each subject
have the same correlation, regardless of how far away they are in time. This is not generally realistic; observations
closer to each other in time might be more correlated than those further apart.
We can’t use lme4 for
analysis which incorporates more flexible correlation structure, we will need
to use library(nlme). The autocorrelation structure most commonly used for data
correlated in time is autoregressive AR(1). It assumes data points close in
time are more strongly correlated than those further apart in time.
Solution 2 – mixed effects model
with flexible correlation structures
Rat.lme0 <-lme(weight ~ Time + Diet, random = ~ Time | Rat, data = BodyWeight)
Rat.lme1 <-lme(weight ~ Time + Diet, random = ~ Time | Rat, corr=corAR1(, form= ~ Time| Rat), data = BodyWeight)
AIC(Rat.lme0,Rat.lme1 )
#We can see that the AR1 correlation structure seems to work better as the AIC is smaller for this model.
Non Normal Data
What about non normal
data? Well we can use the same set up as before in lme4, but now using the
glmer function.
data(epil)
interaction.plot(epil$period,
epil$subject, epil$y)
interaction.plot(epil$period, epil$subject, log(epil$y+1),col=epil$subject)
#model with random effects
for intercept and slope
Epil.mixed0<-
glmer(y ~1 + period + (1 + period | subject) , data = epil,family=poisson)
Epil.mixed<- glmer(y
~1 + period +age+ (1 + period | subject) , data = epil,family=poisson)
# whoops, warning about convergence, let's not worry about it, it's not a problem in this case
# read http://stackoverflow.com/a/21370041 and http://stats.stackexchange.com/a/99719 if you have a similar problem
# whoops, warning about convergence, let's not worry about it, it's not a problem in this case
# read http://stackoverflow.com/a/21370041 and http://stats.stackexchange.com/a/99719 if you have a similar problem
anova(Epil.mixed0,Epil.mixed)
#There
seems to be no effect of age.
What if we want an AR(1)
correlation structure instead? Well, it is more complicated, some options are
glmmPQL in the MASS package, and geeglm in the geepack package. glmmPQL uses the lme function above, and has very similar syntax. geeglm does not
fit quite the same model as lme4, but can also be used if you would like to fit
an AR(1) structure, or other more flexible correlation structures.