Hi guys,

Download R code and data here.

See you there,

Gordana

##

##

##

We want the variogram to be flat when we include a spatial random effect with the correct structure.

###

group.null <- lme(thick ~ 1, data = spdata, random = ~ 1 | species,

##

plot(Variogram(gaus.alt,resType="normalized"))

spher.alt=glmmPQL(red_maple~elev, data = trees,

random = ~ 1 | dummy,

correlation = corSpher(form = ~ x+y| dummy),

family=poisson)

plot(Variogram(spher.alt,resType="normalized"))

Both seem to make the variogram very flat! Yay! but we can't test with anova(), so visual check is the best bet :(, We should try other correlation functions too, see what's best.

What about when we have factors, well we can do a joint wald test

summary(spher.alt.soil)

#we want to test terms 2 and 3 together, so

##

__Problem__: You have data that may be spatially correlated, and you worry this might affect your inference (p-values and confidence intervals for fixed effects).

__Solution__: This Friday, 31st October at 2pm, R-lab on inference with Spatially correlated data with the nlme package.Download R code and data here.

See you there,

Gordana

Material adapted from

What we can and can’t do with
nlme:

- Continuous data only, not counts or presence-absence (for discrete data you could have a look at glmmPQL in MASS package, quick example later, but limited)

- Cannot have more than one measurement at any location ( unless you have a grouping variable, then you can’t have more than one measurement at any location within any group)

- Not too much data, or it will give an error

- No spatial prediction (so you can't draw a map which includes the spatial random effect, which is unfortunate)

## Spatial Data- its everywhere

##
__Continuous data__

We start with continuous data. We will use a dataset of tree bark thickness.

rm(list=ls())

#
install.packages("nlme")

library(nlme)

load(file="spdata.Rdata")

We will need a dummy
variable because lme requires a grouping variable, but it will be the same
for all observations, hence dummy

spdata <-
cbind(spdata, dummy=1)

First we fit a model
with no spatial correlation

nonsp.null = lme(thick
~ 1, data = spdata, random = ~ 1 | dummy, method='ML')

nonsp.alt=lme(thick ~
soil, data = spdata, random = ~ 1 | dummy, method='ML')

anova(nonsp.null,nonsp.alt)

We notice a
significant effect of soil. Now let's have a look
at a variogram of the residuals

plot(Variogram(nonsp.null,resType="normalized",form=~lat+long))

#
plot(Variogram(nonsp.alt,resType="normalized",form=~lat+long))

Variograms are weird
things, they give us similar information as an autocorrelation function, but
are upside down and backwards, so don’t use your intuition from acf()
functions. Here are how some common correlation structures look on a variogram,
but beware, there’s usually a lot of noise. If there is no upwards trend at
all, then there may not be any spatial correlation.

(from Pinheiro and Bates 2004, p. 233)

Next we will fit some
correlations functions to see what fits best

spher.null <-
lme(thick ~ 1, data = spdata, random = ~ 1 | dummy,

correlation
= corSpher(form = ~ lat+long| dummy),method='ML')

plot(Variogram(spher.null,resType="normalized"))

We want the variogram to be flat when we include a spatial random effect with the correct structure.

gauss.null <-
lme(thick ~ 1, data = spdata, random = ~ 1 | dummy,

correlation
= corGaus(form = ~ lat+long| dummy),method='ML')

plot(Variogram(gauss.null,resType="normalized"))

We could (should) also try
corExp,corLin,corRatio correlation structures.

anova(nonsp.null,spher.null,gauss.null)

The Gaussian correlation structure seems to
perform the best, so we'll use that.

gauss.alt <-
lme(thick ~ soil, data = spdata, random = ~ 1 | dummy,

correlation
= corGaus(form = ~ lat+long| dummy),method='ML')

anova(gauss.null,gauss.alt)

__Now there is no evidence of effect of soil. What happen?__

The value at a site is
best predicted by the values at sites nearby, and the value of soil does
not add any information on top of that

###
__Adding a Grouping variable__

Sometimes we may only
want to model spatial correlation for observations within a value of some
grouping variable. We will used species
as a grouping variable, but this does not take into account correlation between
species

group.null <- lme(thick ~ 1, data = spdata, random = ~ 1 | species,

correlation
= corGaus(form = ~ lat+long| species), method='ML')

group.alt <-
lme(thick ~ soil, data = spdata,

random = ~ 1 | species,

correlation
= corGaus(form = ~ lat+long| species), method='ML')

anova(group.null,group.alt )

anova(group.null,group.alt )

##
__Counts and presence absence data__

This data is of the number of red maples in a quadrat.

install.packages("aod","MASS")

library(aod)

library(MASS)

load("trees.Rdata")

trees <- cbind(trees, dummy=1)

nonsp.alt=glmmPQL(red_maple~elev, data = trees,

random = ~ 1 | dummy,family=poisson)

random = ~ 1 | dummy,family=poisson)

summary(nonsp.alt)

plot(Variogram(nonsp.alt,resType="normalized"))

Spatial correlation!

gaus.alt=glmmPQL(red_maple~elev, data = trees,

random = ~ 1 | dummy,

correlation = corGaus(form = ~ x+y| dummy),

family=poisson)

random = ~ 1 | dummy,

correlation = corGaus(form = ~ x+y| dummy),

family=poisson)

plot(Variogram(gaus.alt,resType="normalized"))

spher.alt=glmmPQL(red_maple~elev, data = trees,

random = ~ 1 | dummy,

correlation = corSpher(form = ~ x+y| dummy),

family=poisson)

plot(Variogram(spher.alt,resType="normalized"))

Both seem to make the variogram very flat! Yay! but we can't test with anova(), so visual check is the best bet :(, We should try other correlation functions too, see what's best.

summary(spher.alt)

__Factors__
trees$soil=as.factor(trees$soil)

spher.alt.soil= glmmPQL(red_maple~soil, data = trees,

random = ~ 1 | dummy,

spher.alt.soil= glmmPQL(red_maple~soil, data = trees,

random = ~ 1 | dummy,

correlation = corSpher(form = ~ x+y| dummy),

family=poisson)

family=poisson)

summary(spher.alt.soil)

wald.test(b = t(coef(spher.alt.soil)),

Sigma = vcov(spher.alt.soil),Terms=2:3)

##
__Want to do more? __

This is as far as nlme will take you, I believe. The packages I
have found that are able to do more complex spatial modelling are all Bayesian
(so no p-values) but they can do things like spatial prediction (drawing a
spatial map), and allow more than one observation per location, which nlme does not seem to do. They are a lot more difficult to
fit and use, but people seem happy with them. Some options are:

geoR and geoRglm

spGLM

R-INLA

http://www.math.ntnu.no/inla/r-inla.org/papers/jss/lindgren.pdf

spBayes

http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12189/pdf

spBayes

http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12189/pdf

Some would argue that you should try to avoid hypothesis tests of assumptions (because for example the data weren't collected to test the a priori hypothesis that data were independent, spherical, etc). Instead you could use a model selection approach, pick the autocorrelation model that best matches the data, eg by AIC. AICs are given in the anova output, but you can get them without the other stuff using something like:

ReplyDelete> AIC(nonsp.null,spher.null,gauss.null)

df AIC

nonsp.null 3 346.98685

spher.null 4 147.89675

gauss.null 4 89.55566

Francis makes the good point that the spatial structure has been compared on the null models (thick~1) rather than the alternative models (thick~soil), and does this matter. It might, and apparently different people have different ideas on this, but since the goal is to account for residual spatial autocorrelation after accounting for predictors like soil there is an argument for choosing autocorrelation structure using something like

ReplyDeleteAIC(nonsp.alt, spher.alt, gauss.alt)

where these models each have thick~soil in their formula.

There was also discussion about when including spatial correlation is and is not necessary. If you chose sites and then apply a treatment to them randomly, and you test for that treatment, and your model does not contain any predictors which are spatially correlated (like temperature, rainfall etc) then David argues it's not needed, otherwise if you have data with spatial co-ordinates I say plot a Variogram and see if you need to include a spatial term.

ReplyDelete