Eco-Stats Research Blog: R-lab: Inference with Spatially correlated data with nlme

Monday, 27 October 2014

R-lab: Inference with Spatially correlated data with nlme

Hi guys,

Problem: You have data that may be spatially correlated, and you worry this might affect your inference (p-values and confidence intervals for fixed effects).

Solution : This Friday, 31st October at 2pm, R-lab on inference with Spatially correlated data with the nlme package.

Download R code and data here.

See you there,
Gordana

Material adapted from

http://www.ats.ucla.edu/stat/mult_pkg/faq/general/spatial_autocorr.htm

What we can and can’t do with nlme:

Continuous data only, not counts or presence-absence (for discrete data you could have a look at glmmPQL in MASS package, quick example later, but limited)

Cannot have more than one measurement at any location ( unless you have a grouping variable, then you can’t have more than one measurement at any location within any group)

Not too much data, or it will give an error

No spatial prediction (so you can't draw a map which includes the spatial random effect, which is unfortunate)

Spatial Data- its everywhere

Continuous data

We start with continuous data. We will use a dataset of tree bark thickness.

rm(list=ls())

# install.packages("nlme")

library(nlme)

load(file="spdata.Rdata")

We will need a dummy variable because lme requires a grouping variable, but it will be the same for all observations, hence dummy

spdata <- cbind(spdata, dummy=1)

First we fit a model with no spatial correlation

nonsp.null = lme(thick ~ 1, data = spdata, random = ~ 1 | dummy, method='ML')

nonsp.alt=lme(thick ~ soil, data = spdata, random = ~ 1 | dummy, method='ML')

anova(nonsp.null,nonsp.alt)

We notice a significant effect of soil. Now let's have a look at a variogram of the residuals

plot(Variogram(nonsp.null,resType="normalized",form=~lat+long))

# plot(Variogram(nonsp.alt,resType="normalized",form=~lat+long))

Variograms are weird things, they give us similar information as an autocorrelation function, but are upside down and backwards, so don’t use your intuition from acf() functions. Here are how some common correlation structures look on a variogram, but beware, there’s usually a lot of noise. If there is no upwards trend at all, then there may not be any spatial correlation.

(from Pinheiro and Bates 2004, p. 233)

Next we will fit some correlations functions to see what fits best

spher.null <- lme(thick ~ 1, data = spdata, random = ~ 1 | dummy,

correlation = corSpher(form = ~ lat+long| dummy),method='ML')

plot(Variogram(spher.null,resType="normalized"))

We want the variogram to be flat when we include a spatial random effect with the correct structure.

gauss.null <- lme(thick ~ 1, data = spdata, random = ~ 1 | dummy,

correlation = corGaus(form = ~ lat+long| dummy),method='ML')

plot(Variogram(gauss.null,resType="normalized"))

We could (should) also try corExp,corLin,corRatio correlation structures.

anova(nonsp.null,spher.null,gauss.null)

The Gaussian correlation structure seems to perform the best, so we'll use that.

gauss.alt <- lme(thick ~ soil, data = spdata, random = ~ 1 | dummy,

correlation = corGaus(form = ~ lat+long| dummy),method='ML')

anova(gauss.null,gauss.alt)

Now there is no evidence of effect of soil. What happen?

The value at a site is best predicted by the values at sites nearby, and the value of soil does not add any information on top of that

Adding a Grouping variable

Sometimes we may only want to model spatial correlation for observations within a value of some grouping variable. We will used species as a grouping variable, but this does not take into account correlation between species

group.null <- lme(thick ~ 1, data = spdata, random = ~ 1 | species,

correlation = corGaus(form = ~ lat+long| species), method='ML')

group.alt <- lme(thick ~ soil, data = spdata,

random = ~ 1 | species,

correlation = corGaus(form = ~ lat+long| species), method='ML')
anova(group.null,group.alt )

Counts and presence absence data

This data is of the number of red maples in a quadrat.

install.packages("aod","MASS")

library(aod)

library(MASS)

load("trees.Rdata")

trees <- cbind(trees, dummy=1)

nonsp.alt=glmmPQL(red_maple~elev, data = trees,
random = ~ 1 | dummy,family=poisson)

summary(nonsp.alt)

plot(Variogram(nonsp.alt,resType="normalized"))

Spatial correlation!

gaus.alt=glmmPQL(red_maple~elev, data = trees,
random = ~ 1 | dummy,
correlation = corGaus(form = ~ x+y| dummy),
family=poisson)

plot(Variogram(gaus.alt,resType="normalized"))

spher.alt=glmmPQL(red_maple~elev, data = trees,
random = ~ 1 | dummy,
correlation = corSpher(form = ~ x+y| dummy),
family=poisson)

plot(Variogram(spher.alt,resType="normalized"))

Both seem to make the variogram very flat! Yay! but we can't test with anova(), so visual check is the best bet :(, We should try other correlation functions too, see what's best.

summary(spher.alt)

Factors

What about when we have factors, well we can do a joint wald test

trees$soil=as.factor(trees$soil)
spher.alt.soil= glmmPQL(red_maple~soil, data = trees,
random = ~ 1 | dummy,

correlation = corSpher(form = ~ x+y| dummy),
family=poisson)

summary(spher.alt.soil)

#we want to test terms 2 and 3 together, so

wald.test(b = t(coef(spher.alt.soil)),

Sigma = vcov(spher.alt.soil),Terms=2:3)

Want to do more?

This is as far as nlme will take you, I believe. The packages I have found that are able to do more complex spatial modelling are all Bayesian (so no p-values) but they can do things like spatial prediction (drawing a spatial map), and allow more than one observation per location, which nlme does not seem to do. They are a lot more difficult to fit and use, but people seem happy with them. Some options are:

geoR and geoRglm

http://cran.r-project.org/web/packages/geoRglm/vignettes/geoRglmintro.pdf

spGLM

http://blue.for.msu.edu/NEON/SC/exercises/exercise-3/initial-exploration-spGLM.pdf

R-INLA

http://www.math.ntnu.no/inla/r-inla.org/papers/jss/lindgren.pdf

spBayes
http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12189/pdf