Eco-Stats Research Blog: 2014

Friday, 19 December 2014

Congratulations, farewell and good luck to Francis Hui!

Congratulations to Francis - on submitting his PhD thesis earlier this month. It is a really nice thesis with papers already accepted in Ecology, Methods in Ecology and Evolution, and the Journal of the American Statistical Association. And a few more cracking papers in review.
Farewell - because he is off to a post-doc position at ANU in January, working on mixed models with Alan Welsh and Samuel Mueller. Francis will be missed. He is a friendly, thoughtful, energetic person, who has also been a big contributor in our group and beyond. While keeping his thesis kicking along some other highlights include managing a number of productive collaborations with ecologists on the side, developing and presenting a few EcoStats labs (you can see these in his prev 2014 posts in this blog), and volunteering to teach a full day of introductory statistics material in the now-annual BEES postgrad workshop on intermediate statistics. And co-ordinating Eco-Stats meetings.
Good luck - well everyone could use a little luck every now and then. But the systematic component of the model looks good... Francis has shown himself to be a pretty talented researcher and communicator, so we reckon he is well-placed for an exciting career in statistics methodology and its applications, especially (we hope!) in ecology. Wishing you all the best, Francis!

Tuesday, 25 November 2014

(Generalised) linear (mixed) models with R2jags: an extremely brief primer...

The purpose of this lab (Friday 28th 2pm) is to provide a very basic primer on running basic lm, glm and glmms in a Bayesian framework using the R2jags package (and of course JAGS). As such, the goal is not to debate the relative merits of Bayesian vs frequentist approaches, but hopefully to demystify the fitting of Bayesian models, and more specifically demonstrate that in a wide variety of (more basic) use cases the parameter estimates obtained from the two approaches are typically very similar.

We will be attempting to reproduce a small element of the analysis from a recently published article in Journal of Ecology (for which all the data is available at datadryad.org).

Kessler, M., Salazar, L., Homeier, J., Kluge, J. (2014), Species richness-productivity relationships of tropical terrestrial ferns at regional and local scales. Journal of Ecology, 102: 1623-1633.

Code (jagstut.Rmd) and data (KessDiv.csv) available from dropbox or github. View the compiled html version here.

See you Friday,
Andrew

Friday, 7 November 2014

Eco-Stats ARC Discovery grant - $295,900 for 2015-17

The ARC Discovery grant results were released yesterday - this is the main place in Australia where you can get funding for fundmanetal research, although it is super-competitive. I was lucky enough to get a grant up - "Advances in biodiversity modelling - analysis of high-dimensional counts", $295,900 over three years. This funding will be used to hire a post-doctoral researcher to help improve methods of multi-species modelling, thinking about questions like how to model species interaction in a parsimonious way, and accounting for measurement error in covariates. Job ad coming soon...

Nerd Nite Sydney

So I was asked to speak at Nerd Nite Sydney last night - they describe it as "a bit like the Discovery Channel... with beer" but it was heaps more fun than that.

I did the Rick Astley thing... yeah, again... and talked a little about exciting times for statistics (new technology etc) and the hard times (low levels of statistical literacy are often a barrier to progress/informed discussion)

You can find out more about Nerd Nite Sydney at http://sydney.nerdnite.com/ or on social media

Great STATS talk @ #nerdnightsyd @ecostats I think I need to talk stats with you - HELP appreciated pic.twitter.com/wkm0DfHgCe
— John Martin (@Cockatoowingtag) November 6, 2014

Sunday, 2 November 2014

boral: Reliable construction materials for good model building...

The ecostats group are happy to introduce a new R package called boral -- Bayesian Ordination and Regression AnaLysis, for analysis of multivariate data (community composition data especially) in ecology!!!

Boral uses Bayesian MCMC estimation via JAGS (Just Another Gibbs Sampler) to fit three types of models:
1) GLMs fitted independently to each species (like in another R package mvabund, developed by us)

2) Purely latent variable models for model-based unconstrained ordination (see Hui et al., 2014, http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12236/abstract for details)

3) GLMs fitted to each species while accounting for correlation between species e.g., due to species interaction.

Check it out at: http://cran.r-project.org/web/packages/boral/index.html

And even better, check out the video (which we promise has no dancing and no singing unlike another certain video...): https://www.youtube.com/watch?v=vyMsgyytcUI

Monday, 27 October 2014

R-lab: Inference with Spatially correlated data with nlme

Hi guys,

Problem: You have data that may be spatially correlated, and you worry this might affect your inference (p-values and confidence intervals for fixed effects).

Solution : This Friday, 31st October at 2pm, R-lab on inference with Spatially correlated data with the nlme package.

Download R code and data here.

See you there,
Gordana

September Eco-Stats lab: model-based multivariate analysis in ecology (mvabund package and recent additions)

What does mvabund do?
Analyses multivariate data (especially abundance of presence-absence data) using simultaneous univariate models and design-based inference.

The main functions are manyglm, which fits a GLM to each response variable, and anova/summary, which use row-resampling for valid multivariate inference (i.e. taking into account correlation between variables)

Designed specially for multivariate abundance data in ecology, species-by-site stuff, which has two key properties that need to be dealt with:
(1) strong mean-variance relationship.
(2) correlation between response variables (e.g. due to species interaction)

Why is mvabund better than using PRIMER, PC-ORD, etc?
A few reasons, see this R script for details (and code to work through):

Wednesday, 27 August 2014

R-lab on Mixed models in Ecology

Hi guys,

This Friday, 29th August at 2pm, there will be in an R-lab on mixed models in ecology, a topic which many of you are interested in digging deeper into. I've set up too links to dropbox for the slides as well as a csv file which we will be playing with (dataset courtesy of Sylvia Hay =D)

Thanks.

PDF slides
Example 1: Bird counts
Example 2: Nested design dataset

(Corrected) code for bird analysis

Yours non-significantly,
FH

Monday, 4 August 2014

Sexy, unconstrained models all ready for you to play with!

When trying to visualize how sites vary in terms of species composition, for too long ecologists have been using distance-based methods of unconstrained ordination such as NMDS and CA, with little but precedence to guide them on what dissimilarity measure to use and what transformation and/or standardization to apply.

In collaboration with some folks in New Zealand and Finland, we've been working on a couple of model-based approaches to unconstrained ordination, which offer several advantages such as explicitly accounting for key properties of the data and model variable tools to select key aspects of the analysis. Simulations also show our proposed methods either perform the same or way better than distance-based approaches at the getting the ordinations correct!

Check out our manuscript, now available for early view at:

http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12236/abstract

Friday, 20 June 2014

Eco-Stats Lab June 2014: Phylogenetic model adequacy

In this lab (June 29th) I'll introduce a new package that we're developing called "arbutus" after a funny looking tree that grows in the Pacific Northwest. The package is designed to test the adequacy of models of trait evolution on phylogenies. We've started out looking at relatively simple models but it should be possible to test more and more complex models.

A few links:

1) the slides and code for the lab

2) the package repository

3) the pre-print of the paper

As this work is still in progress, any feedback on theory, usability or aesthetics is more than welcome!

Thursday, 12 June 2014

Andrew Letten wins Best Ecology and Evolution talk at Postgraduate Review Forum

Andrew Letten, from the Centre for Ecosystems Science, honorary eco-stats member, and champion of bringing more statistics into his work in ecology, picked up the top talk in Ecology and Evolution at the UNSW BEES 2 day Postgrad Review Forum. He gave a cracker on species niche differentiation and trying to find empirical evidence for niche differentiation along soil moisture gradients. He showed the Biology department some cutting edge methods in ecological statistics like mvabund (see video) and joint species distribution modelling.
Great work Andrew!

Thursday, 29 May 2014

Eco-Stats Lab May 2014: Longitudinal data

In this week’s lab we will learn about analysing longitudinal data, that is, data collected over time for each study unit (e.g. subject or site).

Standard (generalised) linear models assume all observations are independent, which is typically not reasonable if repeated measures have been taken over time. Data collected for one site at times close together will be more similar than data for different sites, or for times further apart. This induces correlation in the data, which we must account for to obtain valid inference.

Normal data

#Load data

library(lme4)

library(nlme)

library(MASS)

data(BodyWeight)

#scatterplot by Diet

plot(weight~Time,col=Diet,data=BodyWeight)

#ignore all dependence, bad analysis

Rat.lm0=lm(weight~Time,data=BodyWeight)

Rat.lm=lm(weight~Time+Diet,data=BodyWeight)

summary(Rat.lm)

anova(Rat.lm0,Rat.lm)

plot(Rat.lm,which=1)

plot(Rat.lm$fitted,Rat.lm$residuals,col=BodyWeight$Rat)

Problem:

The effects of ignoring dependence vary with the type of dependence but in many cases you can expect

Standard errors are underestimated, giving “false confidence”
T-statistics will be overestimated, and regression coefficients that appear significant may not be

Solution 1 – mixed effects model with random intercept and slope

#plot data by Rat

interaction.plot(BodyWeight$Time, BodyWeight$Rat, BodyWeight$weight)

#model with random effects for intercept and slope

Rat.mixed0<- lmer(weight ~ Time + (1 + Time | Rat) , data = BodyWeight)

Rat.mixed<- lmer(weight ~Time +Diet+ (1 + Time | Rat) , data = BodyWeight)

anova(Rat.mixed0,Rat.mixed)

#There seems to be an effect of diet.

summary(Rat.mixed)

plot(Rat.mixed)

plot(fitted(Rat.mixed),residuals(Rat.mixed),col=BodyWeight$Rat)

This analysis assumes that after fitting a line for each subject, the observations for each subject have the same correlation, regardless of how far away they are in time. This is not generally realistic; observations closer to each other in time might be more correlated than those further apart.

We can’t use lme4 for analysis which incorporates more flexible correlation structure, we will need to use library(nlme). The autocorrelation structure most commonly used for data correlated in time is autoregressive AR(1). It assumes data points close in time are more strongly correlated than those further apart in time.

Solution 2 – mixed effects model with flexible correlation structures

Rat.lme0 <-lme(weight ~ Time + Diet, random = ~ Time | Rat, data = BodyWeight)

Rat.lme1 <-lme(weight ~ Time + Diet, random = ~ Time | Rat, corr=corAR1(, form= ~ Time| Rat), data = BodyWeight)

AIC(Rat.lme0,Rat.lme1 )

#We can see that the AR1 correlation structure seems to work better as the AIC is smaller for this model.

Non Normal Data

What about non normal data? Well we can use the same set up as before in lme4, but now using the glmer function.

data(epil)

interaction.plot(epil$period, epil$subject, epil$y)

interaction.plot(epil$period, epil$subject, log(epil$y+1),col=epil$subject)

#model with random effects for intercept and slope

Epil.mixed0<- glmer(y ~1 + period + (1 + period | subject) , data = epil,family=poisson)

Epil.mixed<- glmer(y ~1 + period +age+ (1 + period | subject) , data = epil,family=poisson)

# whoops, warning about convergence, let's not worry about it, it's not a problem in this case
# read http://stackoverflow.com/a/21370041 and http://stats.stackexchange.com/a/99719 if you have a similar problem

anova(Epil.mixed0,Epil.mixed)

#There seems to be no effect of age.

What if we want an AR(1) correlation structure instead? Well, it is more complicated, some options are glmmPQL in the MASS package, and geeglm in the geepack package. glmmPQL uses the lme function above, and has very similar syntax. geeglm does not fit quite the same model as lme4, but can also be used if you would like to fit an AR(1) structure, or other more flexible correlation structures.

Tuesday, 29 April 2014

Eco-Stats Lab April 2014: Block Bootstrap

In this weeks lab we learn about the block bootstrap. A non parametric way to deal with spatial auto correlation in your data and still make valid inferences.

Bootstrap Recap

•Bootstrapping allows us to find the unknown distribution of a statistic by resampling the original data (with replacement) and recalculating the statistic many times.

•Hence we can calculate p-values and standard errors of things we don’t know the distribution of.

•Assumptions: observations are independent and identically distributed ("iid")

But you can't use an iid bootstrap when data are spatially correlated

David Warton wins Young Investigator Award from American Statistical Association

We're all very proud that David has won another major award this year, this one the Young Investigator Award from the American Statistical Association Section on Statistics and the Environment. He's been recognized internationally for outstanding contributions to the development of methods, issues, concepts, applications, and initiatives in environmental statistics by a young statistician. And we quite agree.

Well done David!

More on the school website:
https://www.maths.unsw.edu.au/news/2014-04/david-warton-young-investigator-award

Wednesday, 26 March 2014

Eco-Stats Lab, March 2014 - Measurement Error modeling using SIMEX

Measurement Error modeling using SIMEX

Date: 28th March 2-3pm
Venue: Computer Lab Room 640

Slides:

Measurement Error Modeling

Measurement error or error-in-variables arises whenever we have imprecise measurements on our predictor variables (or covariates).
If X is the true covariate and U is the measurement error, then what we observe is
W = X + U:
In a simple regression, we usually assume that our covariates are measured precisely or they represent the true covariate values quite well. But what happens to our estimates if our covariates have measurement error?

Eco-Stats Lab, Feb 2014 - SMATR

The SMATR package (Standardised Major Axis estimation and Testing Routines) is designed for when you are fitting lines and:
- you are primarily interested in the slope (rather than significance or strength of association)
- the problem is symmetric, i.e. you could happily swap which variable is on which axis without changing the meaning of what you are doing. Or put another way, rather than predicting Y from X (regression), you have a pair of Y variables (Y1 and Y2) and you want to see how they are related to each other.

This situation commonly arises in allometry (the study of how one size variable scales against another), this is the main place these methods are useful in ecology.

MAXENT equivalence paper rated "Exceptional" on Faculty of 1000

The first paper from Ian Renner's PhD thesis, "Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology", has been rated by the Faculty of 1000. F1000 is a post-publication peer review website that highlights noteworthy articles from the scientific literature, especially biology and medicine articles. Renner & Warton (2013) received the top rating (three stars, "Exceptional"), which we are pretty chuffed about. For details, see the review at http://f1000.com/prime/718270492

Sunday, 2 February 2014

Eco-Stats Paper of the Year, 2013

We just had our second annual paper of the year competition, highlighting papers that made an impression to UNSW Eco-Stats researchers over the previous year. Papers were supposed to be in print in 2013, but this was interpreted generously. And the nominees are...

Ecostats Workshop - Mixture Models

To all those coming to the UNSW Ecostats workshop on Mixture models,

Date: 31st January 2-3pm
Venue: Computer Lab Room 640...somewhere in Sydney =D
Topic: A very very short introduction to mixture models with a very very short taste of how to implement them in R
MC: Francis Hui (PhD student; UNSW School of Maths and Stats)

Please note that this blog is NOT to be used for indicating that you want to attend. That should have been done via the email sent out by Richard Kingsford earlier.

Unfortunately, blogger does not allow one to attach thing that aren't videos or images, so I've instead provided links to the material I shall be using.

Slides: https://www.dropbox.com/s/0ibha7nk2u8k22u/minilecturev1.pdf?dl=0
R script: https://www.dropbox.com/s/kg3bcyt65ec6nhl/scripts_cutdown.R?dl=0

Thank you.

Yours non-significantly,
FH

Tuesday, 21 January 2014

And the Academy Award for statistical ecology goes to....

While David's acting skills may not match the likes of Leonardo DiCaprio and Christian Bale, he has already won a academy award...that is, an Australian Academy of Science prize!

David has been awarded the prestigious Christopher Heyde Medal for distinguished research in mathematical sciences by a researcher under the age of 40. The award includes a cash-prize (a random variable taking a uniform distribution between one and one million dollars), which of course he will generously share amongst the members of the eco-stats group at UNSW =D

He will officially receive his award at Academy’s annual meeting in Canberra in May.

Congratulations David!