The September Eco-Stats Lab (Friday
25th, 2pm, Bioscience level 6) will be on ordinal data analysis in ecology.
Ordered categorical data are commonplace in ecology when quantitative
measurement of a variable of interest is not feasible or too costly. Examples
include size of individuals, body condition and relative abundances of species.
Cumulative link models are a powerful class of models for analyzing such data
since observations are treated as categorical, the ordered nature is exploited
and the flexible regression framework allows in-depth analyses.We will use theordinal package in R to analyse some ecological ordinal data.
The code and details can be found here.
The August Eco-Stats Lab (Friday 28th, 2pm, Bioscience level 6) will be on time to event analysis in ecology.
When modelling the time taken for an event to happen (e.g.death) we often use time to event analysis (survival analysis) rather than regression models. A common feature of these data is that for some subjects the even did not happen at all during the study period, called (right) censoring, and survival analysis can elegantly incorporate the information from these subjects. Regression models on the other hand have no easy way to include these subjects. Survival analysis can often be applied to ecological data, e.g. - Arrival of a parasite - Survival times for animals/plants - Germination timing - Response to stimulus - How long fruit remain on plants before they are eaten We will use the survival package in R to analyse some ecological time to event data. You can find the code/explanationhere, and the data here.
We will be running a two-day workshop at UNSW in the lead-up to the Eco-Stats conference in December. This course is aimed at ecologists who recall some intro stats and want to get up to speed on more modern methods of modelling data using R.
The core idea in the course is to recognise that most statistical methods
you use can be understood under a single framework, as special cases of
(generalised) linear models - including linear regression, t-tests,
ANOVA, ANCOVA, logistic regression and chi-square tests. Learning these
methods in a systematic way, instead of as a "cookbook" of different
methods, enables a systematic approach to key steps in analysis (like
assumption checking) and extension to handle more complex situations you
might encounter in the future (random factors, multivariate analysis,
choosing between a set of competing models).
The course will be taught by the UNSW Eco-Stats group (it will be led by Francis Hui and Gordana Popovic, with contributions by David Warton and others).
########################################################################### ## Forecasting with Times Series Data. ## We will use Rob Hyndman's forecast R-package. ## For more details on the package and time-series forecasting in general, ## see https://www.otexts.org/fpp library(forecast)
Notes: Please be aware that we will be using the MCMC package JAGS, as well as the R package boral and mvabund. If you are using your own laptop, then you could save some time by installing those prior to coming. Thanks!
The May Eco-Stats Lab (Friday 29th, 2pm, Bioscience level 6) will be on the missing data analysis, using the method of Multiple Imputation.
One often encounters missing data in almost all types of studies. Ecological data is also commonly subject to missing data. However, most of the statistical analysis methods are designed for complete datasets. A common way to handle missing data is to remove cases with missing values in order to obtain a complete dataset, which reduces the sample size and thus the statistical power. This approach can result in biased estimates for descriptive statistics and regression coefficients as well. An alternative approach is to impute (fill-in) the missing data by plausible values multiple times, analyse each imputed dataset separately, and then combine the results together. This method is called Multiple Imputation (MI) and was proposed by Rubin (1987).
In this lab we will explore the method of MI implemented in the mice package (van Buuren & Groothuis-Oudshoorn, 2011), which stands for multivariate imputation by chained equations. For more details see
The Eco-Stats Symposium led to a series of papers which make up the April Special Issue of Methods in Ecology and Evolution. Read the blog post on the topic for a little more info. This includes contributions to species distribution modelling (point process models, measurement error, jointly estimating observer bias across multiple spp), multivariate analysis (a method for unconstrained ordination, trait analysis), diversity estimation (rarefaction for phylogenetic diversity, how to weight branches in functional diversity computation).
Planning is well underway for a follow-up Eco-Stats Conference in December 2015 at UNSW, featuring Otso Ovaskainen (Helsinki), Doug Wu (East Anglia), Jay ver Hoef (Alaska), Melodie McGeoch (Monash) and plenty more. We'll send out a call for registrations and poster abstracts in the next month or so.
The March Eco-Stats Lab (Friday 27th, 2pm, Bioscience level 6) will be on incorporating species traits into multivariate analysis (or equivalently, community level models), using some functions recently added to the mvabund package.
Often ecologists collect data (especially abundance or presence-absence data) simultaneously across many taxa, with the intention of studying what occurs where (and why). This tutorial focusses on the why - methods to help us move towards a functional explanation of community abundances. McGill et al (2006) and Shipley (2010) argue passionately for the need for this.
A common strategy in any field looking at "why" is to look for predictor variables that can explain the response. In the case of studying why some taxa are abundant at a site while others are not, the relevant predictors are species traits. These come in a matrix, different traits in different columns, different taxa in different rows.
We will explore methods for using such a matrix of traits in a multivariate analysis, using the mvabund package (version 3.10.1 or later). Full details here: http://rpubs.com/dwarton/68823
We will also discuss the relationship of these methods with standard
analysis of multivariate abundance data, SDMs, and Bill Shipley's CATS
*** Note you need mvabund 3.10.1 or later... ***
UPDATE (13/5/15): mvabund 3.10.4 is now available from CRAN. It has a formula argument for more control of the traitglm model you fit, and composition and col.intercepts arguments to control whether or not you include row/column effects in the model (to focus on relative abundance), and a block resampling option on anova.traitglm (useful for example if you have repeated measures).
In the last few weeks Francis Hui has pulled off two big results:
- his PhD has been awarded, with his thesis going through without changes, but with plenty of complements.
- he was awarded the 2015 E&ERC prize for best student paper for the below article in the Journal of the American Statistical Association:
Well done Francis! Super effort. He is currently working on a post-doc at ANU with Alan Welsh and Samuel Mueller, developing new methodology for mixed models.
Time for our third annual Eco-Stats Paper of the Year awards. Basically everyone in our group nominates the paper they were most impressed by this year, across the ecology and statistics literatures (although ideally somewhere between the two). Then we have a vote for a winner. A bit of a stats focus this year as it turns out, perhaps reflecting where our thinking has been recently. Here is our shortlist:
Godoy, Kraft and Levine Phylogenetic relatedness and the determinants of competitive outcomes, Ecology Letters. An interesting mix of empirical work and theory, meticulously collecting data on vital rates in a competition experiment to parameterise a fancy Chesson mathematical model for competition and look at what this implied concerning the pairwise competitive interactions between a set of 18 Californian grassland species. Results question the widely held expectation that more distantly related species can more readily coexist.
And the two joint winners this year:
Kleiner et al A scalable bootstrap for massive data, JRSSB.
Bootstrapping target quantities in large datasets by breaking the data
into groups and boostrapping the summary statistics for each group.
This idea makes bootstrapping doable for large datasets, if you have the
right sort of statistic.
Lockhart, .., Tibshirani A significance test for the lasso, Annals of Statistics. The LASSO is a big deal these days but a sticking point has always been inference about coefficients. This paper proposes an amazingly simple significance test for coefficients entering the model. This is destined to become a citation classic.
Feel free to share your own opinions on the highlights of 2014!