Eco-Stats Research Blog: 2015

Tuesday, 22 September 2015

Analysis of ordinal data in ecology

The September Eco-Stats Lab (Friday 25th, 2pm, Bioscience level 6) will be on ordinal data analysis in ecology.

Ordered categorical data are commonplace in ecology when quantitative measurement of a variable of interest is not feasible or too costly. Examples include size of individuals, body condition and relative abundances of species. Cumulative link models are a powerful class of models for analyzing such data since observations are treated as categorical, the ordered nature is exploited and the flexible regression framework allows in-depth analyses.We will use the ordinal package in R to analyse some ecological ordinal data.

The code and details can be found here.

Thursday, 10 September 2015

Welcome to Loic and Wesley!

We are very fortunate to have two new research associates joining us in Eco-Stats - Loic Thibaut and Wesley Brooks.

Loic came from a PhD at James Cook with Sean Connelly, and he will be thinking about how to resample generalised linear mixed models.

Wesley came here from Wisconsin, where he did a PhD with Jun Zhu, and he will be looking at the issue of spatial confounding and how it affects point process models.

Tuesday, 25 August 2015

Time to event analysis (Survival analysis) in ecology

The August Eco-Stats Lab (Friday 28th, 2pm, Bioscience level 6) will be on time to event analysis in ecology.

When modelling the time taken for an event to happen (e.g.death) we often use time to event analysis (survival analysis) rather than regression models. A common feature of these data is that for some subjects the even did not happen at all during the study period, called (right) censoring, and survival analysis can elegantly incorporate the information from these subjects. Regression models on the other hand have no easy way to include these subjects.

Survival analysis can often be applied to ecological data, e.g.

- Arrival of a parasite

- Survival times for animals/plants

- Germination timing

- Response to stimulus

- How long fruit remain on plants before they are eaten

We will use the survival package in R to analyse some ecological time to event data. You can find the code/explanation here, and the data here.

Wednesday, 19 August 2015

Workshop - Introduction to Regression Modelling on R, December 6-7

We will be running a two-day workshop at UNSW in the lead-up to the Eco-Stats conference in December. This course is aimed at ecologists who recall some intro stats and want to get up to speed on more modern methods of modelling data using R.

The core idea in the course is to recognise that most statistical methods you use can be understood under a single framework, as special cases of (generalised) linear models - including linear regression, t-tests, ANOVA, ANCOVA, logistic regression and chi-square tests. Learning these methods in a systematic way, instead of as a "cookbook" of different methods, enables a systematic approach to key steps in analysis (like assumption checking) and extension to handle more complex situations you might encounter in the future (random factors, multivariate analysis, choosing between a set of competing models).

The course will be taught by the UNSW Eco-Stats group (it will be led by Francis Hui and Gordana Popovic, with contributions by David Warton and others).

Register via the conference website at http://www.eco-stats.unsw.edu.au/register.html

Tuesday, 28 July 2015

Forecasting with time series data

Day: Friday 31st July 2pm

Location: Bioscience Level 6

Topic: Forecasting with time series data

###########################################################################

## Forecasting with Times Series Data.

## We will use Rob Hyndman's forecast R-package.

## For more details on the package and time-series forecasting in general,
## see https://www.otexts.org/fpp

library(forecast)

It's boral time!

Details for the next R-lab are now available:

Day: Friday 19th June 2pm (it's been pushed a week early for conference reasons)

Location: Bioscience Level 6

Topic: boral -- A R package for bayesian analysis of multivariate abundance data in ecology

Links:

1) Presentation slides Go here

2) R script: Go here

Notes: Please be aware that we will be using the MCMC package JAGS, as well as the R package boral and mvabund. If you are using your own laptop, then you could save some time by installing those prior to coming. Thanks!

Wednesday, 27 May 2015

Missing Data Analysis

The May Eco-Stats Lab (Friday 29th, 2pm, Bioscience level 6) will be on the missing data analysis, using the method of Multiple Imputation.

One often encounters missing data in almost all types of studies. Ecological data is also commonly subject to missing data. However, most of the statistical analysis methods are designed for complete datasets. A common way to handle missing data is to remove cases with missing values in order to obtain a complete dataset, which reduces the sample size and thus the statistical power. This approach can result in biased estimates for descriptive statistics and regression coefficients as well. An alternative approach is to impute (fill-in) the missing data by plausible values multiple times, analyse each imputed dataset separately, and then combine the results together. This method is called Multiple Imputation (MI) and was proposed by Rubin (1987).

In this lab we will explore the method of MI implemented in the mice package (van Buuren & Groothuis-Oudshoorn, 2011), which stands for multivariate imputation by chained equations. For more details see

http://www.jstatsoft.org/v45/i03/

Click here

Wednesday, 6 May 2015

Traits, community ecology and demented accountants

I've added a post on trait modelling on the Methods blog, title as above, to coincide with the April MEE Special Issue from the 2013 Eco-Stats Symposium.

You'd be surprised how hard it is to find a Creative Commons image of people in suits in the field...

Monday, 20 April 2015

Zero inflation in ecology

The April Eco-Stats Lab (Friday 24th, 2pm, Bioscience level 6) will be on zero inflated data in ecology.

It's very common for ecological data to contain many zeros. To account for this we may need to:

1. Use zero inflated regression models

2. Do absolutely nothing (i.e. fit standard glm's)

In this lab we'll talk about why many zeros may occur in ecology, and the appropriate ways to account for them in your analysis.We will mostly use the pscl package in R.

Special Issue from Eco-Stats Symposium in MEE April 2015 issue

The Eco-Stats Symposium led to a series of papers which make up the April Special Issue of Methods in Ecology and Evolution. Read the blog post on the topic for a little more info. This includes contributions to species distribution modelling (point process models, measurement error, jointly estimating observer bias across multiple spp), multivariate analysis (a method for unconstrained ordination, trait analysis), diversity estimation (rarefaction for phylogenetic diversity, how to weight branches in functional diversity computation).

Planning is well underway for a follow-up Eco-Stats Conference in December 2015 at UNSW, featuring Otso Ovaskainen (Helsinki), Doug Wu (East Anglia), Jay ver Hoef (Alaska), Melodie McGeoch (Monash) and plenty more. We'll send out a call for registrations and poster abstracts in the next month or so.

Monday, 23 March 2015

Species traits in multivariate analysis

The March Eco-Stats Lab (Friday 27th, 2pm, Bioscience level 6) will be on incorporating species traits into multivariate analysis (or equivalently, community level models), using some functions recently added to the mvabund package.

Often ecologists collect data (especially abundance or presence-absence data) simultaneously across many taxa, with the intention of studying what occurs where (and why). This tutorial focusses on the why - methods to help us move towards a functional explanation of community abundances. McGill et al (2006) and Shipley (2010) argue passionately for the need for this.

A common strategy in any field looking at "why" is to look for predictor variables that can explain the response. In the case of studying why some taxa are abundant at a site while others are not, the relevant predictors are species traits. These come in a matrix, different traits in different columns, different taxa in different rows.

We will explore methods for using such a matrix of traits in a multivariate analysis, using the mvabund package (version 3.10.1 or later). Full details here:
http://rpubs.com/dwarton/68823
We will also discuss the relationship of these methods with standard analysis of multivariate abundance data, SDMs, and Bill Shipley's CATS models.

*** Note you need mvabund 3.10.1 or later... ***

UPDATE (13/5/15): mvabund 3.10.4 is now available from CRAN. It has a formula argument for more control of the traitglm model you fit, and composition and col.intercepts arguments to control whether or not you include row/column effects in the model (to focus on relative abundance), and a block resampling option on anova.traitglm (useful for example if you have repeated measures).

Congratulations Francis!

In the last few weeks Francis Hui has pulled off two big results:
- his PhD has been awarded, with his thesis going through without changes, but with plenty of complements.
- he was awarded the 2015 E&ERC prize for best student paper for the below article in the Journal of the American Statistical Association:
http://amstat.tandfonline.com/doi/abs/10.1080/01621459.2014.951444

Well done Francis! Super effort. He is currently working on a post-doc at ANU with Alan Welsh and Samuel Mueller, developing new methodology for mixed models.

Friday, 9 January 2015

2014 Eco-Stats paper of the year

Time for our third annual Eco-Stats Paper of the Year awards. Basically everyone in our group nominates the paper they were most impressed by this year, across the ecology and statistics literatures (although ideally somewhere between the two). Then we have a vote for a winner. A bit of a stats focus this year as it turns out, perhaps reflecting where our thinking has been recently. Here is our shortlist:

Kitzes and Harte Beyond the species–area relationship: improving macroecological extinction estimates, Methods in Ecology and Evolution. There is something of a disconnect between broad macroecological methods (like species-area relationships) and methods for modelling individual species (SDMs etc), and this paper tries to bridge that gap a little with some nice ideas.

Godoy, Kraft and Levine Phylogenetic relatedness and the determinants of competitive outcomes, Ecology Letters. An interesting mix of empirical work and theory, meticulously collecting data on vital rates in a competition experiment to parameterise a fancy Chesson mathematical model for competition and look at what this implied concerning the pairwise competitive interactions between a set of 18 Californian grassland species. Results question the widely held expectation that more distantly related species can more readily coexist.

Delaigle Nonparametric Kernel Methods with Errors-in-Variables: Constructing Estimators, Computing them, and Avoiding Common Mistakes, Australian and New Zealand Journal of Statistics. An insightful review of ways to handle measurement error in kernel estimation, with a section on common mistakes seen in the literature. Aurore Delaigle (Melbourne) is one to watch on the Australian Statistics scene, a Peter Hall protege and winner of the 2012 Moran Medal.

Viladomat, .., Hastie Assessing the significance of global and local correlations under spatial autocorrelation: A nonparametric approach, Biometrics. Not the first Stanford Stats entry, and not the last either! A permutation test for association between two spatial variables, which works by smoothing and scaling values in the permuted variable in such a way that it preserves autocorrelation. An original idea working away at the problematic area of design-based inference for spatial data.

And the two joint winners this year:
Kleiner et al A scalable bootstrap for massive data, JRSSB. Bootstrapping target quantities in large datasets by breaking the data into groups and boostrapping the summary statistics for each group. This idea makes bootstrapping doable for large datasets, if you have the right sort of statistic.

Lockhart, .., Tibshirani A significance test for the lasso, Annals of Statistics. The LASSO is a big deal these days but a sticking point has always been inference about coefficients. This paper proposes an amazingly simple significance test for coefficients entering the model. This is destined to become a citation classic.

Feel free to share your own opinions on the highlights of 2014!

Eco-Stats Research Blog