Wednesday, 20 March 2019

Template Model Builder Tutorial

Many of the Eco-Stats group are using Template Model Builder (TMB) - a very flexible package in R for fitting all sorts of latent variable models quickly. For R users without any C++ coding experience, getting familiar with the package might be a little daunting so we've put together a gentle introduction with some simple examples. Follow the link below and get going with TMB:

TMB Introduction Tutorial

Note: before installing TMB (by your usual means of installing an R package) compiling C++ code will require a working development environment. In Windows you can just install the latest version of Rtools - follow the install guide here. If installing on Mac OS or Linux - following the devtools install guide will do the trick - check it out here.

Thursday, 8 March 2018

Paper of the year 2017

The competition for paper of the year 2017 was heated, with the ecostatistician proposing the winning paper scoring the coveted "free coffee for a year" prize, The nominations were diverse, all the way from pure ecology to very fancy stats. After much debate, the winner was:

Hallmann CA, Sorg M, Jongejans E, Siepel H, Hofland N, et al. (2017) More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PLOS ONE 12(10)

This paper was nominated by John Wilshire, who summarises it as follows:

Flying insects play a very important role in ecosystems, both as pollinators and as food sources for other animals. This paper shows that their populations have massively declined over a relatively short period of time  (at least in protected areas in Germany). I like this paper as it presents the results of a long term study, and it is a pretty scary example of the impacts we are having on ecosystems. Plus it is open access and has data and code available, and the statistical analysis is presented in a clear and easy to follow manner.

Other nominees were (in no particular order):


Thursday, 13 April 2017

Special Feature in Methods in Ecology and Evolution on Eco-Stats '15

There is a Special Feature in the April 2017 issue of Methods in Ecology and Evolution reporting outcomes from the Eco-Stats '15 conference, blog post about it here

https://methodsblog.wordpress.com/2017/04/12/generating-new-ideas/

Friday, 21 October 2016

Three simple things you should know about interpreting linear models

What do the coefficients from linear models actually mean? What does it mean to "control for" another variable. How do I interpret coefficients in the presence of other variables v.s. in a model on their own? What do coefficients mean in a generalized linear model? How do I standerdise data using models? What are offsets, and how do I use them? 

Learn the answers to these and many more questions this Friday, October 21 at 2pm

Materials for this lab can be found here.


Friday, 26 August 2016

Introduction to mixed models with lme4

Ever wondered what a mixed model is? Have a nested design for your experiment and don't know how to analyze it? Confused by fixed and random effects?   What's the deal with lme4?

Come and find out the answer to these questions (and more) in an hour. 


Friday 2-3 PM in Biology 640.

Materials are available here.

Thursday, 11 August 2016

Congrats to Gordana Popovic for winning a Student Prize for her awesome talk at ISEC in Seattle. She spoke about an algorithm for covariance modelling of discrete data using copulas (which are fun). For her efforts she got a certificate, an Amazon voucher and one of these...

Thursday, 28 April 2016

Where to submit your paper?

Today at Eco-Stats we discussed the PLoS ONE paper "Where Should I Send It? Optimizing the Submission Decision Process" which did some mathematical modelling to decide on an optimal approach to choosing the order of journals to send an ecological paper to.  The main factors considered were time to acceptance (a function of time to review and acceptance rate) and impact factor of the journal.  The authors wrote to the editorial boards of all - yes all - ISI-listed journals in ecology, and another six general journals (e.g. Science, PLoS ONE) that publish ecological papers.  They got responses from 61 journals, yielding an interesting dataset available as an appendix to their paper.  I've reformatted it as a comma-delimited file here.

The authors derived a couple of metrics (e.g. to maximise expected citations) under a host of assumptions (which made me somewhat uncomfortable, as modelling papers often do), the endpoint was metrics that could be used to evaluate different publication strategies, e.g. Science then PNAS then Ecology Letters then...

Their results I found largely unsurprising - they highlighted a few target journals, of the ones they had data on, in particular Ecology Letters, Ecological Monographs and PloS One, which all scored high as compromises between impact factor and time to publication.  Interestingly Science didn't come out smelling like roses, although this may be a function of the metrics they used and their implicit assumptions as much as anything else.  They didn't have data on all journals, e.g. I would like to know about Nature, Trends in Ecology and Evolution or Methods in Ecology and Evolution.  They expressed surprise that a pretty good strategy seemed to be submitting to journals in order of impact factor.  They expected a loss of impact due to long times spent in review, I mean you end up bouncing around between journals for years and years.  I think in practice that strategy would do worse than their model suggested, for most of us, because it didn't incorporate the positive correlation in outcomes from submitting the same paper to different journals (or more generally, any measure of how significant a given paper actually is).

Over time I've become more of a statistician than a modeller and so I was especially interested in the data behind this work, and I learnt the most just by looking at the raw data that was tucked away in an appendix.  Here are a few choice graphs which explain the main drivers behind their results.

First, Impact Factor vs time in review:




There is a decent negative correlation between impact factor and time in review (r=-0.5).  For those of us who have submitted a few papers to journals at each end of the spectrum this won't be news.  This is presumably one of the reasons why a journal has high impact - faster acceptances has a direct effect on citation metrics, and increases the incentive to submit good papers there.

The Science journal is a bit of an outlier on this graph - it has the highest impact factor but a pretty average review time, more than twice as long as Ecology Letters, so if you take these numbers at face value (are they measured the same way across journals?), and if 50 days means a lot to you, there is a case for having Ecology Letters as your plan A rather than Science.  Hmmm...

Good journals are towards the top left, and apart from Ecology Letters and Science we also have Ecological Monographs on the shortlist because it has a slightly shorter time in review than most journals with similar impact factors.  Although I wonder how large that difference is relative to sampling error (would it come out to the left of the pack next year too?)...

Next graph is Impact Factor vs Acceptance rate:
There is a slightly stronger negative association this time (r=-0.6).  I vaguely remember a bulletin article a few years ago suggesting no relation between impact factor and acceptance rate - that article used a small sample size and made the classic mistake of assuming that no evidence of a relationship means no relationship.  Well given some more data clearly there is a relationship. 

This time we are looking for papers towards to top-right.  The journal fitting the bill is by far the biggest outlier, PLoS ONE - a journal with a different editorial policy to most that reviews largely for technical correctness rather than for novelty.  It ends up with quite a high acceptance rate, and nevertheless manages a pretty high impact factor.  But its impact factor was calculated across all disciplines, what is it when limited to just ecology papers?

So anyway, from looking at the raw data and taking it at face value, what would be your publishing strategy?  A sensible (and relatively common) strategy is to first go for a high impact journal (or two) with relatively short turnaround times, which Ecology Letters is known for, and when you get tired/discouraged by lack of success, or when just trying to squeeze a paper out quickly, PLoS ONE is a good option.  This is pretty much what the paper said using fancy metrics, I guess it is reassuring to get the same sort of answer from eyeballing scatterplots of the raw data.

There are a few simplifying assumptions in this discussion and in the paper itself - a key one is that all paper are treated as equal, when in fact some are more likely to be accepted than others, and some are more suited to some journals than others.  There are assumptions like citations being the be-all and end-all, and the modelling in the original paper further assumed that the citations a paper will get are a function of the journal it is published in alone, and not to do with the quality of the paper that is published.  But it's all good fun and there are certainly some lessons to be learnt here.