Monday, 24 February 2014

Eco-Stats Lab, Feb 2014 - SMATR

The SMATR package (Standardised Major Axis estimation and Testing Routines) is designed for when you are fitting lines and:
- you are primarily interested in the slope (rather than significance or strength of association)
- the problem is symmetric, i.e. you could happily swap which variable is on which axis without changing the meaning of what you are doing.  Or put another way, rather than predicting Y from X (regression), you have a pair of Y variables (Y1 and Y2) and you want to see how they are related to each other.

This situation commonly arises in allometry (the study of how one size variable scales against another), this is the main place these methods are useful in ecology.

These methods were NOT designed to deal with situations where you have measurement error in X (a very widely held misconception).  That is a completely unrelated issue and measurement error should not be used to inform whether you use SMATR or not (nor does it inform which SMATR method you use).

# Example: does brain size scale as the 2/3 power of body size? (Thus brain surface area would scale against mass)
# From a linear regression of brain size against body size, the estimated slope is about 0.5, 95% CI does not cover 2/3, so we would conclude that there is some evidence that the relationshpi is flatter than 2/3 scaling.

# Reversing axes now you might expect the slope to be 1/0.5=2, and the CI to not cover 1.5, the inverse of 2/3.  Nope - the estimated slope is about 1.2 (much flatter than 2), and the 95% CI covers 1.5, so we would reach the opposite conclusion - there is no evidence that the relationship differs from 2/3 scaling.

The problem with using regression in allometry is regression to the mean, slopes are flatter than you might expect.  Regression lines are designed for predicting Y from X and are great for that situation.  But that is not what we are trying to do here - instead of minimising error predicting Y, we want to minimise the straight-line distance of points from the line, or something similiar).

The Major Axis (MA) minimises straight-line distance of points from the line, the Standardised Major Axis (SMA) first standardises data before doing so. SMA is scale invariant (doubling Y doubles the slope), MA is rotation invariant (who cares).  Some have argued SMA is better if your variables are measured on different scales (e.g. weight in grams vs energy in Joules) hence their scaling is arbitrary.  These methods are closely related to principal components analysis, and SMATR is designed for estimation and inference about (S)MA.

  ft=sma(brain~body, data=Animals, log="xy")
#what happens if you reverse the axes?  Do you get the same answer?

#to fit an MA instead:
  ma(brain~body, data=Animals, log="xy")

  ft=sma(brain~body, data=Animals,log="xy",slope.test=2/3)

  ft=ma(brain~body, data=Animals,log="xy")
  plot(ft)  #plots the data with the line
  plot(ft,which="residual")   #plots residuals
  qqnorm( residuals(ft) )
# I've got to say I'm not happy about those outliers... do they have undue influence on the fit?

# You can fit SMATR lines that are less sensitive to outliers.  These methods were developed by Sara Taskinen (U Jyvaskyla, Finland) using Huber's M estimation (fancy method which downweights outliers).
  ftRobust=sma(brain~body, data=Animals,log="xy",robust=T)
#note that while the interpretation hasn't changed here (2/3 still in interval) the fitted line is now much steeper and the CI is much narrower - it is getting thrown less by the outliers so gives a better estimate of the line
# note that the line fits the bulk of the data better, it is not dragged down as much by the outliers

#Sometimes you have multiple lines that you want to compare.  It might be of interest to compare slopes, or to assume common slope and to compare elevations, or compare locations along a common axis.  A cool graph of this, courtesy of Dan Falster (Macquarie):

#Consider leaf longevity vs Leaf Mass per Area of a bunch of species at four sites with varying soil nutrients/rainfall:
  ftComSlope=sma(longev~lma*site, log="xy", data=leaflife)
#Is there evidence that the slope of the lma-longevity SMA varies across sites?

#Multiple comparisons - which sites differ from which in SMA slope? (using Sidak adjustment)
  sma(longev~lma*site, log="xy", data=leaflife, multcomp=T, multcompmethod="adjust")

#Subset to sites 2 and 4 to compare elevation at these low soil P sites:
  leaf.low.soilp <- subset(leaflife, soilp == "low")
  ftElev <- sma(longev~lma+rain, log="xy", data=leaf.low.soilp)

#To test for evidence of shift along the SMA between sites:
  ftShift <- sma(longev~lma+rain, log="xy", type="shift", data=leaf.low.soilp)

#There are robust versions of all of these tests too.

Warton et al (2006) Bivariate line-fitting methods for allometry, Biological Reviews

Smith (2009) Use and misuse of the reduced major axis for line-fitting, American Journal of Physical Anthropology

Warton et al (2012) smatr 3 - an R package for estimation and testing of allometric lines, Methods in Ecology and Evolution

Taskinen & Warton (2013) Robust tests for one or more allometric lines, Journal of Theoretical Biology

No comments:

Post a Comment