Date: 28th March 2-3pm
Venue: Computer Lab Room 640
Slides:
Measurement Error Modeling
- Measurement error or error-in-variables arises whenever we have imprecise measurements on our predictor variables (or covariates).
- If X is the true covariate and U is the measurement error, then what we observe is
W = X + U: - In a simple regression, we usually assume that our covariates are measured precisely or they represent the true covariate values quite well. But what happens to our estimates if our covariates have measurement error?
SIMEX
- There are a number of ways to deal with measurement error we will use Simulation Extrapolation (SIMEX).
- We assume that U is distributed as N(0,u^2), and that u^2 is known.
1. We generate some random measurement error values U* using u^2.
2. We then add these to our observed covariates (W) which we use to estimate our parameters.
3. We repeat this many times and then take means. - We then keep increasing the variance of the "hypothetical" measurement error, (e.g., using say 1.5 u^2), and repeat steps 1 and 3.
- Finally, we extrapolate back as if there were no measurement error.
The simex-package
- Quadratic terms? Yes.
- Interaction terms? No.
- Heteroskedastic measurement error? Yes.
- Goodness-of- fit or AIC, BIC variable selection? No.
- How do we obtain the known u^2? Repeated measures, validation data or the instrumental knowledge.
- Carroll, R.J., Ruppert, D., Stefanski, L.A. & Crainiceanu C.M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd edn. Chapman and Hall, London.
- Cook, J.R. & Stefanski, L.A. (1994). Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association 89, pp. 1314-1328.
- Lederer, W. and Kuchenho , H. (2006). A short introduction to the SIMEX and MCSIMEX. R News v6.4, pp. 26-31.
R-script:
library(simex)
## Real data: Girth, Height and Volume for Black Cherry Trees.
boxplot(trees)
## Let's try to predict Volume using Girth and Height.
mod_naive=lm(Volume ~ Girth + Height, x = TRUE, data=trees)
summary(mod_naive)
## Fit a SIMEX model (assuming measurement error in Girth).
sd_me = 0.25
mod_simex1 = simex(mod_naive, SIMEXvariable = "Girth", measurement.error = sd_me)
summary(mod_simex1)
plot(mod_simex1,show=c(F,T,F))
## Increase the assumed measurement error and crank up B.
sd_me = 1
mod_simex2 = simex(mod_naive, SIMEXvariable = "Girth", measurement.error = sd_me ,B=200)
summary(mod_simex2)
plot(mod_simex2,show=c(F,T,F))
## Measurement error for both Girth and Height variables.
mod_simex3 = simex(mod_naive, SIMEXvariable = c("Girth","Height"), measurement.error = cbind(sd_me,sd_me), B=200)
summary(mod_simex3)
par(mfrow=c(1,2), las=1)
plot(mod_simex3, show=c(F,T,T))
##################################################################################
## Simulated data example - count data (Poisson regression model).
set.seed(10)
n = 200 # Sample size.
x = rnorm(n, 0, sd=1) # True covariate values.
X = cbind(rep(1,n), x)
sd_me = 0.35
w = x + rnorm(n, 0, sd=sd_me) # Add some measurement error.
beta = c(0.25, 1) # True parameter values.
mu_Y = exp(X%*%beta)
Y = rpois(n, lambda=mu_Y)
mod_naive = glm(Y ~ w, x=TRUE, family = poisson)
mod_naive
mod_simex = simex(mod_naive, SIMEXvariable=c("w"), measurement.error = sd_me)
mod_simex
You would probably have quite different measurement errors for height and girth - girth you can get by wrapping tape around the trunk, height is a bot if guesswork. Let's say the sd of measurement error for girth is 1.5 inches, and for height it is 4 metres.
ReplyDeletemod_simex4 = simex(mod_naive, SIMEXvariable = c("Girth","Height"), measurement.error = cbind(1.5,4), B=200)
summary(mod_simex4)
To see the made-up data for example 2,
ReplyDeletepar(mfrow=c(1,2))
plot(Y~w)
plot(Y~x)
The relationship between Y and x is stronger (because it has been measured without error) and when we fit a measuremnt error model we are trying to recover this relationship between y and x even though we have only observed x with error (AKA w)
par(mfrow=c(1,2))
ReplyDeleteplot(Y~w,log="y")
abline(coef(mod_naive),col="purple")
plot(Y~x,log="y")
abline(coef(mod_simex),col="burlywood")
The second plot should have a steeper line on it, that one has the SIMEX fit