Statistical Experimental Design: Fitting Random Effects

Readings 3.9, Rmarkdown

In random effects models, there are three key parameters: \(\mu, \sigma^2_{\tau}, \sigma^2\). Two ways to fit them are (1) the method of moments and (2) maximum likelihood. The different approaches to estimation have different properties, so even though knowing the intricate details of the computation is not necessary, having a basic understanding of how the approaches work can help make sure we use appropriate software implementations.

Method of Moments

For estimating \(\mu\), this method uses the overall mean \(\bar{y}\).
What about the \(\sigma^2\)’s? The key identity is, \[\begin{align*} \mathbf{E}\left[MS_{\text{treatments}}\right] &= \sigma^2 + n \sigma_{\tau}^2 \\ \mathbf{E}\left[MS_{E}\right] &= \sigma^2 \end{align*}\]
We can approximate the expected values using, \[\begin{align*} MS_{\text{treatment}} \approx \sigma^2 + n \sigma_{\tau}^2 \\ MS_{E} \approx \sigma^2 \end{align*}\]
If we pretended these approximations were exact equalities, then we have two equations with two unknowns. The method of moments defines parameter estimates as the solutions to that system of equations. \[\begin{align} \hat{\sigma}^2 &= MS_{E} \\ \hat{\sigma}^2_{\tau} &= \frac{1}{n}\left[MS_{\text{treatment}} - MS_{E}\right] \end{align}\]
How can we get confidence intervals for these estimates?
- For \(\hat{\mu} = \bar{y}\), we can use \(\text{Var}\left(y\right) = \frac{1}{N} \text{Var}\left(y_{ij}\right)=\frac{1}{N}\left(\sigma^2 + n\sigma^2_{\tau}\right)\)
- For \(\hat{\sigma}^2\), we can use the fact that \(\frac{N - a}{\sigma^2}MS_{E} \sim \chi^2_{N - a}\).
- For \(\hat{\sigma}^2_{\tau}\), we’re out of luck, though some papers give approximate confidence intervals.

Maximum Likelihood Estimation

An alternative approach is to use maximum likelihood. The first step is to stack all the \(y_{ij}\)’s into one long length \(N\) vector, and observe that the data are jointly normally distributed, \[ \left(\begin{array}{c} y_{11} \\ y_{12} \\ \vdots \\ y_{a(n-1)} \\ y_{a n} \end{array}\right) \sim \mathcal{N}\left(\mu 1_{N},\left(\begin{array}{cccc} \sigma^{2} I_{n}+\sigma_{\tau}^{2} 1_{n} 1_{n}^{T} & \mathbf{0} & \ldots & \mathbf{0} \\ \mathbf{0} & \sigma^{2} I_{n}+\sigma_{\tau}^{2} 1_{n} 1_{n}^{T} & \ldots & \mathbf{0} \\ \vdots & \vdots & & \vdots \\ \mathbf{0} & \mathbf{0} & \ldots & \sigma^{2} I_{n}+\sigma_{\tau}^{2} 1_{n} 1_{n}^{T} \end{array}\right)\right) \] Here, \(1_{n}\) refers to a vector of \(n\) 1’s stacked in a column and \(I_{n}\) refers to the \(n\times n\) identity matrix.

The specific form of the covariance isn’t important. What is important is that we can exactly evaluate the probability of \(y_{11}, y_{12},\dots, y_{an}\) under any choice of the parameters \(\mu, \sigma^2, \sigma^2_{\tau}\).
Define \(L(\mu, \sigma^2, \sigma^2_{\tau})\) to be the probability of the dataset \(y_{11}, y_{12}, \dots y_{an}\) viewed as a function of the normal distribution’s parameters. A good estimate for these parameters comes from finding the configuration that maximizes \(L\left(\mu, \sigma^2, \sigma^2_{\tau}\right)\). The maximizers can’t be found analytically, but algorithms exist to find the maximizers.

Software also gives a confidence interval for the estimates. It works by studying the curvature of the likelihood function at the maximizer.

A highly peaked likelihood surface will give a smaller confidence interval, because there is a smaller set parameters that gives a good probability to the observed data.

Figure 1: A highly peaked likelihood surface will give a smaller confidence interval, because there is a smaller set parameters that gives a good probability to the observed data.

Code Implementation

Which method should we use in practice? There are a few considerations,

If confidence intervals are necessary for more than just \(\hat{\mu}\) and \(\hat{\sigma}^2\), then maximum likelihood should be used.
It can turn out that the method of moments returns negative values for \(\hat{\sigma}^2\). It is solving an algebraic equation and has no prior knowledge that variances should always be positive. If a negative variance is likely to be problematic, then maximum likelihood should be used.
Maximum likelihood is slow, because it requires an iterative optimization. The method of moments only requires the solution to a system of equations, though, and so is very fast. It is often the method of choice in large scale problems.

The lme4 package implements the maximum likelihood approach to random effects model estimation; this package is reviewed in the previous notes.
To estimate the method of moments, we can do all the calculations by hand – we just need the results from an ANOVA table.

library(readr)
library(broom)
loom <- read_csv("https://uwmadison.box.com/shared/static/ezp3i2pflhi96si7u6rfn3dg3lb5cl3z.csv")
loom$loom <- as.factor(loom$loom)
N <- nrow(loom)
a <- nlevels(loom$loom)
n <- N / a

aov_table <- aov(lm(strength ~ loom, data = loom)) %>%
  tidy()

The code below computes point estimates for \(\sigma^2\) and \(\sigma_{\tau}^2\) based on the formulas above, using the MSE quantities available in the ANOVA table. It also computes the intraclass correlation coefficient (ICC), which is defined in the reading.

sigma_sqs <- vector(length = 2)
sigma_sqs[1] <- aov_table$meansq[2] # estimate for sigma^2
sigma_sqs[2] <- (aov_table$meansq[1] - aov_table$meansq[2]) / n # estimate for sigma^2_tau

sigma_sqs

[1] 1.895833 6.958333

sigma_sqs[2] / sum(sigma_sqs) # ICC

[1] 0.7858824

We can also use the formulas above to compute confidence intervals for \(\hat{\sigma}^2\). We use a formula from the book to build an analogous interval for the ICC.

int_bounds <- c(0.975, 0.025)
(N - a) * sigma_sqs[1] / qchisq(int_bounds, N - a) # CI for sigma^2

[1] 0.9748608 5.1660065

ratio_bounds <- 1 / n * (aov_table$statistic[1] / qf(int_bounds, a - 1, N - a) - 1)
ratio_bounds / (1 + ratio_bounds) # CI for ICC

[1] 0.3850736 0.9824420