Fitting Random Effects

Using the method of moments or maximum likelihood to estimate parameters

Kris Sankaran true
09-30-2021

Readings 3.9, Rmarkdown

  1. In random effects models, there are three key parameters: \(\mu, \sigma^2_{\tau}, \sigma^2\). Two ways to fit them are (1) the method of moments and (2) maximum likelihood. The different approaches to estimation have different properties, so even though knowing the intricate details of the computation is not necessary, having a basic understanding of how the approaches work can help make sure we use appropriate software implementations.

Method of Moments

  1. For estimating \(\mu\), this method uses the overall mean \(\bar{y}\).

  2. What about the \(\sigma^2\)’s? The key identity is, \[\begin{align*} \mathbf{E}\left[MS_{\text{treatments}}\right] &= \sigma^2 + n \sigma_{\tau}^2 \\ \mathbf{E}\left[MS_{E}\right] &= \sigma^2 \end{align*}\]

  3. We can approximate the expected values using, \[\begin{align*} MS_{\text{treatment}} \approx \sigma^2 + n \sigma_{\tau}^2 \\ MS_{E} \approx \sigma^2 \end{align*}\]

  4. If we pretended these approximations were exact equalities, then we have two equations with two unknowns. The method of moments defines parameter estimates as the solutions to that system of equations. \[\begin{align} \hat{\sigma}^2 &= MS_{E} \\ \hat{\sigma}^2_{\tau} &= \frac{1}{n}\left[MS_{\text{treatment}} - MS_{E}\right] \end{align}\]

  5. How can we get confidence intervals for these estimates?

    • For \(\hat{\mu} = \bar{y}\), we can use \(\text{Var}\left(y\right) = \frac{1}{N} \text{Var}\left(y_{ij}\right)=\frac{1}{N}\left(\sigma^2 + n\sigma^2_{\tau}\right)\)
    • For \(\hat{\sigma}^2\), we can use the fact that \(\frac{N - a}{\sigma^2}MS_{E} \sim \chi^2_{N - a}\).
    • For \(\hat{\sigma}^2_{\tau}\), we’re out of luck, though some papers give approximate confidence intervals.

Maximum Likelihood Estimation

  1. An alternative approach is to use maximum likelihood. The first step is to stack all the \(y_{ij}\)’s into one long length \(N\) vector, and observe that the data are jointly normally distributed, \[ \left(\begin{array}{c} y_{11} \\ y_{12} \\ \vdots \\ y_{a(n-1)} \\ y_{a n} \end{array}\right) \sim \mathcal{N}\left(\mu 1_{N},\left(\begin{array}{cccc} \sigma^{2} I_{n}+\sigma_{\tau}^{2} 1_{n} 1_{n}^{T} & \mathbf{0} & \ldots & \mathbf{0} \\ \mathbf{0} & \sigma^{2} I_{n}+\sigma_{\tau}^{2} 1_{n} 1_{n}^{T} & \ldots & \mathbf{0} \\ \vdots & \vdots & & \vdots \\ \mathbf{0} & \mathbf{0} & \ldots & \sigma^{2} I_{n}+\sigma_{\tau}^{2} 1_{n} 1_{n}^{T} \end{array}\right)\right) \] Here, \(1_{n}\) refers to a vector of \(n\) 1’s stacked in a column and \(I_{n}\) refers to the \(n\times n\) identity matrix.

  1. The specific form of the covariance isn’t important. What is important is that we can exactly evaluate the probability of \(y_{11}, y_{12},\dots, y_{an}\) under any choice of the parameters \(\mu, \sigma^2, \sigma^2_{\tau}\).

  2. Define \(L(\mu, \sigma^2, \sigma^2_{\tau})\) to be the probability of the dataset \(y_{11}, y_{12}, \dots y_{an}\) viewed as a function of the normal distribution’s parameters. A good estimate for these parameters comes from finding the configuration that maximizes \(L\left(\mu, \sigma^2, \sigma^2_{\tau}\right)\). The maximizers can’t be found analytically, but algorithms exist to find the maximizers.

  1. Software also gives a confidence interval for the estimates. It works by studying the curvature of the likelihood function at the maximizer.
A highly peaked likelihood surface will give a smaller confidence interval, because there is a smaller set parameters that gives a good probability to the observed data.

Figure 1: A highly peaked likelihood surface will give a smaller confidence interval, because there is a smaller set parameters that gives a good probability to the observed data.

Code Implementation

  1. Which method should we use in practice? There are a few considerations,
  1. The lme4 package implements the maximum likelihood approach to random effects model estimation; this package is reviewed in the previous notes.

  2. To estimate the method of moments, we can do all the calculations by hand – we just need the results from an ANOVA table.

library(readr)
library(broom)
loom <- read_csv("https://uwmadison.box.com/shared/static/ezp3i2pflhi96si7u6rfn3dg3lb5cl3z.csv")
loom$loom <- as.factor(loom$loom)
N <- nrow(loom)
a <- nlevels(loom$loom)
n <- N / a

aov_table <- aov(lm(strength ~ loom, data = loom)) %>%
  tidy()
  1. The code below computes point estimates for \(\sigma^2\) and \(\sigma_{\tau}^2\) based on the formulas above, using the MSE quantities available in the ANOVA table. It also computes the intraclass correlation coefficient (ICC), which is defined in the reading.
sigma_sqs <- vector(length = 2)
sigma_sqs[1] <- aov_table$meansq[2] # estimate for sigma^2
sigma_sqs[2] <- (aov_table$meansq[1] - aov_table$meansq[2]) / n # estimate for sigma^2_tau

sigma_sqs
[1] 1.895833 6.958333
sigma_sqs[2] / sum(sigma_sqs) # ICC
[1] 0.7858824
  1. We can also use the formulas above to compute confidence intervals for \(\hat{\sigma}^2\). We use a formula from the book to build an analogous interval for the ICC.
int_bounds <- c(0.975, 0.025)
(N - a) * sigma_sqs[1] / qchisq(int_bounds, N - a) # CI for sigma^2
[1] 0.9748608 5.1660065
ratio_bounds <- 1 / n * (aov_table$statistic[1] / qf(int_bounds, a - 1, N - a) - 1)
ratio_bounds / (1 + ratio_bounds) # CI for ICC
[1] 0.3850736 0.9824420