Processing math: 100%

Random Effects

An introduction to random effects models

Author

Affiliation

Kris Sankaran

Published

Sept. 28, 2021

DOI

Readings 3.9, Rmarkdown

Sometimes a factor has so many levels, that we can’t collect observations for all of them. Or, even if we could collect them, having one parameter for each would lead to a clumsy model.
In this case, we typically settle for saying the effect of the factor on average, rather than than trying to estimating the effects of every single level of that factor.
Examples,
- Is there a middle school effect on high school graduation? (instead of effects for individual schools)
- Is there a loom effect on fiber strength? (don’t care about individual looms)
- Is there a microbiome effect on preterm births?
- Is there a county effect on election outcomes?

Figure 1: In random effects, the groups (schools, looms, …) we observe are assumed sampled from a larger population.

Model

Random effects models have the form, $\begin{align} y_{ij} = \mu + \tau_i + \epsilon_{ij} \end{align}$ where $\tau_i \sim \mathcal{N}\left(0, \sigma_\tau^2\right)$ and $\epsilon_{ij} \sim \mathcal{N}\left(0, \sigma^2\right)$ . The crucial difference is that $\tau_i$ is now thought of as random, not fixed.

Notice that $\begin{align} \text{Var}\left(y_{ij}\right) &= \text{Var}\left(\tau_j\right) + \text{Var}\left(\epsilon_{ij}\right) \\ &= \sigma_{\tau}^2 + \sigma^2 \end{align}$ More generally, the covariance matrix is block diagonal, with blocks of $\sigma_{\tau}^2$ within groups.

Hypothesis Testing

We may want to test whether there is any variation in response across factor levels. Formally, $\begin{align*} H_0:& \sigma_\tau^2 = 0 \\ H_1:& \sigma^2 > 0 \end{align*}$

Figure 2: Under the null, there is no difference between any of the groups.

For our test statistic, we can use the same one as before, $\frac{MS_{\text{treatments}}}{MS_{E}}$ and for the same reasons as in fixed-effect ANOVA, this is $F$ -distributed with $(a - 1, N - a)$ d.f. – we reject the null when this quantity is large.

Code Example

We will illustrate this method on a dataset about the strength of looms in a factory. The block below reads in the data.

library(readr)
loom <- read_csv("https://uwmadison.box.com/shared/static/ezp3i2pflhi96si7u6rfn3dg3lb5cl3z.csv")
loom$loom <- as.factor(loom$loom)
loom

# A tibble: 16 × 3
   loom  replicate strength
   <fct> <chr>        <dbl>
 1 1     Ob1             98
 2 2     Ob1             91
 3 3     Ob1             96
 4 4     Ob1             95
 5 1     Ob2             97
 6 2     Ob2             90
 7 3     Ob2             95
 8 4     Ob2             96
 9 1     Ob3             99
10 2     Ob3             93
11 3     Ob3             97
12 4     Ob3             99
13 1     Ob4             96
14 2     Ob4             92
15 3     Ob4             95
16 4     Ob4             98

To fit a random effects model, we can use the lmer function in the lme4 package. The syntax (1 | variable) means that this variable should be treated as a random effect. Compare the result with Example 3.10 in the textbook.

library(lme4)
fit <- lmer(strength ~ (1 | loom), data = loom)
summary(fit)

Linear mixed model fit by REML ['lmerMod']
Formula: strength ~ (1 | loom)
   Data: loom

REML criterion at convergence: 63.2

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.38018 -0.57260 -0.04342  0.82574  1.52491 

Random effects:
 Groups   Name        Variance Std.Dev.
 loom     (Intercept) 6.958    2.638   
 Residual             1.896    1.377   
Number of obs: 16, groups:  loom, 4

Fixed effects:
            Estimate Std. Error t value
(Intercept)   95.437      1.363   70.01

The most important quantities in this computer output are entries of the Variance column. These are the $\sigma^2_{\tau}$ and $\sigma^2$ quantities in the model above; they are sometimes called “variance components.” We would interpret this result as meaning that there is quite a large variation in strength from one loom to the next. Even though the average fabric strength is around 95.4 across all looms (the Intercept field), if you drew a new loom, its typical fabric strength might be several points higher or lower. Precisely, the distribution of loom mean strengths is approximately $\mathcal{N}\left(95, 7\right)$ . Within any given loom, though, the strength is not too variable, with a variance of only $\approx 2$ .