An introduction to random effects models
Sometimes a factor has so many levels, that we can’t collect observations for all of them. Or, even if we could collect them, having one parameter for each would lead to a clumsy model.
In this case, we typically settle for saying the effect of the factor on average, rather than than trying to estimating the effects of every single level of that factor.
Examples,
library(readr)
loom <- read_csv("https://uwmadison.box.com/shared/static/ezp3i2pflhi96si7u6rfn3dg3lb5cl3z.csv")
loom$loom <- as.factor(loom$loom)
loom
# A tibble: 16 × 3
loom replicate strength
<fct> <chr> <dbl>
1 1 Ob1 98
2 2 Ob1 91
3 3 Ob1 96
4 4 Ob1 95
5 1 Ob2 97
6 2 Ob2 90
7 3 Ob2 95
8 4 Ob2 96
9 1 Ob3 99
10 2 Ob3 93
11 3 Ob3 97
12 4 Ob3 99
13 1 Ob4 96
14 2 Ob4 92
15 3 Ob4 95
16 4 Ob4 98
lmer
function in the lme4
package. The syntax (1 | variable)
means that this variable should be treated as a random effect. Compare the result with Example 3.10 in the textbook.Linear mixed model fit by REML ['lmerMod']
Formula: strength ~ (1 | loom)
Data: loom
REML criterion at convergence: 63.2
Scaled residuals:
Min 1Q Median 3Q Max
-1.38018 -0.57260 -0.04342 0.82574 1.52491
Random effects:
Groups Name Variance Std.Dev.
loom (Intercept) 6.958 2.638
Residual 1.896 1.377
Number of obs: 16, groups: loom, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 95.437 1.363 70.01
Variance
column. These are the \(\sigma^2_{\tau}\) and \(\sigma^2\) quantities in the model above; they are sometimes called “variance components.” We would interpret this result as meaning that there is quite a large variation in strength from one loom to the next. Even though the average fabric strength is around 95.4 across all looms (the Intercept
field), if you drew a new loom, its typical fabric strength might be several points higher or lower. Precisely, the distribution of loom mean strengths is approximately \(\mathcal{N}\left(95, 7\right)\). Within any given loom, though, the strength is not too variable, with a variance of only \(\approx 2\).