A short description of the post.
Factorial designs cover the situation where effects from multiple sources are crossed with one another. In many situations, however, effects are nested within each other.
You rely on several suppliers to provide raw material and are interested in whether some suppliers provide consistently better material. The material arrives in batches, and there is variation across batches. The batch effect is nested within the supplier effect.
The hardness of different alloys are to be compared. Several ingots are built from each alloy, and repeated measures are drawn from each ingot. The ingot effect is nested within the alloy effect.
We will consider the two-stage design here. The data are imagined to be drawn according to \[\begin{align*} y_{ijk} &= \mu + \alpha_{i} + \beta_{j\left(i\right)} + \epsilon_{ijk}. \end{align*}\]
\(\alpha_i\) is the parent-effect (e.g., supplier effect). For identifiability, assume \(\sum_{i} \alpha_i = 0\).
\(\beta_{j\left(i\right)}\) is the nested-effect associated with parent \(i\) (e.g., effect of the \(j^{th}\) batch within the \(i^{th}\) supplier). For identifiability, assume \(\sum_{j}\beta_{j\left(i\right)} = 0\) for each \(i\).
\(\epsilon_{ijk} \sim N\left(0, \sigma^{2}\right)\) is independent noise.
The key difference from the usual factorial model is that the nested effects \(\beta_{j\left(i\right)}\) are not the same across different parents \(i\).
Fixed-Effects
First, imagine treating all the effects as fixed. In this case, there are two typical hypotheses of interest.
Null parent-effect, \[\begin{align*} &H_{0}: \alpha_{i} = 0 \text{ for all } i \\ &H_{1}: \alpha_{i} \neq 0 \text{ for at least one } i \end{align*}\]
Null child-effect, \[\begin{align*} &H_{0}: \beta_{j\left(i\right)} = 0 \text{ for all } i,j \\ &H_{1}: \beta_{j\left(i\right)} \neq 0 \text{ for at least one } i,j \end{align*}\]
In either case, a test is performed using a sum-of-squares decomposition, \[\begin{align*} SS_{T} &= SS_A + SS_{B\left(A\right)} + SS_{E} \end{align*}\] which is similar in structure to those we have seen before, except we now have a nested effect term, \[\begin{align*} SS_{B\left(A\right)} &= \sum_{i = 1}^{a}\sum_{j = 1}^{b} \left(\bar{y}_{ij\cdot} - \bar{y}_{i\cdot\cdot}\right)^{2}. \end{align*}\]
which measures how much the \(j^{th}\) effect within parent \(i\) varies from the average in that group. The distribution of the associated mean-squares can then be used to perform an ANOVA. Let’s see an implementation, based on the supplier-materials example above. (example 14.1)
ggplot(purity) +
geom_point( aes(x = batch, y = purity) ) +
facet_wrap(~supplier)
Analysis of Variance Table
Response: purity
Df Sum Sq Mean Sq F value Pr(>F)
supplier 2 15.056 7.5278 2.8526 0.07736 .
supplier:batch 9 69.917 7.7685 2.9439 0.01667 *
Residuals 24 63.333 2.6389
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The A/B
notation is just shorthand for A + A:B
; i.e., a main effect for the parent A
and an interaction term between the parent A
and child B
, as the code below makes clear.
Analysis of Variance Table
Response: purity
Df Sum Sq Mean Sq F value Pr(>F)
supplier 2 15.056 7.5278 2.8526 0.07736 .
supplier:batch 9 69.917 7.7685 2.9439 0.01667 *
Residuals 24 63.333 2.6389
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Random Effects
Rather than caring about specific parent or nested effects, we may simply want to know the typical variation due to either factor. For example, we may not care about the effect of the 2nd batch in the 3rd supplier, but we may be curious about the typical size of batch-to-batch variation. In this case, it makes sense to use random effects. We can use a random effect for just the nested factors,
Error: supplier
Df Sum Sq Mean Sq
supplier 2 15.06 7.528
Error: supplier:batch
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 9 69.92 7.769
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 24 63.33 2.639
Linear mixed model fit by REML ['lmerMod']
Formula: purity ~ (1 | supplier/batch)
Data: purity
REML criterion at convergence: 148.7
Scaled residuals:
Min 1Q Median 3Q Max
-1.38226 -0.75533 -0.07592 0.57348 1.71092
Random effects:
Groups Name Variance Std.Dev.
batch:supplier (Intercept) 1.696e+00 1.302285
supplier (Intercept) 9.197e-07 0.000959
Residual 2.639e+00 1.624383
Number of obs: 36, groups: batch:supplier, 12; supplier, 3
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.3611 0.4633 0.779
However, it makes no sense to treat the parent as random, but its child as fixed.
An important detail to keep in mind is that, for nested designs, computation of \(F\)-statistics depends on which terms are treated as fixed or random. For example, suppose \(B\) is nested within \(A\). Then the \(F\) statistic for \(A\) is computed as
\(\frac{MS_A}{MS_E}\) if both \(A\) and \(B\) are fixed
\(\frac{MS_A}{MS_B(A)}\) if \(A\) is fixed but \(B\) is random
\(\frac{MS_{B\left(A\right)}}{MS_{E}}\) if both \(A\) and \(B\) are random