Statistical Experimental Design: Nested Designs

Readings 14.1, Rmarkdown

library(dplyr)
library(ggplot2)
library(lme4)
library(readr)
library(EBImage)
library(reshape2)
theme_set(theme_bw() + theme(legend.position = "bottom"))

display(readImage("https://uwmadison.box.com/shared/static/kbs80435ds41u0qp7typvmuiaibyjlpm.png"))

Figure 1: The nested structure in the suppliers example

Factorial designs cover the situation where effects from multiple sources are crossed with one another. In many situations, however, effects are nested within each other.

You rely on several suppliers to provide raw material and are interested in whether some suppliers provide consistently better material. The material arrives in batches, and there is variation across batches. The batch effect is nested within the supplier effect.
The hardness of different alloys are to be compared. Several ingots are built from each alloy, and repeated measures are drawn from each ingot. The ingot effect is nested within the alloy effect.

Model

display(readImage("https://uwmadison.box.com/shared/static/g6yb4uu8nncnqc39rbjhh3kz8bv9jzwc.png"))

Figure 2: Notation used in nested designs. Parentheses denote the parent branch.

We will consider the two-stage design here. The data are imagined to be drawn according to \[\begin{align*} y_{ijk} &= \mu + \alpha_{i} + \beta_{j\left(i\right)} + \epsilon_{ijk}. \end{align*}\]

display(readImage("https://uwmadison.box.com/shared/static/fy3jwyeou1bu6nyo1i2lypg8juj80z8k.png"))

Figure 3: Another way to look at the nested coefficient structure.

\(\alpha_i\) is the parent-effect (e.g., supplier effect). For identifiability, assume \(\sum_{i} \alpha_i = 0\).
\(\beta_{j\left(i\right)}\) is the nested-effect associated with parent \(i\) (e.g., effect of the \(j^{th}\) batch within the \(i^{th}\) supplier). For identifiability, assume \(\sum_{j}\beta_{j\left(i\right)} = 0\) for each \(i\).
\(\epsilon_{ijk} \sim N\left(0, \sigma^{2}\right)\) is independent noise.

The key difference from the usual factorial model is that the nested effects \(\beta_{j\left(i\right)}\) are not the same across different parents \(i\).

Inference

Fixed-Effects

First, imagine treating all the effects as fixed. In this case, there are two typical hypotheses of interest.

display(readImage("https://uwmadison.box.com/shared/static/6bona7dg8dl6fbnfvetovq2cl9l5zo11.png"))

Figure 4: The nested effect term looks at the variation across child means along a single branch.

Null parent-effect, \[\begin{align*} &H_{0}: \alpha_{i} = 0 \text{ for all } i \\ &H_{1}: \alpha_{i} \neq 0 \text{ for at least one } i \end{align*}\]
Null child-effect, \[\begin{align*} &H_{0}: \beta_{j\left(i\right)} = 0 \text{ for all } i,j \\ &H_{1}: \beta_{j\left(i\right)} \neq 0 \text{ for at least one } i,j \end{align*}\]

In either case, a test is performed using a sum-of-squares decomposition, \[\begin{align*} SS_{T} &= SS_A + SS_{B\left(A\right)} + SS_{E} \end{align*}\] which is similar in structure to those we have seen before, except we now have a nested effect term, \[\begin{align*} SS_{B\left(A\right)} &= \sum_{i = 1}^{a}\sum_{j = 1}^{b} \left(\bar{y}_{ij\cdot} - \bar{y}_{i\cdot\cdot}\right)^{2}. \end{align*}\]

which measures how much the \(j^{th}\) effect within parent \(i\) varies from the average in that group. The distribution of the associated mean-squares can then be used to perform an ANOVA. Let’s see an implementation, based on the supplier-materials example above. (example 14.1)

purity <- read_csv("https://uwmadison.box.com/shared/static/uub0t6lvii52rxyygb2yt4ph3vmionjz.csv") %>%
  mutate(
    supplier = as.factor(supplier),
    batch = as.factor(batch)
  )

ggplot(purity) +
  geom_point( aes(x = batch, y = purity) ) +
  facet_wrap(~supplier)

Figure 5: Data from the supplier example.

fit <- lm(purity ~ supplier + supplier / batch, data = purity)
anova(fit)

Analysis of Variance Table

Response: purity
               Df Sum Sq Mean Sq F value  Pr(>F)  
supplier        2 15.056  7.5278  2.8526 0.07736 .
supplier:batch  9 69.917  7.7685  2.9439 0.01667 *
Residuals      24 63.333  2.6389                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The A/B notation is just shorthand for A + A:B; i.e., a main effect for the parent A and an interaction term between the parent A and child B, as the code below makes clear.

fit_explicit <- lm(purity ~ supplier + supplier : batch, data = purity)
anova(fit_explicit)

Analysis of Variance Table

Response: purity
               Df Sum Sq Mean Sq F value  Pr(>F)  
supplier        2 15.056  7.5278  2.8526 0.07736 .
supplier:batch  9 69.917  7.7685  2.9439 0.01667 *
Residuals      24 63.333  2.6389                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Random Effects

Rather than caring about specific parent or nested effects, we may simply want to know the typical variation due to either factor. For example, we may not care about the effect of the 2nd batch in the 3rd supplier, but we may be curious about the typical size of batch-to-batch variation. In this case, it makes sense to use random effects. We can use a random effect for just the nested factors,

display(readImage("https://uwmadison.box.com/shared/static/twey7ufa1zeaq8lgjpdg1uu66a7fi2l4.png"))

Figure 6: We might imagine the batch effects are drawn from some distribution, and do inference on the variance of that distribution.

fit <- aov(purity ~ supplier + Error(supplier/batch), data = purity)
summary(fit)


Error: supplier
         Df Sum Sq Mean Sq
supplier  2  15.06   7.528

Error: supplier:batch
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9  69.92   7.769               

Error: Within
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 24  63.33   2.639

or for both the nested and parent factors,

fit <- lmer(purity ~ (1 | supplier / batch), data = purity)
summary(fit)

Linear mixed model fit by REML ['lmerMod']
Formula: purity ~ (1 | supplier/batch)
   Data: purity

REML criterion at convergence: 148.7

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.38226 -0.75533 -0.07592  0.57348  1.71092 

Random effects:
 Groups         Name        Variance  Std.Dev.
 batch:supplier (Intercept) 1.696e+00 1.302285
 supplier       (Intercept) 9.197e-07 0.000959
 Residual                   2.639e+00 1.624383
Number of obs: 36, groups:  batch:supplier, 12; supplier, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept)   0.3611     0.4633   0.779

However, it makes no sense to treat the parent as random, but its child as fixed.

An important detail to keep in mind is that, for nested designs, computation of \(F\)-statistics depends on which terms are treated as fixed or random. For example, suppose \(B\) is nested within \(A\). Then the \(F\) statistic for \(A\) is computed as

\(\frac{MS_A}{MS_E}\) if both \(A\) and \(B\) are fixed
\(\frac{MS_A}{MS_B(A)}\) if \(A\) is fixed but \(B\) is random
\(\frac{MS_{B\left(A\right)}}{MS_{E}}\) if both \(A\) and \(B\) are random