Statistical Experimental Design: Split-Plot Designs

Readings 14.4, Rmarkdown

library("dplyr")
library("ggplot2")
library("readr")
library("EBImage")
library("reshape2")
theme_set(theme_bw() + theme(legend.position = "bottom"))

display(readImage("https://uwmadison.box.com/shared/static/qtxsckab5vcxffrhbpepj9peq28m7zwn.png"))

Figure 1: A single field from a true factorial design in the irrigation example.

display(readImage("https://uwmadison.box.com/shared/static/9s2cmkmgc45po62yzm8eltry8cy7rv42.png"))

Figure 2: An alternative split plot structure. The irrigation strategies are now rows, and each cell is a subplot.

There are times when we would like to perform a two-factor factorial experiment across blocks, but one of the factors is much more difficult to vary than the other. For example,

We want to assess the effect of different irrigation (\(A\)) and corn varieties (\(B\)) on total yield. There are three fields (Blocks) within which to gather data. It is hard to change irrigation strategy on small subplots of land, especially compared to corn variety.
We run a papermill, and want to compare pulp preparation strategies (\(A\)) and cooking temperatures (\(B\)). Samples are collected over three days (Blocks). It is hard to change the pulp preparation strategy from sample to sample — we would rather prepare a few big batches — but it’s easy to cook them at different temperatures.

This makes a true \(2^{2}\) factorial experiment impractical, because it would require randomizing over all combinations of \(A\) and \(B\) for every sample that we collect.

If we had 3 irrigation strategies and 6 corn varieties, we would need to divide each field into 18 subplots and randomize the assignment of irrigation x corn pairs.
It’s much easier to first divide each field into 3 large plots, and then randomly assign corn varieties to 6 subplots within each large plot.

Effectively, practical considerations impose a restriction on randomization.

Model

The model for a split-plot design is

\[\begin{align*} y_{ijk} &= \mu + \tau_{k} + \alpha_{i} + \beta_{j} + \left(\tau\alpha\right)_{ki} + \left(\alpha\beta\right)_{ij} + \epsilon_{ijk} \end{align*}\]

where \(\epsilon \sim N\left(0, \sigma^2\right)\) independently.

The terms can be interpreted as,

\(\mu\): The global response average.
\(\tau_{k}\): The effect of the \(k^{th}\) block (e.g., \(k^{th}\) field).
\(\alpha_{i}\): The effect of the \(i^{th}\) level of \(A\), the hard-to-granularly-randomize factor (e.g., \(i^{th}\) irrigation strategy).
\(\beta_{j}\): The effect of the \(j^{th}\) level of \(B\), the easy-to-granularly-randomize factor (e.g., \(j^{th}\) corn variety)
\(\left(\tau\alpha\right)_{ki}\): An interaction factor between the \(k^{th}\) block and the \(i^{th}\) level of \(A\) (e.g., the \(i^{th}\) irrigation strategy within the \(k^{th}\) field might be unusually good).
\(\left(\alpha\beta\right)_{ij}\): An interaction factor between the \(i^{th}\) level of \(A\) and the \(j^{th}\) level of \(B\).

We will typically not care about individual block effects, though we will care about the two different treatments. Therefore, it is common to

Use random effects for \(\tau_{k}, \left(\tau\alpha\right)_{ki}\).
Use fixed-effects for \(\alpha_{i}\) and \(\beta_{j}\).

The expected mean squares associated with each of the terms above can be derived in closed form. Here, we will simply illustrate their use through the lme4 package. The data are from the papermill experiment described above.

pulp <- read_csv("https://uwmadison.box.com/shared/static/843r3mda46is46nbb5b6393u7caz8orq.csv") %>%
  mutate_at(vars(Day, Method, Temperature), as.factor)

ggplot(pulp) +
  geom_point(
    aes(x = Temperature, y = Strength, col = Day),
    size = 3
  ) +
  scale_color_brewer(palette = "Set2") +
  facet_grid(Method ~ .)

Figure 3: Data from the papermill experiment.

fit <- aov(Strength ~ Method * Temperature + Error(Day / Method), data = pulp)
summary(fit)


Error: Day
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  2  77.56   38.78               

Error: Day:Method
          Df Sum Sq Mean Sq F value Pr(>F)  
Method     2 128.39   64.19   7.078 0.0485 *
Residuals  4  36.28    9.07                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Error: Within
                   Df Sum Sq Mean Sq F value   Pr(>F)    
Temperature         3  434.1  144.69  36.427 7.45e-08 ***
Method:Temperature  6   75.2   12.53   3.154   0.0271 *  
Residuals          18   71.5    3.97                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Split-Plot is not \(2^{2}\) factorial

Fitting a \(2^{2}\) factorial model when the data were collected with restrictions on randomization can lead to misleading results. The code below fits an ordinary \(2^{2}\) factorial model to the papermill data. Note the overconfidence about an effect of Method.

fit <- aov(Strength ~ Method * Temperature, data = pulp)
summary(fit)

                   Df Sum Sq Mean Sq F value   Pr(>F)    
Method              2  128.4   64.19   8.313  0.00181 ** 
Temperature         3  434.1  144.69  18.737 1.76e-06 ***
Method:Temperature  6   75.2   12.53   1.622  0.18426    
Residuals          24  185.3    7.72                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1