A short description of the post.
There are times when we would like to perform a two-factor factorial experiment across blocks, but one of the factors is much more difficult to vary than the other. For example,
We want to assess the effect of different irrigation (\(A\)) and corn varieties (\(B\)) on total yield. There are three fields (Blocks) within which to gather data. It is hard to change irrigation strategy on small subplots of land, especially compared to corn variety.
We run a papermill, and want to compare pulp preparation strategies (\(A\)) and cooking temperatures (\(B\)). Samples are collected over three days (Blocks). It is hard to change the pulp preparation strategy from sample to sample — we would rather prepare a few big batches — but it’s easy to cook them at different temperatures.
This makes a true \(2^{2}\) factorial experiment impractical, because it would require randomizing over all combinations of \(A\) and \(B\) for every sample that we collect.
If we had 3 irrigation strategies and 6 corn varieties, we would need to divide each field into 18 subplots and randomize the assignment of irrigation x corn pairs.
It’s much easier to first divide each field into 3 large plots, and then randomly assign corn varieties to 6 subplots within each large plot.
Effectively, practical considerations impose a restriction on randomization.
The model for a split-plot design is
\[\begin{align*} y_{ijk} &= \mu + \tau_{k} + \alpha_{i} + \beta_{j} + \left(\tau\alpha\right)_{ki} + \left(\alpha\beta\right)_{ij} + \epsilon_{ijk} \end{align*}\]
where \(\epsilon \sim N\left(0, \sigma^2\right)\) independently.
The terms can be interpreted as,
\(\mu\): The global response average.
\(\tau_{k}\): The effect of the \(k^{th}\) block (e.g., \(k^{th}\) field).
\(\alpha_{i}\): The effect of the \(i^{th}\) level of \(A\), the hard-to-granularly-randomize factor (e.g., \(i^{th}\) irrigation strategy).
\(\beta_{j}\): The effect of the \(j^{th}\) level of \(B\), the easy-to-granularly-randomize factor (e.g., \(j^{th}\) corn variety)
\(\left(\tau\alpha\right)_{ki}\): An interaction factor between the \(k^{th}\) block and the \(i^{th}\) level of \(A\) (e.g., the \(i^{th}\) irrigation strategy within the \(k^{th}\) field might be unusually good).
\(\left(\alpha\beta\right)_{ij}\): An interaction factor between the \(i^{th}\) level of \(A\) and the \(j^{th}\) level of \(B\).
We will typically not care about individual block effects, though we will care about the two different treatments. Therefore, it is common to
Use random effects for \(\tau_{k}, \left(\tau\alpha\right)_{ki}\).
Use fixed-effects for \(\alpha_{i}\) and \(\beta_{j}\).
The expected mean squares associated with each of the terms above can be derived in closed form. Here, we will simply illustrate their use through the lme4
package. The data are from the papermill experiment described above.
ggplot(pulp) +
geom_point(
aes(x = Temperature, y = Strength, col = Day),
size = 3
) +
scale_color_brewer(palette = "Set2") +
facet_grid(Method ~ .)
Error: Day
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 2 77.56 38.78
Error: Day:Method
Df Sum Sq Mean Sq F value Pr(>F)
Method 2 128.39 64.19 7.078 0.0485 *
Residuals 4 36.28 9.07
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Temperature 3 434.1 144.69 36.427 7.45e-08 ***
Method:Temperature 6 75.2 12.53 3.154 0.0271 *
Residuals 18 71.5 3.97
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Fitting a \(2^{2}\) factorial model when the data were collected with restrictions on randomization can lead to misleading results. The code below fits an ordinary \(2^{2}\) factorial model to the papermill data. Note the overconfidence about an effect of Method
.
Df Sum Sq Mean Sq F value Pr(>F)
Method 2 128.4 64.19 8.313 0.00181 **
Temperature 3 434.1 144.69 18.737 1.76e-06 ***
Method:Temperature 6 75.2 12.53 1.622 0.18426
Residuals 24 185.3 7.72
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1