General Factorial Designs

Factorial designs with arbitrary numbers of factors

Kris Sankaran true
10-21-2021

Readings 5.4, Rmarkdown

  1. We’ll discuss three factor factorial designs, with the hope that what we learn will generalize to arbitrary numbers of factors. In the three factor design, we use the model

\[y_{ijkl} = \mu + \tau_i + \beta_j + \gamma_k + \left(\tau \beta\right)_{ij} + \left(\tau \gamma\right)_{ik} + \left(\beta \gamma\right)_{jk} + \left(\tau \beta \gamma\right)_{ijk} + \epsilon_{ijkl}\]

where \(\epsilon_{ijkl} \sim N\left(0, \sigma^2\right)\). Suppose that the first, second, and third factors have \(a, b\), and \(c\) levels, respectively.

  1. We’re dangerously close to getting lost in index purgatory, but notice certain symmetries,
    • We have main effects for each factor
      • \(\tau_i, \beta_j, \gamma_k\)
    • We have two-way interactions for each pair of factors
      • \(\left(\tau\beta\right)_{ij}, \dots\)
    • We have a three-way interaction, between all factors
      • \(\left(\tau\beta\gamma\right)_{ijk}\)

  1. We can calculate sum-of-squares terms for each of the terms. Notice that there are also certain symmetries in the degrees of freedom,

    • \(SS_A = a - 1\)
    • \(SS_B = b - 1\)
    • \(SS_C = c - 1\)
    • \(SS_{AB} = (a - 1)(b - 1)\)
    • \(SS_{BC} = (b - 1)(c - 1)\)
    • \(SS_{ABC} = (a - 1)(b - 1)(c - 1)\)

    What do you think is the pattern for arbitrary \(K\).

  2. For testing, we will compare these sums-of-squares to \(SS_E\), which has \(abc(n - 1)\) degrees of freedom. The \(F\)-statistics for any of the terms above can be found by dividing the associate mean squares against \(MS_E\). Hence, we can test whether any of the terms is nonzero for at least one value of its index.

Data Example

  1. Let’s look at a \(2^3\) design (3 factors with two levels each). The goal is to see how the etch rate on a chip varies as we change (A) gap between electrodes,
  1. power level, and (C) gas flow rate.
plasma <- read.table("https://uwmadison.box.com/shared/static/f3sggiltyl5ycw1gu1vq7uv7omp4pjdg.txt", header=TRUE)
  1. Looking at the data, there seems to be a strong interaction between A (the x-axis) and C (the pairs of columns): the slope of the effect of A switches when we go from one C configuration to the other.
ggplot(plasma) +
  geom_point(aes(A, Rate)) +
  facet_grid(B ~ C)

  1. We can quantify the strength of these relationships by estimating the model and evaluating the relevant \(F\)-statistics. The * syntax refers to all main and interaction effects derived from the linked variables.
fit <- lm(Rate ~ A * B * C, plasma)
summary(aov(fit))
            Df Sum Sq Mean Sq F value   Pr(>F)    
A            1  41311   41311  18.339 0.002679 ** 
B            1    218     218   0.097 0.763911    
C            1 374850  374850 166.411 1.23e-06 ***
A:B          1   2475    2475   1.099 0.325168    
A:C          1  94403   94403  41.909 0.000193 ***
B:C          1     18      18   0.008 0.930849    
A:B:C        1    127     127   0.056 0.818586    
Residuals    8  18020    2253                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1