2^K Designs and Regression

How effect estimates can be found using linear regression.

Kris Sankaran true
11-09-2021

Readings 6.7, Rmarkdown

  1. In all the code examples, we’ve been fitting \(2^K\) designs using the lm function. Why does this work? In these notes, we’ll see how our effect estimates can be viewed through the lens of linear regression.

The Design Matrix

  1. Let’s write our contrasts summary table linear algebraically. Remember our notation,
A B C AB label
- - - + (1)
+ - - - a
- + - - b
- - + + c
+ + - + ab
+ - + - ac
- + + - bc
+ + + + abc

We’ll translate this into

\[\begin{align*} X = \begin{pmatrix} 1 & -1 & -1 & - 1 & 1 \\ 1 & 1 & -1 & -1 & -1 \\ 1 & -1 & 1 & -1 & -1 \\ 1 & -1 & -1 & 1 & 1 \\ 1 & 1 & 1 & -1 & 1 \\ 1 & 1 & -1 & 1 & -1 \\ 1 & -1 & 1 & 1 & -1 \\ 1 & 1 & 1 & 1 & 1 \end{pmatrix} && y = \begin{pmatrix} (1) \\ a \\ b \\ c \\ ab \\ ac \\ bc \\ abc \end{pmatrix} \end{align*}\]

For \(X\), we’ve just added an intercept column of \(1\)’s and concatenated it with the contrasts written as \(\pm 1\)’s. As before, \((1)\) means the value of the sample taken at the \((1)\) corner of the cube, \(a\) means the sample at the \(a\) corner, etc.

  1. Let’s observe some properties of this matrix,
    • It has \(2^{K}\) rows, one per replicate on each corner of the cube.
    • It has \(2^{K}\) columns. To see why, notice that there are
      • 1 intercept, \(K\) main effects, and \({K \choose 2}\) two-way interactions, which adds up to \(2 ^ K\).
      • Moreover, if our \(K > 2\), we’d have \({K \choose 3}\) three-way interactions, etc. and that, by the binomial theorem, \(\left(1 + 1\right)^{K} = \sum_{j \leq K} {K \choose j}1^{j}1^{K - j}\)
    • The columns are orthogonal. Their sum of squares are all \(2^{K}\). Hence \(X^{T}X = 2^{K}I_{2^{K}}\).

\(\hat{\beta}\) and Effect Estimates

  1. In linear regression, the least squares solution \(\hat{\beta}\) is the vector that optimizes

\[\begin{align*} \hat{\beta} := \arg\min_{\beta \in \mathbb{R}^{2^{K}}} \|y - X\beta\|_{2}^{2} \end{align*}\]

To minimize a quadratic like this, we can differentiate and use the chain rule,

\[\begin{align*} \frac{\partial}{\partial \beta}\left[\|y - X\beta\|_{2}^{2}\right] &= 0 \\ \iff 2X^{T}\left(y - X\beta\right) &= 0 \end{align*}\]

which implies that \(\hat{\beta} = \left(X^{T}X\right)^{-1}X^{T}y\). However, by the observations above, this means,

\[\begin{align*} \hat{\beta} &= \left(2^{K}I_{2^{K}}\right)^{-1}X^T y \\ &= \frac{1}{2^{K}}X^{T}y. \end{align*}\]

  1. But \(X^{T}y\) are exactly our contrasts (!),

\[\begin{align*} X^{T}y &= \begin{pmatrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ -1 & 1 & -1 & -1 & 1 & 1 & -1 & 1 \\ -1 & -1 & 1 & -1 & 1 & -1 & 1 & 1 \\ \vdots & & & & & & & \vdots \end{pmatrix} \begin{pmatrix} \left(1\right) \\ a \\ b \\ c \\ \vdots \end{pmatrix} \end{align*}\]

e.g., \(A = \frac{1}{2^{K - 1}}\left[-(1) + a - b -c + ab + ac - bc + abc\right]\). The only difference between \(\hat{\beta}\) and our effect estimates is a factor of 2 in the scaling. Therefore, to estimate the effects in a \(2^{K}\) design, it’s enough to construct the \(X, y\) matrices above and plug them into standard linear regression programs.

Code Example

  1. To complete this discussion, let’s revisit the \(2 ^ 4\) design in the drill example. First, let’s verify that the \(X\) matrix used by lm is the same as the one in our conceptual discussion.
code <- function(x) ifelse(x == '-', -1, 1)
drill <- read_csv("https://uwmadison.box.com/shared/static/7l8bpcu36a12a8c0chlh4id0qezdnnoe.csv") %>%
  mutate_at(vars(A:D), code)
fit <- lm(rate ~ A * B * C * D, drill)
X <- model.matrix(fit)
X
   (Intercept)  A  B  C  D A:B A:C B:C A:D B:D C:D A:B:C A:B:D A:C:D
1            1 -1 -1 -1 -1   1   1   1   1   1   1    -1    -1    -1
2            1  1 -1 -1 -1  -1  -1   1  -1   1   1     1     1     1
3            1 -1  1 -1 -1  -1   1  -1   1  -1   1     1     1    -1
4            1 -1 -1  1 -1   1  -1  -1   1   1  -1     1    -1     1
5            1 -1 -1 -1  1   1   1   1  -1  -1  -1    -1     1     1
6            1  1  1 -1 -1   1  -1  -1  -1  -1   1    -1    -1     1
7            1  1 -1  1 -1  -1   1  -1  -1   1  -1    -1     1    -1
8            1  1 -1 -1  1  -1  -1   1   1  -1  -1     1    -1    -1
9            1 -1  1  1 -1  -1  -1   1   1  -1  -1    -1     1     1
10           1 -1  1 -1  1  -1   1  -1  -1   1  -1     1    -1     1
11           1 -1 -1  1  1   1  -1  -1  -1  -1   1     1     1    -1
12           1  1  1  1 -1   1   1   1  -1  -1  -1     1    -1    -1
13           1  1 -1  1  1  -1   1  -1   1  -1   1    -1    -1     1
14           1  1  1 -1  1   1  -1  -1   1   1  -1    -1     1    -1
15           1 -1  1  1  1  -1  -1   1  -1   1   1    -1    -1    -1
16           1  1  1  1  1   1   1   1   1   1   1     1     1     1
   B:C:D A:B:C:D
1     -1       1
2     -1      -1
3      1      -1
4      1      -1
5      1      -1
6      1       1
7      1       1
8      1       1
9     -1       1
10    -1       1
11    -1       1
12    -1      -1
13    -1      -1
14    -1      -1
15     1      -1
16     1       1
attr(,"assign")
 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

We can also check the dimension and orthogonality of this matrix.

dim(X)
[1] 16 16
t(X) %*% X
            (Intercept)  A  B  C  D A:B A:C B:C A:D B:D C:D A:B:C
(Intercept)          16  0  0  0  0   0   0   0   0   0   0     0
A                     0 16  0  0  0   0   0   0   0   0   0     0
B                     0  0 16  0  0   0   0   0   0   0   0     0
C                     0  0  0 16  0   0   0   0   0   0   0     0
D                     0  0  0  0 16   0   0   0   0   0   0     0
A:B                   0  0  0  0  0  16   0   0   0   0   0     0
A:C                   0  0  0  0  0   0  16   0   0   0   0     0
B:C                   0  0  0  0  0   0   0  16   0   0   0     0
A:D                   0  0  0  0  0   0   0   0  16   0   0     0
B:D                   0  0  0  0  0   0   0   0   0  16   0     0
C:D                   0  0  0  0  0   0   0   0   0   0  16     0
A:B:C                 0  0  0  0  0   0   0   0   0   0   0    16
A:B:D                 0  0  0  0  0   0   0   0   0   0   0     0
A:C:D                 0  0  0  0  0   0   0   0   0   0   0     0
B:C:D                 0  0  0  0  0   0   0   0   0   0   0     0
A:B:C:D               0  0  0  0  0   0   0   0   0   0   0     0
            A:B:D A:C:D B:C:D A:B:C:D
(Intercept)     0     0     0       0
A               0     0     0       0
B               0     0     0       0
C               0     0     0       0
D               0     0     0       0
A:B             0     0     0       0
A:C             0     0     0       0
B:C             0     0     0       0
A:D             0     0     0       0
B:D             0     0     0       0
C:D             0     0     0       0
A:B:C           0     0     0       0
A:B:D          16     0     0       0
A:C:D           0    16     0       0
B:C:D           0     0    16       0
A:B:C:D         0     0     0      16
  1. Let’s make sure that the formula we derived above is exactly what lm is doing.
(1 / 16) * t(X) %*% drill$rate
               [,1]
(Intercept) 6.15250
A           0.45875
B           3.21875
C           1.64625
D           1.14500
A:B         0.29500
A:C         0.07750
B:C         0.75500
A:D         0.41875
B:D         0.79625
C:D         0.22375
A:B:C       0.08125
A:B:D       0.38000
A:C:D       0.29250
B:C:D       0.08750
A:B:C:D     0.27125
coef(fit) # hand computation agrees
(Intercept)           A           B           C           D 
    6.15250     0.45875     3.21875     1.64625     1.14500 
        A:B         A:C         B:C         A:D         B:D 
    0.29500     0.07750     0.75500     0.41875     0.79625 
        C:D       A:B:C       A:B:D       A:C:D       B:C:D 
    0.22375     0.08125     0.38000     0.29250     0.08750 
    A:B:C:D 
    0.27125 

Finally, let’s compare the fitted \(\hat{\beta}\) with our original effect estimates (at least, for the effect \(A\)).

drill$A
 [1] -1  1 -1 -1 -1  1  1  1 -1 -1 -1  1  1  1 -1  1
est_A <- (1 / 8) * sum(drill[drill$A == 1, ]$rate - drill[drill$A == -1, ]$rate)
est_A / 2
[1] 0.45875