Processing math: 34%

2^K Designs and Regression

How effect estimates can be found using linear regression.

Author

Affiliation

Kris Sankaran

 

Published

Nov. 9, 2021

DOI

Readings 6.7, Rmarkdown

  1. In all the code examples, we’ve been fitting 2K designs using the lm function. Why does this work? In these notes, we’ll see how our effect estimates can be viewed through the lens of linear regression.

The Design Matrix

  1. Let’s write our contrasts summary table linear algebraically. Remember our notation,
A B C AB label
- - - + (1)
+ - - - a
- + - - b
- - + + c
+ + - + ab
+ - + - ac
- + + - bc
+ + + + abc

We’ll translate this into

X=(1111111111111111111111111111111111111111)y=((1)abcabacbcabc)

For X, we’ve just added an intercept column of 1’s and concatenated it with the contrasts written as ±1’s. As before, (1) means the value of the sample taken at the (1) corner of the cube, a means the sample at the a corner, etc.

  1. Let’s observe some properties of this matrix,
    • It has 2K rows, one per replicate on each corner of the cube.
    • It has 2K columns. To see why, notice that there are
      • 1 intercept, K main effects, and {K \choose 2} two-way interactions, which adds up to 2 ^ K.
      • Moreover, if our K > 2, we’d have {K \choose 3} three-way interactions, etc. and that, by the binomial theorem, \left(1 + 1\right)^{K} = \sum_{j \leq K} {K \choose j}1^{j}1^{K - j}
    • The columns are orthogonal. Their sum of squares are all 2^{K}. Hence X^{T}X = 2^{K}I_{2^{K}}.

\hat{\beta} and Effect Estimates

  1. In linear regression, the least squares solution \hat{\beta} is the vector that optimizes

\begin{align*} \hat{\beta} := \arg\min_{\beta \in \mathbb{R}^{2^{K}}} \|y - X\beta\|_{2}^{2} \end{align*}

To minimize a quadratic like this, we can differentiate and use the chain rule,

\begin{align*} \frac{\partial}{\partial \beta}\left[\|y - X\beta\|_{2}^{2}\right] &= 0 \\ \iff 2X^{T}\left(y - X\beta\right) &= 0 \end{align*}

which implies that \hat{\beta} = \left(X^{T}X\right)^{-1}X^{T}y. However, by the observations above, this means,

\begin{align*} \hat{\beta} &= \left(2^{K}I_{2^{K}}\right)^{-1}X^T y \\ &= \frac{1}{2^{K}}X^{T}y. \end{align*}

  1. But X^{T}y are exactly our contrasts (!),

\begin{align*} X^{T}y &= \begin{pmatrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ -1 & 1 & -1 & -1 & 1 & 1 & -1 & 1 \\ -1 & -1 & 1 & -1 & 1 & -1 & 1 & 1 \\ \vdots & & & & & & & \vdots \end{pmatrix} \begin{pmatrix} \left(1\right) \\ a \\ b \\ c \\ \vdots \end{pmatrix} \end{align*}

e.g., A = \frac{1}{2^{K - 1}}\left[-(1) + a - b -c + ab + ac - bc + abc\right]. The only difference between \hat{\beta} and our effect estimates is a factor of 2 in the scaling. Therefore, to estimate the effects in a 2^{K} design, it’s enough to construct the X, y matrices above and plug them into standard linear regression programs.

Code Example

  1. To complete this discussion, let’s revisit the 2 ^ 4 design in the drill example. First, let’s verify that the X matrix used by lm is the same as the one in our conceptual discussion.
code <- function(x) ifelse(x == '-', -1, 1)
drill <- read_csv("https://uwmadison.box.com/shared/static/7l8bpcu36a12a8c0chlh4id0qezdnnoe.csv") %>%
  mutate_at(vars(A:D), code)
fit <- lm(rate ~ A * B * C * D, drill)
X <- model.matrix(fit)
X
   (Intercept)  A  B  C  D A:B A:C B:C A:D B:D C:D A:B:C A:B:D A:C:D
1            1 -1 -1 -1 -1   1   1   1   1   1   1    -1    -1    -1
2            1  1 -1 -1 -1  -1  -1   1  -1   1   1     1     1     1
3            1 -1  1 -1 -1  -1   1  -1   1  -1   1     1     1    -1
4            1 -1 -1  1 -1   1  -1  -1   1   1  -1     1    -1     1
5            1 -1 -1 -1  1   1   1   1  -1  -1  -1    -1     1     1
6            1  1  1 -1 -1   1  -1  -1  -1  -1   1    -1    -1     1
7            1  1 -1  1 -1  -1   1  -1  -1   1  -1    -1     1    -1
8            1  1 -1 -1  1  -1  -1   1   1  -1  -1     1    -1    -1
9            1 -1  1  1 -1  -1  -1   1   1  -1  -1    -1     1     1
10           1 -1  1 -1  1  -1   1  -1  -1   1  -1     1    -1     1
11           1 -1 -1  1  1   1  -1  -1  -1  -1   1     1     1    -1
12           1  1  1  1 -1   1   1   1  -1  -1  -1     1    -1    -1
13           1  1 -1  1  1  -1   1  -1   1  -1   1    -1    -1     1
14           1  1  1 -1  1   1  -1  -1   1   1  -1    -1     1    -1
15           1 -1  1  1  1  -1  -1   1  -1   1   1    -1    -1    -1
16           1  1  1  1  1   1   1   1   1   1   1     1     1     1
   B:C:D A:B:C:D
1     -1       1
2     -1      -1
3      1      -1
4      1      -1
5      1      -1
6      1       1
7      1       1
8      1       1
9     -1       1
10    -1       1
11    -1       1
12    -1      -1
13    -1      -1
14    -1      -1
15     1      -1
16     1       1
attr(,"assign")
 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

We can also check the dimension and orthogonality of this matrix.

dim(X)
[1] 16 16
t(X) %*% X
            (Intercept)  A  B  C  D A:B A:C B:C A:D B:D C:D A:B:C
(Intercept)          16  0  0  0  0   0   0   0   0   0   0     0
A                     0 16  0  0  0   0   0   0   0   0   0     0
B                     0  0 16  0  0   0   0   0   0   0   0     0
C                     0  0  0 16  0   0   0   0   0   0   0     0
D                     0  0  0  0 16   0   0   0   0   0   0     0
A:B                   0  0  0  0  0  16   0   0   0   0   0     0
A:C                   0  0  0  0  0   0  16   0   0   0   0     0
B:C                   0  0  0  0  0   0   0  16   0   0   0     0
A:D                   0  0  0  0  0   0   0   0  16   0   0     0
B:D                   0  0  0  0  0   0   0   0   0  16   0     0
C:D                   0  0  0  0  0   0   0   0   0   0  16     0
A:B:C                 0  0  0  0  0   0   0   0   0   0   0    16
A:B:D                 0  0  0  0  0   0   0   0   0   0   0     0
A:C:D                 0  0  0  0  0   0   0   0   0   0   0     0
B:C:D                 0  0  0  0  0   0   0   0   0   0   0     0
A:B:C:D               0  0  0  0  0   0   0   0   0   0   0     0
            A:B:D A:C:D B:C:D A:B:C:D
(Intercept)     0     0     0       0
A               0     0     0       0
B               0     0     0       0
C               0     0     0       0
D               0     0     0       0
A:B             0     0     0       0
A:C             0     0     0       0
B:C             0     0     0       0
A:D             0     0     0       0
B:D             0     0     0       0
C:D             0     0     0       0
A:B:C           0     0     0       0
A:B:D          16     0     0       0
A:C:D           0    16     0       0
B:C:D           0     0    16       0
A:B:C:D         0     0     0      16
  1. Let’s make sure that the formula we derived above is exactly what lm is doing.
(1 / 16) * t(X) %*% drill$rate
               [,1]
(Intercept) 6.15250
A           0.45875
B           3.21875
C           1.64625
D           1.14500
A:B         0.29500
A:C         0.07750
B:C         0.75500
A:D         0.41875
B:D         0.79625
C:D         0.22375
A:B:C       0.08125
A:B:D       0.38000
A:C:D       0.29250
B:C:D       0.08750
A:B:C:D     0.27125
coef(fit) # hand computation agrees
(Intercept)           A           B           C           D 
    6.15250     0.45875     3.21875     1.64625     1.14500 
        A:B         A:C         B:C         A:D         B:D 
    0.29500     0.07750     0.75500     0.41875     0.79625 
        C:D       A:B:C       A:B:D       A:C:D       B:C:D 
    0.22375     0.08125     0.38000     0.29250     0.08750 
    A:B:C:D 
    0.27125 

Finally, let’s compare the fitted \hat{\beta} with our original effect estimates (at least, for the effect A).

drill$A
 [1] -1  1 -1 -1 -1  1  1  1 -1 -1 -1  1  1  1 -1  1
est_A <- (1 / 8) * sum(drill[drill$A == 1, ]$rate - drill[drill$A == -1, ]$rate)
est_A / 2
[1] 0.45875

Footnotes