How effect estimates can be found using linear regression.
lm
function. Why does this work? In these notes, we’ll see how our effect estimates can be viewed through the lens of linear regression.A | B | C | AB | label |
---|---|---|---|---|
- | - | - | + | (1) |
+ | - | - | - | a |
- | + | - | - | b |
- | - | + | + | c |
+ | + | - | + | ab |
+ | - | + | - | ac |
- | + | + | - | bc |
+ | + | + | + | abc |
We’ll translate this into
\[\begin{align*} X = \begin{pmatrix} 1 & -1 & -1 & - 1 & 1 \\ 1 & 1 & -1 & -1 & -1 \\ 1 & -1 & 1 & -1 & -1 \\ 1 & -1 & -1 & 1 & 1 \\ 1 & 1 & 1 & -1 & 1 \\ 1 & 1 & -1 & 1 & -1 \\ 1 & -1 & 1 & 1 & -1 \\ 1 & 1 & 1 & 1 & 1 \end{pmatrix} && y = \begin{pmatrix} (1) \\ a \\ b \\ c \\ ab \\ ac \\ bc \\ abc \end{pmatrix} \end{align*}\]
For \(X\), we’ve just added an intercept column of \(1\)’s and concatenated it with the contrasts written as \(\pm 1\)’s. As before, \((1)\) means the value of the sample taken at the \((1)\) corner of the cube, \(a\) means the sample at the \(a\) corner, etc.
\[\begin{align*} \hat{\beta} := \arg\min_{\beta \in \mathbb{R}^{2^{K}}} \|y - X\beta\|_{2}^{2} \end{align*}\]
To minimize a quadratic like this, we can differentiate and use the chain rule,
\[\begin{align*} \frac{\partial}{\partial \beta}\left[\|y - X\beta\|_{2}^{2}\right] &= 0 \\ \iff 2X^{T}\left(y - X\beta\right) &= 0 \end{align*}\]
which implies that \(\hat{\beta} = \left(X^{T}X\right)^{-1}X^{T}y\). However, by the observations above, this means,
\[\begin{align*} \hat{\beta} &= \left(2^{K}I_{2^{K}}\right)^{-1}X^T y \\ &= \frac{1}{2^{K}}X^{T}y. \end{align*}\]
\[\begin{align*} X^{T}y &= \begin{pmatrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ -1 & 1 & -1 & -1 & 1 & 1 & -1 & 1 \\ -1 & -1 & 1 & -1 & 1 & -1 & 1 & 1 \\ \vdots & & & & & & & \vdots \end{pmatrix} \begin{pmatrix} \left(1\right) \\ a \\ b \\ c \\ \vdots \end{pmatrix} \end{align*}\]
e.g., \(A = \frac{1}{2^{K - 1}}\left[-(1) + a - b -c + ab + ac - bc + abc\right]\). The only difference between \(\hat{\beta}\) and our effect estimates is a factor of 2 in the scaling. Therefore, to estimate the effects in a \(2^{K}\) design, it’s enough to construct the \(X, y\) matrices above and plug them into standard linear regression programs.
lm
is the same as the one in our conceptual discussion.code <- function(x) ifelse(x == '-', -1, 1)
drill <- read_csv("https://uwmadison.box.com/shared/static/7l8bpcu36a12a8c0chlh4id0qezdnnoe.csv") %>%
mutate_at(vars(A:D), code)
fit <- lm(rate ~ A * B * C * D, drill)
X <- model.matrix(fit)
X
(Intercept) A B C D A:B A:C B:C A:D B:D C:D A:B:C A:B:D A:C:D
1 1 -1 -1 -1 -1 1 1 1 1 1 1 -1 -1 -1
2 1 1 -1 -1 -1 -1 -1 1 -1 1 1 1 1 1
3 1 -1 1 -1 -1 -1 1 -1 1 -1 1 1 1 -1
4 1 -1 -1 1 -1 1 -1 -1 1 1 -1 1 -1 1
5 1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1
6 1 1 1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 1
7 1 1 -1 1 -1 -1 1 -1 -1 1 -1 -1 1 -1
8 1 1 -1 -1 1 -1 -1 1 1 -1 -1 1 -1 -1
9 1 -1 1 1 -1 -1 -1 1 1 -1 -1 -1 1 1
10 1 -1 1 -1 1 -1 1 -1 -1 1 -1 1 -1 1
11 1 -1 -1 1 1 1 -1 -1 -1 -1 1 1 1 -1
12 1 1 1 1 -1 1 1 1 -1 -1 -1 1 -1 -1
13 1 1 -1 1 1 -1 1 -1 1 -1 1 -1 -1 1
14 1 1 1 -1 1 1 -1 -1 1 1 -1 -1 1 -1
15 1 -1 1 1 1 -1 -1 1 -1 1 1 -1 -1 -1
16 1 1 1 1 1 1 1 1 1 1 1 1 1 1
B:C:D A:B:C:D
1 -1 1
2 -1 -1
3 1 -1
4 1 -1
5 1 -1
6 1 1
7 1 1
8 1 1
9 -1 1
10 -1 1
11 -1 1
12 -1 -1
13 -1 -1
14 -1 -1
15 1 -1
16 1 1
attr(,"assign")
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
We can also check the dimension and orthogonality of this matrix.
dim(X)
[1] 16 16
t(X) %*% X
(Intercept) A B C D A:B A:C B:C A:D B:D C:D A:B:C
(Intercept) 16 0 0 0 0 0 0 0 0 0 0 0
A 0 16 0 0 0 0 0 0 0 0 0 0
B 0 0 16 0 0 0 0 0 0 0 0 0
C 0 0 0 16 0 0 0 0 0 0 0 0
D 0 0 0 0 16 0 0 0 0 0 0 0
A:B 0 0 0 0 0 16 0 0 0 0 0 0
A:C 0 0 0 0 0 0 16 0 0 0 0 0
B:C 0 0 0 0 0 0 0 16 0 0 0 0
A:D 0 0 0 0 0 0 0 0 16 0 0 0
B:D 0 0 0 0 0 0 0 0 0 16 0 0
C:D 0 0 0 0 0 0 0 0 0 0 16 0
A:B:C 0 0 0 0 0 0 0 0 0 0 0 16
A:B:D 0 0 0 0 0 0 0 0 0 0 0 0
A:C:D 0 0 0 0 0 0 0 0 0 0 0 0
B:C:D 0 0 0 0 0 0 0 0 0 0 0 0
A:B:C:D 0 0 0 0 0 0 0 0 0 0 0 0
A:B:D A:C:D B:C:D A:B:C:D
(Intercept) 0 0 0 0
A 0 0 0 0
B 0 0 0 0
C 0 0 0 0
D 0 0 0 0
A:B 0 0 0 0
A:C 0 0 0 0
B:C 0 0 0 0
A:D 0 0 0 0
B:D 0 0 0 0
C:D 0 0 0 0
A:B:C 0 0 0 0
A:B:D 16 0 0 0
A:C:D 0 16 0 0
B:C:D 0 0 16 0
A:B:C:D 0 0 0 16
lm
is doing.(1 / 16) * t(X) %*% drill$rate
[,1]
(Intercept) 6.15250
A 0.45875
B 3.21875
C 1.64625
D 1.14500
A:B 0.29500
A:C 0.07750
B:C 0.75500
A:D 0.41875
B:D 0.79625
C:D 0.22375
A:B:C 0.08125
A:B:D 0.38000
A:C:D 0.29250
B:C:D 0.08750
A:B:C:D 0.27125
coef(fit) # hand computation agrees
(Intercept) A B C D
6.15250 0.45875 3.21875 1.64625 1.14500
A:B A:C B:C A:D B:D
0.29500 0.07750 0.75500 0.41875 0.79625
C:D A:B:C A:B:D A:C:D B:C:D
0.22375 0.08125 0.38000 0.29250 0.08750
A:B:C:D
0.27125
Finally, let’s compare the fitted \(\hat{\beta}\) with our original effect estimates (at least, for the effect \(A\)).
drill$A
[1] -1 1 -1 -1 -1 1 1 1 -1 -1 -1 1 1 1 -1 1
est_A <- (1 / 8) * sum(drill[drill$A == 1, ]$rate - drill[drill$A == -1, ]$rate)
est_A / 2
[1] 0.45875