Statistical Experimental Design: Interpreting effects in $2 ^ 3$ Designs

As before, the $SS$ terms can be obtained by squaring the contrasts and dividing by the number of data points. This lets us build the associated ANOVA table,

library(readr)
library(dplyr)
code <- function(x) ifelse(x == '-', -1, 1)
plasma <- read.table("https://uwmadison.box.com/shared/static/f3sggiltyl5ycw1gu1vq7uv7omp4pjdg.txt", header = TRUE) %>%
  mutate_at(vars(A:C), code)

fit <- lm(Rate ~ A * B * C, data = plasma)
summary(aov(fit))

            Df Sum Sq Mean Sq F value   Pr(>F)    
A            1  41311   41311  18.339 0.002679 ** 
B            1    218     218   0.097 0.763911    
C            1 374850  374850 166.411 1.23e-06 ***
A:B          1   2475    2475   1.099 0.325168    
A:C          1  94403   94403  41.909 0.000193 ***
B:C          1     18      18   0.008 0.930849    
A:B:C        1    127     127   0.056 0.818586    
Residuals    8  18020    2253                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Consider the regression view of this situation. The full model would be \[y_{i} = \beta_0 + \sum_{k = 1}^{3} \beta_k x_{ik} + \sum_{\text{pairs } k, k^\prime} \beta_{k k^\prime} x_{ik}x_{ik^{\prime}} + \epsilon_{i}\] though we will often be interested in whether a submodel (which discards some of the main or interaction effects) can do as well.
To compare a full model with a submodel, we can use the relative sums of squares,

\[R^2 = \frac{SS_{\text{Model}}}{SS_{\text{Total}}} = 1 - \frac{SS_{E}}{SS_{\text{Total}}}\]

Instead of trying to understand the entire model’s importance, we might want to understand the importance of specific terms. For this, it’s useful to have an uncertainty estimate. Here is an example calculation,

Variance estimate for effect of A

\[\begin{align*} \text{Var}\left(\text{Effect }A\right) &= \text{Var}\left(\frac{1}{2^{K - 1} n}\left(a - b - c + ab + ...\right)\right) \\ &= \left(\frac{1}{2^{K - 1} n}\right)^2\text{Var}\left(a - b - ac + ab + ...\right) \end{align*}\]

But remember that $a$ refers to the sum of all samples at corner $a$, and likewise for $b$, $ac$, etc., \[\begin{align*} \text{Var}\left(a - b - ac + ab + ...\right) &= \text{Var}\left(\sum_{\text{corner } a}y_{i} - \sum_{\text{corner }b}y_{i} - \sum_{\text{corner }ac}y_{i} + ...\right) \\ &= \sum_{\text{corner } a}\text{Var}\left(y_i\right) + \sum_{\text{corner }b}\text{Var}\left(y_i\right) + ... \\ &= 2^K n \sigma^2 \end{align*}\]

so at the end of the day, we get \[\begin{align*} \text{Var}\left(\text{Effect }A\right) &= \frac{\sigma^2}{2^{K - 2}n} \end{align*}\]

and we can estimate $\sigma^2$ by the error sum of squares $S^2$. From these variance estimates, we can build confidence intervals that summarize all the effects.

summary(fit)


Call:
lm(formula = Rate ~ A * B * C, data = plasma)

Residuals:
   Min     1Q Median     3Q    Max 
-65.50 -11.12   0.00  11.12  65.50 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  776.062     11.865  65.406 3.32e-12 ***
A            -50.812     11.865  -4.282 0.002679 ** 
B              3.688     11.865   0.311 0.763911    
C            153.062     11.865  12.900 1.23e-06 ***
A:B          -12.437     11.865  -1.048 0.325168    
A:C          -76.812     11.865  -6.474 0.000193 ***
B:C           -1.062     11.865  -0.090 0.930849    
A:B:C          2.813     11.865   0.237 0.818586    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 47.46 on 8 degrees of freedom
Multiple R-squared:  0.9661,    Adjusted R-squared:  0.9364 
F-statistic: 32.56 on 7 and 8 DF,  p-value: 2.896e-05

Generalization: $2^K$ designs

Everything we’ve spoken about can be generalized to the case of arbitrary numbers of factors. For example, the table notation can be used to get effect estimate for interaction ABCD listed before equation 6.22 in the book, and the sum of squares remain just the normalized square of the contrasts.
The key observation is that the regression representation continues to be true even for large $K$. In particular, the effect estimates and their standard errors will always be twice the coefficients of the regression onto coded factors. For example, the code below manually estimates A’s effect and compares it with the coefficient in the regression,

mean(plasma$Rate[plasma$A == 1] - plasma$Rate[plasma$A == -1])

[1] -101.625

2 * coef(fit)["A"]

       A 
-101.625

This means we can use general regression machinery, without having to manually substitute into formulas for different effects. This connection will be explored in more depth in the next few weeks.

Interpreting effects in 2 ^ 3 Designs

Generalization: \(2^K\) designs