Testing, uncertainty, and visualization in \(2^3\) designs.
Df Sum Sq Mean Sq F value Pr(>F)
A 1 41311 41311 18.339 0.002679 **
B 1 218 218 0.097 0.763911
C 1 374850 374850 166.411 1.23e-06 ***
A:B 1 2475 2475 1.099 0.325168
A:C 1 94403 94403 41.909 0.000193 ***
B:C 1 18 18 0.008 0.930849
A:B:C 1 127 127 0.056 0.818586
Residuals 8 18020 2253
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Consider the regression view of this situation. The full model would be \[y_{i} = \beta_0 + \sum_{k = 1}^{3} \beta_k x_{ik} + \sum_{\text{pairs } k, k^\prime} \beta_{k k^\prime} x_{ik}x_{ik^{\prime}} + \epsilon_{i}\] though we will often be interested in whether a submodel (which discards some of the main or interaction effects) can do as well.
To compare a full model with a submodel, we can use the relative sums of squares,
\[R^2 = \frac{SS_{\text{Model}}}{SS_{\text{Total}}} = 1 - \frac{SS_{E}}{SS_{\text{Total}}}\]
Variance estimate for effect of A
\[\begin{align*} \text{Var}\left(\text{Effect }A\right) &= \text{Var}\left(\frac{1}{2^{K - 1} n}\left(a - b - c + ab + ...\right)\right) \\ &= \left(\frac{1}{2^{K - 1} n}\right)^2\text{Var}\left(a - b - ac + ab + ...\right) \end{align*}\]
But remember that \(a\) refers to the sum of all samples at corner \(a\), and likewise for \(b\), \(ac\), etc., \[\begin{align*} \text{Var}\left(a - b - ac + ab + ...\right) &= \text{Var}\left(\sum_{\text{corner } a}y_{i} - \sum_{\text{corner }b}y_{i} - \sum_{\text{corner }ac}y_{i} + ...\right) \\ &= \sum_{\text{corner } a}\text{Var}\left(y_i\right) + \sum_{\text{corner }b}\text{Var}\left(y_i\right) + ... \\ &= 2^K n \sigma^2 \end{align*}\]
so at the end of the day, we get \[\begin{align*} \text{Var}\left(\text{Effect }A\right) &= \frac{\sigma^2}{2^{K - 2}n} \end{align*}\]
and we can estimate \(\sigma^2\) by the error sum of squares \(S^2\). From these variance estimates, we can build confidence intervals that summarize all the effects.
summary(fit)
Call:
lm(formula = Rate ~ A * B * C, data = plasma)
Residuals:
Min 1Q Median 3Q Max
-65.50 -11.12 0.00 11.12 65.50
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 776.062 11.865 65.406 3.32e-12 ***
A -50.812 11.865 -4.282 0.002679 **
B 3.688 11.865 0.311 0.763911
C 153.062 11.865 12.900 1.23e-06 ***
A:B -12.437 11.865 -1.048 0.325168
A:C -76.812 11.865 -6.474 0.000193 ***
B:C -1.062 11.865 -0.090 0.930849
A:B:C 2.813 11.865 0.237 0.818586
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 47.46 on 8 degrees of freedom
Multiple R-squared: 0.9661, Adjusted R-squared: 0.9364
F-statistic: 32.56 on 7 and 8 DF, p-value: 2.896e-05
Everything we’ve spoken about can be generalized to the case of arbitrary numbers of factors. For example, the table notation can be used to get effect estimate for interaction ABCD listed before equation 6.22 in the book, and the sum of squares remain just the normalized square of the contrasts.
The key observation is that the regression representation continues to be true even for large \(K\). In particular, the effect estimates and their standard errors will always be twice the coefficients of the regression onto coded factors. For example, the code below manually estimates A’s effect and compares it with the coefficient in the regression,
mean(plasma$Rate[plasma$A == 1] - plasma$Rate[plasma$A == -1])
[1] -101.625
2 * coef(fit)["A"]
A
-101.625
This means we can use general regression machinery, without having to manually substitute into formulas for different effects. This connection will be explored in more depth in the next few weeks.