Modeling and testing with two factors of interest
Recall that in a factorial design, we want to study the influences of multiple experimental factors on some responses of interest. The simplest situation is when there are two factors of interest, and the associated designs are two-factor factorial designs.
We will assume a model of the form, \[ y_{ijk} = \mu + \tau_i + \beta_j + \left(\tau\beta\right)_{ij} + \epsilon_{ijk} \] where \(\epsilon_{ijk} \sim \mathcal{N}\left(0, \sigma^2\right)\) independently. For identifiability, we have to assume that the sums \(\sum_{i} \tau_i = \sum_{j} \beta_{j}= \sum_{ij} \left(\tau\beta\right)_{ij}=0\). We’ll suppose \(i\) ranges from \(1, \dots, a\), \(j\) ranges from \(1, \dots, b\) and \(k\) ranges from \(1,\dots, n\).
What do the indices mean?
What do the greek letters mean?
(Hypothesis A) For the first factor, \[\begin{align} H_0 &: \tau_1 = \dots = \tau_a = 0 \\ H_{1} &: \tau_{i} \neq 0 \text{ for at least one } i \end{align}\]
(Hypothesis B) For the second factor, \[\begin{align} H_0 &: \beta_1 = \dots = \beta_{b} = 0 \\ H_{1} &: \beta_{j} \neq 0 \text{ for at least one } j \end{align}\]
(Hypothesis AB) Are there interaction effects? \[\begin{align} H_0: \left(\tau\beta\right)_{ij}= \dots = \left(\tau\beta\right)_{ij} = 0 \\ H_1: \left(\tau\beta\right)_{ij} &= 0 \text{ for at least one } ij \text{ combination} \end{align}\]
For each of these hypothesis tests, we’re going to need a test statistic that’s sensitive to departures from the null. We’re also going to need their reference distributions.
Miraculously, we have the identity, \[\begin{align*} \sum_{i j k}\left(y_{i j k}-\bar{y}\right)^{2}=& b n \sum_{i}\left(\bar{y}_{i . .}-\bar{y}\right)^{2}+\\ & a n \sum_{j}\left(\bar{y}_{.j.} -\bar{y}\right)^{2}+\\ & n \sum_{i, j}\left(\bar{y}_{i j.}-\bar{y}_{i. .}+\bar{y}_{. j.}-\bar{y}\right)^{2}+\\ & \sum_{i, j, k}\left(y_{i j k}-\bar{y}_{i j.} .\right)^{2} \end{align*}\] which we’ll denote \(SS_{\text{Total}}=SS_{A}+SS_{B}+SS_{AB}+SS_{E}\).
The punchline is that if \(SS_{A}\) is large, we have evidence against hypothesis A, if \(SS_{B}\) is large we have evidence against hypothesis B, and if \(SS_{AB}\) is large, we have evidence against hypothesis AB.
| term | d.f.| |
---|
| $SS_{A} | \(a - 1\) | | $SS_{B} | \(b - 1\) | | $SS_{AB} | \(\left(a - 1\right)\left(b - 1\right)\) | | $SS_{E} | \(ab\left(n - 1\right)\) | |
# A tibble: 36 × 3
Material Temperature Life
<fct> <fct> <dbl>
1 1 15 130
2 1 15 155
3 1 15 74
4 1 15 180
5 1 70 34
6 1 70 40
7 1 70 80
8 1 70 75
9 1 125 20
10 1 125 70
# … with 26 more rows
facet_wrap
to split the plot across the three material types. There seems to be a clear temperature effect, though the effects are not exactly the same across material. This suggests that an interaction is present, though we will need a test to quantify the strength of this pattern.lm
. We use the syntax Material * Temperature
to fit all main effects and interactions involving those variables; it is shorthand for the more explicit notation Material + Temperature + Material : Temperature
(here, the :
denotes an interaction). The Material
and Temperature
rows of the ANOVA table give main effects, the Material:Temperature
row gives the interaction effect, and the Residuals
row corresponds to the \(SS_{E}\) and \(MS_{E}\) terms. Df Sum Sq Mean Sq F value Pr(>F)
Material 2 10684 5342 7.911 0.00198 **
Temperature 2 39119 19559 28.968 1.91e-07 ***
Material:Temperature 4 9614 2403 3.560 0.01861 *
Residuals 27 18231 675
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = Life ~ Material * Temperature, data = battery)
Residuals:
Min 1Q Median 3Q Max
-60.750 -14.625 1.375 17.938 45.250
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 134.75 12.99 10.371 6.46e-11 ***
Material2 21.00 18.37 1.143 0.263107
Material3 9.25 18.37 0.503 0.618747
Temperature70 -77.50 18.37 -4.218 0.000248 ***
Temperature125 -77.25 18.37 -4.204 0.000257 ***
Material2:Temperature70 41.50 25.98 1.597 0.121886
Material3:Temperature70 79.25 25.98 3.050 0.005083 **
Material2:Temperature125 -29.00 25.98 -1.116 0.274242
Material3:Temperature125 18.75 25.98 0.722 0.476759
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.98 on 27 degrees of freedom
Multiple R-squared: 0.7652, Adjusted R-squared: 0.6956
F-statistic: 11 on 8 and 27 DF, p-value: 9.426e-07