Statistical Experimental Design: Two-Factor Factorial Design

Readings 5.3, Rmarkdown

Recall that in a factorial design, we want to study the influences of multiple experimental factors on some responses of interest. The simplest situation is when there are two factors of interest, and the associated designs are two-factor factorial designs.
We will assume a model of the form, \[ y_{ijk} = \mu + \tau_i + \beta_j + \left(\tau\beta\right)_{ij} + \epsilon_{ijk} \] where $\epsilon_{ijk} \sim \mathcal{N}\left(0, \sigma^2\right)$ independently. For identifiability, we have to assume that the sums $\sum_{i} \tau_i = \sum_{j} \beta_{j}= \sum_{ij} \left(\tau\beta\right)_{ij}=0$. We’ll suppose $i$ ranges from $1, \dots, a$, $j$ ranges from $1, \dots, b$ and $k$ ranges from $1,\dots, n$.
What do the indices mean?
- $i$ indexes levels of the first factor (temperature).
- $j$ indexes levels of the second factor (material).
- $k$ indexes multiple replicates at a particular factor combination $ij$ (battery lifetimes, for fixed temperature $\times$ material combination)
What do the greek letters mean?
- $\mu$: An intercept term, representing a global mean.
- $\tau_i$: The effect of the $i^{th}$ level of factor 1 on the average response (effect of temperature setting $i$)
- $\beta_{j}$: The effect of the $j^{th}$ level of factor 2 on the average response (effect of material $j$)
- $\left(\tau\beta\right)_{ij}$:The interaction / synergy between the $i^{th}$ level of factor 1 and the $j^{th}$ level of factor 2. (e.g. long / short response times, depending on particular material and temperature settings)
- $\epsilon_{ijk}$: The random variation we’d observe if we drew many samples at a fixed setting of the two factors. Caution: $\beta_j$ now indexes a treatment of interest. It is not just nuisance blocking variation.

Figure 1: A graphical view of all the parameters in a factorial design.

Testing

Up until now, we’ve only asked whether one particular factor had an influence on the response, as the levels were changed. Now, we care about each of the factors, and each gets a hypothesis test.

(Hypothesis A) For the first factor, \[\begin{align} H_0 &: \tau_1 = \dots = \tau_a = 0 \\ H_{1} &: \tau_{i} \neq 0 \text{ for at least one } i \end{align}\]
(Hypothesis B) For the second factor, \[\begin{align} H_0 &: \beta_1 = \dots = \beta_{b} = 0 \\ H_{1} &: \beta_{j} \neq 0 \text{ for at least one } j \end{align}\]
(Hypothesis AB) Are there interaction effects? \[\begin{align} H_0: \left(\tau\beta\right)_{ij}= \dots = \left(\tau\beta\right)_{ij} = 0 \\ H_1: \left(\tau\beta\right)_{ij} &= 0 \text{ for at least one } ij \text{ combination} \end{align}\]

For each of these hypothesis tests, we’re going to need a test statistic that’s sensitive to departures from the null. We’re also going to need their reference distributions.
Miraculously, we have the identity, \[\begin{align*} \sum_{i j k}\left(y_{i j k}-\bar{y}\right)^{2}=& b n \sum_{i}\left(\bar{y}_{i . .}-\bar{y}\right)^{2}+\\ & a n \sum_{j}\left(\bar{y}_{.j.} -\bar{y}\right)^{2}+\\ & n \sum_{i, j}\left(\bar{y}_{i j.}-\bar{y}_{i. .}+\bar{y}_{. j.}-\bar{y}\right)^{2}+\\ & \sum_{i, j, k}\left(y_{i j k}-\bar{y}_{i j.} .\right)^{2} \end{align*}\] which we’ll denote $SS_{\text{Total}}=SS_{A}+SS_{B}+SS_{AB}+SS_{E}$.
The punchline is that if $SS_{A}$ is large, we have evidence against hypothesis A, if $SS_{B}$ is large we have evidence against hypothesis B, and if $SS_{AB}$ is large, we have evidence against hypothesis AB.

Dividing $SS$ terms by their degrees of freedom (d.f.) gives $MS_{A}, MS_{B}, MS_{AB}$, and $MS_{E}$. The d.f. are derived from the number of levels for each factor, but a proof is beyond the scope of this course.

\| term \| d.f.\|
\| $SS_{A} \| $a - 1$ \| \| $SS_{B} \| $b - 1$ \| \| $SS_{AB} \| $\left(a - 1\right)\left(b - 1\right)$ \| \| $SS_{E} \| $ab\left(n - 1\right)$ \|

We can define corresponding mean squares by dividing by the degrees of freedom. For each hypothesis, we get a corresponding $F$-statistic.

Code Example

We’ll consider an experiment that studied the effect of material and temperature on battery lifetimes. These are the two factors of interest, and each is measured at 3 levels.

# A tibble: 36 × 3
   Material Temperature  Life
   <fct>    <fct>       <dbl>
 1 1        15            130
 2 1        15            155
 3 1        15             74
 4 1        15            180
 5 1        70             34
 6 1        70             40
 7 1        70             80
 8 1        70             75
 9 1        125            20
10 1        125            70
# … with 26 more rows

Before testing for effects, we can plot the influence of each factor. The code below uses facet_wrap to split the plot across the three material types. There seems to be a clear temperature effect, though the effects are not exactly the same across material. This suggests that an interaction is present, though we will need a test to quantify the strength of this pattern.

We can fit the two factor model using lm. We use the syntax Material * Temperature to fit all main effects and interactions involving those variables; it is shorthand for the more explicit notation Material + Temperature + Material : Temperature (here, the : denotes an interaction). The Material and Temperature rows of the ANOVA table give main effects, the Material:Temperature row gives the interaction effect, and the Residuals row corresponds to the $SS_{E}$ and $MS_{E}$ terms.

                     Df Sum Sq Mean Sq F value   Pr(>F)    
Material              2  10684    5342   7.911  0.00198 ** 
Temperature           2  39119   19559  28.968 1.91e-07 ***
Material:Temperature  4   9614    2403   3.560  0.01861 *  
Residuals            27  18231     675                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the $p$-values, we conclude that there is a very strong temperature effect, a strong material effect, and a noticeable (though slight) interaction. We can in fact develop more granular interpretations by looking at individual coefficients in the fitted model. The runs from the first material and lowest temperature are used as a reference point and absorbed into the intercept – everything else is viewed as the expected deviation from that reference. For example, it seems that, on average, the temperature 70 and 125 configurations are both about 80 life-units shorter than runs at temperature 15. The significant interaction between material 3 and temperature 70 suggests that this combination lives about 80 units longer than we would expect if no interaction were present.


Call:
lm(formula = Life ~ Material * Temperature, data = battery)

Residuals:
    Min      1Q  Median      3Q     Max 
-60.750 -14.625   1.375  17.938  45.250 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)                134.75      12.99  10.371 6.46e-11 ***
Material2                   21.00      18.37   1.143 0.263107    
Material3                    9.25      18.37   0.503 0.618747    
Temperature70              -77.50      18.37  -4.218 0.000248 ***
Temperature125             -77.25      18.37  -4.204 0.000257 ***
Material2:Temperature70     41.50      25.98   1.597 0.121886    
Material3:Temperature70     79.25      25.98   3.050 0.005083 ** 
Material2:Temperature125   -29.00      25.98  -1.116 0.274242    
Material3:Temperature125    18.75      25.98   0.722 0.476759    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 25.98 on 27 degrees of freedom
Multiple R-squared:  0.7652,    Adjusted R-squared:  0.6956 
F-statistic:    11 on 8 and 27 DF,  p-value: 9.426e-07