Statistical Experimental Design: Interpreting Effects in $2^2$ Designs

We will often want to know whether estimated main or interaction effects are significant. We can use ANOVA, though we have to be cautious when $n$ is small.
- The numerators in the effect estimate expressions will be called contrasts for the estimated effect
- For example, the contrast for the effect of $A$ is $ab + a - b - \left(1\right)$.
The associated sum of squares is \[ \frac{1}{2^2 n}\left(\text{Contrast}\right)^2 \] for example, \[ SS_A = \frac{1}{2^2 n}\left[ab + a - b - (1)\right]^2 \]
The associated ANOVA decomposition is \[ SS_{\text{Total}} = SS_A + SS_B + SS_{AB} + SS_E \] and since the factors all have two levels, the df’s for the main and interaction terms are all 1. The df of $SS_T$ is $n 2^2 - 1$ (number of samples minus one). Taking the ratio between main and interaction $SS$ terms and $MS_{E} (= \frac{1}{4\left(n - 1\right)}SS_{E})$ gives the basis for $F$-statistics in the ANOVA table.

Regression View

Another way of summarizing the $2^2$ model is to write a regression, \[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon \] where the $x_k$’s take on one of two values, depending on whether or not factor $k$ is active.
- We’ve only included main effects. An interaction would be added via $\beta_{12} x_{1}x_{2}$
- If the factors are binary (on vs. off), we can use a binary encoding.

What if our factors are actually continuous?
- We could code the variables, converting low and high levels to ${-1, 1}$.
- The model will still apply to all values in interval $[-1, 1]$.
- An added benefit is that this coding (a) makes scales comparable and (b) induces orthogonality (roughly, it makes variables less correlated)
We will illustrate these ideas on a yield dataset. There are 12 samples total, three replicates at each corner of the square.

library(readr)
yield <- read_table2("https://uwmadison.box.com/shared/static/bfwd6us8xsii4uelzftg1azu2f7z77mk.txt")
yield

# A tibble: 12 × 4
   A     B     Rep   Yield
   <chr> <chr> <chr> <dbl>
 1 -     -     I        28
 2 +     -     I        36
 3 -     +     I        18
 4 +     +     I        31
 5 -     -     II       25
 6 +     -     II       32
 7 -     +     II       19
 8 +     +     II       30
 9 -     -     III      27
10 +     -     III      32
11 -     +     III      23
12 +     +     III      29

The same lm + aov approach used in general factorial designs applies to $2^{K}$ designs. The only difficulty is that the data were originally coded as + and -, and we need -1 and 1’s. For this, we’ve prepared a small function called coded and used it to create new columns, cA and cB which convert those symbols into their numeric equivalents. From the ANOVA table below, we can see that both factors have strong effects, though A’s is stronger than B’s. We can also see that there is no detectable interaction effect, which is consistent with the plot from the previous notes.

coded <- function(x) ifelse(x == '-', -1, 1)
yield <- yield %>%
  mutate(cA = coded(A), cB = coded(B))
fit <- lm(Yield ~ cA * cB, data = yield)
summary(aov(fit))

            Df Sum Sq Mean Sq F value   Pr(>F)    
cA           1 208.33  208.33  53.191 8.44e-05 ***
cB           1  75.00   75.00  19.149  0.00236 ** 
cA:cB        1   8.33    8.33   2.128  0.18278    
Residuals    8  31.33    3.92                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA table only describes whether a factor has any relationship with the response – it doesn’t describe in what way the response changes when the factors are turned on or off. For this, we need to look at the full model summary. In the output below, the Intercept estimate (27.5) gives the response when all the factors are turned off. The cA and cB estimates (4.17 and -2.5) describe how the response changes when those factors are turned on. The interaction effect measures how the effect differs from the additive effect when both factors are active (it is 0.833 larger than what would be expected if we just added cA and cB).

summary(fit)


Call:
lm(formula = Yield ~ cA * cB, data = yield)

Residuals:
   Min     1Q Median     3Q    Max 
-2.000 -1.333 -0.500  1.083  3.000 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  27.5000     0.5713  48.135 3.84e-11 ***
cA            4.1667     0.5713   7.293 8.44e-05 ***
cB           -2.5000     0.5713  -4.376  0.00236 ** 
cA:cB         0.8333     0.5713   1.459  0.18278    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.979 on 8 degrees of freedom
Multiple R-squared:  0.903, Adjusted R-squared:  0.8666 
F-statistic: 24.82 on 3 and 8 DF,  p-value: 0.0002093

We can use this fit to build a response surface as well. This is a plot of the yield from the top down – darker colors correspond to higher yields. If the interaction estimate was 0, the lines would be exactly parallel to one another.

library(rsm)
image(fit, ~ cA + cB)

Interpreting Effects in 2^2 Designs

Regression View