Multiple comparisons, model checking, and other post-estimation checks.
Subtlety: If the factor under investigation interacts with the other one, its effects will depend on that other factor.
Solution: Fix a level for the other factor, and study the influence of levels for the factor of interest.
Example: Fix temperature, and use Tukey’s HSD to study pairwise difference between materials, for that fixed temperature.
Temperature = 15:
contrast estimate SE df t.ratio p.value
1 - 2 -21.00 18.4 27 -1.143 0.4967
1 - 3 -9.25 18.4 27 -0.503 0.8703
2 - 3 11.75 18.4 27 0.639 0.7998
Temperature = 70:
contrast estimate SE df t.ratio p.value
1 - 2 -62.50 18.4 27 -3.402 0.0058
1 - 3 -88.50 18.4 27 -4.817 0.0001
2 - 3 -26.00 18.4 27 -1.415 0.3475
Temperature = 125:
contrast estimate SE df t.ratio p.value
1 - 2 8.00 18.4 27 0.435 0.9012
1 - 3 -28.00 18.4 27 -1.524 0.2959
2 - 3 -36.00 18.4 27 -1.959 0.1419
P value adjustment: tukey method for comparing a family of 3 estimates
ggplot(battery) +
geom_point(aes(y_hat, resid))
How should you choose how many replicates to have at each combination of the two factors?
For example, in the block below, we simulate data from where the true \(\tau_i\)’s and \(\beta_j\)’s are \(\left(0, 1, 2\right)\) and \(\left(0, 1\right)\), respectively. We increase the number of samples from 2 to 10 and draw 50 replicates each. Fitting a linear model to each dataset allows us to see how the sample size influences our final \(p\)-values.
tau <- c(0, 1, 2)
beta <- c(0, 1)
ns <- seq(2, 10, by = 2)
b <- 1 # a counter variable
sims <- list()
for (k in seq_along(ns)) {
for (sim_rep in seq_len(50)) {
for (i in seq_along(tau)) {
for (j in seq_along(beta)) {
sims[[b]] <- data.frame(
"factor_1" = i,
"factor_2" = j,
"sample_size" = ns[k],
"replicate" = seq_len(ns[k]),
"value" = rnorm(ns[k], tau[i] + beta[j]),
"sim_rep" = sim_rep
)
b <- b + 1
}
}
}
}
sims <- bind_rows(sims)
ggplot(sims %>% filter(sim_rep == 1)) + # visualize only the first replicate
geom_point(aes(factor_1, value)) +
facet_grid(sample_size ~ factor_2)
lm
to each is somewhat complicated – the important thing to takeaway here is that we can use a simulation to see exactly how the \(p\)-values decrease as a function of the sample size (and hence allow us to draw stronger conclusions).library(broom)
library(purrr)
library(tidyr)
power <- sims %>% # get p.values and estimates from every single run
split(list(.$sample_size, .$sim_rep)) %>%
map_dfr(~ tidy(lm(value ~ factor_1 + factor_2, data = .)), .id = "group") %>%
separate(group, c("sample_size", "sim_rep"), convert = TRUE)
ggplot(power) +
geom_jitter(aes(sample_size, p.value)) +
geom_hline(yintercept = 0.05, col = "red") +
facet_wrap(~ term)
If we only have one replicate per cell, then we can’t estimate an interaction effect. If we tried, we’d be able to perfectly fit the data, so there would be no way to estimate \(\sigma^2\).
As a general recipe, to check for interactions, we can perform residual analysis on the main effects model. Alternatively, we could use Tukey’s additivity test, which checks whether a multiplicative form of the interaction is present, i.e., does \[y_{ijk} = \mu + \tau_i + \beta_j + \gamma \tau_i \beta_j\] fit significantly better than the main effects model?