Three case studies in using \(2^K\) designs.
A
to D
so that they are \(\pm 1\)’s instead of the characters +
and -
.ggplot(drill) +
geom_point(aes(B, rate, col = as.factor(C))) +
scale_color_brewer(palette = "Set2") +
facet_grid(A ~ D)
Lessons: * Examining residuals can motivate useful transformations of the data.
windows <- read_csv("https://uwmadison.box.com/shared/static/62phufkeprheu9gu35mu1e75x6rc2shv.csv") %>%
mutate_at(vars(A:D), code)
ggplot(windows) +
geom_point(aes(A, defects, col = as.factor(C))) +
scale_color_brewer(palette = "Set2") +
facet_grid(B ~ D)
fit <- lm(defects ~ A + C, data = windows)
windows$residual <- resid(fit)
ggplot(windows) +
geom_point(aes(x = B, y = residual))
Lessons: * In practice, it’s often useful to take variability into account, rather than just average response * A residual plot can be directly actionable
M
), computes these dispersions, and makes a Daniel plot from them.M <- model.matrix(defects ~ A * B * C * D, data = windows)[, -1] # remove intercept
S <- list()
for (k in seq_len(ncol(M))) {
S[[k]] <- data.frame(
"effect" = colnames(M)[k],
"sd_plus" = sd(windows$residual[M[, k] == 1]),
"sd_minus" = sd(windows$residual[M[, k] == -1])
)
}
S <- do.call(rbind, S)
s_ratio <- setNames(log(S$sd_plus / S$sd_minus), S$effect)
daniel_plot(s_ratio)
library(tidyr)
oxide <- read_csv("https://uwmadison.box.com/shared/static/vyk6uoe3zbnonv4n6jcusbrocmt4cvru.csv") %>%
pivot_longer(starts_with("wafer"), names_to = "variable")
ggplot(oxide) +
geom_point(aes(A, value, col = as.factor(B))) +
scale_color_brewer(palette = "Set2") +
facet_grid(C ~ D)
oxide_collapse <- oxide %>%
group_by(A, B, C, D) %>% # isolate independent configurations
summarise(mean = mean(value), var = var(value)) # take average and var. across groups
oxide_collapse
# A tibble: 16 × 6
# Groups: A, B, C [8]
A B C D mean var
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -1 -1 -1 -1 378 2
2 -1 -1 -1 1 380 12
3 -1 -1 1 -1 372 6.67
4 -1 -1 1 1 378 1.33
5 -1 1 -1 -1 381 3.33
6 -1 1 -1 1 371 0.667
7 -1 1 1 -1 385 0.667
8 -1 1 1 1 376 0.667
9 1 -1 -1 -1 416 0.667
10 1 -1 -1 1 415 14.7
11 1 -1 1 -1 390 2
12 1 -1 1 1 392 34
13 1 1 -1 -1 448 3.33
14 1 1 -1 1 446 6
15 1 1 1 -1 430 8.67
16 1 1 1 1 429 1.33
~ A * (B + C)
. This formula gets expanded into A * B + A * C
, and each of the products expands into main effects and interactions (e.g., A * B = A + B + A:B
). This captures all the effects we mention in the previous point.
Call:
lm(formula = mean ~ A * (B + C), data = oxide_collapse)
Residuals:
Min 1Q Median 3Q Max
-7.125 -2.469 1.000 2.250 6.625
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 399.188 1.049 380.475 < 2e-16 ***
A 21.562 1.049 20.552 1.64e-09 ***
B 9.063 1.049 8.638 5.98e-06 ***
C -5.187 1.049 -4.944 0.000583 ***
A:B 8.437 1.049 8.042 1.12e-05 ***
A:C -5.313 1.049 -5.063 0.000489 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.197 on 10 degrees of freedom
Multiple R-squared: 0.9839, Adjusted R-squared: 0.9759
F-statistic: 122.3 on 5 and 10 DF, p-value: 1.237e-08
image
method in the rsm
(“response surface methodology”) package directly outputs this. For example, in the first plot, we see that the response is highest at A and B both equal to 1. The curvature in this surface also suggests that there is an interaction between these two terms.(Intercept) A B D B:D
6.125000 2.708333 -3.041667 2.708333 -3.625000
Lessons: * Don’t treat repeated measures as replicates, or we risk many false positive effects * It can be useful to model the variance of the response, rather than simply the mean
Call:
lm(formula = value ~ A * B * C * D, data = oxide)
Residuals:
Min 1Q Median 3Q Max
-6 -1 0 1 8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 399.1875 0.3094 1290.369 < 2e-16 ***
A 21.5625 0.3094 69.701 < 2e-16 ***
B 9.0625 0.3094 29.294 < 2e-16 ***
C -5.1875 0.3094 -16.769 < 2e-16 ***
D -0.8125 0.3094 -2.626 0.0115 *
A:B 8.4375 0.3094 27.274 < 2e-16 ***
A:C -5.3125 0.3094 -17.173 < 2e-16 ***
B:C 1.9375 0.3094 6.263 9.93e-08 ***
A:D 0.5625 0.3094 1.818 0.0753 .
B:D -1.9375 0.3094 -6.263 9.93e-08 ***
C:D 0.5625 0.3094 1.818 0.0753 .
A:B:C -0.1875 0.3094 -0.606 0.5473
A:B:D 1.4375 0.3094 4.647 2.65e-05 ***
A:C:D -0.0625 0.3094 -0.202 0.8407
B:C:D -0.3125 0.3094 -1.010 0.3175
A:B:C:D 0.0625 0.3094 0.202 0.8407
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.475 on 48 degrees of freedom
Multiple R-squared: 0.9933, Adjusted R-squared: 0.9912
F-statistic: 476.8 on 15 and 48 DF, p-value: < 2.2e-16
The factors are drill load, flow rate, rotational speed, and drilling mud.↩︎