Characterizing effects when only one replicate is available.
1.Sometimes, only one measurement can be taken per factor configuration. This is often because when \(K\) is large, replication can increase the number of samples needed substantially. E.g., changing \(n\) from 1 to 2 when \(K = 5\) means 32 more runs. Moreover, in factor screening experiments, typically want to allocate samples to understanding new factors, rather than replicating known configurations.
library(readr)
library(dplyr)
library(ggplot2)
filtration <- read_table2("https://uwmadison.box.com/shared/static/xxh05ngikmscnddbhg2l3v268jnu4jtc.txt")
ggplot(filtration) +
geom_point(aes(A, Rate, col = C)) +
scale_color_brewer(palette = "Set2") +
facet_grid(B ~ D)
aov
function won’t give us any \(p\)-values (because it can’t estimate \(\sigma^2\)).
Call:
lm(formula = Rate ~ A * B * C * D, data = filtration)
Residuals:
ALL 16 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 45 NaN NaN NaN
A+ 26 NaN NaN NaN
B+ 3 NaN NaN NaN
C+ 23 NaN NaN NaN
D+ -2 NaN NaN NaN
A+:B+ -9 NaN NaN NaN
A+:C+ -34 NaN NaN NaN
B+:C+ 9 NaN NaN NaN
A+:D+ 31 NaN NaN NaN
B+:D+ -1 NaN NaN NaN
C+:D+ 9 NaN NaN NaN
A+:B+:C+ 2 NaN NaN NaN
A+:B+:D+ 11 NaN NaN NaN
A+:C+:D+ -12 NaN NaN NaN
B+:C+:D+ -16 NaN NaN NaN
A+:B+:C+:D+ 11 NaN NaN NaN
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 15 and 0 DF, p-value: NA
Df Sum Sq Mean Sq
A 1 1870.6 1870.6
B 1 39.1 39.1
C 1 390.1 390.1
D 1 855.6 855.6
A:B 1 0.1 0.1
A:C 1 1314.1 1314.1
B:C 1 22.6 22.6
A:D 1 1105.6 1105.6
B:D 1 0.6 0.6
C:D 1 5.1 5.1
A:B:C 1 14.1 14.1
A:B:D 1 68.1 68.1
A:C:D 1 10.6 10.6
B:C:D 1 27.6 27.6
A:B:C:D 1 7.6 7.6
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1870.6 1870.6 73.176 0.000360 ***
B 1 39.1 39.1 1.528 0.271297
C 1 390.1 390.1 15.259 0.011337 *
D 1 855.6 855.6 33.469 0.002172 **
A:B 1 0.1 0.1 0.002 0.962478
A:C 1 1314.1 1314.1 51.406 0.000821 ***
A:D 1 1105.6 1105.6 43.249 0.001220 **
B:C 1 22.6 22.6 0.883 0.390613
B:D 1 0.6 0.6 0.022 0.887871
C:D 1 5.1 5.1 0.198 0.674909
Residuals 5 127.8 25.6
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Graphical methods provide useful summaries for evaluating interactions in the \(2^K\) model.
Effect Estimation. In general, the effect estimates in a \(2^K\) design are twice the coefficients estimated by the regression onto the \(\pm 1\) coded variables. The block below first codes columns A to D (from + and - to \(1\) and \(-1\)).
code <- function(x) ifelse(x == '-', -1, 1)
filtration_coded <- filtration %>%
mutate_at(vars(A:D), code)
fit_coded <- lm(Rate~A * B * C * D, filtration_coded)
effects <- 2 * coef(fit_coded)[-1] # exclude intercept
effects
A B C D A:B A:C B:C A:D
21.625 3.125 9.875 14.625 0.125 -18.125 2.375 16.625
B:D C:D A:B:C A:B:D A:C:D B:C:D A:B:C:D
-0.375 -1.125 1.875 4.125 -1.625 -2.625 1.375
qqnorm
and qqline
functions to make a QQ plot, but there are a few differences. First, instead of drawing a line approximating all the points, we only draw a line through the points within the range specified by the probs
argument. This prevents the line from being influenced by outliers (real effects). Second, we add the names of each effect using the text
command, so we can back out which effects are likely real / null, based on the QQ plot.BsMD
package. Let \(s_0 = 1.5 \times \text{median}\left(\text{Contrast}_j\right)\). Then, the notation in the plot below refers to,
alpha PSE ME SME
0.050000 2.625000 6.747777 13.698960
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1870.6 1870.6 95.86 1.93e-06 ***
C 1 390.1 390.1 19.99 0.0012 **
D 1 855.6 855.6 43.85 5.92e-05 ***
A:C 1 1314.1 1314.1 67.34 9.41e-06 ***
A:D 1 1105.6 1105.6 56.66 2.00e-05 ***
Residuals 10 195.1 19.5
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1