Statistical Experimental Design: Multiple Comparisons

Readings 3.5, Rmarkdown

If we planned in advance which contrasts we want to use, we are fine. But what if we hadn’t planned any, and go in search for significant results using different contrasts?
Imagine testing 100 hypothesis at \(\alpha = 0.05\). Suppose they are all null. We would see 5 rejected null hypotheses on average.
Therefore, if we want to allow ourselves some flexibility in searching over contrasts, we need to adapt our methodology. We should control the experimentwise error rate, the probability that any test results in a false positive.

Let’s say we’re interested in \(m\) contrasts, \(c_1, \cdots ,c_m\). The idea is to widen our confidence intervals slightly, to make false positives rarer.
How much should the intervals be widened? It’s not obvious, but Scheffe found that we should multiply the endpoints of each of our \(m\) intervals by \[ \sqrt{\left(a - 1\right)F_{0.025, a - 1, N - a}} \] (this is for 95% confidence intervals).

If we only care about the differences between pairs of group means, we can use Tukey’s method. All the contrasts now have the form, \[ \Gamma\left(c\right) = \mu_i - \mu_j \]

We’re going to make confidence intervals for these, and it’s natural to center them around, \[\begin{align*} \hat{\Gamma}\left(c\right) &= \bar{y}_i - \bar{y}_j. \end{align*}\]
How wide should the intervals be? Tukey found a reference distribution for \[ \frac{\bar{y}_{\max }-\bar{y}_{\min }}{\frac{\hat{\sigma}}{\sqrt{n}}} \] where \(\bar{y}_{\text{max}}\) refers to the maximum group’s average across the \(a\) groups. From there, he tabulated the quantiles as \(q_{\alpha}(a, \text{df})\).
It turns out that the appropriate width of the confidence intervals can be derived from these quantiles, \[ \left[\left(\bar{y}_{i}-\bar{y}_{j}\right)-q_{\alpha}(a, \text{df}) \frac{\hat{\sigma}}{\sqrt{n}},\left(\bar{y}_{i}-\bar{y}_{j}\right)+q_{\alpha}(a, \text{df}) \frac{\hat{\sigma}}{\sqrt{n}}\right] \]

This works because if the difference between the max and min group averages is contained within this interval, then all pairs \(i, j\) of differences are also contained in this interval.

Fisher’s LSD is used to compare pairs of means. Unlike Tukey’s method, it doesn’t control the experimentwise error rate
Notice that the variance of the differences is \[ \begin{aligned} \operatorname{Var}\left(\bar{y}_{i}-\bar{y}_{j}\right) &=\operatorname{Var}\left(\bar{y}_{i}\right)+\operatorname{Var}\left(\bar{y}_{j}\right) \\ &=\frac{\sigma^{2}}{n_{i}}+\frac{\sigma^{2}}{n_{j}} \\ & \approx \hat{\sigma}^{2}\left(\frac{1}{n_{i}}+\frac{1}{n_{j}}\right) \end{aligned} \]
Fisher’s LSD compares each difference \(\left|y_- y_j\right|\) to the cutoff, \[ t_{\text {right }} \sqrt{\hat{\sigma}^{2}\left(\frac{1}{n_{i}}+\frac{1}{n_{j}}\right)} \] and rejects the null that the pairs have equal means if the difference is larger.