Statistical Experimental Design: Diagnostics and Power

Samples are independent. If they aren’t, then we’re pretending we have more samples than we actually do.
The standard deviations are equal.
The populations are normally distributed.

We can check the last two assumptions using something called a normal probability plot. This plots the sample quantiles against the theoretical normal distribution’s quantiles.

library(EBImage)
display(readImage("https://uwmadison.box.com/shared/static/n1a3bdzspet06ibsd1yebc1r6w7kzf3o.png"))

People will often call you asking about what a good sample size is for their experiment. A good way to answer this is to compute the power curves as a function of different signal strengths.

include_graphics("https://uwmadison.box.com/shared/static/06qu4t1q6jemmwto01vgd95jtzd0if3e.png")

Of course, we can never know the signal strength in advance. But we can test a few different plausible ranges, based on past experience.

What if the variances are not equal? Our test statistic used a pooled standard deviation. If the variances aren’t equal, we could standardize differently, \[ \frac{\bar{y}_1 - \bar{y}_2}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}}. \]

include_graphics("https://uwmadison.box.com/shared/static/uzh72egfwwp251xzac932cej6kna1woz.png")

This is unfortunately not exactly \(t\)-distributed under the null. That said, the reference distribution can be well approximated by one, and almost any statistical package will let you compute corresponding \(p\)-values and confidence intervals.
What if the variances are known? In this case, we can avoid using \(S_1\) and \(S_2\). Instead, we ought to standardize using the known standard deviations. Since there’s no additional randomness coming from estimation, the reference distribution is a standard normal, not a \(t\)-distribution.