Diagnostics and Power

Tricks to make sure tests aren’t applied blindly

Kris Sankaran true
09-17-2021

Readings 2.4, Rmarkdown

  1. The reference distribution depends on three assumptions,
  1. We can check the last two assumptions using something called a normal probability plot. This plots the sample quantiles against the theoretical normal distribution’s quantiles.
library(EBImage)
display(readImage("https://uwmadison.box.com/shared/static/n1a3bdzspet06ibsd1yebc1r6w7kzf3o.png"))

Power Analysis

  1. People will often call you asking about what a good sample size is for their experiment. A good way to answer this is to compute the power curves as a function of different signal strengths.
include_graphics("https://uwmadison.box.com/shared/static/06qu4t1q6jemmwto01vgd95jtzd0if3e.png")

  1. Of course, we can never know the signal strength in advance. But we can test a few different plausible ranges, based on past experience.

Important Variations

  1. What if the variances are not equal? Our test statistic used a pooled standard deviation. If the variances aren’t equal, we could standardize differently, \[ \frac{\bar{y}_1 - \bar{y}_2}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}}. \]
include_graphics("https://uwmadison.box.com/shared/static/uzh72egfwwp251xzac932cej6kna1woz.png")

  1. This is unfortunately not exactly \(t\)-distributed under the null. That said, the reference distribution can be well approximated by one, and almost any statistical package will let you compute corresponding \(p\)-values and confidence intervals.

  2. What if the variances are known? In this case, we can avoid using \(S_1\) and \(S_2\). Instead, we ought to standardize using the known standard deviations. Since there’s no additional randomness coming from estimation, the reference distribution is a standard normal, not a \(t\)-distribution.