The basic principles of hypothesis testing.
Tested rejected | Test didn’t reject | |
---|---|---|
Null is true | False alarm | Correct |
Null is false | Correct | Missed detection |
The alternative hypothesis is that the strengths are not equal,
\[ H_1: \mu_1 \neq \mu_2 \]\[ t_0 := \frac{\bar{y}_1 - \bar{y}_2}{S_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \] where we define the pooled standard deviation by,
\[ S^2_p = \frac{\left(n_1 - 1\right)S_1^2 + \left(n_2 - 1\right)S_2^2}{n_1 + n_2 -2} \] and \(S_1\) and \(S_2\) are the usual standard deviations for each group individually. (consider what happens when \(n_1 = n_2 = n\))
\[ \mathbf{P}\left(\theta \in \left[L, U\right]\right) = 0.95 \]
\[ \left[L\left(y_1, \dots, y_n\right), U\left(y_1, \dots, y_n\right)\right] \]
\[ P\left(\frac{\left(\bar{y}_1 - \bar{y}_2\right) - \left(\mu_1 - \mu_2\right)}{S_p\sqrt{\frac{2}{n}}} \in \left[t_{0.025, 2\left(n - 1\right)}, t_{0.975, 2\left(n - 1\right)}\right] \right) = 0.95 \]
To simplify the algebra, let
\[ T\left(y\right) := \bar{y}_1 - \bar{y}_2 \\ \theta := \mu_1 - \mu_2 \\ \hat{\sigma} := S_p\sqrt{\frac{2}{n}} \\ t_{0.025, 2\left(n - 1\right)} := t_{\text{left}} \\ t_{0.975, 2\left(n - 1\right)} := t_{\text{right}} \] so that the above expression reduces to,
\[ \mathbf{P}\left(\frac{T\left(y\right) - \theta}{\hat{\sigma}} \in \left[t_{\text{left}}, t_{\text{right}}\right]\right) = 0.95 \]
\[ \mathbf{P}\left(\theta \in \left[T\left(y\right) - \hat{\sigma}t_{\text{right}}, T\left(y\right) - \hat{\sigma}t_{\text{left}}\right]\right) = 0.95 \] We can use the fact that \(t_{\text{left}} = -t_{\text{right}}\) to simplify the expression further to
\[ \mathbf{P}\left(\theta \in \left[T\left(y\right) - \hat{\sigma}t_{\text{right}}, T\left(y\right) + \hat{\sigma}t_{\text{right}}\right]\right) = 0.95 \] This is exactly the property that a confidence interval has to satisfy. Plugging in the original expressions gives the confidence interval for the difference in means, assuming shared variance.