Readings 2.4, Rmarkdown
- Statistics is about making general conclusions based on specific evidence. One approach is based on hypothesis testing: we have a theory about the general (a null hypothesis), and we want to see whether our specific sample is consistent with that theory. This philosophy is made quantitative by following a standard recipe,
- Pose a null hypothesis about the world
- Define a test statistic that should detect departures from that null hypothesis
- Determine a reference distribution for that test statistic
- Compute the test statistic on your data, and see if it’s plausibly a draw from your reference distribution
- Proceeding in this way, there are a few types of error
| Null is true |
False alarm |
Correct |
| Null is false |
Correct |
Missed detection |
- \(p\)-values. “Rejected” or “Not rejected” is only a very coarse description of how the data conforms to a theory. \(p\)-values give a measure of the degree of (im)plausibility of a test statistic under a given null hypothesis. The specific measure of plausibility will depend on the form of the test – we will see a specific example in the next set of notes.
Two Sample t-test
- Motivating example: You have two ways of making concrete mortar. Is one stronger than the other? By default, you think they are equally strong. We can denote this (the null hypothesis) by,
\[
H_0: \mu_1 = \mu_2
\]
The alternative hypothesis is that the strengths are not equal,
\[
H_1: \mu_1 \neq \mu_2
\]
- Our test statistic for detecting departures from this null will be,
\[
t_0 := \frac{\bar{y}_1 - \bar{y}_2}{S_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}
\] where we define the pooled standard deviation by,
\[
S^2_p = \frac{\left(n_1 - 1\right)S_1^2 + \left(n_2 - 1\right)S_2^2}{n_1 + n_2 -2}
\] and \(S_1\) and \(S_2\) are the usual standard deviations for each group individually. (consider what happens when \(n_1 = n_2 = n\))
- Under the null hypothesis, this is a ratio between a standard normal and chi-square, so \(t_0\) is \(t\)-distributed with \(n_1 + n_2 - 2\) d.f. This gives reference distribution under the null.
Confidence intervals
- Instead of thinking we know the mean and trying to reject it, why don’t we try to directly estimate it (with an error bar)? A 95% confidence interval is an interval \([L, U]\) satisfying,
\[
\mathbf{P}\left(\theta \in \left[L, U\right]\right) = 0.95
\]
- The randomness here is in \(L\) and \(U\). If we were being more formal, we would write those as functions of the (random) sample,
\[
\left[L\left(y_1, \dots, y_n\right), U\left(y_1, \dots, y_n\right)\right]
\]
- To construct one for the two sample test, recall that, based on the \(t\)-distribution characterization from the probability-review lecture,
\[
P\left(\frac{\left(\bar{y}_1 - \bar{y}_2\right) - \left(\mu_1 - \mu_2\right)}{S_p\sqrt{\frac{2}{n}}} \in \left[t_{0.025, 2\left(n - 1\right)}, t_{0.975, 2\left(n - 1\right)}\right] \right) = 0.95
\]
To simplify the algebra, let
\[
T\left(y\right) := \bar{y}_1 - \bar{y}_2 \\
\theta := \mu_1 - \mu_2 \\
\hat{\sigma} := S_p\sqrt{\frac{2}{n}} \\
t_{0.025, 2\left(n - 1\right)} := t_{\text{left}} \\
t_{0.975, 2\left(n - 1\right)} := t_{\text{right}}
\] so that the above expression reduces to,
\[
\mathbf{P}\left(\frac{T\left(y\right) - \theta}{\hat{\sigma}} \in \left[t_{\text{left}}, t_{\text{right}}\right]\right) = 0.95
\]
- Now, if we rearrange terms, we find
\[
\mathbf{P}\left(\theta \in \left[T\left(y\right) - \hat{\sigma}t_{\text{right}}, T\left(y\right) - \hat{\sigma}t_{\text{left}}\right]\right) = 0.95
\] We can use the fact that \(t_{\text{left}} = -t_{\text{right}}\) to simplify the expression further to
\[
\mathbf{P}\left(\theta \in \left[T\left(y\right) - \hat{\sigma}t_{\text{right}}, T\left(y\right) + \hat{\sigma}t_{\text{right}}\right]\right) = 0.95
\] This is exactly the property that a confidence interval has to satisfy. Plugging in the original expressions gives the confidence interval for the difference in means, assuming shared variance.