Probability distributions, their properties, and relationships.
Figure 1: The expectation in a two-valued random variable is a weighted average between the values it can take on.
Figure 2: Variance measures the typical distance of an observation from the distribution’s mean.
\[ \mathbf{E}\left[c_{1}y_{1} + c_{2}y_{2}\right] = c_{1}\mathbf{E}\left[y_{1}\right] + c_{2}\mathbf{E}\left[y_{2}\right]. \]
Variance is not linear, but the variance of a linear combination of two random variables can be found simply enough,
\[ \text{Var}\left[c_1 y_1 + c_2 y_2\right] = c_1^2 \text{Var}\left[y_1\right] + c_2^2 \text{Var}\left[y_2\right] + c_1 c_2 \text{Cov}\left[y_1, y_2\right] \]
where we define the covariance as, \[ \text{Cov}\left[y_1, y_2\right] = \mathbf{E}\left[\left(y_1 - \mathbf{E}\left[y_1\right]\right)\left( y_2 - \mathbf{E}\left[y_2\right]\right)\right] \]
Figure 3: If two variables have high covariance, then whether or not they are above their means is often synchronized.
Why is probability useful in statistics. From a high-level, statistics is concerned with drawing inferences from the specific to the general. Starting from a sample, we would like to say something true about the population. A typical strategy is to compute a statistic (a function of the sample) to say something about the probability distribution that it was drawn from (a property of the population).
Suppose we have observed \(n\) samples \(y_{1}, \dots, y_{n}\). Two very useful statistics are the sample mean,
\[ \bar{y} = \frac{1}{n}\sum_{i = 1}^{n}y_i \] and the sample standard deviation \[ S = \sqrt{\frac{1}{n - 1}\sum_{i = 1}^{n}\left(y_i - \bar{y}\right)^2} \]
and the sample standard deviation,
\[ \frac{\sqrt{n}\left(\bar{y} - \mu\right)}{\sigma} \to \mathcal{N}\left(0, 1\right). \]