Statistical Experimental Design: Probability Review

Readings 2.1 - 2.2, Rmarkdown

The most basic idea of statistics is that if you ran an experiment again, you would get different results i.e., there is randomness. Probability is the calculus of randomness.

Definitions

If \(y\) is a discrete random variable taking on values \(y_{k}\) with probability \(p_{k}\), then its mean is defined as \(\mathbf{E}\left[y\right] = \sum_{k} p_{k}y_{k}\). If it is a continuous variable with density \(p\left(y\right)\), the corresponding quantity is \(\mathbf{E}\left[y\right] = \int_{\mathbf{R}} y p\left(y\right) dy\). Think of integral in the continuous case like the limit of a Riemann sum in calculus

Figure 1: The expectation in a two-valued random variable is a weighted average between the values it can take on.

To build intuition about this formula, consider some special cases,

If there are just two values with equal probability, it’s just a midpoint
If one of the probability weights is larger, it’s closer to the larger weight
If you have many values, it’s closer to the ones with large weight

The variance of a random variable \(Y\) is defined as \(\text{Var}\left[y\right] = \mathbf{E}\left[y - \mathbf{E}\left[y\right]\right]^2\). This measures the typical distance of \(Y\) around its mean.

Figure 2: Variance measures the typical distance of an observation from the distribution’s mean.

Useful properties

For calculations, it’s often easier to use properties of mean and variance to reduce to simpler expressions, rather than using the formulas above. For example, expectation is a linear function,

\[ \mathbf{E}\left[c_{1}y_{1} + c_{2}y_{2}\right] = c_{1}\mathbf{E}\left[y_{1}\right] + c_{2}\mathbf{E}\left[y_{2}\right]. \]

Variance is not linear, but the variance of a linear combination of two random variables can be found simply enough,

\[ \text{Var}\left[c_1 y_1 + c_2 y_2\right] = c_1^2 \text{Var}\left[y_1\right] + c_2^2 \text{Var}\left[y_2\right] + c_1 c_2 \text{Cov}\left[y_1, y_2\right] \]

where we define the covariance as, \[ \text{Cov}\left[y_1, y_2\right] = \mathbf{E}\left[\left(y_1 - \mathbf{E}\left[y_1\right]\right)\left( y_2 - \mathbf{E}\left[y_2\right]\right)\right] \]

Figure 3: If two variables have high covariance, then whether or not they are above their means is often synchronized.

Sampling and Estimators

Why is probability useful in statistics. From a high-level, statistics is concerned with drawing inferences from the specific to the general. Starting from a sample, we would like to say something true about the population. A typical strategy is to compute a statistic (a function of the sample) to say something about the probability distribution that it was drawn from (a property of the population).
Suppose we have observed \(n\) samples \(y_{1}, \dots, y_{n}\). Two very useful statistics are the sample mean,

\[ \bar{y} = \frac{1}{n}\sum_{i = 1}^{n}y_i \] and the sample standard deviation \[ S = \sqrt{\frac{1}{n - 1}\sum_{i = 1}^{n}\left(y_i - \bar{y}\right)^2} \]

and the sample standard deviation,

Statisticians have come up with a variety of properties that they would like their statistics to satisfy. Two common requirements are that the statistic be “unbiased” and “minimum variance.” Unbiased means it’s centered around the correct value, on average Minimum variance means it’s not too far from the correct value, on average.

Central limit theorem

For very many distributions, an appropriately rescaled version of the sample mean converges to a normal distribution. Specifically, if all the \(y_i\) are drawn i.i.d. from some distribution with mean \(\mu\) and variance \(\sigma^2\), then

\[ \frac{\sqrt{n}\left(\bar{y} - \mu\right)}{\sigma} \to \mathcal{N}\left(0, 1\right). \]

This phenomenon is called the central limit theorem.