Probability Review

Probability distributions, their properties, and relationships.

Kris Sankaran true
09-14-2021

Readings 2.1 - 2.2, Rmarkdown

  1. The most basic idea of statistics is that if you ran an experiment again, you would get different results i.e., there is randomness. Probability is the calculus of randomness.

Definitions

  1. If \(y\) is a discrete random variable taking on values \(y_{k}\) with probability \(p_{k}\), then its mean is defined as \(\mathbf{E}\left[y\right] = \sum_{k} p_{k}y_{k}\). If it is a continuous variable with density \(p\left(y\right)\), the corresponding quantity is \(\mathbf{E}\left[y\right] = \int_{\mathbf{R}} y p\left(y\right) dy\). Think of integral in the continuous case like the limit of a Riemann sum in calculus

The expectation in a two-valued random variable is a weighted average between the values it can take on.

Figure 1: The expectation in a two-valued random variable is a weighted average between the values it can take on.

  1. To build intuition about this formula, consider some special cases,
  1. The variance of a random variable \(Y\) is defined as \(\text{Var}\left[y\right] = \mathbf{E}\left[y - \mathbf{E}\left[y\right]\right]^2\). This measures the typical distance of \(Y\) around its mean.
Variance measures the typical distance of an observation from the distribution's mean.

Figure 2: Variance measures the typical distance of an observation from the distribution’s mean.

Useful properties

  1. For calculations, it’s often easier to use properties of mean and variance to reduce to simpler expressions, rather than using the formulas above. For example, expectation is a linear function,

\[ \mathbf{E}\left[c_{1}y_{1} + c_{2}y_{2}\right] = c_{1}\mathbf{E}\left[y_{1}\right] + c_{2}\mathbf{E}\left[y_{2}\right]. \]

Variance is not linear, but the variance of a linear combination of two random variables can be found simply enough,

\[ \text{Var}\left[c_1 y_1 + c_2 y_2\right] = c_1^2 \text{Var}\left[y_1\right] + c_2^2 \text{Var}\left[y_2\right] + c_1 c_2 \text{Cov}\left[y_1, y_2\right] \]

where we define the covariance as, \[ \text{Cov}\left[y_1, y_2\right] = \mathbf{E}\left[\left(y_1 - \mathbf{E}\left[y_1\right]\right)\left( y_2 - \mathbf{E}\left[y_2\right]\right)\right] \]

If two variables have high covariance, then whether or not they are above their means is often synchronized.

Figure 3: If two variables have high covariance, then whether or not they are above their means is often synchronized.

Sampling and Estimators

  1. Why is probability useful in statistics. From a high-level, statistics is concerned with drawing inferences from the specific to the general. Starting from a sample, we would like to say something true about the population. A typical strategy is to compute a statistic (a function of the sample) to say something about the probability distribution that it was drawn from (a property of the population).

  2. Suppose we have observed \(n\) samples \(y_{1}, \dots, y_{n}\). Two very useful statistics are the sample mean,

\[ \bar{y} = \frac{1}{n}\sum_{i = 1}^{n}y_i \] and the sample standard deviation \[ S = \sqrt{\frac{1}{n - 1}\sum_{i = 1}^{n}\left(y_i - \bar{y}\right)^2} \]

and the sample standard deviation,

  1. Statisticians have come up with a variety of properties that they would like their statistics to satisfy. Two common requirements are that the statistic be “unbiased” and “minimum variance.” Unbiased means it’s centered around the correct value, on average Minimum variance means it’s not too far from the correct value, on average.

Central limit theorem

  1. For very many distributions, an appropriately rescaled version of the sample mean converges to a normal distribution. Specifically, if all the \(y_i\) are drawn i.i.d. from some distribution with mean \(\mu\) and variance \(\sigma^2\), then

\[ \frac{\sqrt{n}\left(\bar{y} - \mu\right)}{\sigma} \to \mathcal{N}\left(0, 1\right). \]

  1. This phenomenon is called the central limit theorem.