Readings 2.1 - 2.2, Rmarkdown
- The most basic idea of statistics is that if you ran an experiment again, you would get different results i.e., there is randomness. Probability is the calculus of randomness.
Definitions
- If \(y\) is a discrete random variable taking on values \(y_{k}\) with probability \(p_{k}\), then its mean is defined as \(\mathbf{E}\left[y\right] = \sum_{k} p_{k}y_{k}\). If it is a continuous variable with density \(p\left(y\right)\), the corresponding quantity is \(\mathbf{E}\left[y\right] = \int_{\mathbf{R}} y p\left(y\right) dy\). Think of integral in the continuous case like the limit of a Riemann sum in calculus
- To build intuition about this formula, consider some special cases,
- If there are just two values with equal probability, it’s just a midpoint
- If one of the probability weights is larger, it’s closer to the larger weight
- If you have many values, it’s closer to the ones with large weight
- The variance of a random variable \(Y\) is defined as \(\text{Var}\left[y\right] = \mathbf{E}\left[y - \mathbf{E}\left[y\right]\right]^2\). This measures the typical distance of \(Y\) around its mean.
Useful properties
- For calculations, it’s often easier to use properties of mean and variance to reduce to simpler expressions, rather than using the formulas above. For example, expectation is a linear function,
\[
\mathbf{E}\left[c_{1}y_{1} + c_{2}y_{2}\right] = c_{1}\mathbf{E}\left[y_{1}\right] + c_{2}\mathbf{E}\left[y_{2}\right].
\]
Variance is not linear, but the variance of a linear combination of two random variables can be found simply enough,
\[
\text{Var}\left[c_1 y_1 + c_2 y_2\right] = c_1^2 \text{Var}\left[y_1\right] +
c_2^2 \text{Var}\left[y_2\right] +
c_1 c_2 \text{Cov}\left[y_1, y_2\right]
\]
where we define the covariance as, \[
\text{Cov}\left[y_1, y_2\right] = \mathbf{E}\left[\left(y_1 - \mathbf{E}\left[y_1\right]\right)\left(
y_2 - \mathbf{E}\left[y_2\right]\right)\right]
\]
Sampling and Estimators
Why is probability useful in statistics. From a high-level, statistics is concerned with drawing inferences from the specific to the general. Starting from a sample, we would like to say something true about the population. A typical strategy is to compute a statistic (a function of the sample) to say something about the probability distribution that it was drawn from (a property of the population).
Suppose we have observed \(n\) samples \(y_{1}, \dots, y_{n}\). Two very useful statistics are the sample mean,
\[
\bar{y} = \frac{1}{n}\sum_{i = 1}^{n}y_i
\] and the sample standard deviation \[
S = \sqrt{\frac{1}{n - 1}\sum_{i = 1}^{n}\left(y_i - \bar{y}\right)^2}
\]
and the sample standard deviation,
- Statisticians have come up with a variety of properties that they would like their statistics to satisfy. Two common requirements are that the statistic be “unbiased” and “minimum variance.” Unbiased means it’s centered around the correct value, on average Minimum variance means it’s not too far from the correct value, on average.
Central limit theorem
- For very many distributions, an appropriately rescaled version of the sample mean converges to a normal distribution. Specifically, if all the \(y_i\) are drawn i.i.d. from some distribution with mean \(\mu\) and variance \(\sigma^2\), then
\[
\frac{\sqrt{n}\left(\bar{y} - \mu\right)}{\sigma} \to \mathcal{N}\left(0, 1\right).
\]
- This phenomenon is called the central limit theorem.