All notes
Statistics

# Probability Theory

## Set Theory

One of the main objectives of a statistician is to draw conclusions about a

# Moments

## Variance

Suppose a variate $X$ having a distribution $P(x)$ with population mean $\mu$ and population variance $\text{var}(X)$ (also written as $\sigma^2$). $$\sigma^2 \equiv E(X-\mu)^2 = \int P(x)(x-\mu)^2 dx$$ $E(X)$ denotes the expectation value of $X$. The variance is therefore equal to the second central moment $\mu_2$.

Sample variance: $$s_N^2 \equiv \frac{1}{N} \sum_{i=1}^N (x_i - \bar{x})^2$$ Unbiased Sample variance: $$s_{N-1}^2 \equiv \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2$$

The reason that $s_N^2$ gives a biased estimator of the population variance is that two free parameters $\mu$ and $\sigma^2$ are actually being estimated from the data itself.

Student's t-distribution is the "best" that can be done without knowing $\sigma^2$.

The quantity $Ns_N^2/\sigma^2$ has a chi-squared distribution.

# Dist comparison

## Q-Q plot

http://en.wikipedia.org/wiki/Q%E2%80%93Q_plot. "Q" stands for quantile. It is a graphical tool to compare two distributions by plotting their quantiles against each other. The linearity and deviation between the two distributions could be seen from this plot.

# Hidden Markov model

• A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states.
• A HMM can be presented as the simplest dynamic Bayesian network.
• The mathematics behind the HMM was developed by L. E. Baum and coworkers.
• A hidden Markov model can be considered a generalization of a mixture model where the hidden variables (or latent variables), which control the mixture component to be selected for each observation, are related through a Markov process rather than independent of each other.

Notational conventions
$T$ = length of the sequence of observations (training set)
$N$ = number of states (we either know or guess this number)
$M$ = number of possible observations (from the training set)
$\Omega_X = {q_1,...q_N}$ (finite set of possible states)
$\Omega_O = {v_1,...,v_M}$ (finite set of possible observations)
$X_t$ random variable denoting the state at time t (state variable)
$O_t$ random variable denoting the observation at time t (output variable)
$\sigma = o_1,...,o_T$ (sequence of actual observations)

Distributional parameters
$A = \{a_{ij}\} s.t. a_{ij} = Pr(X_t+1 = q_j |X_t = q_i)$ (transition probabilities)
$B = \{b_i\} s.t. b_i(k) = Pr(O_t = v_k | X_t = q_i t)$ (observation probabilities)
$pi = \{pi_i\} s.t. pi_i = Pr(X_0 = q_i)$ (initial state distribution)