# Probability Theory

## Set Theory

One of the main objectives of a statistician is to draw conclusions about a

One of the main objectives of a statistician is to draw conclusions about a

Suppose a variate $X$ having a distribution $P(x)$ with *population mean* $\mu$ and *population variance* $\text{var}(X)$ (also written as $\sigma^2$).
$$ \sigma^2 \equiv E(X-\mu)^2 = \int P(x)(x-\mu)^2 dx$$
$E(X)$ denotes the expectation value of $X$.
The variance is therefore equal to the second central moment $\mu_2$.

*Sample variance*:
$$ s_N^2 \equiv \frac{1}{N} \sum_{i=1}^N (x_i - \bar{x})^2 $$
*Unbiased Sample variance*:
$$ s_{N-1}^2 \equiv \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2 $$

The reason that $s_N^2$ gives a biased estimator of the population variance is that two free parameters $\mu$ and $\sigma^2$ are actually being estimated from the data itself.

Student's t-distribution is the "best" that can be done without knowing $\sigma^2$.

The quantity $Ns_N^2/\sigma^2$ has a chi-squared distribution.

http://en.wikipedia.org/wiki/Q%E2%80%93Q_plot. "Q" stands for quantile. It is a graphical tool to compare two distributions by plotting their quantiles against each other. The linearity and deviation between the two distributions could be seen from this plot.

- A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states.
- A HMM can be presented as the simplest dynamic Bayesian network.
- The mathematics behind the HMM was developed by L. E. Baum and coworkers.
- A hidden Markov model can be considered a generalization of a mixture model where the hidden variables (or latent variables), which control the mixture component to be selected for each observation, are related through a Markov process rather than independent of each other.

**Notational conventions**

$T$ = length of the sequence of observations (training set)

$N$ = number of states (we either know or guess this number)

$M$ = number of possible observations (from the training set)

$\Omega_X = {q_1,...q_N}$ (finite set of possible states)

$\Omega_O = {v_1,...,v_M}$ (finite set of possible observations)

$X_t$ random variable denoting the state at time t (state variable)

$O_t$ random variable denoting the observation at time t (output variable)

$\sigma = o_1,...,o_T$ (sequence of actual observations)

**Distributional parameters**

$A = \{a_{ij}\} s.t. a_{ij} = Pr(X_t+1 = q_j |X_t = q_i)$ (transition probabilities)

$B = \{b_i\} s.t. b_i(k) = Pr(O_t = v_k | X_t = q_i t)$ (observation probabilities)

$pi = \{pi_i\} s.t. pi_i = Pr(X_0 = q_i)$ (initial state distribution)