All notes
Compressi

## Prefix code

Prefix Property
There is no code word in the system that is a prefix (initial segment) of any other code word in the system.
Example: a code with code words {9, 55} has the prefix property; a code consisting of {9, 5, 59, 55} does not, because "5" is a prefix of "59" and also of "55".

A prefix code is a type of code system in posession of "prefix property".

A prefix code is a uniquely decodable code: a receiver can identify each word without requiring a special marker between words.

Prefix codes are not error-correcting codes.
In practice, a message might first be compressed with a prefix code, and then encoded again with channel coding (including error correction) before transmission.
Kraft's inequality characterizes the sets of code word lengths that are possible in a uniquely decodable code

A suffix code is a set of words none of which is a suffix of any other.
A bifix code is a set of words which is both a prefix and a suffix code.

### Kraft's inequality

It gives both a necessary and sufficient condition for the existence of a prefix code for a given set of codeword lengths.

Definition. Given a list of symbols, there exists a prefix code with a set of codewords ($\sigma_1, \sigma_2, \ldots, \sigma_r$) where the length of each codeword $\|\sigma_i\|=n_i$, $\forall i$ if and only if $$\sum_{i=1}^r s^{−n_i} \le 1$$ where $s$ is the size of the alphabet $S$.

Suppose: \left\{ \begin{aligned} a & \rightarrow 0 \\ b & \rightarrow 10 \\ c & \rightarrow 110 \\ d & \rightarrow 111 \\ \end{aligned} \right. $a, b, c, d$ are symbols, $\{0,1\}$ is the alphabet, and $0, 10, 110, 111$ are codewords. $$2^{-1} + 2^{-2} + 2^{-3} + 2^{-3} \leq 1$$ And, for the non-prefix: \left\{ \begin{aligned} a & \rightarrow 0 \\ b & \rightarrow 01 \\ c & \rightarrow 011 \\ d & \rightarrow 0111 \\ \end{aligned} \right. $$2^{-1} + 2^{-2} + 2^{-3} + 2^{-4} \leq 1$$

Kraft's inequality limits the lengths of codewords in a prefix code: if one takes an exponential of the length of each valid codeword, the resulting set of values must look like a probability mass function, that is, it must have total measure less than or equal to one. $$\sum_{l \in leaves} 2^{-depth(l)} \le 1$$

Kraft's inequality can be thought of in terms of a constrained budget to be spent on codewords, with shorter codewords being more expensive.

• If Kraft's inequality holds with strict inequality, the code has some redundancy.
• If Kraft's inequality holds with equality, the code in question is a complete code.
• If Kraft's inequality does not hold, the code is not uniquely decodable.

### Forward error correction

Forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels. The central idea is the sender encodes the message in a redundant way by using an error-correcting code (ECC).
The first error-correcting code in 1950: the Hamming (7,4) code.

## Huffman coding

A particular type of optimal prefix code that is commonly used for lossless data compression.

• The algorithm derives a variable-length code table from the estimated probability or frequency of occurrence (weight) for each possible value of the source symbol.
• As in other entropy encoding methods, more common symbols are generally represented using fewer bits than less common symbols.

The algorithm was developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".