# Information Theory and Source coding

I was going through my dropbox and found some of my old notes on information theory, till this day I think I still love these notes. 🙂

## I. Introduction to information theory,

Information theory: Fundamental Limits of communication Coding Techniques: Concerned with practical techniques to realise the limits specified by information theory
Uncertainty:“More the uncertainty of the event is, more is the information content in the occurrence of the event” Mathematical measure of information should be function of probability of outcome and should satisfy the following:

The unit of I(x) is the bit (binary unit) if b = 2,Hartley or decit

if b = 10,nat (natural unit) if b = e.

## II. Entropy and its properties,

Entropy: The mean value of $latexI(x_i)$ over the alphabet of source X with m different symbols is entropy
X is an ensemble that contains {x (Random Variable), Ax (Outcome Alphabets, Px(Probabilities) }

Entropy of the source is maximum when all the messages from the sources are equally likely.
Information Rate: If the time rate at which source X emits symbols is r (symbols/s), the information rate R of thesource is given by $R = r \times H(x_i) b/s$

Mutual Information
Mutual information the uncertainty about channel input that is resolved after observing the channel output. $I(x;y) = H(x) - H(x|y)$

Properties of Mutual Information

## III. Source coding theorem

Conversation of the output of DMS into a code with the objective to minimise the average bit rate required for representation of source by reducing the redundancy of the information source to improve efficiency.

Encode DMS Minimise average bit rate Reduce Redundancy  Improve efficiency

Average Code Length $L = \sum_{i}^{m} P(x_i)n_i$

Code Efficiency $\eta = \frac{L_(min)}{L}$

*Modification due to Shannon’s first theorem.

Code redundancy $\gamma = 1 - \eta$

Source coding theorems

Shannon’s first theorem: For a DMS $L \ge H(x)$ and if $L_(min) = H(x)$ then we can rewrite the efficiency as $\eta = \frac{H(x)}{L}$

Wiki: In information theory, Shannon’s source coding theorem (or noiseless coding theorem) establishes the limits to possible data compression, and the operational meaning of the Shannon entropy.

The source coding theorem shows that it is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without it being virtually certain that information will be lost. However it is possible to get the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss.

Krafts inequality: For a DMS the instantaneous code exists only if

• $K \le 1$
• $K = \sum_{i}^{m} 2^{-n_i}$

## IV Shannon-Fano Coding

Wikipedia: Shannon–Fano coding is used in the IMPLODE compression method, which is part of the ZIP file format. A Shannon–Fano tree is built according to a specification designed to define an effective code table.

Shannon-Fano algorithm:

List the source symbols in order of decreasing probability. Partition the set into two sets that are as close to equiprobable as possible, and assign 0 to the upper set and 1 to the lower set. Continue this process, each time partitioning the sets with as nearly equal probabilities as possible until further partitioning is not possible.

Wikipedia: In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding

Wikipedia: “A Mathematical Theory of Communication” is an article by mathematician Claude E. Shannon published in Bell System Technical Journal in 1948. It was renamed The Mathematical Theory of Communication in the book of the same name, a small but significant title change after realizing the generality of this work.

## V. Huffman Coding

Huffman coding is used in JPEG

It is performed in two steps –(1)Merging:

(2) Splitting:

parts of codeword.

## VI Lempel-Ziv Coding

“Universal code” –works without knowledge of source statistics.

Parse input file into unique phrases.

Encode phrases using fixed length codeword, variable to fixed length encoding (suitable for synchronous transmission.)

Decoder can uniquely decode the send sequence.

Code rate approaches the source entropy for large sequence

Algorithm inefficient for short sequences.

Used : encoding binary/text files, compress/uncompress software for PCs.

Applications

## VII. Run Length Coding

Run: a sequence of identical characters. data has many runs.

AAAABBBAABBBBBCCCCCCCCDABCB
4A3BAA5B8CDABCB3A4B3CD

References
Textbooks:

## Classification of Codes

< p class=”has-medium-font-size”>q. Identify is the code is prefix-free or uniquely decodable code.

Uniquely decodable if every finite vector x ∈ X k maps to different code vectors, i.e. the viewed as vectors the code is non-singular. Such code is always decodable in a unique way. It might require to see the complete vector before the individual codewords can be decoded.

Prefix-free or instantaneous if no codeword is a prefix of another codeword. In such code each codeword can be decoded as soon as it is received.1