Information Theory and Source coding

I was going through my dropbox and found some of my old notes on information theory, till this day I think I still love these notes. 🙂

I. Introduction to information theory,

Information theory: Fundamental Limits of communication Coding Techniques: Concerned with practical techniques to realise the limits specified by information theory 
Uncertainty:“More the uncertainty of the event is, more is the information content in the occurrence of the event” Mathematical measure of information should be function of probability of outcome and should satisfy the following: 

Information should be proportional to the outcome Information contained in the independent outcome should add. 

The unit of I(x) is the bit (binary unit) if b = 2,Hartley or decit

if b = 10,nat (natural unit) if b = e.

Thus when rarer events happen you get more information.  

II. Entropy and its properties,

Entropy: The mean value of $latexI(x_i)$ over the alphabet of source X with m different symbols is entropy
X is an ensemble that contains {x (Random Variable), Ax (Outcome Alphabets, Px(Probabilities) }

Entropy of the source is maximum when all the messages from the sources are equally likely.
Information Rate: If the time rate at which source X emits symbols is r (symbols/s), the information rate R of thesource is given by
R = r \times H(x_i) b/s

Mutual Information
Mutual information the uncertainty about channel input that is resolved after observing the channel output. 
I(x;y) = H(x) - H(x|y)

Properties of Mutual Information 

I(x;y) = I(y;x) I(x;y) = H(y)-H(y|x) I(x;y) \ge 0 I(x;y) = H(x) + H(y) - H(x,y)

III. Source coding theorem

Conversation of the output of DMS into a code with the objective to minimise the average bit rate required for representation of source by reducing the redundancy of the information source to improve efficiency. 

Encode DMS Minimise average bit rate Reduce Redundancy  Improve efficiency 

Average Code Length

L = \sum_{i}^{m} P(x_i)n_i

Code Efficiency 

\eta = \frac{L_(min)}{L}

*Modification due to Shannon’s first theorem. 

Code redundancy 

\gamma = 1 - \eta

Source coding theorems 

Shannon’s first theorem: For a DMS L \ge H(x) and if L_(min) = H(x) then we can rewrite the efficiency as

\eta = \frac{H(x)}{L}

Wiki: In information theory, Shannon’s source coding theorem (or noiseless coding theorem) establishes the limits to possible data compression, and the operational meaning of the Shannon entropy.

The source coding theorem shows that it is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without it being virtually certain that information will be lost. However it is possible to get the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss.

Krafts inequality: For a DMS the instantaneous code exists only if 

  • K \le 1
  • K = \sum_{i}^{m} 2^{-n_i}

IV Shannon-Fano Coding

Wikipedia: Shannon–Fano coding is used in the IMPLODE compression method, which is part of the ZIP file format.[2] A Shannon–Fano tree is built according to a specification designed to define an effective code table.

Shannon-Fano algorithm:

List the source symbols in order of decreasing probability. Partition the set into two sets that are as close to equiprobable as possible, and assign 0 to the upper set and 1 to the lower set. Continue this process, each time partitioning the sets with as nearly equal probabilities as possible until further partitioning is not possible.

Wikipedia: In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding

Wikipedia: “A Mathematical Theory of Communication” is an article by mathematician Claude E. Shannon published in Bell System Technical Journal in 1948.[1][2][3][4] It was renamed The Mathematical Theory of Communication in the book of the same name,[5] a small but significant title change after realizing the generality of this work.

V. Huffman Coding 

Huffman coding is used in JPEG 
Advantages:1.  Easy to implement 

Lossless technique  Produces optimal and compact code. 


Relatively slow  Depends of the statistical information of the data Decoding becomes difficult because of different code lengths.  Overhead due to Huffman tree. 

It is performed in two steps –(1)Merging:

Repeat arranging the symbols in descending order of probability merging the least probable two entries into a new entry with a probability equal to the sum of their probabilities in the next column, until only two entries remain.

(2) Splitting:

 Assign 0/1 to each of the two remaining entries to make their initial

parts of codeword.

 Then repeat splitting back the merged entries into the two (previous) entries and appending another 0/1 to each of their (unfinished) codeword’s.

VI Lempel-Ziv Coding 

“Universal code” –works without knowledge of source statistics.

Parse input file into unique phrases.

Encode phrases using fixed length codeword, variable to fixed length encoding (suitable for synchronous transmission.)

Decoder can uniquely decode the send sequence.

Code rate approaches the source entropy for large sequence

Algorithm inefficient for short sequences.

Used : encoding binary/text files, compress/uncompress software for PCs.


The CCITT Recommendation V.42 bis is a data compression standard used in modems that connect computers with remote users via the GSTN. n image compression, Specifically, it is utilised in the graphic interchange format (GIF) which was created to encode graphical images. the Unix Compress command.

VII. Run Length Coding

Run: a sequence of identical characters. data has many runs.

Each character that repeats itself in a sequence three or more times is replaced by the single character and its frequency. Data to be transmitted is first scanned to find the coding information before transmission. For example, the message

Saving can be dramatic when long runs are present. A problem arises if one of the characters being transmitted is a digit, as in  Solution: instead of using a number n, a character can be used whose ASCII value is n. This method is only modestly efficient for text files and widely used for encoding fax images. The main advantage of this algorithm is simplicity and speed.

Discrete memoryless channel 

Binary Symmetric Channel 


Lossless Channel 

Deterministic Channel 

Noiseless Channel

Classification of Codes 

< p class=”has-medium-font-size”>q. Identify is the code is prefix-free or uniquely decodable code. 

Uniquely decodable if every finite vector x ∈ X k maps to different code vectors, i.e. the viewed as vectors the code is non-singular. Such code is always decodable in a unique way. It might require to see the complete vector before the individual codewords can be decoded.

Prefix-free or instantaneous if no codeword is a prefix of another codeword. In such code each codeword can be decoded as soon as it is received.1

Leave a Reply

%d bloggers like this: