Origins of Information Theory : Abundance of Entropies

  • Looking into history of scientific inventions and the understanding gained by them
  • Entropy – Thermodynamic, Statistical and Shannon.
  • Cross Entropy

Information Information Everywhere!

This part of the book starts off with Men Have been at odds concerning the value of history This is where I knew this book was something. Let’s dive into the content though!

Some believe that by studying scientific discovery in another day we can learn how to make discoveries. On the other hand, one sage observed that we learn nothing from history except that we never learn anything from history and Henry Ford asserted that history is a bunk.

John R Pierce

He then argues that many powerful discoveries of science have arisen not through the study of phenomena as they occur in nature, but through study of phenomena in man made devices. Our knowledge of aerodynamics and Hydrodynamics exists chiefly because airplanes not because of of birds.

Vision? The modern AI that tries to map synapses to mathematical Neurones? I have so many questions! If anyone reading this can help me regarding this, I would really like to know what you think.

Shannon’s work originated in from telegraphy.

Entropy. The mystery word that you will hear constantly during anything related to information theory. In fact, if you’ve worked with Machine Learning you must be really familiar with this term called Cross-Entropy!!
Though, we must have all wondered at some point that isn’t entropy used in thermodynamics?

Entropy used in communication is different than that used in statistical mechanics or thermodynamics. The problem that they are dealing with are quite different.

Entropy in Thermodynamics

Rudolf Clausius in 1850’s defined entropy for thermodynamics. Entropy of gas is dependant on its temperature, volume and mass. Entropy is indicator of reversibility i.e when there is no change in Entropy, the process is reversible.

S = K . log (W) Boltzmann’s formula for entropy of gas.

Irreversible physical phenomenon always involve increase of entropy. Interestingly this was originally formulated by Ludwig Boltzmann between 1872 and 1875! Boltzmann’s formula is the most general formula for the thermodynamic entropy. Does this also come into the discussion about theories which are generalised but abstract enough for wide applications? Keep the mathematical structure in mind (The log) — we will come back to this soon

Thus, an increase in entropy means a decrease in our ability to change thermal energy, the energy of heat, into mechanical energy. An increase of entropy means a decrease of available energy.

John R Pierce

Entropy in Statistical Mechanics

While thermodynamics gave us the concept of entropy, it does not give a detailed physical picture of entropy, in terms of positions and velocities of molecules, for instance. Statistical mechanics does give a detailed mechanical meaning to entropy in particular cases.

John R Pierce

Here we come to the understanding that increase in entropy means decrease in order, or increase in uncertainty.

S = -K . \sum_{i} p_{i} .log (p_{i}) where K is Boltzmann’s constant and p_{i} is probability of being occupied.

Entropy in Communication Theory

Communication Theory Terms (we are going to generalise this later on)

  • Message source : A writer, speaker anything that produces messages
  • Message : Data like text document or stuff you want to send to someone.

Amount on information conveyed bu the message increases as amount of uncertainty as to what message actually will be produced become larger. Let’s look at it. You have two friend. One (Person A) constantly keeps sending you meme’s about counter strike and the other one is a normal human being (Person B) who has wide range of topics to talk about. When you hear a beep on your phone and if it’s from Person A there is less uncertainty because you have some idea about the message is going to be so less entropy. Similarly Person B can send you anything and thus you don’t know what the message is going to be about and uncertainty is higher and higher entropy.

Let’s say if it’s person A you already think it’s a CS meme and you open it to find it was a CS meme. It was predictable, you gained a little information from opening it.
On the other hand, opening the message from Person B which wasn’t predictable gives you more information.

You can also look at the hangman example. You will know more about which movie/book it is if you guess the rarer word in the name than something common as “of” or “the”.

Thus, high entropy means more information

H(x) = \sum_{x} P(x) . log(\frac{1}{P(x)})


H(x) = -\sum_{x} P(x) . log(P(x))

Notice now how they are mathematically similar?

Cross Entropy

In Machine Learning Cross Entropy is generally used as a loss function. It is the difference between two probability distributions for a given random variable.

H(P, Q) = - \sum P(x) . log (Q(x))

Cross-Entropy: Average number of total bits to represent an event from Q instead of P.

So when we use Cross Entropy as a loss function

Expected Probability (y): The known probability of each class label for an example in the dataset (P).
Predicted Probability (yhat): The probability of each class label an example predicted by the model (Q).

We can, therefore, estimate the cross-entropy for a single prediction using the cross-entropy calculation described above;

Where each x in X is a class label that could be assigned to the example, and P(x) will be 1 for the known label and 0 for all other labels.

Associating Information with knowledge

Words, we humans constantly play with words. When information is association with knowledge there are some problems. We are desperate to associate this with statistical mechanics! Communication theory has its origins in electrical communication, not in statistical mechanics.


  • Even though some might believe that studying history of scientific discoveries is redundant, discoveries in science have occurred by observing phenomena in man made devices
  • Entropy used in communication was called that because of mathematical analogy and it is quite different to the one defined in statistical mechanics or thermodynamics. In fact, they are trying to solve different problems. While thermodynamic entropy is trying to deal with reversibility of a process its statistical mechanics counter part is dealing with entropy in terms of position and velocities of molecules. Entropy in communication is about information, more entropy, more information
  • Information here is slightly different than your tradition association with knowledge

Leave a Reply

%d bloggers like this: