• Blog
  • Podcasts
  • Books
  • Resume / CV
  • Bonaccorso’s Law
  • Essays
  • Contact
  • Testimonials
  • Disclaimer

Giuseppe Bonaccorso

Artificial Intelligence – Machine Learning – Data Science

  • Blog
  • Podcasts
  • Books
  • Resume / CV
  • Bonaccorso’s Law
  • Essays
  • Contact
  • Testimonials
  • Disclaimer

ML Algorithms addendum: Mutual information in classification tasks

08/18/2017 Machine Learning Machine Learning Algorithms Addenda Python Scikit-Learn No Comments

Many classification algorithms, both in machine and in deep learning, adopt the cross-entropy as cost function. This is a brief explanation why minimizing the cross-entropy allows to increase the mutual information between training and learned distributions.

If we call p the training set probability distribution and q, the corresponding learned one, the cross-entropy is:

By manipulating this expression, we get:

Therefore, the cross-entropy is equal to the sum of H(p), which is the entropy of the training distribution (that we can’t control) and the Kullback-Leibler divergence of the learned distribution from the training one. As the first term is a constant, minimizing the cross-entropy is equivalent to minimizing the Kullback-Leibler divergence.

We know that:

Therefore, the training process will “remodel” q(x) in order to minimize its divervenge from p(x). In the following figure, there’s a schematic representation of this process before the initial iteration, at iteraion n and at the end of the training process:

If we consider, the mutual information between p(x) and q(x), we obtain:

The mutual information is the amount of information shared by both distributions and it is expressed as the entropy of the training distribution minus the “unresolved” uncertainty that q provides when chosen instead of p. In other words, if we have modeled q perfectly, it means that q = p, therefore H(q|q) = 0 and I(p;q) = H(p) (that represent the maximum amount of information that we can learn). Therefore, when we minimize the cross-entropy, we implicitly minimize the conditional entropy H(p|q), obtaining a maximization of the mutual information.

A good introductory book on Information Theory:

See also:

ML Algorithms Addendum: Fisher Information – Giuseppe Bonaccorso

Fisher Information, named after the statistician Ronald Fisher, is a very powerful tool in both Bayesian Statistics and Machine Learning. To understand its “philosophical” meaning, it’s useful to think about a simple classification task, where our task is to find a model (characterized by a set of parameters) that is able to reproduce the data generating process p(x) (normally represented as a data set) with the highest possible accuracy.

Share:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Pocket (Opens in new window)
  • Click to share on Tumblr (Opens in new window)
  • Click to share on Reddit (Opens in new window)
  • Click to share on Pinterest (Opens in new window)
  • Click to share on Skype (Opens in new window)
  • Click to share on WhatsApp (Opens in new window)
  • Click to share on Telegram (Opens in new window)
  • Click to email this to a friend (Opens in new window)
  • Click to print (Opens in new window)

You can also be interested in these articles:

bookinformation theorymachine learning

Twitter Sentiment Analysis with Gensim Word2Vec and Keras Convolutional Networks

Hodgkin-Huxley spiking neuron model in Python

Leave a Reply Cancel reply

Follow Me

  • linkedin
  • twitter
  • facebook
  • googlescholar
  • youtube
  • github
  • amazon
  • medium

Search articles

Latest blog posts

  • Mastering Machine Learning Algorithms Second Edition 02/03/2020
  • EphMrA 2019 Switzerland one day meeting 08/30/2019
  • Machine Learning Algorithms – Second Edition 08/28/2018
  • Recommendations and User-Profiling from Implicit Feedbacks 07/10/2018
  • Are recommendations really helpful? A brief non-technical discussion 06/29/2018

Subscribe to this blog

Join 2,199 other subscribers

Follow me on Twitter

My Tweets
Copyright © Giuseppe Bonaccorso. All Rights Reserved
Proudly powered by WordPress | Theme: Doo by ThemeVS.
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.