ML Algorithms addendum: Mutual information in classification tasks

Many classification algorithms, both in machine and in deep learning, adopt the cross-entropy as cost function. This is a brief explanation why minimizing the cross-entropy allows to increase the mutual information between training and learned distributions. If we call p the training set probability distribution and q, the corresponding learned…