ML Algorithms Addendum: Fisher Information

Fisher Information, named after the statistician Ronald Fisher, is a very powerful tool in both Bayesian Statistics and Machine Learning. To understand its “philosophical” meaning, it’s useful to think about a simple classification task, where our task is to find a model (characterized by a set of parameters) that is able to reproduce the data generating process p(x) (normally represented as a data set) with the highest possible accuracy. This condition can be expressed using the Kullback-Leibler divergence: Finding the best parameters that minimize the Kullback-Leibler divergence is equivalent to maximizing the log-likelihood log P(x|M) because p(x) doesn’t contain any reference to M. Now, a good (informal) question is: how much information p(x) can provide to simplify the task to the model? In other words, given a log-likelihood, which is its behavior next to the MLE point? Considering the following two plots, we can observe two different situations. The first […]