In this page, you can find notes, errata corrige and additional pieces of information for the book “Machine Learning Algorithms“. Related posts and notes can be found the section: Machine Learning Algorithms Addenda.
The transformation matrix W for the PCA must be transposed:
As explained in the previous chapters, it’s almost always a good practice normalizing the dataset. In this way, it becomes zero-centered and in the linear expression, it’s possible to avoid the use of bias. Otherwise, it’s necessary to rewrite the expression as:
Both w and b are parameters to learn.
The left part of the cross-entropy formula is wrong because its arguments are the two distributions. The right one is:
Addendum: In the “stochastic” gradient descent, the batch size is often set equal to 1. It means that a weight update is performed after every sample is presented. However, there are many papers and books where the attribute “stochastic” is referred to every mini-batch size.
Pages 238 and 268:
The singular value decomposition is intended without the computation of full matrices and therefore it’s limited to the t principal singular values and vectors. The correct formula is: