Machine Learning Algorithms: Errata Corrige and Additional Notes

In this page, you can find notes, errata corrige and additional pieces of information for the book "Machine Learning Algorithms".


Page 59:

The transformation matrix W for the PCA must be transposed:


Page 97:

As explained in the previous chapters, it’s almost always a good practice normalizing the dataset. In this way, it becomes zero-centered and in the linear expression, it’s possible to avoid the use of bias. Otherwise, it’s necessary to rewrite the expression as:

Both w and b are parameters to learn.


Page 100:

The left part of the cross-entropy formula is wrong because its arguments are the two distributions. The right one is:


Page 103:

Addendum: In the “stochastic” gradient descent, the batch size is often set equal to 1. It means that a weight update is performed after every sample is presented. However, there are many papers and books where the attribute “stochastic” is referred to every mini-batch size.


Pages 238 and 268:

The singular value decomposition is intended without the computation of full matrices and therefore it’s limited to the t principal singular values and vectors. The correct formula is:


