Errata

Hands-On Machine Learning with Scikit-Learn and PyTorch

Errata for Hands-On Machine Learning with Scikit-Learn and PyTorch

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Page Chapter 17. Advanced Transformer Techniques
End of Page

The link for reading advanced transformer techniques online is not working, there is no preview available online on Google drive.

Note from the Author or Editor:
Thanks for your feedback. The online content is being finalized now, it will be up shortly at: https://homl.info/
(the old link https://homl.info/att will just redirect to this address).

Anonymous  Sep 18, 2025  Oct 22, 2025
Page Chapter 2 and Chapter 12
In the "Popular open data repositories" list and at the end of Chapter 12

Both locations mention PapersWithCode.com, which has now been discontinued and redirects to Hugging Face's trending papers section instead.

Note from the Author or Editor:
Great catch, thanks. I changed the link to point directly to HF papers.

Anonymous  Oct 10, 2025  Oct 22, 2025
Page Chapter 19
Equation 19-7. Q-learning using an exploration function

I believe equation 19-7 has a minor formatting issue: a missing open parenthesis in the MathML: <mfenced separators="" open="" close=")">. This appears correctly in the 3rd edition.

Note from the Author or Editor:
Thanks for your feedback. I cannot see this issue right now, so I suppose it was fixed in the last stages of production.

Dylan  Oct 25, 2025 
Page Figure 2-6
Figure 2-6 image

In Figure 2-6, we can see housing.head() being called, whereas it should state housing_full.head(). This error is correct in the colab notebook, but present on the O'Reilly platform.

Note from the Author or Editor:
Good catch, thanks! I updated the figure (a screenshot of this cell in the colab notebook).

Amit Lamba  Nov 15, 2025 
Page Chapter 15
equation 15-2

Equation 15-2 missing closing square bracket on the web version. In the app, both brackets are missing (i.e. the log only applies to the first term, which is incorrect)

Note from the Author or Editor:
Fixed in source files for next reprint

Bartosz  Nov 27, 2025 
Page Discrete Variational Autoencoders
2nd paragraph

Is: F.gumble_softmax()
Should be: F.gumbel_softmax()

Note from the Author or Editor:
Good catch, thanks. Indeed, it should be gumbel_softmax instead of gumble_softmax.

Bartosz  Dec 01, 2025 
Page Don't know the page since I read it online
Chapter 12 Deep Computer Vision...

There is a language there saying: "We use the functools.partial() function (introduced in Chapter 11) to define DefaultConv2d..."

Well the functools.partial() was actually introduced in Chapter 10, therefore the correct version is: "We use the functools.partial() function (introduced in Chapter 10) to define DefaultConv2d..."

Note from the Author or Editor:
Good point, thank you, indeed the sentence should point to Chapter 10, not Chapter 11.

Vladimir Orlov  Dec 13, 2025 
Page Don't know the page since I read it online
Chapter 12. Deep Computer Vision Using Convolutional Neural Networks

There is a reference there saying: "precision/recall curve may contain a few sections where precision actually goes up when recall increases, especially at low recall values (you can see this at the top right of Figure 3-6)"

In fact it should refer to the top left corner of Figure 3-6 so that the correct version is: "precision/recall curve may contain a few sections where precision actually goes up when recall increases, especially at low recall values (you can see this at the top left of Figure 3-6"

Note from the Author or Editor:
Good catch, thanks. Indeed, this sentence should point to the top left of Figure 3-6, not the top right.

Vladimir Orlov  Dec 18, 2025 
Page Appendix B Common Number Representations
Integers

ntegers are often represented using 64 bits, with values ranging from 0 to 2^64 – 1 (about 1.8e19) for unsigned integers, or –2^32 to 2^32 – 1 (about ±4.3e9) for signed integers.
It should be –2^63 to 2^63 – 1 for signed integers.

Note from the Author or Editor:
My apologies for this error, I must have been quite tired when I wrote this! You are absolutely right, the sentence should read:

"""
Integers are often represented using 64 bits, with values ranging from 0 to 2^64 – 1 (about 1.8e^19) for unsigned integers, or –2^63 to 2^63 – 1 (about ±9.2e^18) for signed integers.
"""

rongjiang pan  Dec 20, 2025 
Page Chapter 16, The Swin Transformer
Left image in Figure 16-6

The label of the image on the left in Figure 16-6 is "S-MSA" which should be "W-MSA".

Note from the Author or Editor:
Good catch, thank you! Indeed the label of the left image in Figure 16-6 should be "W-MSA", not "S-MSA".

Ali  Dec 24, 2025 
Printed
Page P.542
The beginning of line 6 after Equation 14-1

Should the full-form quotation marks in the initial " #“# be corrected?

Note from the Author or Editor:
Yes, The goal is for this to render like this (the space before ## is intentional):
[...] just remove " ##" (as well as spaces before punctuations).

Fixed in source files for next reprint.

Joy Chan  Jan 12, 2026 
Page Chapter 18 P.733
line 2

Should "αₜ =1 – \beta_t" be written as "αₜ = 1 − βₜ"?

Note from the Author or Editor:
Ah good catch, thanks. Indeed, \beta_t should be βₜ.

Joy Chan  Jan 12, 2026 
Page Chapter 19, P.752
Third paragraph, second line

In the sentence "applying a discount factor, _γ (gamma), at each step", should "_γ" be "γ"? Should the underscore be deleted?

Note from the Author or Editor:
Fixed in source files for next reprint

Joy Chan  Jan 12, 2026 
Page Regression MLPs
In general, the mean squared error is the right loss to use for a regression tasks

In "to use for a regression tasks", the "a" should not be there. It should be, "to use for regression tasks".

Note from the Author or Editor:
Good catch, thanks. I fixed the typo so future reprints will be correct.

Anas  Jan 22, 2026 
Page Chapter 18 Stacked Autoencoders
Figure 18-3

I believe there is a typo in Figure 18-3. The label for Hidden layer 3 should read 128 units (instead of 100 units) based on the text above and the code example.

Note from the Author or Editor:
Good catch, thanks, indeed this is a typo, I just fixed it so future reprints will be correct. My code initially used 100 but I switched it to 128 because modern GPUs are typically more efficient with powers of 2, however I failed to properly update the figure. Sorry about that.

Anonymous  Feb 01, 2026 
Page Chapter 13, "Forecasting Several Time Steps Ahead" section
3rd paragraph (the paragraph just below the code snippet)

The paragraph reads,

"... Since each prediction has a shape of [1, 1], we must use unsqueeze() again to add a batch dimension..."

Shouldn't it be,

"... to add a TIME dimension..." since the predictions' shape of [1, 1] is actually [batch, feature]?

In this specific example the result does not change because of the shape of the data but in its current form the sentence gives a false or confusing factual understanding of what's going on.

Note from the Author or Editor:
Thanks for your feedback, indeed we're adding a time dimension, not a batch dimension.

cuneyt belge  Feb 14, 2026 
Page Chapter 13, "Forecasting Several Time Steps Ahead" section
the code snippet just above the 3rd paragraph

I think the line seen in the code,

>>> X = torch.cat([X, y_pred_one.unsqueeze(dim=0)], dim=1)

should be:

>>> X = torch.cat([X, y_pred_one.unsqueeze(dim=1)], dim=1)

since the predictions' (y_pred_one) shape of [1, 1] is already [batch, feature]. thus, unsqueezing should be done along time dimension, which is 1.

(In this specific example the result does not change because of the shape of the data but in its current form the sentence gives a confusing factual or conceptual understanding of what's going on.)

Note from the Author or Editor:
Great catch, thanks a lot. Indeed, it should have been `y_pred_one.unsqueeze(dim=1)` instead of `y_pred_one.unsqueeze(dim=0)`.
I fixed this in the book and the notebook.

cuneyt belge  Feb 15, 2026 
Page page 415
Question 8 e

"e. Try replacing batch-norm with SELU,.... "

I suppose that it should be:
"e. Try replacing Swish with SELU, ..."

Note from the Author or Editor:
Thanks for your feedback, indeed I wasn't clear enough: the goal is to use SELU to allow the network to self-normalize, rather than use batch-norm for this. This requires swapping Swish with SELU, and also getting rid of batch-norm, plus a few more things listed in the chapter (e.g., standardize the inputs using LeCun initialization). See the exercise solution for more details.
I updated the book and notebook. The exercise now reads:
----
Try replacing Swish with SELU, and make the necessary adjustments to ensure the network self-normalizes (i.e., standardize the input features, use LeCun normal initialization, make sure the DNN contains only a sequence of dense layers, without batch-norm, etc.).
----

Li Bo  Feb 20, 2026 
Page I am reading it online
Is under "Figure 4-13. Polynomial regression model predictions"

When it says "Not bad: the model estimates y^ = 0.56 x 1^ 2 + 0.93 x 1 + 1.78 " it should be:
" y^ = 0.51 x 1^ 2 + 1.11 x 1 + 2.01"

Note from the Author or Editor:
Great catch, thanks! That's a left-over from a previous version. I fixed the book for future releases.

Anonymous  Feb 27, 2026 
Page Chapter 4 - Regularized Linear Models
after note 7

There is an "are" missing:
"...are not colinear⁠7 and there (are) at least as many samples as parameters."

Note from the Author or Editor:
Good catch, thanks! Fixed.

Anonymous  Feb 27, 2026 
Page 19
-

the 2 links provided on page 19-20 to submit the errata are wrong on.

"We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at homl_dot_info/oreilly-p"
&
"....if you find errors in the code examples (or just to ask questions), or to submit errata if you find errors in the text. "

Note from the Author or Editor:
Thanks for your feedback. The link is now fixed.

Sabri Hamad  Jun 25, 2025  Oct 22, 2025
Page 83
-

equation 2-1 has a formatting problem (missing opening parenthesis) on oreilly's web viewer but not on oreilly's app!

Note from the Author or Editor:
Thanks for your feedback. The production team is looking into this and have found the cause. It will be fixed soon.
Aurélien

Sabri Hamad  Jun 25, 2025  Oct 22, 2025
Page 302
1st paragraph

"We get a test RMSE of about 0.53, which is comparable to what you would get with a random forest classifier"

"random forest classifier" should be replaced by "random forest regressor", because the task is a regression task.

Note from the Author or Editor:
Good catch, thanks! Indeed, it should be "random forest regressor" instead of "random forest classifier".

Li Bo  Feb 07, 2026 
Page 495
1st paragraph

"...For example, daily bus ridership dropped by about 2,500 in October 2017,
which represents about 570 fewer passengers each week, so if we were at the end of October 2017, it would make sense to forecast tomorrow’s ridership by copying the value from last week, minus 570....“

I suppose that there are mistakes with respect to the number 2500 and 570.

The daily bus ridership in September 2017 is 729859.8 and that for October 2017 is 727201.5. The drop in October is 729859.8-727201.5=2658.3 (it is not far from 2500 which is okay). On average the drop in each week is 2658.3 times 7 which is 18608. Thus, I think the correct statement should be:

"...For example, daily bus ridership dropped by about 2658 in October 2017,
which represents about 18608 fewer passengers each week, so if we were at the end of October 2017, it would make sense to forecast next week’s ridership by copying the value from last week, minus 18608....“

Note from the Author or Editor:
Thanks for your feedback. I rounded up a bit too much indeed, it's better to write "about 2,568" rather than "about 2,500". However, since that's the number in October, which has 31 days, we can compute the number per week by calculating 2568 / 31 * 7 = 600.2, so I'll write "which represents about 600 fewer passengers each week [...] from last week, minus 600".

Thanks again!

Li Bo  Feb 24, 2026 
Page 685
Figure 16-18

"constrastive learning" should be "contrastive learning"

Note from the Author or Editor:
Good catch, thanks. Fixed for future reprints.

Li Bo  Feb 27, 2026 
Page 727
3rd paragraph

"Moreover, we make the discriminator untrainable by setting p.required_grad = False for each parameter p."

p.required_grad should be p.requires_grad.

Note from the Author or Editor:
Good catch, thanks! Indeed, it should be p.requires_grad, not p.required_grad. I fixed the book, so future reprints will be good.

Li Bo  Feb 24, 2026