Errata

Errata for Hands-On Machine Learning with Scikit-Learn and PyTorch

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
	Page Chapter 17. Advanced Transformer Techniques End of Page	The link for reading advanced transformer techniques online is not working, there is no preview available online on Google drive. Note from the Author or Editor: Thanks for your feedback. The online content is being finalized now, it will be up shortly at: https://homl.info/ (the old link https://homl.info/att will just redirect to this address).	Anonymous	Sep 18, 2025	Oct 22, 2025
	Page Chapter 2 and Chapter 12 In the "Popular open data repositories" list and at the end of Chapter 12	Both locations mention PapersWithCode.com, which has now been discontinued and redirects to Hugging Face's trending papers section instead. Note from the Author or Editor: Great catch, thanks. I changed the link to point directly to HF papers.	Anonymous	Oct 10, 2025	Oct 22, 2025
	Page Chapter 19 Equation 19-7. Q-learning using an exploration function	I believe equation 19-7 has a minor formatting issue: a missing open parenthesis in the MathML: <mfenced separators="" open="" close=")">. This appears correctly in the 3rd edition. Note from the Author or Editor: Thanks for your feedback. I cannot see this issue right now, so I suppose it was fixed in the last stages of production.	Dylan	Oct 25, 2025
	Page Figure 2-6 Figure 2-6 image	In Figure 2-6, we can see housing.head() being called, whereas it should state housing_full.head(). This error is correct in the colab notebook, but present on the O'Reilly platform. Note from the Author or Editor: Good catch, thanks! I updated the figure (a screenshot of this cell in the colab notebook).	Amit Lamba	Nov 15, 2025
	Page Chapter 15 equation 15-2	Equation 15-2 missing closing square bracket on the web version. In the app, both brackets are missing (i.e. the log only applies to the first term, which is incorrect) Note from the Author or Editor: Fixed in source files for next reprint	Bartosz	Nov 27, 2025
	Page Discrete Variational Autoencoders 2nd paragraph	Is: F.gumble_softmax() Should be: F.gumbel_softmax() Note from the Author or Editor: Good catch, thanks. Indeed, it should be gumbel_softmax instead of gumble_softmax.	Bartosz	Dec 01, 2025
	Page Don't know the page since I read it online Chapter 12 Deep Computer Vision...	There is a language there saying: "We use the functools.partial() function (introduced in Chapter 11) to define DefaultConv2d..." Well the functools.partial() was actually introduced in Chapter 10, therefore the correct version is: "We use the functools.partial() function (introduced in Chapter 10) to define DefaultConv2d..." Note from the Author or Editor: Good point, thank you, indeed the sentence should point to Chapter 10, not Chapter 11.	Vladimir Orlov	Dec 13, 2025
	Page Don't know the page since I read it online Chapter 12. Deep Computer Vision Using Convolutional Neural Networks	There is a reference there saying: "precision/recall curve may contain a few sections where precision actually goes up when recall increases, especially at low recall values (you can see this at the top right of Figure 3-6)" In fact it should refer to the top left corner of Figure 3-6 so that the correct version is: "precision/recall curve may contain a few sections where precision actually goes up when recall increases, especially at low recall values (you can see this at the top left of Figure 3-6" Note from the Author or Editor: Good catch, thanks. Indeed, this sentence should point to the top left of Figure 3-6, not the top right.	Vladimir Orlov	Dec 18, 2025
	Page Appendix B Common Number Representations Integers	ntegers are often represented using 64 bits, with values ranging from 0 to 2^64 – 1 (about 1.8e19) for unsigned integers, or –2^32 to 2^32 – 1 (about ±4.3e9) for signed integers. It should be –2^63 to 2^63 – 1 for signed integers. Note from the Author or Editor: My apologies for this error, I must have been quite tired when I wrote this! You are absolutely right, the sentence should read: """ Integers are often represented using 64 bits, with values ranging from 0 to 2^64 – 1 (about 1.8e^19) for unsigned integers, or –2^63 to 2^63 – 1 (about ±9.2e^18) for signed integers. """	rongjiang pan	Dec 20, 2025
	Page Chapter 16, The Swin Transformer Left image in Figure 16-6	The label of the image on the left in Figure 16-6 is "S-MSA" which should be "W-MSA". Note from the Author or Editor: Good catch, thank you! Indeed the label of the left image in Figure 16-6 should be "W-MSA", not "S-MSA".	Ali	Dec 24, 2025
Printed	Page P.542 The beginning of line 6 after Equation 14-1	Should the full-form quotation marks in the initial " #“# be corrected? Note from the Author or Editor: Yes, The goal is for this to render like this (the space before ## is intentional): [...] just remove " ##" (as well as spaces before punctuations). Fixed in source files for next reprint.	Joy Chan	Jan 12, 2026
	Page Chapter 18 P.733 line 2	Should "αₜ =1 – \beta_t" be written as "αₜ = 1 − βₜ"? Note from the Author or Editor: Ah good catch, thanks. Indeed, \beta_t should be βₜ.	Joy Chan	Jan 12, 2026
	Page Chapter 19, P.752 Third paragraph, second line	In the sentence "applying a discount factor, _γ (gamma), at each step", should "_γ" be "γ"? Should the underscore be deleted? Note from the Author or Editor: Fixed in source files for next reprint	Joy Chan	Jan 12, 2026
	Page Regression MLPs In general, the mean squared error is the right loss to use for a regression tasks	In "to use for a regression tasks", the "a" should not be there. It should be, "to use for regression tasks". Note from the Author or Editor: Good catch, thanks. I fixed the typo so future reprints will be correct.	Anas	Jan 22, 2026
	Page Chapter 18 Stacked Autoencoders Figure 18-3	I believe there is a typo in Figure 18-3. The label for Hidden layer 3 should read 128 units (instead of 100 units) based on the text above and the code example. Note from the Author or Editor: Good catch, thanks, indeed this is a typo, I just fixed it so future reprints will be correct. My code initially used 100 but I switched it to 128 because modern GPUs are typically more efficient with powers of 2, however I failed to properly update the figure. Sorry about that.	Anonymous	Feb 01, 2026
	Page Chapter 13, "Forecasting Several Time Steps Ahead" section 3rd paragraph (the paragraph just below the code snippet)	The paragraph reads, "... Since each prediction has a shape of [1, 1], we must use unsqueeze() again to add a batch dimension..." Shouldn't it be, "... to add a TIME dimension..." since the predictions' shape of [1, 1] is actually [batch, feature]? In this specific example the result does not change because of the shape of the data but in its current form the sentence gives a false or confusing factual understanding of what's going on. Note from the Author or Editor: Thanks for your feedback, indeed we're adding a time dimension, not a batch dimension.	cuneyt belge	Feb 14, 2026
	Page Chapter 13, "Forecasting Several Time Steps Ahead" section the code snippet just above the 3rd paragraph	I think the line seen in the code, >>> X = torch.cat([X, y_pred_one.unsqueeze(dim=0)], dim=1) should be: >>> X = torch.cat([X, y_pred_one.unsqueeze(dim=1)], dim=1) since the predictions' (y_pred_one) shape of [1, 1] is already [batch, feature]. thus, unsqueezing should be done along time dimension, which is 1. (In this specific example the result does not change because of the shape of the data but in its current form the sentence gives a confusing factual or conceptual understanding of what's going on.) Note from the Author or Editor: Great catch, thanks a lot. Indeed, it should have been `y_pred_one.unsqueeze(dim=1)` instead of `y_pred_one.unsqueeze(dim=0)`. I fixed this in the book and the notebook.	cuneyt belge	Feb 15, 2026
	Page page 415 Question 8 e	"e. Try replacing batch-norm with SELU,.... " I suppose that it should be: "e. Try replacing Swish with SELU, ..." Note from the Author or Editor: Thanks for your feedback, indeed I wasn't clear enough: the goal is to use SELU to allow the network to self-normalize, rather than use batch-norm for this. This requires swapping Swish with SELU, and also getting rid of batch-norm, plus a few more things listed in the chapter (e.g., standardize the inputs using LeCun initialization). See the exercise solution for more details. I updated the book and notebook. The exercise now reads: ---- Try replacing Swish with SELU, and make the necessary adjustments to ensure the network self-normalizes (i.e., standardize the input features, use LeCun normal initialization, make sure the DNN contains only a sequence of dense layers, without batch-norm, etc.). ----	Li Bo	Feb 20, 2026
	Page I am reading it online Is under "Figure 4-13. Polynomial regression model predictions"	When it says "Not bad: the model estimates y^ = 0.56 x 1^ 2 + 0.93 x 1 + 1.78 " it should be: " y^ = 0.51 x 1^ 2 + 1.11 x 1 + 2.01" Note from the Author or Editor: Great catch, thanks! That's a left-over from a previous version. I fixed the book for future releases.	Anonymous	Feb 27, 2026
	Page Chapter 4 - Regularized Linear Models after note 7	There is an "are" missing: "...are not colinear⁠7 and there (are) at least as many samples as parameters." Note from the Author or Editor: Good catch, thanks! Fixed.	Anonymous	Feb 27, 2026
	Page 19 -	the 2 links provided on page 19-20 to submit the errata are wrong on. "We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at homl_dot_info/oreilly-p" & "....if you find errors in the code examples (or just to ask questions), or to submit errata if you find errors in the text. " Note from the Author or Editor: Thanks for your feedback. The link is now fixed.	Sabri Hamad	Jun 25, 2025	Oct 22, 2025
	Page 83 -	equation 2-1 has a formatting problem (missing opening parenthesis) on oreilly's web viewer but not on oreilly's app! Note from the Author or Editor: Thanks for your feedback. The production team is looking into this and have found the cause. It will be fixed soon. Aurélien	Sabri Hamad	Jun 25, 2025	Oct 22, 2025
	Page 302 1st paragraph	"We get a test RMSE of about 0.53, which is comparable to what you would get with a random forest classifier" "random forest classifier" should be replaced by "random forest regressor", because the task is a regression task. Note from the Author or Editor: Good catch, thanks! Indeed, it should be "random forest regressor" instead of "random forest classifier".	Li Bo	Feb 07, 2026
	Page 495 1st paragraph	"...For example, daily bus ridership dropped by about 2,500 in October 2017, which represents about 570 fewer passengers each week, so if we were at the end of October 2017, it would make sense to forecast tomorrow’s ridership by copying the value from last week, minus 570....“ I suppose that there are mistakes with respect to the number 2500 and 570. The daily bus ridership in September 2017 is 729859.8 and that for October 2017 is 727201.5. The drop in October is 729859.8-727201.5=2658.3 (it is not far from 2500 which is okay). On average the drop in each week is 2658.3 times 7 which is 18608. Thus, I think the correct statement should be: "...For example, daily bus ridership dropped by about 2658 in October 2017, which represents about 18608 fewer passengers each week, so if we were at the end of October 2017, it would make sense to forecast next week’s ridership by copying the value from last week, minus 18608....“ Note from the Author or Editor: Thanks for your feedback. I rounded up a bit too much indeed, it's better to write "about 2,568" rather than "about 2,500". However, since that's the number in October, which has 31 days, we can compute the number per week by calculating 2568 / 31 * 7 = 600.2, so I'll write "which represents about 600 fewer passengers each week [...] from last week, minus 600". Thanks again!	Li Bo	Feb 24, 2026
	Page 685 Figure 16-18	"constrastive learning" should be "contrastive learning" Note from the Author or Editor: Good catch, thanks. Fixed for future reprints.	Li Bo	Feb 27, 2026
	Page 727 3rd paragraph	"Moreover, we make the discriminator untrainable by setting p.required_grad = False for each parameter p." p.required_grad should be p.requires_grad. Note from the Author or Editor: Good catch, thanks! Indeed, it should be p.requires_grad, not p.required_grad. I fixed the book, so future reprints will be good.	Li Bo	Feb 24, 2026

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Errata

Errata for Hands-On Machine Learning with Scikit-Learn and PyTorch