Skip to content

FIX Work around likely SIMD issue in tree export on 32bit OS#29327

Merged
jeremiedbb merged 6 commits into
scikit-learn:mainfrom
lesteve:export-tree-32bit
Jun 21, 2024
Merged

FIX Work around likely SIMD issue in tree export on 32bit OS#29327
jeremiedbb merged 6 commits into
scikit-learn:mainfrom
lesteve:export-tree-32bit

Conversation

@lesteve

@lesteve lesteve commented Jun 21, 2024

Copy link
Copy Markdown
Member

Fix #27506.

I have put @mr-c,@gspr and @sergiopasra as co-authors, thanks for your inputs on this tricky issue 🙏!

I tested the fix in the Docker image provided by @mr-c in #27506 (comment) and it works

Note for Distro package managers the patch may not work directly because _IS_32BIT was moved from sklearn.utils to sklearn.utils.fixes . Honestly I am pretty sure that this code is not performance critical and that just using -1.0 * tree.impurity even on 64bit OS is a completely fine patch. After all this code was not SIMDed before using numpy 1.26 and nobody ever complained that it was slow ...

The main reason, I used _IS_32BIT is that it makes it more explicit and I hope that one day it can be cleaned up 🤞

lesteve and others added 2 commits June 21, 2024 06:31
Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com>
Co-authored-by: SGard Spreemann <gspr@nonempty.org>
Co-authored-by: Sergio Pascual <sergiopr@fis.ucm.es>
@github-actions

github-actions Bot commented Jun 21, 2024

Copy link
Copy Markdown

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d1c65fd. Link to the linter CI: here

@lesteve lesteve changed the title Export tree 32bit FIX Work-around likely SIMD issue on 32bit OS Jun 21, 2024
@lesteve lesteve changed the title FIX Work-around likely SIMD issue on 32bit OS FIX Work around likely SIMD issue on 32bit OS Jun 21, 2024
@lesteve lesteve changed the title FIX Work around likely SIMD issue on 32bit OS FIX Work around likely SIMD issue in tree export on 32bit OS Jun 21, 2024
@lesteve

lesteve commented Jun 21, 2024

Copy link
Copy Markdown
Member Author

The CI is 🚨 but I will merge main when #29328 is auto-merged and that should be all 💚

@lesteve lesteve added the Quick Review For PRs that are quick to review label Jun 21, 2024
@lesteve

lesteve commented Jun 21, 2024

Copy link
Copy Markdown
Member Author

@mr-c @gspr, just curious, would it make the Linux distribution packaging easier (my understanding is that you are involved in it somehow) if this fix was included in the soonish to be released bug-fix release scikit-learn 1.5.1? Maybe it would not make a big difference anyway, because you are going to package an older scikit-learn release like 1.3 or 1.4?

@mr-c

mr-c commented Jun 21, 2024

Copy link
Copy Markdown
Contributor

@mr-c @gspr, just curious, would it make the Linux distribution packaging easier (my understanding is that you are involved in it somehow) if this fix was included in the soonish to be released bug-fix release scikit-learn 1.5.1? Maybe it would not make a big difference anyway, because you are going to package an older scikit-learn release like 1.3 or 1.4?

Debian is already carrying a version of this patch on top of version 1.4.2 in our development distribution and that will soon migrate to our testing distribution

https://tracker.debian.org/pkg/scikit-learn

We are carrying other patches as well, perhaps some of them could also be upstreamed

https://salsa.debian.org/science-team/scikit-learn/-/tree/master/debian/patches?ref_type=heads

(I'm not one of the primary maintainers of scikit-learn in Debian, but I'm on the team and recently merged some fixes)

We have not yet packaged scikit-learn 1.5.x (possibly due to a bug in how we check for new versions, I will look into that now)

So, no rush needed for Debian. Thank you for asking!

@gspr

gspr commented Jun 21, 2024

Copy link
Copy Markdown
Contributor

@mr-c @gspr, just curious, would it make the Linux distribution packaging easier (my understanding is that you are involved in it somehow) if this fix was included in the soonish to be released bug-fix release scikit-learn 1.5.1? Maybe it would not make a big difference anyway, because you are going to package an older scikit-learn release like 1.3 or 1.4?

To add to what @mr-c said: Thank you for asking! This kind of thought and consideration given to distributions really gives me confidence in the scikit-learn developers ❤️

@mr-c

mr-c commented Jun 21, 2024

Copy link
Copy Markdown
Contributor

possibly due to a bug in how we check for new versions, I will look into that now)

Ah, the source distribution now has an underscore instead of a hyphen: https://pypi.org/project/scikit-learn/1.5.0/#files vs https://pypi.org/project/scikit-learn/1.4.2/#files

I have fixed our regex to account for that

@lesteve

lesteve commented Jun 21, 2024

Copy link
Copy Markdown
Member Author

Ah, the source distribution now has an underscore instead of a hyphen: pypi.org/project/scikit-learn/1.5.0/#files vs pypi.org/project/scikit-learn/1.4.2/#files

Oh well, probably a side-effect of moving to Meson (and away from setuptools). This was noticed in #28757 (comment) actually, but would I have believed that someone out there was relying on this? Certainly not 😅 ...

@jeremiedbb jeremiedbb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a changelog entry. I would put it for 1.5.1

Comment thread sklearn/tree/_export.py Outdated
Comment on lines +267 to +268
minus_impurity = -1.0 * tree.impurity if _IS_32BIT else -tree.impurity
self.colors["bounds"] = (np.min(minus_impurity), np.max(minus_impurity))

@jeremiedbb jeremiedbb Jun 21, 2024

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would the following work ? It avoids 1 unnecessary array multiplication and it avoids to specialize the behavior for 32bit OS.

Suggested change
minus_impurity = -1.0 * tree.impurity if _IS_32BIT else -tree.impurity
self.colors["bounds"] = (np.min(minus_impurity), np.max(minus_impurity))
# equivalent to (np.min(-tree.impurity), np.max(-tree.impurity))
self.colors["bounds"] = (-np.max(tree.impurity), -np.min(tree.impurity))

@lesteve lesteve Jun 21, 2024

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my original thought in #27506 (comment) 😉.

I find it slightly too clever and I kind of doubt this matters that much in practice since this is not in a performance-critical part of the code (visualization/debugging)

Having said that, the min(-arr) == -max(arr) trick is now the official Debian patch for scikit-learn 1.4 apparently: https://salsa.debian.org/science-team/scikit-learn/-/blob/master/debian/patches/i386-impurity-nan.patch?ref_type=heads 😅

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my original thought [...] I find it too clever

😄

@jeremiedbb jeremiedbb Jun 21, 2024

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a matter of taste indeed. I find it more intuitive.
One argument: the workaround should be removed when the bug is fixed upstream, while the suggestion can be let as is for ever. I updated the suggestion to add a comment to make it less surprising

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremiedbb OK you won me over by talking to my ex-physicist side 😉, I have added a comment to kind of indicate there is a reason why we write it like this.

I have also added an entry in the 1.5.1 changelog.

@thomasjpfan thomasjpfan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeremiedbb jeremiedbb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeremiedbb jeremiedbb merged commit a67ebbe into scikit-learn:main Jun 21, 2024
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 2, 2024
…learn#29327)

Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com>
Co-authored-by: SGard Spreemann <gspr@nonempty.org>
Co-authored-by: Sergio Pascual <sergiopr@fis.ucm.es>
@jeremiedbb jeremiedbb mentioned this pull request Jul 2, 2024
11 tasks
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 2, 2024
…learn#29327)

Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com>
Co-authored-by: SGard Spreemann <gspr@nonempty.org>
Co-authored-by: Sergio Pascual <sergiopr@fis.ucm.es>
jeremiedbb pushed a commit that referenced this pull request Jul 2, 2024
Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com>
Co-authored-by: SGard Spreemann <gspr@nonempty.org>
Co-authored-by: Sergio Pascual <sergiopr@fis.ucm.es>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:tree Quick Review For PRs that are quick to review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test failure in i686 with version 1.3.1

5 participants