Skip to content

Test decision tree pickle for different endianness#21539

Merged
rth merged 4 commits intoscikit-learn:mainfrom
lesteve:cross-architecture-tree-pickle
Nov 4, 2021
Merged

Test decision tree pickle for different endianness#21539
rth merged 4 commits intoscikit-learn:mainfrom
lesteve:cross-architecture-tree-pickle

Conversation

@lesteve
Copy link
Copy Markdown
Member

@lesteve lesteve commented Nov 3, 2021

Reference Issues/PRs

Close #21359

What does this implement/fix? Explain your changes.

This adds a test that checks that pickles containing big endian arrays can be loaded on a little-endian machine.

The way the pickles are created is to change the Pickle dispatch_table or to reimplement NumpyPickle.save for joblib pickles. This creates pickles with numpy arrays in big endian, which does not match exactly a pickle created on a big endian machine. The main advantage IMO is that we have some test for #17644 ...

I thought it was simpler that using qemu inside a docker image (look at #21237 (comment) for more details) as mentioned in #19602 (comment).

Doing something similar for more (or even all estimators) as mentioned in #19602 (comment) is left for a future PR.

@lesteve lesteve changed the title Cross architecture tree pickle Test decision tree pickle for different endianness Nov 3, 2021
Copy link
Copy Markdown
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

And +1 for a similar test (+fix) for #19602 in a follow-up PR to support pickling of tree structures that have system specific precision levels in their integer fields.

@lesteve
Copy link
Copy Markdown
Member Author

lesteve commented Nov 4, 2021

And +1 for a similar test (+fix) for #19602 in a follow-up PR to support pickling of tree structures that have system specific precision levels in their integer fields.

Yep I was working on it in parallel since this touches the same code, I opened a draft PR with a similar approach: #21552. This is better to merge this one first.

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks a lot @lesteve !

Maybe this should be put somewhere in common tests? Or at least in some folder where it would make sense to run it parametrized for a few estimators (I'm thinking also KDTree/BallTree #21553 ). Otherwise LGTM.

@lesteve
Copy link
Copy Markdown
Member Author

lesteve commented Nov 4, 2021

Maybe this should be put somewhere in common tests? Or at least in some folder where it would make sense to run it parametrized for a few estimators (I'm thinking also KDTree/BallTree #21553 ). Otherwise LGTM.

I have this in mind eventually but for now my current strategy is:

@rth
Copy link
Copy Markdown
Member

rth commented Nov 4, 2021

OK, fair enough. Thanks!

@rth rth merged commit 46485a9 into scikit-learn:main Nov 4, 2021
@lesteve lesteve deleted the cross-architecture-tree-pickle branch November 4, 2021 13:49
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Nov 29, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove code introduced by #17644

3 participants