Skip to content

BUG Fixes cython code for ppc arch#17201

Merged
NicolasHug merged 8 commits intoscikit-learn:masterfrom
thomasjpfan:fix-cython
May 18, 2020
Merged

BUG Fixes cython code for ppc arch#17201
NicolasHug merged 8 commits intoscikit-learn:masterfrom
thomasjpfan:fix-cython

Conversation

@thomasjpfan
Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan commented May 13, 2020

Attempt to fix conda-forge/scikit-learn-feedstock#124 I rather this be const char but there were build errors in the conda-forge build.

The error in the build logs are:

python3.7/site-packages/sklearn/ensemble/_hist_gradient_boosting/grower.py:234: in __init__
    min_samples_leaf, min_gain_to_split, hessians_are_constant)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   ValueError: Buffer dtype mismatch, expected 'const char' but got 'signed char'

CC @NicolasHug

(Will think about how reproduce this locally when I wake up.)

@thomasjpfan
Copy link
Copy Markdown
Member Author

As an aside, it would be kind of nice to get the conda-forge bot to run on the rc so we can fix things before we tag.

@adrinjalali adrinjalali added this to the 0.23.1 milestone May 13, 2020
@adrinjalali
Copy link
Copy Markdown
Member

isn't it only failing on ppc? Not sure how you can reproduce them locally.

@NicolasHug
Copy link
Copy Markdown
Member

Was the error only in test_splitting.py:446?

That's surprising because the grower also passes the same type of array, so all tests should fail.

I'm also just confused by the error message: expected 'const char' but got 'signed char': const and signed aren't mutually exclusive properties

@thomasjpfan
Copy link
Copy Markdown
Member Author

It was everywhere that created a splitter. Here is the link to the one of the tests https://travis-ci.com/github/conda-forge/scikit-learn-feedstock/jobs/333125503

@thomasjpfan
Copy link
Copy Markdown
Member Author

A cleaner error message:

python3.7/site-packages/sklearn/ensemble/_hist_gradient_boosting/grower.py:234: in __init__
    min_samples_leaf, min_gain_to_split, hessians_are_constant)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   ValueError: Buffer dtype mismatch, expected 'const char' but got 'signed char'

@adrinjalali
Copy link
Copy Markdown
Member

adrinjalali commented May 13, 2020

I'm very confused as well, isn't signed char simply auto casted into const signed char?

@NicolasHug
Copy link
Copy Markdown
Member

what's the status on this, do we still need to fix something?

BTW @thomasjpfan there are no "changes" in this PR anymore

@adrinjalali
Copy link
Copy Markdown
Member

I think this still fails on ppc, but we're keeping releasing w/o ppc builds I think. It'd be nice to fix it.

This reverts commit d1cf3b1.
@thomasjpfan
Copy link
Copy Markdown
Member Author

whoops I reverted everything.

@thomasjpfan
Copy link
Copy Markdown
Member Author

I am guessing removing the const (sadly) should fix this. But the only way to test is with conda-forge?

@adrinjalali
Copy link
Copy Markdown
Member

I guess so. You can temporarily push the change to a non release branch on the main repo and have it tested on conda-forge

@thomasjpfan thomasjpfan marked this pull request as draft May 16, 2020 17:09
@thomasjpfan
Copy link
Copy Markdown
Member Author

thomasjpfan commented May 16, 2020

I am actively trying different patches out in conda-forge/scikit-learn-feedstock#125

I will mark this PR to be merged when I find something that works.

(We can go from PR to draft now and back again!)

@NicolasHug
Copy link
Copy Markdown
Member

Did this one not work?

If it works and a quick benchmark shows no performance drop, maybe our best bet is to go with it so @thomasjpfan doesn't waste too much time...

@thomasjpfan
Copy link
Copy Markdown
Member Author

The patch does not work. The monotonic constraints tests are failing https://travis-ci.com/github/conda-forge/scikit-learn-feedstock/jobs/335600845

@NicolasHug
Copy link
Copy Markdown
Member

cpdef enum MonotonicConstraint:
    NO_CST = 0
    POS = 1
    NEG = -1

When I was coding in C (in a previous life) I remember having some edge cases regarding some enums with values that were not 0, 1, 2... Good old memories.

May I suggest to try:

  • stop using the enum and declare global vars: MNT_CST_NONE = 0, MNT_CST_POST = 1, MNT_CST_NEG = -1

  • if it doesn't work, use an enum with values 0, 1, 2 (would need a remapping :/)

  • if it doesn't work, use global vars 0, 1, 2 and declare the array as const unsigned char. I really hope this would work since has_missing_values is a const unsigned char and there is no issue with it

  • if it doesn't work, lie down and cry

@thomasjpfan thomasjpfan marked this pull request as ready for review May 16, 2020 21:04
@thomasjpfan
Copy link
Copy Markdown
Member Author

Fun stuff. Turns out explicitly saying that the char is signed works.

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting!

@adrinjalali
Copy link
Copy Markdown
Member

@NicolasHug creating the 0.23.1 PR now, should we merge this?

@NicolasHug NicolasHug changed the title BUG Fixes cython in splitting BUG Fixes cython code for ppc arch May 18, 2020
@NicolasHug NicolasHug merged commit 43c0efa into scikit-learn:master May 18, 2020
@NicolasHug
Copy link
Copy Markdown
Member

Thanks @thomasjpfan

I guess there's something fishy going on with the compiler used of the ppc CI...

adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request May 18, 2020
* BUG Fixes cython in splitting

* REV Revert

* Revert "REV Revert"

This reverts commit d1cf3b1.

* WIP Uses signed char

* WIP Maybe use signed chars

* WIP const signed chars
adrinjalali pushed a commit that referenced this pull request May 19, 2020
* BUG Fixes cython in splitting

* REV Revert

* Revert "REV Revert"

This reverts commit d1cf3b1.

* WIP Uses signed char

* WIP Maybe use signed chars

* WIP const signed chars
viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020
* BUG Fixes cython in splitting

* REV Revert

* Revert "REV Revert"

This reverts commit d1cf3b1.

* WIP Uses signed char

* WIP Maybe use signed chars

* WIP const signed chars
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants