ENH Add check for non binary variables in OneHotEncoder.#16585
ENH Add check for non binary variables in OneHotEncoder.#16585rth merged 12 commits intoscikit-learn:masterfrom
Conversation
|
Thanks for the PR! I haven't entirely followed the original issue so I might be missing something. Generally it would be good to have a regression test, i.e. a test that shows that now the generated feature names are as expected with your fix. Cheers, |
|
@amueller @jnothman np.array([None, None, 10, 1], dtype=object)instead of np.array([-1, -1, 10, 10], dtype=np.int_)The reason being that So if we should agree with any solution and then we can go ahead. ping @ogrisel @jeremiedbb @NicolasHug @thomasjpfan as well. |
Indeed, see also #16593 (that I would rather label as a New Feature, but maybe I'm missing something...). If we want to use indexes in |
|
As I would like to move forward in #15706, I have implemented |
|
I'm ok with this change but since it will change the result of a public attribute, shouldn't it pass through a deprecation cycle ? unless we consider it a bug :) |
|
The convention of setting |
|
@rth @thomasjpfan this was discussed during #16245 right? Could you please remind us the pros and cons of each approach? |
The sentinel value itself shouldn't matter, I think, other than for readability. -1 allows to keep the dtype as int. I also have a slight preference for |
|
Hi @rth, hope you don't mind I've just tried the "Request for reviewers" button ... :) |
Co-Authored-By: Thomas J Fan <thomasjpfan@gmail.com>
|
Thanks @thomasjpfan ! |
|
In general I think arrays of mixed type are a strange beast. They're not
especially helpful as arrays, and should rather be a list. I don't really
see the problem in using -1 as a sentinel, as long as it's well tested. But
the consensus seems to be towards another option.
|
I can't really judge how many users will think |
|
@jnothman @thomasjpfan , I will not try to persuade you , just want to add a clarification. |
|
Shrug. I'm fine either way. I agree usability is likely the greater concern
here than efficiency or succinct representation. In numerical computing it
is not at all uncommon when working with positive ints to give negative
numbers special meanings. For instance, see how leaves are indicated in
decision trees. But it can indeed add confusion.
|
rth
left a comment
There was a problem hiding this comment.
hope you don't mind I've just tried the "Request for reviewers" button
Don't hesitate to use that button on reviewers @cmarmo :) The code LGTM.
In numerical computing it
is not at all uncommon when working with positive ints to give negative
numbers special meanings. For instance, see how leaves are indicated in
decision trees.
Indeed, but I guess the issue here is that array index is not necessarily a positive integer, and that we are introducing a different meaning from what negative index commonly means in python. Another possibility for the sentinel could have been np.iinfo(np.int32).max () == 2147483647. Anyway any of these would likely be OK.
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
|
@rth, @thomasjpfan , after discussion with @glemaitre I've finally understood that |
glemaitre
left a comment
There was a problem hiding this comment.
LGTM apart of this small issue
|
Oh and we will need an entry in whats new: Please add an entry to the change log at doc/whats_new/v0.23.rst under bug fixes. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user: |
|
No need for the what's new the bug was only introduced in dev. |
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
|
Someone available for merging? :) Thanks a lot! |
…n#16585) Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…n#16585) Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
Fixes #16552
Closes #16554 (as superseded)
What does this implement/fix? Explain your changes.
Adds a check on the value of
drop_idx_elements:Noneis the value for no dropping.