ENH Adds support for drop + handle_unknown=ignore in the OneHotEncoder#19041
Conversation
glemaitre
left a comment
There was a problem hiding this comment.
I think that we should update the docstring of the OneHotEncoder as well.
doc/modules/preprocessing.rst
Outdated
|
|
||
| All the categories in `X_test` are unknown during transform and will be mapped | ||
| to all zeros. This means that unknown categories will have the same mapping | ||
| as the dropped category. |
There was a problem hiding this comment.
It might be good to show the inverse_transform here.
|
Do you think it would be a good idea to expose an attribute containing the column with unknown categories? I am wondering if the warning will not be too much annoying. I am thinking that we could have 2 attributes, one containing the column indices and another the unknown categories, when it applies. In this case, we could avoid to warn but you could always check the attributes for sanity check? |
With If the goal is to avoid warnings, we can hope that the documentation is clear enough and remove the warning. |
Yep this is True. Since it would only be rare, we should not warn so much thought. |
There was a problem hiding this comment.
LGTM. I would be +0 for having handle_unknown == "ignore" not warn but handle_unknown == "warn" instead, but we can always do that in a latter PR.
In particular @amueller wasn't a big fan of warnings: #18072 (comment)
Reference Issues/PRs
Fixes #18072
What does this implement/fix? Explain your changes.
Adds support for the suggestions stated in #18072