-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Have handle_unknown="ignore" by default in OneHotEncoder #19286
Copy link
Copy link
Closed
Description
I would propose to make handle_unknown="ignore" the default in OneHotEncoder.
That's what one would want in most cases in practice, I believe. Real datasets often have infrequent categories and so depending on the train/test split the test set is likely to have some infrequent categories. Also in production systems, better to have unknown categories ignored than have the system crashing because of it.
This might be blocked due to a suboptimal interaction with the drop option #18072, and I'm not sure how this would interact with a few other proposed improvements to OHE lately.
Reactions are currently unavailable