Skip to content

[MRG] add drop_first option to OneHotEncoder#12884

Closed
NicolasHug wants to merge 7 commits intoscikit-learn:masterfrom
NicolasHug:drop_first_one_hot_encoding
Closed

[MRG] add drop_first option to OneHotEncoder#12884
NicolasHug wants to merge 7 commits intoscikit-learn:masterfrom
NicolasHug:drop_first_one_hot_encoding

Conversation

@NicolasHug
Copy link
Copy Markdown
Member

@NicolasHug NicolasHug commented Dec 28, 2018

Reference Issues/PRs

Closes #6488

What does this implement/fix? Explain your changes.

This PR adds a drop_first option to OneHotEncoder.
Each feature is encoded into n_unique_values - 1 columns instead of n_unique_values columns. The first one is dropped, resulting in all of the others being zero.

Any other comments?

This is incompatible with handle_missing='ignore' because the ignored missing categories result in all of the one-hot columns being zeros, which is also how the first category is treated when drop_first=True. So by allowing both, there would be no way to distinguish between a missing category and the first one.

@NicolasHug NicolasHug changed the title [WIP] add drop_first option to OneHotEncoder [MRG] add drop_first option to OneHotEncoder Dec 29, 2018
@NicolasHug
Copy link
Copy Markdown
Member Author

Note to reviewers: #12908 is more general so maybe review this one instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OneHotEncoder - add option for 1 of k-1 encoding

1 participant