Skip to content

[MRG] Make drop_idx_ a masked array in OneHotEncoder#16554

Closed
cmarmo wants to merge 4 commits intoscikit-learn:masterfrom
cmarmo:dropbinary
Closed

[MRG] Make drop_idx_ a masked array in OneHotEncoder#16554
cmarmo wants to merge 4 commits intoscikit-learn:masterfrom
cmarmo:dropbinary

Conversation

@cmarmo
Copy link
Copy Markdown
Contributor

@cmarmo cmarmo commented Feb 26, 2020

Reference Issues/PRs

Fixes #16552.

What does this implement/fix? Explain your changes.

Make drop_idx_ a numpy masked array in order to manage column selection.

@cmarmo
Copy link
Copy Markdown
Contributor Author

cmarmo commented Feb 26, 2020

Sorry... I forgot to finish the tests... :(

@cmarmo cmarmo changed the title [MRG] Make drop_idx_ a masked array in OneHotEncoder [WIP] Make drop_idx_ a masked array in OneHotEncoder Feb 26, 2020
@cmarmo cmarmo changed the title [WIP] Make drop_idx_ a masked array in OneHotEncoder [MRG] Make drop_idx_ a masked array in OneHotEncoder Feb 26, 2020
@glemaitre
Copy link
Copy Markdown
Member

I am not sure that we should use a mask array. If it was only some internal issues without exposing the resulting array, I think that it will be OK because we already use this mask array in the SimpleImputer. The issue here is that we exposed this attribute publicly and I am scared that some of our users do not know about NumPy masked array.

I would think this is more friendly to have a list or a NumPy array with object dtype where we model with None or the index to be dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OneHotEncoder drop 'if_binary' drop one column from all categorical variables

2 participants