[MRG] Parameter for stacking missing indicator into imputer#12583
Conversation
|
Sweet. Can you add an entry to the 0.21 whatsnew? Thanks! |
jeremiedbb
left a comment
There was a problem hiding this comment.
This is a nice feature ! Thanks. A few comments below.
|
Hum I found a case when it will fail. When one column is full of missing value, the SimpleImputer will drop that column (except for "constant" strategy) and the MissingIndicator will have a feature mismatch at transform time. First, you can add it to the test, i.e. add one column full of missing values, to make the test cover more cases. Then, to fix this, one possibility is to not fit the MissingIndicator in fit but fit_transform it instead in transform. The MissingIndicator does not hold any information that the SimpleImputer already have, so it does not even need to be an attribute of the SimpleImputer. @amuller @jnothman what do you think ? |
|
@jeremiedbb, good catch with fully missing column! I adjusted code and added tests. What do you think about it now? |
|
I think I don't like the current solution since it adds a constant 1 column which is not useful imho. |
I agree. In the case of a column full of missing values, we should also drop the column full of 1 from the output of missingIndicator. |
|
@amueller and @jeremiedbb, I see your point and agree. But the current solution does the same if you do it like this: In theory, if we drop the column full of 1, people who now use |
|
That's a good point. We need to decide to either:
I'm somewhat leaning towards 3, though it requires a deprecation cycle. |
|
Seems, auto drop for a constant column not always good choice. Here's a small sample: For I would vote for the # 2, because, to be honest, it's hard to imagine, that users will try to work with a dataset, that contains whole column of empty values. It would be unuseful even with "constant" strategy for SimpleImputer. |
|
@DanilBaibak I think Joel meant in another PR, once this one is done |
|
Ok! Just added it in hot pursuit 😄 |
|
Thanks @banilo!! |
|
Glad to help 😊 |
…ikit-learn#12583)" This reverts commit 5f94df8.
…ikit-learn#12583)" This reverts commit 5f94df8.
Fixes #11886
A new parameter
add_indicatorwas added to SimpleImputer that allow simply stacking a MissingIndicator transform into the output of the imputer's transform.