[MRG] Fix MissingIndicator explicit zeros & output shape#13562
[MRG] Fix MissingIndicator explicit zeros & output shape#13562jnothman merged 5 commits intoscikit-learn:masterfrom
Conversation
| imputer_mask = sparse_constructor( | ||
| (mask, X.indices.copy(), X.indptr.copy()), | ||
| shape=X.shape, dtype=bool) | ||
| imputer_mask.eliminate_zeros() |
There was a problem hiding this comment.
The fix is correct and it's already an improvement, but we're still building this potentially huge imputer_mask matrix (with a lot of explicit zeros).
Do you think it'd be possible to directly build it correctly?
There was a problem hiding this comment.
presumably you'd just use imputer_mask = sparse_constructor((mask[mask], X.indices[mask], X.indptr[mask]), shape=...) or similar
There was a problem hiding this comment.
That's not so simple. mask[mask] and indices[mask] are correct but X.indpr needs to be updating in a non trivial way (essentially what eliminate_zeros does).
There was a problem hiding this comment.
Ok. Since the intermediate imputer_mask can only be as big as X, maybe it's not that much of an issue after all.
| imputer_mask = sparse_constructor( | ||
| (mask, X.indices.copy(), X.indptr.copy()), | ||
| shape=X.shape, dtype=bool) | ||
| imputer_mask.eliminate_zeros() |
There was a problem hiding this comment.
presumably you'd just use imputer_mask = sparse_constructor((mask[mask], X.indices[mask], X.indptr[mask]), shape=...) or similar
|
Thanks @jeremiedbb |
…arn#13562)" This reverts commit ed50381.
…arn#13562)" This reverts commit ed50381.
Fixes 2 bugs in
MissingIndicatorwhen X is sparse all non-zero non missing values would become explicit False in the transformed mask.
when the are no missing values at all and
features='missing-only', the transformed mask would contain all features instead of none.