Skip to content

[MRG] Fix MissingIndicator explicit zeros & output shape#13562

Merged
jnothman merged 5 commits intoscikit-learn:masterfrom
jeremiedbb:fix-missing-indicator-explicit-zeros
Apr 6, 2019
Merged

[MRG] Fix MissingIndicator explicit zeros & output shape#13562
jnothman merged 5 commits intoscikit-learn:masterfrom
jeremiedbb:fix-missing-indicator-explicit-zeros

Conversation

@jeremiedbb
Copy link
Copy Markdown
Member

Fixes 2 bugs in MissingIndicator

  • when X is sparse all non-zero non missing values would become explicit False in the transformed mask.

  • when the are no missing values at all and features='missing-only', the transformed mask would contain all features instead of none.

imputer_mask = sparse_constructor(
(mask, X.indices.copy(), X.indptr.copy()),
shape=X.shape, dtype=bool)
imputer_mask.eliminate_zeros()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is correct and it's already an improvement, but we're still building this potentially huge imputer_mask matrix (with a lot of explicit zeros).

Do you think it'd be possible to directly build it correctly?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably you'd just use imputer_mask = sparse_constructor((mask[mask], X.indices[mask], X.indptr[mask]), shape=...) or similar

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not so simple. mask[mask] and indices[mask] are correct but X.indpr needs to be updating in a non trivial way (essentially what eliminate_zeros does).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Since the intermediate imputer_mask can only be as big as X, maybe it's not that much of an issue after all.

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise lgtm

imputer_mask = sparse_constructor(
(mask, X.indices.copy(), X.indptr.copy()),
shape=X.shape, dtype=bool)
imputer_mask.eliminate_zeros()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably you'd just use imputer_mask = sparse_constructor((mask[mask], X.indices[mask], X.indptr[mask]), shape=...) or similar

Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still has a typo but LGTM

Co-Authored-By: jeremiedbb <34657725+jeremiedbb@users.noreply.github.com>
@jnothman jnothman merged commit bcf4f80 into scikit-learn:master Apr 6, 2019
@jnothman
Copy link
Copy Markdown
Member

jnothman commented Apr 6, 2019

Thanks @jeremiedbb

jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants