Skip to content

[MRG+2] Moves Imputation out of Preprocessing#10483

Merged
jnothman merged 23 commits intoscikit-learn:masterfrom
thechargedneutron:imputer
Feb 14, 2018
Merged

[MRG+2] Moves Imputation out of Preprocessing#10483
jnothman merged 23 commits intoscikit-learn:masterfrom
thechargedneutron:imputer

Conversation

@thechargedneutron
Copy link
Copy Markdown
Contributor

@thechargedneutron thechargedneutron commented Jan 16, 2018

Reference Issues/PRs

Fixes #9726

What does this implement/fix? Explain your changes.

Changes done as per suggestions

Tasks

  • copy sklearn/preprocessing/imputation.py to sklearn/impute.py, as well as the corresponding tests.
  • deprecate Imputer in sklearn/preprocessing/imputation.py to be removed in v0.22.
  • create a sklearn.impute section in doc/modules/classes.rst
  • update the deprecated section at the bottom of doc/modules/classes.rst
  • update sklearn/__init__.py's __all__
  • move imputation documentation from doc/modules/preprocessing.rst to doc/modules/impute.rst
  • we might also want to rename Imputer in the new module to SimpleImputer, DummyImputer or something
  • update any references to Imputer (in sklearn/, doc/ or examples/) to refer to the new location.

@thechargedneutron thechargedneutron changed the title [WIP] Moves Imputation out of Preprocessing [MRG] Moves Imputation out of Preprocessing Jan 16, 2018
@jnothman
Copy link
Copy Markdown
Member

jnothman commented Jan 16, 2018 via email

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add impute.rst to the user guide table of contents

Comment thread doc/glossary.rst Outdated
missing values
Most Scikit-learn estimators do not work with missing values. When they
do (e.g. in :class:`preprocessing.Imputer`), NaN is the preferred
do (e.g. in :class:`Imputer`), NaN is the preferred
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be impute.Imputer

Comment thread sklearn/preprocessing/imputation.py Outdated


@deprecated("Imputer was deprecated in version 0.20 and will be "
"removed in 0.22. Import Imputer from sklearn instead.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sklearn.impute

Comment thread doc/modules/classes.rst
isotonic.check_increasing
isotonic.isotonic_regression

.. _impute_ref:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this referenced anywhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with reST, I am not sure why there's _ref here. This was with other modules here so I used this :( .

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need this only if it is referenced somewhere. But I guess if it is not referenced now, it can be in the futur so it can stay here

Comment thread doc/modules/impute.rst
@@ -0,0 +1,54 @@

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to remove this content from the preprocessing docs. You're welcome to leave a "see also" reference to these docs in preprocessing if you think that will help users.

Comment thread sklearn/preprocessing/imputation.py Outdated

@deprecated("Imputer was deprecated in version 0.20 and will be "
"removed in 0.22. Import Imputer from sklearn instead.")
"removed in 0.22. Import import.Imputer from sklearn"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be impute.Imputer

@jnothman
Copy link
Copy Markdown
Member

Will you do the SimpleImputer rename?

@thechargedneutron
Copy link
Copy Markdown
Contributor Author

Will you do the SimpleImputer rename?

Yeah, Sure.

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

please use @ignore_warnings on all tests in sklearn.preprocessing.tests.test_imputation.

Please don't change test_cross_validation.py and test_grid_search.py. They're about to be removed and you only make merge conflicts.

Comment thread sklearn/preprocessing/imputation.py Outdated


@deprecated("Imputer was deprecated in version 0.20 and will be "
"removed in 0.22. Import impute.Imputer from sklearn"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now SimpleImputer

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread sklearn/impute.py
@@ -0,0 +1,375 @@
# Authors: Nicolas Tresegnie <nicolas.tresegnie@gmail.com>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short docstring: "Transformers for missing value imputation"

@jnothman jnothman changed the title [MRG] Moves Imputation out of Preprocessing [MRG+1] Moves Imputation out of Preprocessing Jan 17, 2018
Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't import ignore_warnings

Thanks for the efficient work

@thechargedneutron
Copy link
Copy Markdown
Contributor Author

@jnothman Thanks for the quick review. Just out of curiosity, have you moved to a new time zone? ;) It looks so from your review time.

@jnothman
Copy link
Copy Markdown
Member

No, I haven't. I'm just sneaking in an early morning review before disappearing for a few days. Hopeful we can merge this and get on with business.

Please add an API changes entry to what's new

@thechargedneutron
Copy link
Copy Markdown
Contributor Author

_____________________________ [doctest] impute.rst _____________________________
 also allows for different missing values encodings.
 
 The following snippet demonstrates how to replace missing values,
 encoded as ``np.nan``, using the mean value of the columns (axis 0)
 that contain the missing values::
 
     >>> import numpy as np
     >>> from sklearn.impute import SimpleImputer
     >>> imp = SimpleImputer(missing_values='NaN', strategy='mean', axis=0)
     >>> imp.fit([[1, 2], [np.nan, 3], [7, 6]])
Expected:
    SimpleImputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0)
Got:
    SimpleImputer(axis=0, copy=True, missing_values='NaN', strategy='mean',
           verbose=0)

Why this doctest failed? The two results are same. Also, there's a failure in pep8 which says l is ambigious variable. I have not made any such changes.

@jnothman
Copy link
Copy Markdown
Member

Use the normalize whitespace flag for doctest

@jnothman
Copy link
Copy Markdown
Member

I think the ambiguous variable name just means some code was unnecessarily repeated in the test. You can fix it

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Feb 13, 2018 via email

@qinhanmin2014
Copy link
Copy Markdown
Member

I'm OK with a deprecation warning in the old version and directly dropping axis parameter in the new version.

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Feb 14, 2018 via email

@qinhanmin2014
Copy link
Copy Markdown
Member

We don't exactly need a deprecation warning in the old version if the entire module is deprecated...?

Not sure, I'll revert the deprecation if you have made the decision, but I think users will expect the new version to behave the same as the old version. It seems strange to remove a parameter without any warning when moving the class, so I would prefer a deprecation warning at least in the old version (maybe also in the new version from my side).

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Feb 14, 2018 via email

@qinhanmin2014
Copy link
Copy Markdown
Member

Why not just having what's new and warning say: "sklearn.preprocessing.Imputer has been renamed to sklearn.impute.SimpleImputer, and its axis parameter is no longer available. The deprecated Imputer will be removed in version 0.22."

I think it's a better way. @jnothman revert #10558?

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Feb 14, 2018 via email

@qinhanmin2014
Copy link
Copy Markdown
Member

@jnothman I've created #10635 to revert #10558.
Finishing all the work here is fine but I'm also OK to merge without the deprecation of axis parameter (I think we can leave this to another PR).

@jnothman
Copy link
Copy Markdown
Member

Okay. Let's do it that way.

@jnothman
Copy link
Copy Markdown
Member

Thanks @thechargedneutron

@jnothman
Copy link
Copy Markdown
Member

Do you want to remove axis and update the what's new, @qinhanmin2014, or should we open an issue?

@qinhanmin2014
Copy link
Copy Markdown
Member

@jnothman Please open a new issue. I'll try to take care of it after couple of days if no one takes it (it's Chinese new year :) )

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Feb 14, 2018 via email

@sklearn-lgtm
Copy link
Copy Markdown

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for Explicit export is not defined

Comment posted by lgtm.com

Copy link
Copy Markdown
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've resolved the conflicts and checked the PR. merge? @jnothman @glemaitre

@qinhanmin2014 qinhanmin2014 changed the title [MRG+1] Moves Imputation out of Preprocessing [MRG+2] Moves Imputation out of Preprocessing Feb 14, 2018
@jnothman jnothman merged commit 71b29ac into scikit-learn:master Feb 14, 2018
@jnothman
Copy link
Copy Markdown
Member

Ah, I thought I'd already merged this. I must have been blocked by a merge conflict and not seen.

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Feb 14, 2018 via email

@jnothman
Copy link
Copy Markdown
Member

We failed to remove references to Imputer from sklearn/modules/preprocessing.rst :( Needs to be done.

@thechargedneutron
Copy link
Copy Markdown
Contributor Author

I'll do this. :) Are you referring to doc/modules/preprocessing.rst ?
Also, the whole of Imputation section is to be removed, right?

@jnothman
Copy link
Copy Markdown
Member

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move imputation out of preprocessing

6 participants