[MRG] skip_complete flag for IterativeImputer by sergeyf · Pull Request #14806 · scikit-learn/scikit-learn

sergeyf · 2019-08-25T17:40:52Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Before, IterativeImputer had some unintuitive behavior when there were features with no missing values at fit but not at transform. This PR turns off this behavior by default, and adds a flag (skip_complete) to turn it back on. A test is added, the docstring is updated.

This change also required a few minor changes to fix edge cases that were impossible before.

Paging @jnothman and @glemaitre.

jnothman · 2019-08-26T02:29:19Z

sklearn/impute/_iterative.py

        "random"
            A random order for each round.

+    skip_non_missing_features : boolean, optional (default=False)


Maybe we should have:

skip : 'none' (default), 'complete' Which features to use the initial imputation for, instead of iterative imputation. 'complete' avoids learning an iterative imputation model for features that have no missing values in fitting, so will be efficient if you expect those features not require imputation.

We could consider supporting a list of feature indices too.

I like boolean a little better because it's easy to remember what the options are. People could easily confuse 'none' for None.

As for supporting feature indices: I can't really think of when I'd want to do this and how exactly it would work. Let's keep it simple for now.

jnothman · 2019-08-26T03:02:03Z

When you know that some features will never have missing values is when you would specify them explicitly.

sergeyf · 2019-08-26T03:05:37Z

When you know that some features will never have missing values is when you would specify them explicitly.

Doesn't skip_non_missing_features=True do this automatically? It finds them for you so you don't have to specify them yourself!

sergeyf · 2019-08-26T03:16:35Z

To be clear about my reluctance to add more bells/whistles. Partially, I think the current version solves the problem we have, and partially I don't have a ton of time to make this more complex/fancy. We can always add more stuff later if we need it?

jnothman · 2019-08-26T05:45:06Z

skip_non_missing_features=True is the same as skip='complete'. Part of my concern is about the name being long and hard to decode

sergeyf · 2019-08-26T14:24:53Z

Right. How about skip_complete but still as boolean?

jnothman

Otherwise LGTM!

jnothman · 2019-08-27T08:56:23Z

sklearn/impute/tests/test_impute.py

+    train = [[1], [2]]
+    imputer = IterativeImputer()
+    imputer.fit(train)
+    assert imputer.n_iter_ == 0


test it with missing value too?

jnothman · 2019-08-27T08:58:27Z

sklearn/impute/_iterative.py

-                    for a_, b_, loc_, scale_
-                    in zip(a, b, mus, sigmas)]
+        # get posterior samples if there is at least one missing value
+        if np.sum(missing_row_mask) > 0:


This makes the function appear quite nested. Might be more readable if we handle the == 0 case with a return.

jnothman

Otherwise LGTM!

sergeyf · 2019-08-27T15:54:19Z

Thanks - I've addressed the comments!

sergeyf · 2019-09-03T22:12:09Z

@jnothman Anything else here?

glemaitre

I am fine with the feature. Just some cosmetic changes.
Could you add some entries in what's new. Maybe we should have 2 entries:

one for skip_complete;
one for the resolution with a single feature.

sklearn/impute/_iterative.py

sklearn/impute/tests/test_impute.py

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

sergeyf · 2019-09-09T16:51:55Z

@glemaitre Thanks, everything you suggested/asked for is done!

glemaitre · 2019-09-09T20:48:15Z

Thanks @sergeyf

LuxMiranda · 2019-09-09T20:58:49Z

My hero! @sergeyf

sergeyf added 3 commits August 25, 2019 10:37

initial commit

7883982

removing unnecessary check

af6b8bc

test one feature edge case

a4d9d39

jnothman reviewed Aug 26, 2019

View reviewed changes

simpler name for the param

a0ad96e

sergeyf changed the title ~~[MRG] skip_non_missing_features flag for IterativeImputer~~ [MRG] skip_complete flag for IterativeImputer Aug 26, 2019

missing attribute description

a5fb216

jnothman reviewed Aug 27, 2019

View reviewed changes

jnothman approved these changes Aug 27, 2019

View reviewed changes

addressing reviewer comments

13e19bc

glemaitre requested changes Sep 9, 2019

View reviewed changes

sergeyf and others added 7 commits September 9, 2019 08:31

Update sklearn/impute/_iterative.py

5f9dcd6

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update sklearn/impute/_iterative.py

c60a1c2

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update sklearn/impute/tests/test_impute.py

05f1df9

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update sklearn/impute/tests/test_impute.py

25adce1

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Apply suggestions from code review

7142274

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Merge branch 'master' into iterativeimputer_dont_skip

19f0722

docs + fix

5007090

Update v0.22.rst

4dbe415

glemaitre merged commit 9f7d3f9 into scikit-learn:master Sep 9, 2019

sergeyf deleted the iterativeimputer_dont_skip branch September 9, 2019 21:02

marah-abdin mentioned this pull request Jan 31, 2023

maabdin/imputation microsoft/responsible-ai-toolbox-mitigations#49

Merged

Uh oh!

Conversation

sergeyf commented Aug 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

jnothman Aug 26, 2019

Choose a reason for hiding this comment

Uh oh!

sergeyf Aug 26, 2019

Choose a reason for hiding this comment

Uh oh!

jnothman commented Aug 26, 2019 via email

Uh oh!

sergeyf commented Aug 26, 2019

Uh oh!

sergeyf commented Aug 26, 2019

Uh oh!

jnothman commented Aug 26, 2019 via email

Uh oh!

sergeyf commented Aug 26, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Aug 27, 2019

Choose a reason for hiding this comment

Uh oh!

jnothman Aug 27, 2019

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

sergeyf commented Aug 27, 2019

Uh oh!

sergeyf commented Sep 3, 2019

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sergeyf commented Sep 9, 2019

Uh oh!

glemaitre commented Sep 9, 2019

Uh oh!

LuxMiranda commented Sep 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sergeyf commented Aug 25, 2019 •

edited

Loading