[MRG] skip_complete flag for IterativeImputer#14806
[MRG] skip_complete flag for IterativeImputer#14806glemaitre merged 14 commits intoscikit-learn:masterfrom
Conversation
sklearn/impute/_iterative.py
Outdated
| "random" | ||
| A random order for each round. | ||
|
|
||
| skip_non_missing_features : boolean, optional (default=False) |
There was a problem hiding this comment.
Maybe we should have:
skip : 'none' (default), 'complete'
Which features to use the initial imputation for, instead of iterative imputation.
'complete' avoids learning an iterative imputation model for features that
have no missing values in fitting, so will be efficient if you expect those features
not require imputation.
We could consider supporting a list of feature indices too.
There was a problem hiding this comment.
I like boolean a little better because it's easy to remember what the options are. People could easily confuse 'none' for None.
As for supporting feature indices: I can't really think of when I'd want to do this and how exactly it would work. Let's keep it simple for now.
|
When you know that some features will never have missing values is when you
would specify them explicitly.
|
Doesn't |
|
To be clear about my reluctance to add more bells/whistles. Partially, I think the current version solves the problem we have, and partially I don't have a ton of time to make this more complex/fancy. We can always add more stuff later if we need it? |
|
skip_non_missing_features=True is the same as skip='complete'. Part of my
concern is about the name being long and hard to decode
|
|
Right. How about |
| train = [[1], [2]] | ||
| imputer = IterativeImputer() | ||
| imputer.fit(train) | ||
| assert imputer.n_iter_ == 0 |
There was a problem hiding this comment.
test it with missing value too?
sklearn/impute/_iterative.py
Outdated
| for a_, b_, loc_, scale_ | ||
| in zip(a, b, mus, sigmas)] | ||
| # get posterior samples if there is at least one missing value | ||
| if np.sum(missing_row_mask) > 0: |
There was a problem hiding this comment.
This makes the function appear quite nested. Might be more readable if we handle the == 0 case with a return.
|
Thanks - I've addressed the comments! |
|
@jnothman Anything else here? |
glemaitre
left a comment
There was a problem hiding this comment.
I am fine with the feature. Just some cosmetic changes.
Could you add some entries in what's new. Maybe we should have 2 entries:
- one for
skip_complete; - one for the resolution with a single feature.
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
|
@glemaitre Thanks, everything you suggested/asked for is done! |
|
Thanks @sergeyf |
|
My hero! @sergeyf |
Reference Issues/PRs
Fixes #13773 and #14383
What does this implement/fix? Explain your changes.
Before,
IterativeImputerhad some unintuitive behavior when there were features with no missing values atfitbut not attransform. This PR turns off this behavior by default, and adds a flag (skip_complete) to turn it back on. A test is added, the docstring is updated.This change also required a few minor changes to fix edge cases that were impossible before.
Paging @jnothman and @glemaitre.