Skip to content

ENH keep features with all missing values during imputation#24770

Merged
jjerphan merged 23 commits intoscikit-learn:mainfrom
glemaitre:missing_feature_imputer
Nov 17, 2022
Merged

ENH keep features with all missing values during imputation#24770
jjerphan merged 23 commits intoscikit-learn:mainfrom
glemaitre:missing_feature_imputer

Conversation

@glemaitre
Copy link
Copy Markdown
Member

@glemaitre glemaitre commented Oct 27, 2022

Fixes #16695
Fixes #16426
Fixes #16977

Pushing in the contributor branch did not seem to sync with GitHub.
Opening this PR the.

@jeremiedbb jeremiedbb changed the title Missing feature imputer ENH keep features with all missing values during imputation Oct 27, 2022
Copy link
Copy Markdown
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a first pass. As discussed, we probably don't want to let the features with all missing get out of transform unchanged, i.e still with nans. It's a surprising behavior for an imputer. As a first simple solution we can impute them by 0 (the value should never be relevant anyway). I was wondering if for the SimpleImputer with strategy=constant we would like to impute with "fill_value", otherwise the behavior might be confusing ?

@glemaitre
Copy link
Copy Markdown
Member Author

I will first add the support for imputing with a constant now that I merge main. I will update the tests as well.

Copy link
Copy Markdown
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pursuing this work, @glemaitre!

Here are a few comments.

@jeremiedbb
Copy link
Copy Markdown
Member

So it appears that your changes eventually ended up showing in the original PR. I guess github was having a hard time :)
Do you still want to continue here or go back to the original PR ?

@glemaitre
Copy link
Copy Markdown
Member Author

Do you still want to continue here or go back to the original PR ?

I will continue here and make sure that we use the original entry in the changelog and acknowledge the contributor.

@glemaitre
Copy link
Copy Markdown
Member Author

Adding the label "No change needed" to not have an error raised due to the PR number mismatch.

@glemaitre
Copy link
Copy Markdown
Member Author

@jeremiedbb @jjerphan This PR is now ready to be reviewed. I added some additional documentation.

@glemaitre
Copy link
Copy Markdown
Member Author

Uhm the bad bug :). We have a list where the tests modify a global variable.

Copy link
Copy Markdown
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few last remarks.

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Copy link
Copy Markdown
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you, @vitorsrg for initiating this work and @glemaitre for superseding it.

Copy link
Copy Markdown
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a few typos and nitpicks

glemaitre and others added 2 commits November 16, 2022 18:14
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IterativeImputer automatically removes variables with all missing values Missing features removal with SimpleImputer

5 participants