[WIP] Example of multiple imputation with IterativeImputer by sergeyf · Pull Request #13025 · scikit-learn/scikit-learn

sergeyf · 2019-01-21T19:11:39Z

Adding to #11977. This PR is a restart of #11370, which got messy.

Here is a quote from #11370 that explains what this PR does:

This PR is an example that shows how to use IterativeImputer for Multiple Imputation.

As discussed in #11259, the defaults of IterativeImputer are such that single imputation is performed. Because the method is also quite powerful for Multiple Imputation, we agreed to make an example that shows the user how to use ImputerImputer to perform Multiple Imputation.

I made the document: examples/impute/plot_multiple_imputation.py and it shows 2 things:

Estimation of beta estimates and their standard error: compare IterativeImputer with using IterativeImputer as a MICE Imputer.
How to use IterativeImputer as a MICE Imputer when making a prediction model (with train and test datasets).

This reverts commit f819704.

…cikit-learn#11350) Towards making this more generic than MICE

… a normal (scikit-learn#12177)

sergeyf · 2019-01-21T19:11:59Z

Paging @jnothman and @RianneSchouten.

jnothman · 2019-01-21T23:21:37Z

It might be good to amend that first commit with --author 'Rianne Schouten' etc. Thanks for this! Will look soon!

sergeyf · 2019-01-21T23:36:45Z

OK, I think that worked.

jnothman · 2019-01-22T02:40:48Z

I suspect that without examples/impute/README.txt the example won't render.

jnothman · 2019-01-22T02:46:39Z

I've created examples/impute/README.txt in the iterativeimputer branch.

jnothman · 2019-01-22T04:57:14Z

Oh, no. Did I break the doctest again?

…rgeyf/scikit-learn into iterativeimputer_mice_example

sergeyf · 2020-09-25T17:17:31Z

@jnothman I wondered back to this PR, and now it passes tests!

Any interest in picking up work on this? I feel like it was in a pretty good place already and we were just unsure about the extremely long runtimes.

sergeyf · 2020-12-04T03:28:18Z

@jnothman @glemaitre Any thoughts on my last comment? Repeated here: "Any interest in picking up work on this? I feel like it was in a pretty good place already and we were just unsure about the extremely long runtimes."

nxorable · 2022-01-03T20:03:17Z

I found the current docs ambiguous and believe the community would value this work.

sergeyf · 2022-01-03T23:46:16Z

I'm not familiar with these build trigger checks. Can anyone please suggest how to fix it?

thomasjpfan · 2022-01-03T23:54:05Z

Can anyone please suggest how to fix it?

Syncing with the main branch should fix the build trigger error.

sergeyf · 2022-01-04T00:08:09Z

Thank you @thomasjpfan! Any idea if we can get this merged once all the tests pass?

thomasjpfan · 2022-01-04T03:00:44Z

Any idea if we can get this merged once all the tests pass?

This PR still need two approvals to get merged and I do not have a time estimate for that to happen.

At a glance, I see two big tasks for this PR:

It needs to use another dataset instead of load_boston because load_boston has been deprecated and will be removed in 1.2
In the recent years, we have been moving to more narrative driven examples, where text to place together with code. For example: Common pitfalls in the interpretation of coefficients of linear models. This means moving text from the big paragraph in the beginning and placing them into the text to create a narrative.

sergeyf · 2022-01-04T03:11:53Z

Thanks! I can make those changes. What regression dataset do you recommend to replace Boston? Smaller is best because this example is hefty, but I can always subsample any dataset.

nxorable · 2022-01-04T03:24:38Z

fetch_california is similar to load_boston, albeit larger

…

On Mon, Jan 3, 2022 at 10:12 PM Sergey Feldman ***@***.***> wrote: Thanks! I can make those changes. What regression dataset do you recommend to replace Boston? Smaller is best because this example is hefty, but I can always subsample any dataset. — Reply to this email directly, view it on GitHub <#13025 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABURKX3GKTIT3F6WM3WK5DDUUJQQNANCNFSM4GRNRPRA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

glemaitre · 2022-01-04T10:05:37Z

Any idea if we can get this merged once all the tests pass?

I would like to see how this example integrates within the proposal in #21967.

sergeyf · 2022-01-04T16:56:32Z

@glemaitre MICE is in the family of multiple imputation - perform imputation multiple times, then apply your subsequent pipeline multiple times also, and then have multiple solutions. For sklearn users the subsequent pipeline will often be "train/val/test a ML alg". I read through #21967 and multiple imputation isn't mentioned at all, but it is common in the stats world as far as I understand. This example would be useful because people coming from stats might want to do what the mice R package does, but don't know how with IterativeImputer as it does not do it out of the box.

To summarize:

Documenting missing-values practices #21967 is about how to make a guide to imputation, (mostly) for the purpose of ML.
This example is about multiple imputation.
The two don't have an clear bridge at the moment.

glemaitre · 2022-01-05T10:35:45Z

I agree that having an example defining what is "multiple imputations" is important to remove the confusion with the iterative procedure of the IterativeImputer and that there is no confusion. In this regard, I find the example too complex.

In this regard, I would prefer to have a single pipeline to make single imputation and then create a specific estimator to show how to make multiple imputations. We would not even need to use an IterativeImputer in this case. This would speed up the example that is currently taking up to 3 minutes while we usually try to have examples running under 30 seconds.

I think that this is super important to point out in the discussion that the example stands at providing a definition of "multiple imputations with code" rather than showing that multiple imputations work better. I am not sure that currently in ML setting there is any evidence that multiple imputations are working better than using a strong learner (@GaelVaroquaux and @marineLM have better insights than me on this).

GaelVaroquaux · 2022-01-05T14:09:23Z

Cc @A-pl (we need to put your paper on HAL)

sergeyf · 2022-01-05T16:17:00Z

I'm a bit confused. We do have a single pipeline in example 2: https://github.com/sergeyf/scikit-learn/blob/iterativeimputer_mice_example/examples/impute/plot_multiple_imputation.py#L303

And it's used multiple times to do MICE: https://github.com/sergeyf/scikit-learn/blob/iterativeimputer_mice_example/examples/impute/plot_multiple_imputation.py#L315

Can you please clarify what you'd like changed?

jnothman and others added 13 commits September 3, 2018 13:23

FEA Reinstate ChainedImputer

dc67ec0

This reverts commit f819704.

Fix import of time

cbf89ec

Merge branch 'master' into iterativeimputer

e4fa514

[MRG] ChainedImputer -> IterativeImputer, and documentation update (s…

a4f2a89

…cikit-learn#11350) Towards making this more generic than MICE

[MRG] sample from a truncated normal instead of clipping samples from…

09a9a21

… a normal (scikit-learn#12177)

Merge branch 'master' into iterativeimputer

d854b45

DOC Merge IterativeImputer what's news

caa089e

Merge branch 'master' into iterativeimputer

1550d65

Undo changes to v0.20.rst

f103c6b

Revert changes to v0.20.rst

9e10658

DOC Normalize whitespace in doctest

0aab6dc

Fix for SciPy 0.17

d34f227

Fix doctest

b44dff8

sergeyf force-pushed the iterativeimputer_mice_example branch from ecfdfc5 to 4f59d37 Compare January 21, 2019 23:35

Rianne Schouten and others added 3 commits January 21, 2019 15:35

first commit, author=Rianne

965819f

flakes

c502460

adding authors

999bfa0

sergeyf force-pushed the iterativeimputer_mice_example branch from 4f59d37 to 999bfa0 Compare January 21, 2019 23:36

jnothman changed the title ~~first commit~~ Example of multiple imputation with IterativeImputer Jan 22, 2019

Create examples/impute gallery

0453c19

jnothman and others added 3 commits January 22, 2019 13:47

Merge branch 'iterativeimputer' into HEAD

516ba51

maybe this will fix the 24/26 issue?

fe84477

adding readme

1313077

Add missing readme file

8758561

sergeyf closed this Aug 26, 2019

sergeyf reopened this Aug 26, 2019

sergeyf added 4 commits August 26, 2019 07:44

merge master

8f16172

Merge branch 'iterativeimputer_mice_example' of https://github.com/se…

0c8b1fc

…rgeyf/scikit-learn into iterativeimputer_mice_example

Merge branch 'master' into iterativeimputer_mice_example

4b988dd

undoing weird merge changes

72f62e8

skeller88 mentioned this pull request Mar 4, 2020

Any plans to stabilize IterativeImputer? What are the current roadblocks to doing so? #16638

Closed

Merge branch 'master' into iterativeimputer_mice_example

e10edc3

Base automatically changed from master to main January 22, 2021 10:50

sergeyf closed this Jan 3, 2022

sergeyf reopened this Jan 3, 2022

Merge branch 'scikit-learn:main' into iterativeimputer_mice_example

f273dd1

sergeyf added 3 commits January 3, 2022 16:15

run black

f115711

black try again

ba51bfb

final black try. somehow "black file" did not do it all...

9079411

cmarmo added the module:impute label Feb 22, 2022

Uh oh!

Conversation

sergeyf commented Jan 21, 2019

Uh oh!

sergeyf commented Jan 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jan 21, 2019

Uh oh!

sergeyf commented Jan 21, 2019

Uh oh!

jnothman commented Jan 22, 2019

Uh oh!

jnothman commented Jan 22, 2019

Uh oh!

jnothman commented Jan 22, 2019

Uh oh!

sergeyf commented Sep 25, 2020

Uh oh!

sergeyf commented Dec 4, 2020

Uh oh!

nxorable commented Jan 3, 2022

Uh oh!

sergeyf commented Jan 3, 2022

Uh oh!

thomasjpfan commented Jan 3, 2022

Uh oh!

sergeyf commented Jan 4, 2022

Uh oh!

thomasjpfan commented Jan 4, 2022

Uh oh!

sergeyf commented Jan 4, 2022

Uh oh!

nxorable commented Jan 4, 2022 via email

Uh oh!

glemaitre commented Jan 4, 2022

Uh oh!

sergeyf commented Jan 4, 2022

Uh oh!

glemaitre commented Jan 5, 2022

Uh oh!

GaelVaroquaux commented Jan 5, 2022

Uh oh!

sergeyf commented Jan 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

sergeyf commented Jan 21, 2019 •

edited

Loading