[WIP] Example of multiple imputation with IterativeImputer#13025
[WIP] Example of multiple imputation with IterativeImputer#13025sergeyf wants to merge 59 commits intoscikit-learn:mainfrom
Conversation
|
Paging @jnothman and @RianneSchouten. |
|
It might be good to amend that first commit with |
ecfdfc5 to
4f59d37
Compare
4f59d37 to
999bfa0
Compare
|
OK, I think that worked. |
|
I suspect that without |
|
I've created examples/impute/README.txt in the iterativeimputer branch. |
|
Oh, no. Did I break the doctest again? |
|
@jnothman I wondered back to this PR, and now it passes tests! Any interest in picking up work on this? I feel like it was in a pretty good place already and we were just unsure about the extremely long runtimes. |
|
@jnothman @glemaitre Any thoughts on my last comment? Repeated here: "Any interest in picking up work on this? I feel like it was in a pretty good place already and we were just unsure about the extremely long runtimes." |
|
I found the current docs ambiguous and believe the community would value this work. |
|
I'm not familiar with these build trigger checks. Can anyone please suggest how to fix it? |
Syncing with the |
|
Thank you @thomasjpfan! Any idea if we can get this merged once all the tests pass? |
This PR still need two approvals to get merged and I do not have a time estimate for that to happen. At a glance, I see two big tasks for this PR:
|
|
Thanks! I can make those changes. What regression dataset do you recommend to replace Boston? Smaller is best because this example is hefty, but I can always subsample any dataset. |
|
fetch_california is similar to load_boston, albeit larger
…On Mon, Jan 3, 2022 at 10:12 PM Sergey Feldman ***@***.***> wrote:
Thanks! I can make those changes. What regression dataset do you recommend
to replace Boston? Smaller is best because this example is hefty, but I can
always subsample any dataset.
—
Reply to this email directly, view it on GitHub
<#13025 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABURKX3GKTIT3F6WM3WK5DDUUJQQNANCNFSM4GRNRPRA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
I would like to see how this example integrates within the proposal in #21967. |
|
@glemaitre MICE is in the family of multiple imputation - perform imputation multiple times, then apply your subsequent pipeline multiple times also, and then have multiple solutions. For To summarize:
|
|
I agree that having an example defining what is "multiple imputations" is important to remove the confusion with the iterative procedure of the In this regard, I would prefer to have a single pipeline to make single imputation and then create a specific estimator to show how to make multiple imputations. We would not even need to use an I think that this is super important to point out in the discussion that the example stands at providing a definition of "multiple imputations with code" rather than showing that multiple imputations work better. I am not sure that currently in ML setting there is any evidence that multiple imputations are working better than using a strong learner (@GaelVaroquaux and @marineLM have better insights than me on this). |
|
Cc @A-pl (we need to put your paper on HAL) |
|
I'm a bit confused. We do have a single pipeline in example 2: https://github.com/sergeyf/scikit-learn/blob/iterativeimputer_mice_example/examples/impute/plot_multiple_imputation.py#L303 And it's used multiple times to do MICE: https://github.com/sergeyf/scikit-learn/blob/iterativeimputer_mice_example/examples/impute/plot_multiple_imputation.py#L315 Can you please clarify what you'd like changed? |
Adding to #11977. This PR is a restart of #11370, which got messy.
Here is a quote from #11370 that explains what this PR does: