[MRG+2] Rename MICEImputer to ChainedImputer#11314
[MRG+2] Rename MICEImputer to ChainedImputer#11314ogrisel merged 14 commits intoscikit-learn:masterfrom
MICEImputer to ChainedImputer#11314Conversation
MICEImputer to ChainedImputerMICEImputer to ChainedImputer
MICEImputer to ChainedImputerMICEImputer to ChainedImputer
|
Getting some |
|
@sergeyf Any reference for the new name? |
|
@qinhanmin2014 thanks re: master, I'll do that. Re: name. So the point here is that we're not actually doing MICE, because the traditional way of applying the MICE technique is to get multiple imputations, and this class only returns a single one. But the full name is multivariate imputation by chained equations (MICE), hence the named |
|
Sorry, I broke master. |
|
Part of the confusion is that:
|
|
@jnothman Which paper? Stef's paper that I reference is this one: "mice: Multivariate Imputation by Chained Equations in R" https://www.jstatsoft.org/article/view/v045i03 There's also this paper, but it's not referenced in this PR: "Multiple Imputation by Chained Equations: What is it and how does it work?" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/ @qinhanmin2014 will you please take a look at my changes to impute.rst and let me know if it's sufficiently clear now. |
|
You're right. I've just been confused by people implying that the M must mean Multiple.... because of course what we're doing here is Multivariate Imputation by Chaining.... just not Multiple. |
|
I have confused that at least 5 times while writing the original PR =) |
doc/modules/impute.rst
Outdated
| A more sophisticated approach is to use the :class:`ChainedImputer` class, which | ||
| implements MICE: Multivariate Imputation by Chained Equations. MICE is usually used | ||
| to generate multiple imputations, but :class:`ChainedImputer` generates a single | ||
| (averaged) imputation, which is why it is not named `MICEImputer`. MICE models each |
There was a problem hiding this comment.
I think we need to link those terms to a glossary entry that explains:
- univariate vs multivariate imputation
- single vs multiple imputation (increasing the number of samples in the resulting transformed dataset).
There was a problem hiding this comment.
Good idea. There is no entry for imputation. I'll add one and include these points.
ogrisel
left a comment
There was a problem hiding this comment.
Besides the above comment on the missing glossary entry, +1 for merge.
|
@ogrisel I added a glossary entry and rewrote the entries in impute.rst. Please take a look and confirm that it satisfies your request. |
MICEImputer to ChainedImputerMICEImputer to ChainedImputer
|
Yeah, mice is both multivariate and multiple.
multivariate: it loops over the features and imputes all incomplete features
multiple: if you impute every incomplete value m times.
ChainedImputer in its current form is chained and multivariate (and not yet
multiple).
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virusvrij.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
2018-06-20 17:19 GMT+02:00 Sergey Feldman <notifications@github.com>:
… @ogrisel <https://github.com/ogrisel> I added a glossary entry and
rewrote the entries in impute.rst. Please take a look and confirm that it
satisfies your request.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11314 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AVQqeyAS6OX5LzBwe2JtzlCeq2jly0RCks5t-mfqgaJpZM4UsF-9>
.
|
There was a problem hiding this comment.
The glossary entry for imputation is significantly longer than the average entry.
Maybe we should keep a short entry in the glossary and move the discussing on univariate vs multivariate vs single vs multiple imputations in a dedicated section at the end of the missing value imputation chapter of the narrative documentation.
Alternatively, we could split the entries: univariate imputation, multivariate imputation, multiple imputations and add cross-links. See for instance the entries on "inductive" vs "transductive".
| impute | ||
| imputation | ||
| Most machine learning algorithms require that their inputs have no | ||
| :term:`missing values`, and will not work if this requirement is |
There was a problem hiding this comment.
Please add a backlink to the :term:imputation entry from the :term:missing values entry.
doc/glossary.rst
Outdated
| The above practice is called multiple imputation. The | ||
| :class:`impute.ChainedImputer` class can be used for multiple | ||
| imputations by applying it to the same dataset with different random | ||
| seeds. |
There was a problem hiding this comment.
I would further refine:
... imputations by applying it repeatedly to the same dataset with different random seeds.
Note that a call to the transform method of a :term:transformer is not allowed to change the number of samples. Therefore multiple imputations cannot be achieved by a single call to transform.
|
@ogrisel Done! I moved most of the discussion to impute.rst |
|
Hmm tests are failing because of StandardScaler? |
|
|
||
| ALLOW_NAN = ['Imputer', 'SimpleImputer', 'MICEImputer', | ||
| 'MinMaxScaler', 'StandardScaler', 'QuantileTransformer'] | ||
| ALLOW_NAN = ['Imputer', 'SimpleImputer', 'ChainedImputer', |
There was a problem hiding this comment.
Add the StandardScaler back here to remove the error.
There was a problem hiding this comment.
Oh dang that was my bad on the merging. Thanks!
|
Thank you very much @sergeyf! |
This reverts commit 93382cc.
Addresses a point raised in #8478 and #11259. The
MICEImputerisn't really a MICE imputer because we're only keeping a single averaged imputation by default. This is a purely cosmetic PR so that end-users don't get as confused.CC @glemaitre @jnothman @RianneSchouten