[WIP] DOC Explain missing value mechanisms#23746
Draft
aperezlebel wants to merge 1 commit intoscikit-learn:mainfrom
Draft
[WIP] DOC Explain missing value mechanisms#23746aperezlebel wants to merge 1 commit intoscikit-learn:mainfrom
aperezlebel wants to merge 1 commit intoscikit-learn:mainfrom
Conversation
7 tasks
glemaitre
reviewed
Jun 28, 2022
|
|
||
| Missing value mechanisms | ||
| ======================== | ||
| Three mechanisms model data missingness. |
Member
There was a problem hiding this comment.
Suggested change
| Three mechanisms model data missingness. | |
| Three mechanisms model data missingness exist: |
| ======================== | ||
| Three mechanisms model data missingness. | ||
|
|
||
| * **Missing Completely At Random (MCAR)**: the missingness does not depend on data. |
Member
There was a problem hiding this comment.
What about giving a concrete example for each mechanism to illustrate it.
| :align: center | ||
| :scale: 20% | ||
|
|
||
| In the above example, X1 is always observed. In the first plot, X2 is masked |
Member
There was a problem hiding this comment.
Suggested change
| In the above example, X1 is always observed. In the first plot, X2 is masked | |
| In the above example, X1 is always observed. In the left-hand side plot, X2 is masked |
| :scale: 20% | ||
|
|
||
| In the above example, X1 is always observed. In the first plot, X2 is masked | ||
| independently of the values of (X1, X2), hence MCAR. In the second, X2 is |
Member
There was a problem hiding this comment.
Suggested change
| independently of the values of (X1, X2), hence MCAR. In the second, X2 is | |
| independently of the values of (X1, X2), hence MCAR. In the middle, X2 is |
|
|
||
| In the above example, X1 is always observed. In the first plot, X2 is masked | ||
| independently of the values of (X1, X2), hence MCAR. In the second, X2 is | ||
| masked when X1 (observed) reaches some threshold, hence MAR. In the last, X2 is |
Member
There was a problem hiding this comment.
Suggested change
| masked when X1 (observed) reaches some threshold, hence MAR. In the last, X2 is | |
| masked when X1 (observed) reaches some threshold, hence MAR. In the right-hand side plot, X2 is |
Member
|
@aperezlebel do you want to address the comment and solve the conflict such that we merge this PR? |
ogrisel
reviewed
Jul 7, 2023
|
|
||
| * **Missing Completely At Random (MCAR)**: the missingness does not depend on data. | ||
| * **Missing At Random (MAR)**: the missingness does not depend on underlying | ||
| missing values but can depend on observed ones. |
Member
There was a problem hiding this comment.
Including the target variable y?
|
|
||
| Missing value mechanisms | ||
| ======================== | ||
| Three mechanisms model data missingness. |
Member
There was a problem hiding this comment.
Suggested change
| Three mechanisms model data missingness. | |
| The machine learning literature typically distinguishes between the following | |
| settings. Note that the names are not necessarily very intuitive: |
| * **Missing At Random (MAR)**: the missingness does not depend on underlying | ||
| missing values but can depend on observed ones. | ||
| * **Missing Not At Random (MNAR)**: the missingness depends on underlying missing | ||
| values. |
Member
There was a problem hiding this comment.
Suggested change
| values. | |
| values. Therefore, the missingness pattern can be statistically associated | |
| with `y` in a supervised classification or regression setting. |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Addresses task 3 of #21967.
What does this implement/fix? Explain your changes.
Add a section to the "Imputation of missing values" doc to explain the missing value mechanisms.
Any other comments?
Work in progress