FIX KNNImputer missing indicator column addition when add_indicator=True#26600
Merged
thomasjpfan merged 13 commits intoscikit-learn:mainfrom Aug 8, 2023
Merged
Conversation
added 5 commits
June 18, 2023 11:31
thomasjpfan
reviewed
Jul 26, 2023
Member
thomasjpfan
left a comment
There was a problem hiding this comment.
Thank you for the PR @Shreesha3112 !
sklearn/impute/tests/test_knn.py
Outdated
|
|
||
| @pytest.mark.parametrize("add_indicator", [True, False]) | ||
| @pytest.mark.parametrize("missing_value_test", [np.NaN, 1]) | ||
| def test_knn_imputation_adds_missing_indicator_column_if_add_indicator_is_true( |
Member
There was a problem hiding this comment.
I think this property applies to all imputers, thus I think it's better to have a test in sklearn/impute/tests/test_common.py:
@pytest.mark.parametrize("imputer", imputers(), ids=lambda x: x.__class__.__name__)
@pytest.mark.parametrize("missing_value_test", [np.nan, 1])
def test_imputation_adds_missing_indicator_if_add_indicator_is_true(
imputer, missing_value_test
):
"""Check that missing indicator always exists when add_indicator=True.
Non-regression test for gh-26590.
"""
X_train = np.array([[0, np.NaN], [1, 2]])
# Test data without missing values that can be set to np.NaN or 1.
X_test = np.array([[0, missing_value_test], [1, 2]])
imputer.set_params(add_indicator=True)
imputer.fit(X_train)
X_test_imputed_with_indicator = imputer.transform(X_test)
assert X_test_imputed_with_indicator.shape == (2, 3)
imputer.set_params(add_indicator=False)
imputer.fit(X_train)
X_test_imputed_without_indicator = imputer.transform(X_test)
assert X_test_imputed_without_indicator.shape == (2, 2)
assert_allclose(
X_test_imputed_with_indicator[:, :-1], X_test_imputed_without_indicator
)
if np.isnan(missing_value_test):
expected_missing_indicator = [1, 0]
else:
expected_missing_indicator = [0, 0]
assert_allclose(X_test_imputed_with_indicator[:, -1], expected_missing_indicator)Given the original bug report, I am okay with checking add_indicator=True, which is also consistent with the name of the test "*_if_add_indicator_is_true".
doc/whats_new/v1.4.rst
Outdated
| :mod:`sklearn.impute` | ||
| ..................... | ||
|
|
||
| - |Fix| :class:`impute.KNNImputer` now correctly adds a missing indicator column in |
Member
There was a problem hiding this comment.
Can this changelog entry be moved to doc/whats_new/1.3.rst under 1.3.1 ?
added 4 commits
July 27, 2023 20:36
…ing-indicator-behavior
…ing-indicator-behavior
thomasjpfan
approved these changes
Jul 27, 2023
Member
thomasjpfan
left a comment
There was a problem hiding this comment.
Thank you for the update! LGTM
Contributor
|
Hoping this will be merged soon! Thank you for this PR |
OmarManzoor
approved these changes
Aug 8, 2023
Contributor
OmarManzoor
left a comment
There was a problem hiding this comment.
Thanks for the PR. Looks good overall.
TamaraAtanasoska
pushed a commit
to TamaraAtanasoska/scikit-learn
that referenced
this pull request
Aug 21, 2023
…rue (scikit-learn#26600) Co-authored-by: shreesha3112 <shreesha3112.com>
glemaitre
pushed a commit
to glemaitre/scikit-learn
that referenced
this pull request
Sep 18, 2023
…rue (scikit-learn#26600) Co-authored-by: shreesha3112 <shreesha3112.com>
jeremiedbb
pushed a commit
that referenced
this pull request
Sep 20, 2023
…rue (#26600) Co-authored-by: shreesha3112 <shreesha3112.com>
REDVM
pushed a commit
to REDVM/scikit-learn
that referenced
this pull request
Nov 16, 2023
…rue (scikit-learn#26600) Co-authored-by: shreesha3112 <shreesha3112.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Fixes #26590
What does this implement/fix? Explain your changes.
Previously, the
KNNImputerdid not add a missing indicator column intransformmethod, when theadd_indicatorparameter was set to True and no missing values are present in the input. This could lead to inconsistent outputs between training and testing datasets for example.This bug fix ensures that
KNNImputercorrectly adds a missing indicator column in itstransformoutput when theadd_indicatoris set toTrue, even if no missing values are present in the input.Any other comments?