[MRG] Adds KNNImputer#12852
Conversation
jnothman
left a comment
There was a problem hiding this comment.
This is another comment I had pending. Not sure when I'll give it another full pass
sklearn/impute/_knn.py
Outdated
| return X[:, valid_idx] | ||
|
|
||
| def _more_tags(self): | ||
| return {'allow_nan': True} |
There was a problem hiding this comment.
| return {'allow_nan': True} | |
| return {'allow_nan': is_scalar_nan(self.missing_values)} |
sklearn/impute/_knn.py
Outdated
|
|
||
| Parameters | ||
| ---------- | ||
| dist_pot_donors : array-like, shape=(n_receivers, n_train_samples) |
There was a problem hiding this comment.
Should reformat according to the recent edicts on these matters.
|
@jnothman scikit-learn/sklearn/utils/validation.py Line 927 in 35c0ca0 |
|
tests are failing... |
|
Test failure was unrelated, now that #14689 is merged, merging this with master should fix it. |
| least one neighbor with a defined distance, the weighted or unweighted average | ||
| of the remaining neighbors will be used during imputation. If a feature is | ||
| always missing in training, it is removed during `transform`. For more | ||
| information on the methodology, see ref. [OL2001]_. |
There was a problem hiding this comment.
Do they consider distance weighting? It might be worth noting the differences...
There was a problem hiding this comment.
It looks like they use distance weighting by default.
There was a problem hiding this comment.
Gerhard Tutz, Shahla Ramzan,
Improved Methods for the Imputation of Missing Data by Nearest Neighbor Methods shows in their experiments that weighted kNN performs a little better than unweighted.
There was a problem hiding this comment.
Lorenzo Beretta* and Alessandro Santaniello, Nearest neighbor imputation algorithms: a critical evaluation, shows that weighted is slightly better than unweighted.
| ]) | ||
| knn = KNNImputer(missing_values=na, n_neighbors=2).fit(X) | ||
|
|
||
| X_transform = knn.transform(X) |
There was a problem hiding this comment.
Please check with test data where the feature has values
There was a problem hiding this comment.
PR updated with this suggestion.
sklearn/impute/tests/test_knn.py
Outdated
| dist = nan_euclidean_distances(X) | ||
| r1c1_nbor_dists = dist[1, [0, 2, 3, 4, 5]] | ||
| r1c3_nbor_dists = dist[1, [0, 3, 4, 5, 6]] | ||
| r1c1_nbor_wt = (1 / r1c1_nbor_dists) |
There was a problem hiding this comment.
| r1c1_nbor_wt = (1 / r1c1_nbor_dists) | |
| r1c1_nbor_wt = 1 / r1c1_nbor_dists |
sklearn/impute/tests/test_knn.py
Outdated
| r1c1_nbor_dists = dist[1, [0, 2, 3, 4, 5]] | ||
| r1c3_nbor_dists = dist[1, [0, 3, 4, 5, 6]] | ||
| r1c1_nbor_wt = (1 / r1c1_nbor_dists) | ||
| r1c3_nbor_wt = (1 / r1c3_nbor_dists) |
There was a problem hiding this comment.
| r1c3_nbor_wt = (1 / r1c3_nbor_dists) | |
| r1c3_nbor_wt = 1 / r1c3_nbor_dists |
sklearn/impute/tests/test_knn.py
Outdated
|
|
||
|
|
||
| @pytest.mark.parametrize("na", [-1, np.nan]) | ||
| def test_knn_imputer_with_weighted_features(na): |
There was a problem hiding this comment.
"weighted features" seems to be a strange name. This also appears to overlap in purpose with test_knn_imputer_weight_distance. What's the distinction?
There was a problem hiding this comment.
There was not a distinction. test_knn_imputer_with_weighted_features was removed and the tests from there was moved into test_knn_imputer_weight_distance
|
The mask of |
jnothman
left a comment
There was a problem hiding this comment.
I think this is good to hit the road.
Congrats @ashimb9 and thanks @thomasjpfan for finishing it off!
|
yayyyy finally ;) |
|
Glad to see this one in, too!
Danilo
… Am 04.09.2019 um 6:43 PM schrieb Andreas Mueller ***@***.***>:
yayyyy finally ;)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#12852?email_source=notifications&email_token=AA5Z6RABX22LODMU7SHEIDDQH7QSZA5CNFSM4GL4N2AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD54GS7A#issuecomment-527985020>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA5Z6RB4ZL7CGTRMJDWPFODQH7QSZANCNFSM4GL4N2AA>.
|
|
Very happy to see this. |
Reference Issues/PRs
Continues #9348, and a part of #9212
Closes #2989
Closes #9348
Closes #9212
What does this implement/fix? Explain your changes.
This PR cleans up the work done on #9348 and #9212:
Edit:
KNNImputerhas been added this to PR with the following updates: