[MRG+2] algorithm='auto' should always work for nearest neighbors (continuation)#9145
[MRG+2] algorithm='auto' should always work for nearest neighbors (continuation)#9145amueller merged 9 commits intoscikit-learn:masterfrom
Conversation
| if metric not in _metrics: | ||
| nn = neighbors.NearestNeighbors(n_neighbors=3, algorithm='auto', | ||
| metric=metric).fit(Xcsr) | ||
| if metric != "precomputed": |
There was a problem hiding this comment.
Is that normal that I can't call kneighbors method on sparse matrix when metric='precomputed'
There was a problem hiding this comment.
That case is not implemented: it would need to be treated a bit differently to the precomputed dense array case; and it doesn't really make sense. An array containing distances which is mostly 0s is pretty useless for nearest neighbors. (Unless we're admitting negative distances, or having a different interpretation of 0s, both of which may be useful.)
There was a problem hiding this comment.
But I don't understand why precomputed should be in VALID_METRICS_SPARSE and hence why it's relevant tot his code path.
| if metric not in _metrics: | ||
| nn = neighbors.NearestNeighbors(n_neighbors=3, algorithm='auto', | ||
| metric=metric).fit(Xcsr) | ||
| if metric != "precomputed": |
There was a problem hiding this comment.
That case is not implemented: it would need to be treated a bit differently to the precomputed dense array case; and it doesn't really make sense. An array containing distances which is mostly 0s is pretty useless for nearest neighbors. (Unless we're admitting negative distances, or having a different interpretation of 0s, both of which may be useful.)
| if metric not in _metrics: | ||
| nn = neighbors.NearestNeighbors(n_neighbors=3, algorithm='auto', | ||
| metric=metric).fit(Xcsr) | ||
| if metric != "precomputed": |
There was a problem hiding this comment.
But I don't understand why precomputed should be in VALID_METRICS_SPARSE and hence why it's relevant tot his code path.
| assert_array_almost_equal(dist1, dist2) | ||
|
|
||
|
|
||
| def test_algo_auto_metrics(): |
There was a problem hiding this comment.
a better name or comment would be appreciated
doc/modules/neighbors.rst
Outdated
| ``'effective_metric_'`` is in the ``'VALID_METRICS'`` list of | ||
| ``'ball_tree'``. It selects ``'brute'`` if :math:`k < N/2` and the | ||
| ``'effective_metric_'`` is not in the ``'VALID_METRICS'`` list of | ||
| ``'kd_tree'`` or ``'ball_tree'``. It selects ``'brute'`` if :math:`k >= N/2`. This choice is based on the assumption that the number of query points is at least the |
There was a problem hiding this comment.
could you please keep this under 80 chars
|
Thanks for the review. The change has been done! |
| assert_array_almost_equal(dist1, dist2) | ||
|
|
||
|
|
||
| def test_unsupported_metric_for_auto(): |
There was a problem hiding this comment.
I don't know what you mean by unsupported. It's clearly not unsupported if you're testing it works.
| X = rng.rand(12, 12) | ||
| Xcsr = csr_matrix(X) | ||
|
|
||
| # Metric which don't required any additional parameter |
There was a problem hiding this comment.
I find the comment a bit misleading. I would rename the variable and say only test metrics that don't require additional arguments.
And maybe assert that some strange metric is in VALID_METRICS['brute'].
Checking that the test is non-empty, and more didactic variable name
|
LGTM, I made minor minor changes to fix my nitpicks, merge on green. |
|
Thank you! |
…ntinuation) (scikit-learn#9145) * Merge neighbors.rst * issue scikit-learn#4931 * Improve test implementation * Update base.py * Remove unused import * Customize test for precomputed metric * Change test function name * rename _metrics to require_params, add set assert Checking that the test is non-empty, and more didactic variable name * Remove blank line
…ntinuation) (scikit-learn#9145) * Merge neighbors.rst * issue scikit-learn#4931 * Improve test implementation * Update base.py * Remove unused import * Customize test for precomputed metric * Change test function name * rename _metrics to require_params, add set assert Checking that the test is non-empty, and more didactic variable name * Remove blank line
…ntinuation) (scikit-learn#9145) * Merge neighbors.rst * issue scikit-learn#4931 * Improve test implementation * Update base.py * Remove unused import * Customize test for precomputed metric * Change test function name * rename _metrics to require_params, add set assert Checking that the test is non-empty, and more didactic variable name * Remove blank line
…ntinuation) (scikit-learn#9145) * Merge neighbors.rst * issue scikit-learn#4931 * Improve test implementation * Update base.py * Remove unused import * Customize test for precomputed metric * Change test function name * rename _metrics to require_params, add set assert Checking that the test is non-empty, and more didactic variable name * Remove blank line
…ntinuation) (scikit-learn#9145) * Merge neighbors.rst * issue scikit-learn#4931 * Improve test implementation * Update base.py * Remove unused import * Customize test for precomputed metric * Change test function name * rename _metrics to require_params, add set assert Checking that the test is non-empty, and more didactic variable name * Remove blank line
…ntinuation) (scikit-learn#9145) * Merge neighbors.rst * issue scikit-learn#4931 * Improve test implementation * Update base.py * Remove unused import * Customize test for precomputed metric * Change test function name * rename _metrics to require_params, add set assert Checking that the test is non-empty, and more didactic variable name * Remove blank line
…ntinuation) (scikit-learn#9145) * Merge neighbors.rst * issue scikit-learn#4931 * Improve test implementation * Update base.py * Remove unused import * Customize test for precomputed metric * Change test function name * rename _metrics to require_params, add set assert Checking that the test is non-empty, and more didactic variable name * Remove blank line
Reference Issue
Fixes #4931, continuation of #5596
What does this implement/fix? Explain your changes.
Implement test for metric in
['mahalanobis', 'wminkowski', 'seuclidean']Any other comments?
Should we warn the user when the algorithm is set into
brute? (instead ofauto)