[MRG + 1] Implement haversine metric in pairwise#4458
[MRG + 1] Implement haversine metric in pairwise#4458xuewei4d wants to merge 1 commit intoscikit-learn:masterfrom
Conversation
3dba895 to
c9a02bd
Compare
|
You could add a test here :) |
sklearn/metrics/pairwise.py
Outdated
There was a problem hiding this comment.
Ah sorry keep the y uppercased...
(X, Y=None, ) --> (X, Y=None)
|
@ragv I looked into wrong test folders :( Besides, the testing in nbrs = cls(algorithm='ball_tree', metric='haversine')
assert_raises(ValueError,
nbrs.predict,
X)
assert_raises(ValueError,
ignore_warnings(nbrs.fit),
Xsparse, y)Is there any reason that haversine metric cannot be used with ball_tree? @jakevdp ? |
sklearn/metrics/pairwise.py
Outdated
There was a problem hiding this comment.
shape information can go in the first line itself... More description, if available, can follow next :) Like here.
You haven't validated the data like done here for |
|
@ragv I don't think |
|
I suggested input (
I think since previously |
|
There is an unrelated test failure in Python 3.4 alone... :-/ Others seem to pass... Probably temporary server unavailability... |
|
Any review on this PR @amueller ? Should I need to check the data is in the domain of cos, sin? |
|
You meant scipy right? :p |
|
Also in your PR description change fix to fixes so that github can automatically close that issue when this PR gets merged :) |
|
Hey @xuewei4d. Sorry I was sick over the weekend and just catching up. It is fine that |
|
You should add the metric to |
|
Does the tree also raise an exception if the data is not 2d? Maybe that should be tested in test_neighbors_badargs? |
|
restarted. |
|
Yes, haversine distances are defined only in two dimensions. See https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/dist_metrics.pyx#L955 |
|
Just out of curiosity, could you check the performance of this algorithm vs. using the In [1]: from sklearn.neighbors import DistanceMetric
In [2]: import numpy as np
In [3]: X = np.random.random((100, 2))
In [20]: metric = DistanceMetric.get_metric('haversine')
In [21]: dist = metric.pairwise(X, X)The difference is that this doesn't do any broadcasting, so it may be faster and/or more memory efficient. Note that the |
|
I wasn't really aware of the DistanceMetric interface. Shame on me :-/ could we also use that for the jaccard? |
|
Yes, it would work for Jaccard! Actually, it will work for any of the metrics supported by BallTree. For pairwise distances, it tends to be a factor of a few slower than the equivalent scipy routines, though (if I remember correctly; I did the benchmarks a long time ago). |
|
Ok we should check for jaccard. |
|
what was the status here? Just waiting for another review? |
|
I think I need to do some benchmark. |
|
@xuewei4d Would you have some time to finish this up? |
|
Hi all, How can I finish this? |
|
Could you please benchmark how fast this is compared to an implementation based on Distance metric as suggested by jakevdp above? |
|
Closing due to being superseded by #12568 |
Fixes #4453 #4452 , add haversine distrance for pairwise_distances and document it.
I did not find any test forpairwise.py, so I did not add any test cases.