[MRG] Make OPTICS support sparse input#14736
[MRG] Make OPTICS support sparse input#14736huntzhan wants to merge 6 commits intoscikit-learn:mainfrom
Conversation
|
Please add a test
…On Fri., 23 Aug. 2019, 2:01 pm Hunt Zhan, ***@***.***> wrote:
Reference Issues/PRs
See #12028 (comment)
<#12028 (comment)>,
#11982 <#11982>
What does this implement/fix? Explain your changes.
Make OPTICS.fit(self, X, y=None) support the sparse matrix X.
Any other comments?
------------------------------
You can view, comment on, or merge this pull request online at:
#14736
Commit Summary
- Try csr support.
File Changes
- *M* sklearn/cluster/optics_.py
<https://github.com/scikit-learn/scikit-learn/pull/14736/files#diff-0>
(2)
Patch Links:
- https://github.com/scikit-learn/scikit-learn/pull/14736.patch
- https://github.com/scikit-learn/scikit-learn/pull/14736.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#14736?email_source=notifications&email_token=AAATH23GHNSYA2K22NK3TFLQF5OKVA5CNFSM4IO3NUUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HG6DQ4Q>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAATH2YURNDHD2PGE4J5UQ3QF5OKVANCNFSM4IO3NUUA>
.
|
|
Sure. BTW, simply adding accept_sparse won't work. I'm looking into it. |
| # in the dict params | ||
| _params['p'] = p | ||
| dists = pairwise_distances(P, np.take(X, unproc, axis=0), | ||
| dists = pairwise_distances(P, X[unproc], |
There was a problem hiding this comment.
np.take doesn't support the sparse matrix
sklearn/cluster/optics_.py
Outdated
| """ | ||
|
|
||
| def __init__(self, min_samples=5, max_eps=np.inf, metric='minkowski', p=2, | ||
| def __init__(self, min_samples=5, max_eps=np.inf, metric='euclidean', p=2, |
There was a problem hiding this comment.
The default metric minkowski doesn't accept the sparse matrix.
There was a problem hiding this comment.
But this won't work if the user changes p. So I'd rather reinterpret the metric as euclidean below when using the metric, rather than changing the default
There was a problem hiding this comment.
Are you able to add commits to resolve it?
There was a problem hiding this comment.
Yeah, I will resolve this issue and add test cases this week.
| from ..neighbors import NearestNeighbors | ||
| from ..base import BaseEstimator, ClusterMixin | ||
| from ..metrics import pairwise_distances | ||
| from ..metrics.pairwise import PAIRWISE_DISTANCE_FUNCTIONS |
There was a problem hiding this comment.
Import directly without exposing as an interface.
| """ | ||
| X = check_array(X, dtype=np.float) | ||
| # TODO: Support the sparse input for metric = 'precopmuted'. | ||
| if self.metric != 'precomputed' \ |
There was a problem hiding this comment.
Will have a separated PR for this.
adrinjalali
left a comment
There was a problem hiding this comment.
Not a complete review yet, but looks good.
| X : array, shape (n_samples, n_features), or (n_samples, n_samples) \ | ||
| if metric=’precomputed’. | ||
| if metric=’precomputed’, or sparse matrix (n_samples, n_features) if metric | ||
| in ['cityblock', 'cosine', 'euclidean', 'haversine', 'l2', 'l1', |
There was a problem hiding this comment.
Please keep the first line of the X's description short and as usual syntax, and put the information in the comments of the parameter.
| # TODO: Support the sparse input for metric = 'precopmuted'. | ||
| if self.metric != 'precomputed' \ | ||
| and self.metric in PAIRWISE_DISTANCE_FUNCTIONS: | ||
| X = check_array(X, accept_sparse='csr') |
| clust = OPTICS(min_samples=3, min_cluster_size=2, | ||
| max_eps=20, cluster_method='xi', | ||
| xi=0.4, metric='euclidean').fit(sparse.lil_matrix(X)) | ||
| assert_array_equal(clust.labels_, expected_labels) |
There was a problem hiding this comment.
I think we can parameterize the test for these ones and refactor them a bit
|
It would be nice to get OPTICS to work with sparse precomputed distance matrices. The OPTICS documentation says:
But if your distance matrix doesn't fit in memory as a dense matrix you can't even use OPTICS, whereas that nicely works for DBSCAN. |
|
I see this is stalled for quite a while; I'm continuing work on this on #20802 |
Reference Issues/PRs
See #12028 (comment), #11982
What does this implement/fix? Explain your changes.
Make
OPTICS.fit(self, X, y=None)support the sparse matrixX.Any other comments?