[MRG+2] Fix sparse matrix handling in clustering#4052
[MRG+2] Fix sparse matrix handling in clustering#4052amueller merged 1 commit intoscikit-learn:masterfrom
Conversation
|
@jnothman |
77a4e13 to
de91554
Compare
|
Huh? Oh, perhaps that's the only part of the implementation I didn't test! :s |
|
You can never test enough ;) |
|
No, had nothing to do with parts I didn't test. It's an edge case. That scipy.sparse doesn't seem to handle well... Until recently there was no support for sparse matrices with a zero-dimension. Apart from that, it seems |
|
Meh :-/ Do you want to look into it / do a fix? I'm not super familiar with the DBSCAN code. |
sklearn/utils/estimator_checks.py
Outdated
|
more tests with less code :D |
|
Regrding the edge case of slicing with an empty index array: this is unsupported in scipy 0.13, but is fixed in master. So perhaps the solution is to special-case it. If we land up with no core samples, create a dense array of the correct shape (but with no data)! Or, could special-case it only if scipy raises an error. |
|
maybe just special-case it all the time and comment saying it is scipy backward compatibility? |
9925a98 to
c71ebb2
Compare
|
fixed the dbscan issue. |
ae632fc to
75f2f5e
Compare
sklearn/cluster/tests/test_dbscan.py
Outdated
There was a problem hiding this comment.
(Maybe that's where the missing s went from a very different patch)
There was a problem hiding this comment.
This comment has not been addressed.
There was a problem hiding this comment.
Damn. I forgot to push at some point or a rebase messed it up :-/
|
Except for |
|
The |
|
But I can also ditch the other one and we just use this. |
|
the problem is that I used #4058 elsewhere, too. |
04a4364 to
2345d23
Compare
|
Ah sorry, I'd forgotten this was on top of #4058 On 12 January 2015 at 05:45, Andreas Mueller notifications@github.com
|
2345d23 to
3e3fdd3
Compare
|
done |
3e3fdd3 to
93ed5d8
Compare
sklearn/cluster/tests/test_dbscan.py
Outdated
There was a problem hiding this comment.
I assume that all samples are clustered as outliers in that case? It would be great to add an assertion to check that here, first by asserting that core_sample_indices_ has shape (0,) and second that labels_ is filed with -1s.
|
Aside from this last minor comment, +1 on my side as well. |
Make sparsity check check everything. don't test everything. That would be nice but is out of scope :-/ catch special case of no core samples in DBSCAN add nonregression test for sparse dbscan with no core samples.
93ed5d8 to
494b8e5
Compare
|
will merge if travis agrees with me. |
[MRG+2] Fix sparse matrix handling in clustering
See #4051.
Needs fixing: