[MRG] support sample_weight in silhouette_score#4087
[MRG] support sample_weight in silhouette_score#4087jnothman wants to merge 3 commits intoscikit-learn:mainfrom
Conversation
ccf6cf9 to
e56d3af
Compare
There was a problem hiding this comment.
Can you maybe add the reference Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis again?
|
LGTM apart from an explanation of the strategy and possibly a reference. |
|
Btw, I don't know its value as a reference, but http://cs.au.dk/~simina/weighted.pdf intuits the notion of weighted clustering as I did (last para section 4), but I've not yet read on to see how they extend this to the real-valued case |
|
needs a rebase btw. |
|
rebased |
There was a problem hiding this comment.
This can also be viewed like,
for a given sample, if another sample belonging to the same cluster has a very high sample weight and is far away, it should reduce the silhouette score more
right?
And vice versa.
|
You need to Rebase. Again. |
There was a problem hiding this comment.
test_paper_example is not a great name for a test :P . Would be great to mention the name of the paper.
|
Is it possible #11135 should have closed this? I see the current version has |
|
@jnothman would you be able to give this a fresh update? If not, @StefanieSenger would you be able to take this over and update per our codebase these days? |
I sought
sample_weightinsilhouette_score, to account for multiple points that are merged into one when calculating average distances. Hacking it into the current implementation resulted in a very slow solution. Thus this PR also rewrites the implementation, yielding something that's a bit slower than the solution at master, but supportssample_weight.I've also added tests for correctness which I haven't otherwise found in the code.