Skip to content

[MRG] support sample_weight in silhouette_score#4087

Open
jnothman wants to merge 3 commits intoscikit-learn:mainfrom
jnothman:silhouette
Open

[MRG] support sample_weight in silhouette_score#4087
jnothman wants to merge 3 commits intoscikit-learn:mainfrom
jnothman:silhouette

Conversation

@jnothman
Copy link
Copy Markdown
Member

I sought sample_weight in silhouette_score, to account for multiple points that are merged into one when calculating average distances. Hacking it into the current implementation resulted in a very slow solution. Thus this PR also rewrites the implementation, yielding something that's a bit slower than the solution at master, but supports sample_weight.

I've also added tests for correctness which I haven't otherwise found in the code.

@jnothman jnothman force-pushed the silhouette branch 2 times, most recently from ccf6cf9 to e56d3af Compare January 12, 2015 13:52
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe add the reference Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis again?

@amueller
Copy link
Copy Markdown
Member

LGTM apart from an explanation of the strategy and possibly a reference.

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Jun 3, 2015

Btw, I don't know its value as a reference, but http://cs.au.dk/~simina/weighted.pdf intuits the notion of weighted clustering as I did (last para section 4), but I've not yet read on to see how they extend this to the real-valued case

@amueller
Copy link
Copy Markdown
Member

amueller commented Jun 3, 2015

needs a rebase btw.

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Jun 6, 2015

rebased

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be viewed like,

for a given sample, if another sample belonging to the same cluster has a very high sample weight and is far away, it should reduce the silhouette score more
right?
And vice versa.

@MechCoder
Copy link
Copy Markdown
Member

You need to Rebase.

Again.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_paper_example is not a great name for a test :P . Would be great to mention the name of the paper.

@haiatn
Copy link
Copy Markdown
Contributor

haiatn commented Aug 26, 2023

Is it possible #11135 should have closed this? I see the current version has D_chunk that calculates sample_weights, although I must say I am familiar with the algorithm. I also see the paper example test is in the main branch

@adrinjalali
Copy link
Copy Markdown
Member

@jnothman would you be able to give this a fresh update?

If not, @StefanieSenger would you be able to take this over and update per our codebase these days?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants