[MRG] Add an example of inductive clustering by chkoar · Pull Request #10852 · scikit-learn/scikit-learn

chkoar · 2018-03-21T23:41:20Z

Reference Issues/PRs

Resolves #4587. Continues #6478.

What does this implement/fix? Explain your changes.

This PR adds an example about how to implement and perform inductive inference on cluster memberships by using a classifier that is trained on cluster labels.

chkoar · 2018-03-21T23:41:28Z

ping @jnothman

qinhanmin2014

Please try to make Circle CI green first (you might refer to Circle CI log and other examples).

qinhanmin2014 · 2018-03-22T14:11:07Z

examples/cluster/plot_inductive_learning.py

+"""
+==============================================
+Inductive Clustering
+==============================================


A blank line here I guess?

Yeap, it might be that.

TomDLT

Thanks for this example.
I just have some nitpicks, especially concerning the bad plot rendering.

Once nitpicks are addressed, I am +1 to merge this

TomDLT · 2018-04-13T15:22:08Z

examples/cluster/plot_inductive_clustering.py

+        self.classifier_.fit(X, y)
+        return self
+
+    @if_delegate_has_method(delegate='classifier')


should be with an underscore: @if_delegate_has_method(delegate='classifier_')

TomDLT · 2018-04-13T15:22:13Z

examples/cluster/plot_inductive_clustering.py

+    def predict(self, X):
+        return self.classifier_.predict(X)
+
+    @if_delegate_has_method(delegate='classifier')


TomDLT · 2018-04-13T15:22:46Z

examples/cluster/plot_inductive_clustering.py

+        return self.classifier_.decision_function(X)
+
+
+def plot_scatter(X,  color, alpha=0.5):


remove double whitespace

That was on purpose. Shouldn't we have double space between two functions as per PEP8?

TomDLT · 2018-04-13T15:25:30Z

examples/cluster/plot_inductive_clustering.py

+plt.subplot(133)
+plot_scatter(X, cluster_labels)
+plot_scatter(X_new, probable_clusters)
+plt.title("Inductive inference on cluster membership \n"


The title is too long for the plot.
Please make sure the final plot is readable with the chosen figure size.

I shorted the titles

TomDLT · 2018-04-13T15:27:04Z

examples/cluster/plot_inductive_clustering.py

+clusterer = AgglomerativeClustering(n_clusters=3)
+cluster_labels = clusterer.fit_predict(X)
+
+plt.subplot(131)


Please add before this line plt.figure(figsize=(12, 4)) to specify a figure size.
(and make sure the size is good)

It did the trick!

chkoar · 2018-04-15T15:38:58Z

@TomDLT thanks for the feedback. The rendering of the plot I believe that it seems ok now. The titles of the plots could be better, though.

jnothman · 2018-04-16T00:12:28Z

examples/cluster/plot_inductive_clustering.py

+# Declare the inductive learning model that it will be used to
+# predict cluster membership for unknown instances
+classifier = RandomForestClassifier(random_state=RANDOM_STATE)
+inductiveLearner = InductiveClusterer(clusterer, classifier).fit(X)


Please use underscores, not camelCase for local variables

jnothman · 2018-04-16T00:15:26Z

examples/cluster/plot_inductive_clustering.py

+
+
+# Generate new samples and plot them along with the original dataset
+X_new, y_new = make_blobs(n_samples=10,


Hmm... Should we be drawing samples from a completely different distribution, rather than drawing a test set from the same generation procedure (or even real-world data)?

I don't have a strong opinion about that. I think that the intention of the example is clearly provided. If you want something more specific I am all ears.

jnothman · 2018-04-16T02:40:45Z

examples/cluster/plot_inductive_clustering.py

+Clustering is expensive, especially when our dataset contains millions of
+datapoints. Recomputing the clusters everytime we receive some new data
+is thus in many cases, intractable. With more data, there is also the
+possibility of degrading the previous clustering.


I think there is less of an issue with degrading, than with identifying the clusters across two clusterings.

For that reason and others, this kind of technique is interesting regardless of the size of the dataset. An algorithm like agglomerative clustering or dbscan makes no hypothesis about how to divide the data in terms of features. Learning a classifier may also help us make inferences about the nature of the clustering. For this reason, I think we should aim to plot the decision boundary in the plot below

For that reason and others, this kind of technique is interesting regardless of the size of the dataset.

@jnothman I agree. I kept the docstring from the original PR.

An algorithm like agglomerative clustering or dbscan makes no hypothesis about how to divide the data in terms of features. Learning a classifier may also help us make inferences about the nature of the clustering. For this reason, I think we should aim to plot the decision boundary in the plot below

I agree again. I have plotted the decision regions in the third plot. What do you think?

The decision regions are helpful. Please also update the description to better match real use cases.

jnothman · 2018-06-20T09:09:32Z

One problem we have here is that if_delegate_has_method is a bit obscure, and currently is not hyperlinked to its documentation because no such documentation page exists. If we use if_delegate_has_method in the example, we must also make sure it's present in doc/modules/classes.rst.

Broadly, I like this example, I just wish it was clearer on the inferential value of such an approach, not merely its application to new data.

jnothman · 2019-01-17T10:21:22Z

I've decided this is a nice example of both the technique and meta-estimator design, and would like to merge it.

chkoar · 2019-01-17T10:24:28Z

@jnothman great!

jnothman · 2019-01-17T11:16:33Z

Thanks @chkoar! Nice to clear out some cobwebs.

This reverts commit 534090c.

Add example of inductive clustering.

2ff7a38

qinhanmin2014 reviewed Mar 22, 2018

View reviewed changes

make CircleCI happy again

3bdaacb

chkoar changed the title ~~Add an example of inductive clustering~~ [MRG] Add an example of inductive clustering Apr 2, 2018

TomDLT approved these changes Apr 13, 2018

View reviewed changes

Address TomDLT comments

a7e0f36

jnothman reviewed Apr 16, 2018

View reviewed changes

Addressing comments iteration

d9507b9

jnothman mentioned this pull request Sep 26, 2018

What about Gaussian Mixture Regression? #6073

Closed

jnothman added 2 commits January 17, 2019 21:18

More explanation

7ae89c8

Add if_delegate_has_method to utils in classes.rst

090bbfe

jnothman approved these changes Jan 17, 2019

View reviewed changes

jnothman merged commit d2a77d7 into scikit-learn:master Jan 17, 2019

jnothman mentioned this pull request Jan 17, 2019

resolve #4587 add inductive learning example #6478

Closed

thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 7, 2019

DOC Add an example of inductive clustering (scikit-learn#10852)

bd396bf

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

DOC Add an example of inductive clustering (scikit-learn#10852)

534090c

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "DOC Add an example of inductive clustering (scikit-learn#10852)"

ecf10cb

This reverts commit 534090c.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "DOC Add an example of inductive clustering (scikit-learn#10852)"

2d8c223

This reverts commit 534090c.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

DOC Add an example of inductive clustering (scikit-learn#10852)

fab1f87

chkoar mentioned this pull request Aug 10, 2021

Support for regression by classification #15850

Closed

		return self.classifier_.decision_function(X)


		def plot_scatter(X, color, alpha=0.5):



		# Generate new samples and plot them along with the original dataset
		X_new, y_new = make_blobs(n_samples=10,

Uh oh!

Conversation

chkoar commented Mar 21, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

chkoar commented Mar 21, 2018

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomDLT left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chkoar commented Apr 15, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jun 20, 2018

Uh oh!

jnothman commented Jan 17, 2019

Uh oh!

chkoar commented Jan 17, 2019

Uh oh!

jnothman commented Jan 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TomDLT left a comment •

edited

Loading