resolve #4587 add inductive learning example#6478
resolve #4587 add inductive learning example#6478chiragnagpal wants to merge 1 commit intoscikit-learn:masterfrom
Conversation
|
See #4587. (github doesn't create links from PR headers. |
|
@jnothman this was your idea ;) |
jnothman
left a comment
There was a problem hiding this comment.
Apologies for the very slow review. I'm not yet sure this is persuasive. It might be worth thinking about whether there's a use case that can motivate it clearly.
This should be in examples/cluster/plot_inductive_learning.py
Thanks.
| @@ -0,0 +1,85 @@ | |||
| """ | |||
| ============================================== | |||
| Inductive Learning with Scikit Learn | |||
There was a problem hiding this comment.
I think you must mean 'inductive clustering'. "With scikit-learn" is redundant inside scikit-learn.
|
|
||
| Clustering is expensive, especially when our dataset contains millions of | ||
| datapoints. Recomputing the clusters everytime we receive some new data | ||
| is thus in many cases, intractable. With more data, there is also the |
There was a problem hiding this comment.
"With more data, "... this comment only really makes sense if you say something more explicit about acquiring more data from a noisier source than was used to build a clustering... no?
| some unsupervised learning algorithm and then fit a classifier on the | ||
| inferred targets, treating it as a supervised problem. This is known as | ||
| Transductive learning. | ||
|
|
| One solution to this problem, is to first infer the target classes using | ||
| some unsupervised learning algorithm and then fit a classifier on the | ||
| inferred targets, treating it as a supervised problem. This is known as | ||
| Transductive learning. |
There was a problem hiding this comment.
No, this is certainly not transductive. Whether we're using "inductive" correctly is another matter.
|
|
||
| n_samples = 5000 | ||
|
|
||
| colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk']) |
There was a problem hiding this comment.
this is just np.array(list('bgrcmyk' * 4))?
| plt.scatter(X[:, 0], X[:, 1], color="black", s=2) | ||
| plt.show() | ||
|
|
||
| from sklearn import svm |
|
|
||
| colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk']) | ||
|
|
||
| blobs = datasets.make_blobs(n_samples=3*n_samples, random_state=8) |
| # Inferring class on a new random dataset | ||
| X_new = StandardScaler().fit_transform(np.random.rand(n_samples*2,2)) | ||
| y_pred = inductiveLearner.predict(X_new) | ||
| plt.scatter(X_new[:, 0], X_new[:, 1], color=colors[y_pred].tolist(), s=5) |
There was a problem hiding this comment.
This overlay doesn't really work if black is one of the plotted colours.
You need more clear titling/description of the plot.
|
@jnothman what is the intension here? Do we need to provide just an example of inductive inference on cluster labels or to create a meta-estimator? |
|
I don't mind examples containing new meta-estimators... |
|
Closed by #10852 |
I added an example that uses a synthetic dataset (blobs + random), anduses dbscan to infer labels, and and then fits SVM over it, to infer labels on other data