DOC Rework Decision boundary of semi-supervised methods example by ArturoAmorQ · Pull Request #32024 · scikit-learn/scikit-learn

ArturoAmorQ · 2025-08-27T09:56:20Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Related to the series of examples I am reworking, this PR:

General clean up (some inline comments didn't even hold);
Implements notebook-tutorial style;
Uses DecisionBoundaryDisplay instead of hard-coding the decision boundary;
Plots predict_proba instead of hard predictions;
Changes the proportions of labeled data to better demonstrate that a few labeled points suffice;
Adds interpretation to those plots;
Adds section to explain how predict_proba works for both methods.

Any other comments?

I am aware that removing this example was suggested in #31499 (comment), but I think it can still provide value in terms of visualizing probabilities, which is something that cannot be done in Semi-supervised Classification on a Text Dataset, as argued in said discussion.

github-actions · 2025-08-27T09:57:22Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 5dc9216. Link to the linter CI: here}

StefanieSenger

Thank you for this awesome enhancement of the example, @ArturoAmorQ! 💟

I really learned a lot while going through it and especially the section on " predict_proba in LabelSpreading" clicked much better with me than what we have in the user guide.

Maybe the last paragraph where LabelSpreading and SelfTrainingClassifier are compared, we could also mention that in LabelSpreading, predictions (including predict_proba) depend on the training set (that keeps being stored), whereas with SelfTrainingClassifier (and SVC as a base estimator) the model learns a decision rule that exists independently of the training data after fitting (and is thus more abstract/better generalisable?).

Apart from that I have only commented on some nits.

examples/semi_supervised/plot_semi_supervised_versus_svm_iris.py

StefanieSenger · 2025-09-12T08:47:28Z

examples/semi_supervised/plot_semi_supervised_versus_svm_iris.py

-    plt.title(title)
-
-plt.suptitle("Unlabeled points are colored white", y=0.1)
+rbf_svc = (base_classifier.fit(X, y), y, "SVC with rbf kernel (100% labeled data)")


Or maybe:

Suggested change

rbf_svc = (base_classifier.fit(X, y), y, "SVC with rbf kernel (100% labeled data)")

rbf_svc = (base_classifier.fit(X, y), y, "Self-training with 100% labeled data (equivalent to SVC with rbf kernel)")

It's the other way around, right?
"SVC with rbf kernel and 100% labeled data (equivalent to self-training with no unlabeled points left)"

Both is correct. In terms of guiding the user's attention, I would mention Self-training first, because this way it's easier to see that we're not breaking the pattern in the 3x2 table.
If you chose to stay with your suggestion: Maybe not mention the 100% labeled data for the SVC because it cannot use less than that and its clearer to mention the 100% with the Self-training only.

How about the wording in 035d443?

Sure, that's fine. :)

examples/semi_supervised/plot_semi_supervised_versus_svm_iris.py

Co-authored-by: Stefanie Senger <91849487+StefanieSenger@users.noreply.github.com>

virchan

Thanks for the PR, @ArturoAmorQ!

I have a few minor suggestions about using the :class: role more consistently throughout the example.

Otherwise, LGTM and is ready to merge!

examples/semi_supervised/plot_semi_supervised_versus_svm_iris.py

Co-authored-by: Virgil Chan <virchan.math@gmail.com>

virchan · 2025-10-03T23:35:43Z

@StefanieSenger, could we merge this?

StefanieSenger · 2025-10-04T05:34:19Z

I went through the changes again and I think it's all fine, except I found a problem with the legend. In the rendered docs it appeared outside of the figure:

I fixed it and now it shows up.

I could not make a suggestion that surpasses several lines on the github files tab (strange bug) and I just commited the change directly. If it looks fine on the CI, this can be merged.

DOC Rework Decision boundary of semi-supervised methods example

e4ee8c5

github-actions bot added the Documentation label Aug 27, 2025

ArturoAmorQ added 4 commits August 27, 2025 18:18

Notation tweaks

28d33a5

Add discussion on predict_proba

88c2055

Format tweaks

308e8b7

Iter format

b3dc7d4

StefanieSenger mentioned this pull request Sep 9, 2025

DOC Removed examples/semi_supervised/plot_semi_supervised_versus_svm_iris.py #31625

Closed

Fix markdown issue

b0f5d04

StefanieSenger approved these changes Sep 12, 2025

View reviewed changes

ArturoAmorQ and others added 5 commits September 15, 2025 10:20

Apply suggestions from code review

18a8f74

Co-authored-by: Stefanie Senger <91849487+StefanieSenger@users.noreply.github.com>

Use legend to explain colors in scatterplot

ef924ce

Avoid contradicting term

36d13fc

Expand conclusions as per Stefanie's review

7e9cdbd

Rewording as per Stefanie's suggestion

035d443

StefanieSenger added the Waiting for Second Reviewer First reviewer is done, need a second one! label Sep 15, 2025

virchan approved these changes Sep 30, 2025

View reviewed changes

examples/semi_supervised/plot_semi_supervised_versus_svm_iris.py Outdated Show resolved Hide resolved

examples/semi_supervised/plot_semi_supervised_versus_svm_iris.py Outdated Show resolved Hide resolved

Apply suggestions from code review

cea7d12

Co-authored-by: Virgil Chan <virchan.math@gmail.com>

fix legend position

5dc9216

StefanieSenger merged commit 05b031c into scikit-learn:main Oct 5, 2025
36 checks passed

ArturoAmorQ deleted the rework_semi_supervised branch October 6, 2025 07:40

	rbf_svc = (base_classifier.fit(X, y), y, "SVC with rbf kernel (100% labeled data)")
	rbf_svc = (base_classifier.fit(X, y), y, "Self-training with 100% labeled data (equivalent to SVC with rbf kernel)")

Uh oh!

Conversation

ArturoAmorQ commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

StefanieSenger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StefanieSenger Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

ArturoAmorQ Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

ArturoAmorQ Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

virchan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

virchan commented Oct 3, 2025

Uh oh!

StefanieSenger commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArturoAmorQ commented Aug 27, 2025 •

edited

Loading

github-actions bot commented Aug 27, 2025 •

edited

Loading

StefanieSenger commented Oct 4, 2025 •

edited

Loading