DOC improve the cost-sensitive learning example by ogrisel · Pull Request #29149 · scikit-learn/scikit-learn

ogrisel · 2024-05-31T10:28:10Z

Here is a summary of the proposed changes:

better names for metrics to avoid confusion between gains and costs;
fixed the definition of the variable cost metric for the fraud transaction case to better align with the example in Elkan's paper;
removed any reference to balanced accuracy as the example is already long and I found it was mostly a distraction away from the important message on the business metrics;
use the new prefit=True option of FixedThresholdClassifier.

/cc @glemaitre @lorentzenchr.

github-actions · 2024-05-31T10:29:21Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 5382a42. Link to the linter CI: here}

glemaitre

It looks good to me. I think that @lorentzenchr will be happy to remove the balanced-accuracy mention :)

examples/model_selection/plot_cost_sensitive_learning.py

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

ogrisel · 2024-05-31T15:08:52Z

@lorentzenchr thanks for the review, I think I addressed all your feedback.

adrinjalali · 2024-06-03T11:32:58Z

examples/model_selection/plot_cost_sensitive_learning.py

-    fraudulent_refuse = (mask_true_positive.sum() * 50) + amount[
-        mask_true_positive
-    ].sum()


Not sure why the amount here is removed, when I gave a talk on this, this was a nice point to make.

Because you don't gain the amount when rejecting a fraudulent case :). Indeed, you are just not loosing it.

The amount itself only contribute when accepting the claim by taking a proportional amount.

Yes, but we have the false negative counterpart. Removing this puts a massive pressure on making sure we have no false negatives, and will somewhat be okay with true positives being low, since the amounts are usually a lot larger than 50€.

The updated PR puts our example cost matrix better in line with the one proposed in Elkan 2001 paper on cost-sensitive learning (the fixed costs and gains are not the same but the variable components are compatible). Here is an excerpt of the relevant paragraph (part of section 1.2):

I don't understand why we would put a gain that is proportional to amount in the fraudulent_refuse case. If you catch a frauder, nobody will pay the bank the amount of the transaction the frauder would have otherwise stolen.

Fair enough. Would be nice to have this chart in the example actually, makes things quite clear.

I'll keep that in mind for a future iteration on this example.

DOC improve the cost-sensitive learning example

2c4cf73

github-actions bot added the Documentation label May 31, 2024

ogrisel mentioned this pull request May 31, 2024

WIP Elkan optimal variable threshold decision making #29150

Draft

glemaitre approved these changes May 31, 2024

View reviewed changes

lorentzenchr approved these changes May 31, 2024

View reviewed changes

ogrisel commented May 31, 2024

View reviewed changes

examples/model_selection/plot_cost_sensitive_learning.py Outdated Show resolved Hide resolved

ogrisel and others added 2 commits May 31, 2024 17:02

Apply suggestions from code review

f915441

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

More phrasing fixes

5382a42

lorentzenchr merged commit 5d92c35 into scikit-learn:main Jun 3, 2024

ogrisel deleted the improve-cost-sensitive-learning-example branch June 3, 2024 07:38

adrinjalali reviewed Jun 3, 2024

View reviewed changes

jeremiedbb mentioned this pull request Jul 2, 2024

Release 1.5.1 #29382

Merged

11 tasks

StefanieSenger mentioned this pull request Mar 21, 2025

DOC add link to plot_semi_supervised_newsgroups.py example in semi_supervised.rst #30882

Closed

Uh oh!

Conversation

ogrisel commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented May 31, 2024

Uh oh!

adrinjalali Jun 3, 2024

Choose a reason for hiding this comment

Uh oh!

glemaitre Jun 3, 2024

Choose a reason for hiding this comment

Uh oh!

adrinjalali Jun 3, 2024

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 3, 2024

Choose a reason for hiding this comment

Uh oh!

adrinjalali Jun 3, 2024

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 5, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ogrisel commented May 31, 2024 •

edited

Loading

github-actions bot commented May 31, 2024 •

edited

Loading