DOC improve the cost-sensitive learning example#29149
DOC improve the cost-sensitive learning example#29149lorentzenchr merged 3 commits intoscikit-learn:mainfrom
Conversation
glemaitre
left a comment
There was a problem hiding this comment.
It looks good to me. I think that @lorentzenchr will be happy to remove the balanced-accuracy mention :)
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
|
@lorentzenchr thanks for the review, I think I addressed all your feedback. |
| fraudulent_refuse = (mask_true_positive.sum() * 50) + amount[ | ||
| mask_true_positive | ||
| ].sum() |
There was a problem hiding this comment.
Not sure why the amount here is removed, when I gave a talk on this, this was a nice point to make.
There was a problem hiding this comment.
Because you don't gain the amount when rejecting a fraudulent case :). Indeed, you are just not loosing it.
The amount itself only contribute when accepting the claim by taking a proportional amount.
There was a problem hiding this comment.
Yes, but we have the false negative counterpart. Removing this puts a massive pressure on making sure we have no false negatives, and will somewhat be okay with true positives being low, since the amounts are usually a lot larger than 50€.
There was a problem hiding this comment.
The updated PR puts our example cost matrix better in line with the one proposed in Elkan 2001 paper on cost-sensitive learning (the fixed costs and gains are not the same but the variable components are compatible). Here is an excerpt of the relevant paragraph (part of section 1.2):
I don't understand why we would put a gain that is proportional to amount in the fraudulent_refuse case. If you catch a frauder, nobody will pay the bank the amount of the transaction the frauder would have otherwise stolen.
There was a problem hiding this comment.
Fair enough. Would be nice to have this chart in the example actually, makes things quite clear.
There was a problem hiding this comment.
I'll keep that in mind for a future iteration on this example.

Here is a summary of the proposed changes:
prefit=Trueoption ofFixedThresholdClassifier./cc @glemaitre @lorentzenchr.