Skip to content

[MRG] Add calibration loss metric in classification#12479

Closed
aishgrt1 wants to merge 14 commits intoscikit-learn:masterfrom
aishgrt1:calibration-loss
Closed

[MRG] Add calibration loss metric in classification#12479
aishgrt1 wants to merge 14 commits intoscikit-learn:masterfrom
aishgrt1:calibration-loss

Conversation

@aishgrt1
Copy link
Copy Markdown
Contributor

Reference Issues/PRs

#10971

What does this implement/fix? Explain your changes.

Added calibration loss metric for classification

Any other comments?

y_prob : array, shape (n_samples,)
Probabilities of the positive class.
bin_size : int
Size of the bin (samples) analysed in one iteration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion is better say: Size of the bin (samples) analysed on each iteration

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"""
pos_loss = 0.0
neg_loss = 0.0
for bin_start in range(0, len(y_true) - bin_size + 1):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't use just range(len(y_true) - bin_size +1)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pos_loss /= (len(y_true) - bin_size + 1)
neg_loss /= (len(y_true) - bin_size + 1)
loss = (0.5) * (pos_loss + neg_loss)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would better save the len(y_true) in a variable to don't call len() some times

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- actual_per_pos_class).sum()
pos_loss += bin_error_pos
actual_per_neg_class = (bin_size - y_true[bin_start:bin_end]
.sum()) / bin_size
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If bin_size is negative this will raise a exception. Maybe have to add a control before.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am getting this error:
AssertionError: Docstring Error: sklearn.metrics.classification.calibration_loss arg mismatch:

Probabilities of the positive class.
bin_size : int
Size of the bin (samples) analysed in each iteration
Returns
Copy link
Copy Markdown
Contributor

@eamanu eamanu Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to the let a newline below returns and try to --- match the length of Return

Suggested change
Returns
Returns
-------

-------
score : float
Calibration loss
Examples
Copy link
Copy Markdown
Contributor

@eamanu eamanu Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same

Suggested change
Examples
Examples
--------

----------
y_true : array, shape (n_samples,)
True targets.
y_prob : array, shape (n_samples,)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
y_prob : array, shape (n_samples,)
y_prob : array, shape (n_samples,)

True targets.
y_prob : array, shape (n_samples,)
Probabilities of the positive class.
bin_size : int
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bin_size : int
bin_size : int

@eamanu
Copy link
Copy Markdown
Contributor

eamanu commented Oct 30, 2018

If you apply my comments this solve the Travis error

@aishgrt1
Copy link
Copy Markdown
Contributor Author

@eamanu Done!!

Copy link
Copy Markdown
Contributor

@eamanu eamanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please, edit the PR to add the [MRG] tag.

@aishgrt1 aishgrt1 changed the title Add calibration loss metric in classification [MRG] Add calibration loss metric in classification Oct 30, 2018
@aishgrt1
Copy link
Copy Markdown
Contributor Author

I need to add my name to the contributors list right?

@jnothman
Copy link
Copy Markdown
Member

I need to add my name to the contributors list right?

The changelog entry is sufficient, but it should be in v0.21.rst, not v0.20.rst

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not doing a full review yet. This needs to be in doc/modules/{model_evaluation,classes}.rst

@amueller
Copy link
Copy Markdown
Member

this needs to add
https://www.math.ucdavis.edu/~saito/data/roc/ferri-class-perf-metrics.pdf
and maybe references therein.
Maybe we should add that one to the metric docs just somewhere in general?

@amueller
Copy link
Copy Markdown
Member

There's an interesting discussion of debiasing the calibration error in https://arxiv.org/pdf/1909.10155.pdf

That's a current NeurIPS paper but the method they are discussing is actually already established, so it might be a good candidate. cc @thomasjpfan who has shown interest.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Nov 9, 2020

I just realized that #11096 implements the same thing and additional variations and has a more complete documentation.

Sorry @aishgrt1 for our failure to review your work properly on time. Thanks again for your contribution.

@ogrisel ogrisel closed this Nov 9, 2020
@ogrisel ogrisel added Superseded PR has been replace by a newer PR and removed Stalled labels Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:metrics Superseded PR has been replace by a newer PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants