FEA add CumulativeAccuracyDisplay#28752
FEA add CumulativeAccuracyDisplay#28752JosephBARBIERDARNAL wants to merge 5 commits intoscikit-learn:mainfrom JosephBARBIERDARNAL:cap-curve
Conversation
glemaitre
left a comment
There was a problem hiding this comment.
Thanks for the PR.
On a first pass, could you fix the linter CI. I would advise to directly use pre-commit. It would corresponds to the optional step 10 of the guideline(https://scikit-learn.org/dev/developers/contributing.html#how-to-contribute). It is optional but it makes life easier :)
ogrisel
left a comment
There was a problem hiding this comment.
Thanks very much for the PR. Here is quick first pass of reviews.
sklearn/metrics/_plot/cap_curve.py
Outdated
|
|
||
| CumulativeAccuracyDisplay.from_predictions(y_test, y_scores, name='Logistic Regression') | ||
| plt.title('CAP Curve using from_predictions') | ||
| plt.show() No newline at end of file |
There was a problem hiding this comment.
For information when running this code I get:
This looks fine but I would like to also have an option to display the curve for oracle predictions and the curve for the random predictions baseline.
EDIT: we need to check the literature to see if it's common to normalize the cumulated values to 1 instead of plotting absolute numbers. Maybe an option to select between relative and absolute scales would make sense.
There was a problem hiding this comment.
In addition, I think we should set better the x- and y-lim (as in ROC curve display).
Also, is it common for this curve to be be in a squared axis? I see this is kind of different from the ROC and PR curve in this regards, where the x- and y-axis are not the same unit.
There was a problem hiding this comment.
-
An option for relative VS absolute scales sounds relevant to me. Don't have a opinion of which should be the default.
-
+1 for squared axis (especially with normalized scale). This paper (Tasche 2006 "Validation of internal rating systems and PD estimates" contains one example with a squared axis cap curve:
It reminds of ROC curve, but maybe there are reasons of not doing so?
There was a problem hiding this comment.
An option for relative VS absolute scales sounds relevant to me. Don't have a opinion of which should be the default.
I would use relative intuitively as the default.
|
I've implemented some of the elements mentioned, but I don't have the time to do everything quickly. I also have several concerns:
|
The example I linked above is precisely an example of a Lorenz curve which based on the linked discussion is very related to the CAP curve. EDIT: I also posted an extra reference in the parent issue: #10003 (comment). |
For an example, you can have a look at the tests for You can run all the tests in this module with the pytest command ( you can run a specific test with: since this is a parametrized test (with So for your branch, you can create a file named More details on pytest usage in https://scikit-learn.org/dev/developers/tips.html#pytest-tips or on the pytest website itself. |
Some pre-commit tasks such as the black formatted and the ruff import statement formatter can auto-fix the code with you type |
|
|
||
| # compute cumulative sums for true positives and all cases | ||
| cumulative_true_positives = np.cumsum(y_true_sorted * sample_weight_sorted) | ||
| cumulative_total = np.cumsum(sample_weight_sorted) |
There was a problem hiding this comment.
This is very similar to a Lorenz curve that we manually define in this example.
scikit-learn/examples/linear_model/plot_poisson_regression_non_normal_loss.py
Lines 495 to 507 in 6bf0ba5
The differences are (as far as I can see):
y_predis the estimated probability of a positive class membership given by a classifier for CAP/Lift/Gain curves whiley_predhold the continuous prediction of a positive target variable for a regressor.sorted_indices = np.argsort(y_pred)[::-1]is reversed in CAP while it stays in ascending order in a Lorenz curve.
So I think we could indeed reuse the same class both for CAP/Lift/Gain curves and Lorenz curves.
To switch between both, we can probably introduce a keyword parameter x_ordering_style="cap" vs x_ordering_style="lorenz" for instance.
There was a problem hiding this comment.
As discussed in the main thread of the PR, let's keep that for a follow-up PR.
thanks! helps a lot actually (also for tests) |
|
Great. Now that you have tests, you can see in the diff view that the test coverage analysis has resulted in a few automated inline comments that highlight section of your code that are currently not tested: https://github.com/scikit-learn/scikit-learn/pull/28752/files To merge this PR we will need to have improved test coverage. However, I would rather put the focus on the upgrading the code itself to:
|
To keep this PR minimal, I would postpone this to a follow-up PR. |
|
Alright, I agree to limit the scope of this PR to the case of CAP / Lift / Gain curve on classification tasks for now. |
ogrisel
left a comment
There was a problem hiding this comment.
As I said above I am ok to focus this PR on the classification case and leave regression for later. However I think we should already chose public attribute names for the cumulated statistics that are compatible with the anticipated regression/Lorenz curve case (see inline comment below)
Furthermore, I think we should update one of the existing example that displays a ROC curve for a binary classifier to also include a CAP curve. I think that the following example that already displays both a DET curve and a ROC curve is a good candidate:
https://scikit-learn.org/stable/auto_examples/model_selection/plot_det.html
We should probably refactor it a bit to change the title and even the filename to make it explicit that it's about more than DET curves.
The alternative would be to create a new example file dedicated to CAP but I think we all agree that the example gallery has already too many short examples and it's better to consolidate them into larger examples that combine related components from the scikit-learn API and better contrasts the commonalities and differences.
| if plot_chance_level: | ||
| assert ( | ||
| cap_display.chance_level_ is not None | ||
| ), "Chance level line should be present" |
There was a problem hiding this comment.
Let's also extend the tests to check for the presence shape and dtypes of other public attributes such as y_true_cumulative (new name suggested in https://github.com/scikit-learn/scikit-learn/pull/28752/files#r1553497449).
I might also make sense to check that the values of those attribute match (with assert_allclose) when calling CumulativeAccuracyDisplay.from_estimator or CumulativeAccuracyDisplay.from_predictions.
There was a problem hiding this comment.
I've made the change for y_true_cumulative, but I'm not sure about shape or dtypes. Are displays supposed to have these attributes (or just this one)?
There was a problem hiding this comment.
I've just realised that I've misread it. You meant checking the attributes of y_true_cumulative, not the display itself.
lorentzenchr
left a comment
There was a problem hiding this comment.
Review of from_estimator.
| This is also known as a Gain or Lift Curve for classification, and a Lorenz | ||
| curve for regression with a positively valued target. |
There was a problem hiding this comment.
| This is also known as a Gain or Lift Curve for classification, and a Lorenz | |
| curve for regression with a positively valued target. | |
| This is also known as a Gain or Lift Curve, or the mirrored Lorenz | |
| curve of Machine Learning for regression with a positively valued target. |
There was a problem hiding this comment.
Since we agree to leave the Lorenz curve / regression case for a follow-up PR I would defer this change for later. Speaking about regression on an class that is designed and document to only accept classifier predictions would be a source confusion.
I would also not add another example. We can replace the lorenz curve with CAP in 2 examples (Poisson and Tweedie). |
But this is only for regression. I would like to illustrate CAP in a classification context and updating an example with a ROC and DET curve would make that possible and also improve the discoverability of the plot. Furthermore I find it interesting to visually compare the 3 plots for the same models / data. |
ogrisel
left a comment
There was a problem hiding this comment.
The sklearn.metrics should expose the CumulativeAccuracyDisplay class as part of the public API and include it in its __all__ module level attribute.
I think the docstrings of the 2 class methods should:
- mention that this curve assesses the ranking power of classifiers, that is the ability to correctly order the individual predictions by probability of actually be labeled by the positive class,
- make it explicit this curve is invariant under any monotonic transformation of the predictions, therefore it cannot be used to assess the calibration of the predictions, only the ranking power,
- have a "See also" section (at the end of the docstring) and cross-link
RocCurveDisplayas an alternative curve to assess the ranking power of classifiers.
|
I just want to avoid a new example. Therefore, either extend existing examples or postpone to a followup PR. But please, no new example! (Strong opinion there) |
I'm more in favor of including regression from the start. That's where it adds additional value. For classification, it is kind of equivalent to ROC. On top, this helps to find good names. |
|
@JosephBARBIERDARNAL Why did you close? This would be a valuable addition. |
|
I forgot that this PR was still open when I deleted my fork, sorry! Any solution for this type of case? |
|
I created a new PR (#28972) with the same code. I'll hopefully keep this one open. again sorry for the mistake! |
Reference Issue
fix for #10003
What does this implement/fix?
creation of a CumulativeAccuracyDisplay class for plots
"The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis. The output is called a CAP curve.[1] The CAP is distinct from the receiver operating characteristic (ROC) curve, which plots the true-positive rate against the false-positive rate." (wikipedia definition)
It's mainly inspired from the
RocCurveDisplayclass.other
It's currently a work in progress.