[MRG] Plotting API starting with ROC curve#14357
[MRG] Plotting API starting with ROC curve#14357qinhanmin2014 merged 46 commits intoscikit-learn:masterfrom
Conversation
doc/modules/classes.rst
Outdated
| Plotting metrics | ||
| ---------------- | ||
|
|
||
| .. automodule:: sklearn.metrics.plot |
There was a problem hiding this comment.
So we've decide to use sklearn.XXX.plot.plot_XXX, right?
We'll put things like plot_decision_boundary in sklearn.inspection, right?
sklearn/metrics/plot/roc_curve.py
Outdated
| (default='predict_proba') | ||
| Method to call estimator to get target scores | ||
|
|
||
| label : str or None, optional (default=None) |
|
So we've decided to: I think this is a great solution, but I'd like to see more opinions here (>=+3 maybe). Are you able to reach out to more people during the sprint, or maybe a SLEP/mail in the mailing list? You need to change our matplotlib dependency in readme and install.rst to something like We need to take care of |
|
CC @NicolasHug |
|
There's a reimplementation of I think it's worth emphasizing that the common simple use-case involves just calling the function, and the user not having to worry about the object. The main motivation for the object is to allow replotting without recomputation. Maybe if we want to make the SLEPs more lightweight it might make sense to write a slep for this, I'm not sure. I listed alternatives in #13448 (#13448 (comment)) There's some discussion there as well. |
|
Regarding plot_roc_curve, will it be useful to return the roc_auc_score? |
|
Maybe to summarize, we could:
I like being able to plot a roc curve with a single line. I think that should be our goal, so I'm strongly in favor of 2. Having to recompute, say, partial dependence, for plotting it again is not acceptable to me, though. That means we need to return or store the results of the computation somehow. There's two immediate solutions (maybe there's better ones that I can't think of): We opted for b) here because it hides the complexity somewhat from the user. The simple case still remains simple with a function call, not a class construction, which seems more natural to me (the difference is a bit academic, though, for the user it mostly manifests in whether we use |
|
docstring tests are failing @thomasjpfan |
qinhanmin2014
left a comment
There was a problem hiding this comment.
We need an example in the gallery (or we can modify existing example), at least for plot_roc_curve.
doc/developers/contributing.rst
Outdated
| return viz | ||
| ``` | ||
|
|
||
| Note that the ``__init__`` method defines attributes that are going to be used |
There was a problem hiding this comment.
this paragraph can be integrated into the previous one IMO.
|
There's two roc curve examples, one for multi-class and one for cross-validation. The cross-validation example we can pretty easily do with the new function, apart from computing the mean.
For the multi-class one, I would use the plot function and manually apply it to each label for plotting in OVR fashion, and use the newly added multi-class roc to actually compute the metric. |
sklearn/metrics/plot/roc_curve.py
Outdated
| return self | ||
|
|
||
|
|
||
| def plot_roc_curve(estimator, X, y, pos_label=None, sample_weight=None, |
There was a problem hiding this comment.
If there's only one artist, I think we don't need to have a dict as an argument and can just do **kwargs instead of having line_kw={...}
|
Naming sugestion: changer "Visualizer" to "Display": "RocCurveDisplay" |
| """ | ||
| if response_method != "auto": | ||
| prediction_method = getattr(estimator, response_method, None) | ||
| if prediction_method is None: |
There was a problem hiding this comment.
you should check that user provided prediction_method is one of the accepted values.
@qinhanmin2014 the ultimate goal here is to enable changing the style of plots without having to recompute values. We want to keep functions |
Maybe it's common for a helper function to do similar things compared with the class in scikit-learn? |
|
True, but I guess estimators are intended to be only used once. The advantage of the current design is that if you realize you want to change the style of the plot after the fact, your code can stay (almost) the same. With your proposal uses have to replace the function call by an object instantiation. |
|
@qinhanmin2014 raises the question in the dev meeting whether we need a visualizer for |
|
I kinda like the |
|
I think Andy persuaded me during the meeting. I'll vote +1 here.
I agree that we don't need a visualizer for plot_tree, but the inconsistency is indeed a problem.
+1 for either solution. I don't care much about the name. Maybe @NicolasHug can organize a vote in the mailing list (since it's a brand new API in scikit-learn) and then we can merge this one. |
|
waiting for @thomasjpfan to post an update and ping (not sure why I should be the one calling for a vote though?) |
because you organize the meeting, hmm, strange reason right :) |
|
I have updated this PR and the original post with the following summary: SummaryThe key features of this API is to run calculations once and to have the flexibility to adjust the visualizations after the fact. This logic is encapsulated into a display object where the computed data is stored and the plotting is done in a For example, the class RocCurveDisplay:
def __init__(self, fpr, tpr, roc_auc, estimator_name):
...
self.fpr = fpr
self.tpr = tpr
self.roc_auc = roc_auc
self.estimator_name = estimator_name
def plot(self, ax=None, name=None, **kwargs):
...
self.line_ = ...
self.ax_ = ax
self.figure_ = ax.figure_Together with a plotting function to create this object: def plot_roc_curve(estimator, X, y, pos_label=None, sample_weight=None,
drop_intermediate=True, response_method="auto",
name=None, ax=None, **kwargs):
# do computation
viz = RocCurveDisplay(fpr, tpr, roc_auc,
estimator.__class__.__name__)
return viz.plot(ax=ax, name=name, **kwargs) |
|
I think Display is ambiguous but much more succinct. I don't think there's
any reason these need to take the same convention as "Classifier",
"Estimator" as they are entirely different. Go with Display? Btw I've not
looked in detail but at a glance I like this API.
|
|
I'm not sure if a vote is going to help here, it'll complicate things and this PR would be merged much much later, if we go for that process, which also includes writing a SLEP. |
|
So +1 from Joel, Andy, Adrin, Nicolas and me (maybe Gael not sure, sorry if I miss someone), |
doc/visualizations.rst
Outdated
| Available Plotting Utilities | ||
| ============================ | ||
|
|
||
| Fucntions |
|
Welcome to a brave new world!
|
|
The brave new world of plotting? |
|
Yes, the brave new world of maintaining a wide ranging plotting API!
|
(Post Updated 08/05/19)
Reference Issues/PRs
Related to
What does this implement/fix? Explain your changes.
Scikit-learn defines a simple API for creating visualizations for machine learning. The key features of this API is to run calculations once and to have the flexibility to adjust the visualizations after the fact. This logic is encapsulated into a display object where the computed data is stored and the plotting is done in a
plotmethod. The display object's__init__method contains only the data needed to create the visualization. Theplotmethod takes in parameters that only have to do with visualization, such as a matplotlib axes. Theplotmethod will store the matplotlib artists as attributes allowing for style adjustments through the display object. Aplot_*helper function accepts parameters to do the computation and the parameters used for plotting. After the helper function creates the display object with the computed values, it calls the display's plot method. Note that theplotmethod defines attributes related to matplotlib, such as the line artist. This allows for customizations after calling theplotmethod.For example, the
RocCurveDisplaydefines the following methods:Together with a plotting function to create this object:
Any other comments?
Here is a notebook demonstrating a workflow for using this API.
Here are what the API implemenation looks like for other visualizations: