FEA add quantile HGBT#21800
Conversation
|
I suspect we would want to benchmark the new loss implementations and compare it to Edit: I just saw that #20811 does the loss migrations. I guess we want to merge that one first and then work on this one? |
|
Marking as high priority according to https://scikit-learn.fondation-inria.fr/technical-committee-june-2-2021-fr/. |
a8bed32 to
57505de
Compare
|
After merging #20811 and git rebase, this is an enjoyable short PR for a new feature! |
| loss_param : float, default=None | ||
| If loss is "quantile", this parameter specifies which quantile to be estimated | ||
| and must be between 0 and 1. |
There was a problem hiding this comment.
I see a few paths forward with passing in more parameters for loss:
-
If we want to be fully generic, then I think
loss_paramcould become a dictionary, which will be passed down into the loss directly. This way, other parameterized losses can be generically supported by this dictionary. We kind of already do this withmetric_kwargs. -
We can copy
GradientBoostingRegressor's API and use a single parameter (alpha) that is just forquantileloss. InHistGradientBoosting*case, I would call itquantileinstead ofalpha. -
(Likely out of reach in the near-term) Loss become public and users can pass create a
PinballLoss(quantile=0.3)and pass it directly intoHistGradient*.
There was a problem hiding this comment.
I'd like to add the option:
loss_paramis single parameter with different meaning depending on the loss, i.e. a quantile level forloss="quantile", and tweedie power forloss="tweedie"(in a later PR).
Comments:
loss_paramdictionary
This makes the clear distinction which loss is used with which parameter. As the loss parameter should not be used for hyperparameter tuning, this is ok.- Special parameter per loss.
If we add further losses, this would clutter the API. - Pass a loss object instance
To not do this and instead use simple parameter types like string and float was an original design decision for scikit-learn. It would, however, make live sometimes easier (meaning, I'm not opposed to this direction).
General: I really like consistency and would favor to use/adapt the choice we make here to other estimators, e.g. GradientBoostingRegressor, maybe one day DecisionTreeRegressor as well.
There was a problem hiding this comment.
loss_param is single parameter with different meaning depending on the loss, i.e. a quantile level for loss="quantile", and tweedie power for loss="tweedie" (in a later PR).
I was thinking of losses with more than one parameter. Although, I think supporting multiple parameters can be a considered a non-issue. For advanced users, they can create their own loss object and pass it in. Kind of like how we unofficially support passing in a Criterion object into the trees:
scikit-learn/sklearn/tree/_classes.py
Lines 360 to 363 in e1e8c66
As a starting point, I am okay with a single loss_param. I think the API discussion is worth bringing up during the next dev meeting.
There was a problem hiding this comment.
I'm not sure we already have a design similar to the proposed loss_param in scikit-learn, do we?
Do we already have a clear idea of what future losses we want to support and what parameters they would need, apart from Tweedie? If we don't, using loss_param instead of the more consistent and simpler quantile parameter might be solving a problem we don't really have.
There was a problem hiding this comment.
one reason to go towards loss_param be a dictionary is to support losses with more than two parameters. Although traditional losses e.g. epsilon insensitive loss, huber loss have also only one param maybe this can be a restriction in the future?
There was a problem hiding this comment.
I'm not sure we already have a design similar to the proposed loss_param in scikit-learn, do we?
Similar to this is param in GenericUnivariateSelect but it's always seemed quite ugly to me!
There was a problem hiding this comment.
From the dev meeting, I think the most agreeable and quickest path forward is to change loss_param toquantile and only accept a single float.
ogrisel
left a comment
There was a problem hiding this comment.
LGTM.
Please consider including lorentzenchr#2 if you like it.
Let's wait for the decision at the dev meeting w.r.t. loss_param. For the record I am fine with the way it is unless we already know of a loss that we plan to add and that would naturally require more than 1 parameter, in which case we can go for the dict option directly.
Note that we already have neighbors models that accept constructor params defined as metric="minknowski", metric_params={"p": 1.5, "w": [0.2, 0.8]} for instance.
|
@lorentzenchr brainstorming for a potential follow-up PR: do you think it would be possible to have a quantile loss that could predict multiple quantiles at once (similarly to loss multinomial that can predict one output per possible class)?. That would still require having many n times more trees in the ensemble when predicting n quantiles instead of 1 (as for multinomial) but that could be helpful from a usability standpoint. In a follow-up, follow-up PR we could even extend the capabilities of the individual HGBR trees so as to predict n outputs per tree directly and have the split select the highest improvement in the aggregate loss directly. This would have a different inductive bias but could be computationally much more efficient, both for multi-quantile regression and for multi-class classification (and we could also support mult-label in a similar way). |
@ogrisel For the (pinball) loss itself, there is no gain (that I see) in enabling several quantile levels at the same time other than user convenience: One pinball loss is tailored for only one quantile level.
My first thought is that a single tree/split for all quantiles could be too simplistic for the whole predictive distribution. The problem is the weighting: should we weight the median more than the 99% quantile or the other way around. It works, however, for quantile regression forests, see Meinshausen 2006. I have to think about it more deeply. |
|
From the dev meeting, I think the most agreeable and quickest path forward
is to change loss_param toquantile and only accept a single float.
works for me
… Message ID: ***@***.***
com>
|
|
@ogrisel Do you stand by your approval? |
|
@agramfort Thanks for your review. It is good to know that there are other people interested in quantile regression 😄 |
|
you're not alone :)
… Message ID: ***@***.***>
|
This reverts commit 539c3a2.
|
Status report: 2 approvals, all lights green and no comments left. |
thomasjpfan
left a comment
There was a problem hiding this comment.
LGTM!
@ogrisel Your approval was when we still had loss_param. Are you okay with using the quantile instead?
Reference Issues/PRs
Follow-up of #20567 and #20811. Solves #17955.
It is based on top of #20811, so it should be merged after that PR.Edit: merged.What does this implement/fix? Explain your changes.
This PR adds
loss="quantile"toHistGradientBoostingRegressor.Any other comments?
This also introduces the new parameter
loss_paramfor specifying which quantile to model. This is in anticipation of possible other losses with one parameter as the Tweedie deviance.