You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the target y is (approximately) Poisson, Gamma or else Tweedie distributed, it would be beneficial for tree based regressors to support Tweedie deviance loss functions as splitting criterion. This partially addresses #5975.
For Poisson and Tweedie deviance with 1<=power<2, ther target y may be zero while the prediction y_pred must be strictly larger than zero. A tree might find a split where one node has y=0 for all samples in that node, resulting naively in y_pred = mean(y) = 0 for that node. I see 3 different solutions to that:
Use a splitting rule that forbids splits where one node has sum(y)=0.
One might also introduce some option like min_y_weight, such that splits with sum(sample_weight*y) < min_y_weight are forbidden.
Use some form of parent child average y_pred = a * mean(y) + (1-a) * y_pred_parent and forbid further splits, see [1].
(Bayes/credibility theory motivates to set a = sum(sample_weight*y)/(gamma+sum(sample_weight*y)) for some hyperparameter gamma.)
There is also a dirty solution that allows y_pred=0 but sets the value min(eps, y_pred) in the loss function for some tiny value of eps.
Describe the workflow you want to enable
If the target
yis (approximately) Poisson, Gamma or else Tweedie distributed, it would be beneficial for tree based regressors to support Tweedie deviance loss functions as splitting criterion. This partially addresses #5975.Describe your proposed solution
Ideally, one first implements
and then adds the different loss criteria to the tree based models:
DecisionTreeRegressor(poisson only) [MRG] ENH add Poisson splitting criterion for single trees #17386RandomForestRegressor(poisson only) ENH Adds Poisson criterion in RandomForestRegressor #19304 #19836GradientBoostingRegressorHistGradientBoostingRegressor(poisson and gamma but no other tweedie cases) ENH Poisson loss for HistGradientBoostingRegressor #16692Open for Discussion
For Poisson and Tweedie deviance with
1<=power<2, ther targetymay be zero while the predictiony_predmust be strictly larger than zero. A tree might find a split where one node hasy=0for all samples in that node, resulting naively iny_pred = mean(y) = 0for that node. I see 3 different solutions to that:y_pred = np.exp(tree)See ENH Poisson loss for HistGradientBoostingRegressor #16692 for HistGradientBoostingRegressor. This may be no option for DecisionTreeRegressor.
sum(y)=0.One might also introduce some option like
min_y_weight, such that splits withsum(sample_weight*y) < min_y_weightare forbidden.y_pred = a * mean(y) + (1-a) * y_pred_parentand forbid further splits, see [1].(Bayes/credibility theory motivates to set
a = sum(sample_weight*y)/(gamma+sum(sample_weight*y))for some hyperparametergamma.)There is also a dirty solution that allows
y_pred=0but sets the valuemin(eps, y_pred)in the loss function for some tiny value ofeps.References
[1] R rpart library, chapter 8 Poisson regression