[MRG] Monotonic constraints for GBDT#15582
Conversation
…notonic_constraints
…notonic_constraints
|
why not make it an init argument for now? |
|
This is more of a I'm open to anything, but ideally we wouldn't merge this PR if we knew for sure that the API would change in the future. |
| if gain > best_gain and gain > self.min_gain_to_split: | ||
| found_better_split = True | ||
| best_gain = gain | ||
| best_bin_idx = bin_idx | ||
| best_sum_gradient_left = sum_gradient_left | ||
| best_sum_hessian_left = sum_hessian_left | ||
| best_n_samples_left = n_samples_left | ||
|
|
||
| if found_better_split: | ||
| split_info.gain = best_gain | ||
| split_info.bin_idx = best_bin_idx | ||
| # we scan from left to right so missing values go to the right | ||
| split_info.missing_go_to_left = False | ||
| split_info.sum_gradient_left = best_sum_gradient_left | ||
| split_info.sum_gradient_right = sum_gradients - best_sum_gradient_left | ||
| split_info.sum_hessian_left = best_sum_hessian_left | ||
| split_info.sum_hessian_right = sum_hessians - best_sum_hessian_left | ||
| split_info.n_samples_left = best_n_samples_left | ||
| split_info.n_samples_right = n_samples - best_n_samples_left | ||
|
|
||
| # We recompute best values here but it's cheap | ||
| split_info.value_left = compute_value( | ||
| split_info.sum_gradient_left, split_info.sum_hessian_left, | ||
| lower_bound, upper_bound, self.l2_regularization) | ||
|
|
||
| split_info.value_right = compute_value( | ||
| split_info.sum_gradient_right, split_info.sum_hessian_right, | ||
| lower_bound, upper_bound, self.l2_regularization) |
There was a problem hiding this comment.
For reviewers: apart from the leaf value computation (which is new), the logic here is actually unchanged.
It's mostly just a small optimization: instead of setting the all the attribute of the split_info at each iteration, we only do it once at the end
| gain = negative_loss(sum_gradient_left, sum_hessian_left, | ||
| l2_regularization) | ||
| gain += negative_loss(sum_gradient_right, sum_hessian_right, | ||
| l2_regularization) | ||
| gain -= negative_loss_current_node |
There was a problem hiding this comment.
For reviewers:
The gain computation has been updated to:
- first compute values of left and right child
- cap those values according to the bounds for monotonic constraints
- discard any split that does not respect left < right (INC) or right < left (DEC)
- compute the loss reduction from the previously (bounded) computed values
| # Considering the following tree with a monotonic INC constraint, we | ||
| # should have: |
There was a problem hiding this comment.
Note for reviewers: start here
NicolasHug
left a comment
There was a problem hiding this comment.
Thanks a lot for the review!
I addressed most comments and will try a fix tomorrow for the gain computation.
| # or all decreasing (or neither) depending on the monotonic constraint. | ||
| nodes = predictor.nodes | ||
|
|
||
| def get_leaves_values(): |
There was a problem hiding this comment.
For me that would suggest that there is only one leaf and that one leaf has multiple values.
Thoughts @jnothman ?
| dfs(node.left_child) | ||
| dfs(node.right_child) | ||
|
|
||
| dfs(grower.root) |
There was a problem hiding this comment.
I'm not sure we can do that here because unlike the predictor object, the grower does not provide an array of the nodes. It only has the root, and a finalized_leaves list which isn't enough for us here.
sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py
Outdated
Show resolved
Hide resolved
| gain -= _loss_from_value(value_right, sum_gradient_right) # with bounds | ||
| # Note that the losses for both children are computed with bounded values, | ||
| # while the loss of the current node isn't. It's OK since all the gain | ||
| # comparisons will be made with the same loss_current_node anyway. |
There was a problem hiding this comment.
I'm reasonably confident that lightgbm does it similarly: see https://github.com/microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L160, where the gain of the current node to be split will not take constraints into account (follow BeforeNumercal and then GetLeafGain). But I could be missing something.
You're right that it doesn't work well withmin_gain_to_split. I guess the correct way would be to pass the current node's value into find_node_split.
I'll try tomorrow and will report back!
sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py
Outdated
Show resolved
Hide resolved
…contraints.py Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>
|
OK, I pushed something and added a test. I'm still a little bit hesitant on what we should be doing. I think that in the end, it all boils down on how we define the loss at a given node:
or
They're both equivalent if no clipping happens, like in the XGBoost paper. In any case, I agree we should be consistent and use the same formula for all nodes. In the previous version (and in LightGBM as far as I understand), we were using 1 for the current node and 2 for the children. Now, we're using 2 for all nodes. WDYT @ogrisel, good to go? |
ogrisel
left a comment
There was a problem hiding this comment.
LGTM. Can you just do a quick benchmark to check that the latest change did not cause a significant performance regression w.r.t. master?
|
Thanks for the reminder, There actually was a slow-down due to With I get about 51sec in the current branch and 47 on master now. So there's still a slow down, but as far as I can tell the Cython code is about just as fast. It seems we're spending more time in the Python part. That might be due to the |
|
Maybe you can use py-spy with native code profiling enabled both on master and on this branch and compare the 2 flamegraphs to help identify the discrepancy? |
|
Great suggestion! It turns out the grower is spending a significant amount of time setting the bounds of the children. That's where the difference came from. I'm quite surprised because there's no constraint at all in the benchmark. But I guess it adds up in the end since this is done for every single node. I added a fast way in 83dba40. Now both times are similar:
(I ran them on a different machine from #15582 (comment), in case you're wondering why it's faster in both cases) |
adrinjalali
left a comment
There was a problem hiding this comment.
Thanks @NicolasHug , other than the one concern, looks all good.
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
Show resolved
Hide resolved
|
Alright, very nice work. Let's merge! |
|
Thanks a lot @adrinjalali and @ogrisel for the reviews! |

This PR adds support for monotonic constraints for the histogram GBDTs.
Addresses #6656
(see https://xgboost.readthedocs.io/en/latest/tutorials/monotonic.html)
The API is to pass e.g.
HistGradientBoostingRegressor(monotonic_cst=[-1, 1])For reviewers: The overall logic is pretty simple, but this still involved a lot of changes because I had to refactor the splitter, since now we require all nodes to have a value (previously only leaves would have a value).
@ogrisel @adrinjalali @thomasjpfan @amueller @glemaitre might be interested in this (after the release of course)