[MRG] FIX max_leaf_node and max_depth interaction in GBDT#16183
[MRG] FIX max_leaf_node and max_depth interaction in GBDT#16183adrinjalali merged 11 commits intoscikit-learn:masterfrom
Conversation
| assert depth == max_depth | ||
|
|
||
|
|
||
| def test_max_depth_max_leaf_nodes(): |
There was a problem hiding this comment.
You don't think that we should move this test in test_gradient_boosting.py instead of the test_grower.py?
There was a problem hiding this comment.
No strong opinion. It's really a test about the grower code more than the estimator but yeah it's not obvious to come up with something that would only involve the grower
There was a problem hiding this comment.
Okay this may seem a little crazy, but:
def test_max_depth_max_leaf_nodes():
n_samples = 1000
n_bins = 64
max_depth = 4
max_leaf_nodes = 9 # 2**(max_depth - 1) + 1
X, y = make_classification(n_samples=n_samples, random_state=42)
X_binned = _BinMapper(n_bins=n_bins, random_state=42).fit_transform(X)
y = y.astype(Y_DTYPE)
loss = LeastSquares()
baseline = loss.get_baseline_prediction(y, 1)
raw_predictions = np.full(shape=(1, n_samples), fill_value=baseline)
gradients, hessians = loss.init_gradients_and_hessians(
n_samples=n_samples, prediction_dim=1)
for i in range(5):
loss.update_gradients_and_hessians(gradients, hessians, y,
raw_predictions)
grower = TreeGrower(X_binned, gradients[0, :], hessians[0, :],
n_bins=n_bins, max_leaf_nodes=max_leaf_nodes,
max_depth=max_depth)
grower.grow()
_update_raw_predictions(raw_predictions[0, :], grower)
assert len(grower.finalized_leaves) <= max_leaf_nodes|
you might like this new solution better @thomasjpfan . |
|
failure is unrelated |
…x_maxleafnodes_maxdepth
…x_maxleafnodes_maxdepth
…kit-learn into fix_maxleafnodes_maxdepth
| self.n_nodes += 2 | ||
|
|
||
| if self.max_depth is not None and depth == self.max_depth: | ||
| if (self.max_leaf_nodes is not None |
There was a problem hiding this comment.
Diff might be confusing but all I did was check for max_leaf_nodes before checking for max_depth
There was a problem hiding this comment.
This reordered was noticed ;)
|
Thanks for the reviews so far. I have moved the test and simplified it to a 3-liner. I confirm it's a non-regression test that fails on master |
thomasjpfan
left a comment
There was a problem hiding this comment.
LGTM
Can confirm that test fails on master.
| self.n_nodes += 2 | ||
|
|
||
| if self.max_depth is not None and depth == self.max_depth: | ||
| if (self.max_leaf_nodes is not None |
There was a problem hiding this comment.
This reordered was noticed ;)
* fix max_leaf_node max_depth interaction * Added test * comment * what's new * simpler solution * moved and simplified test * typo
* fix max_leaf_node max_depth interaction * Added test * comment * what's new * simpler solution * moved and simplified test * typo
Fixes #16179
The fix is simply to check for max_samples_leaf before checking for max_depth.
This makes sure that
_finalize_splittable_nodesis called, which was the cause of the bug.