[MRG] FIX gradient boosting with sklearn estimator as init#12436
[MRG] FIX gradient boosting with sklearn estimator as init#12436jeremiedbb wants to merge 11 commits intoscikit-learn:masterfrom
Conversation
rth
left a comment
There was a problem hiding this comment.
I'm not too familiar with GB code, but this looks good to me except for a minor comment below.
You also need to repeat "close" or "fix" in the description for every issue you want to auto-close: i.e. close #123, close#1234, etc..
I can confirm the added test fails on master.
Maybe @glemaitre would be able to do another review..
| # init predictions | ||
| y_pred = self.init_.predict(X) | ||
| y_pred = self.init_.predict(X).astype(np.float64, copy=False) | ||
| if y_pred.shape == (X.shape[0],): |
There was a problem hiding this comment.
Wouldn't,
y_pred.ndim == 1be enough?
|
|
||
| # init predictions | ||
| y_pred = self.init_.predict(X) | ||
| y_pred = self.init_.predict(X).astype(np.float64, copy=False) |
There was a problem hiding this comment.
What happens if you don't cast it to 64bit?
There was a problem hiding this comment.
An error is raised due to unsafe casting because y_pred is updated later by the actual fit of the gradient boosting, and filled with float64 values.
y_pred[:, k] += (learning_rate * tree.value[:, 0, 0].take(terminal_regions, axis=0))|
Also please add a what's new. I guess we could try to squeeze it into 0.20.1 since it's a long standing issue that has been reported multiple times.. |
|
@stoddardg Pointed out another related bug. When the init estimator does not support sample weights, it crashes even if one calls fit with |
7d3c06e to
53abfd6
Compare
| # fit initial model - FIXME make sample_weight optional | ||
| self.init_.fit(X, y, sample_weight) | ||
| # fit initial model | ||
| if has_fit_parameter(self.init_, "sample_weight"): |
There was a problem hiding this comment.
We should avoid has_fit_parameter unless try-except really won't work.
There was a problem hiding this comment.
It means keyword args won't work. We should only use this kind of thing when we really need to take a different approach when sample_weight is not supported. (Feel free to document this caveat more in a separate pr)
cc85fee to
12c18f4
Compare
jnothman
left a comment
There was a problem hiding this comment.
I'm not certain this should go in 0.20.1, but whatever...?
Nice work!
|
@rth I made some changes since your last review. Maybe you want to give another look at it.
It's not a regression from 0.19 so maybe it should go in 0.21 indeed. |
ae99693 to
92cb295
Compare
| # init does not support sample weights | ||
| init_est = _NoSampleWeightWrapper(init()) | ||
| gb(init=init_est).fit(X, y) # ok no sample weights | ||
| with pytest.raises(ValueError): |
There was a problem hiding this comment.
To be more specific about the error,
| with pytest.raises(ValueError): | |
| with pytest.raises( | |
| ValueError, | |
| match=r'.*estimator.*does not support sample weights.*'): |
| gb, dataset_maker, init = task | ||
|
|
||
| X, y = dataset_maker() | ||
| sample_weight = np.random.rand(100) |
There was a problem hiding this comment.
Let's use a RNG,
| sample_weight = np.random.rand(100) | |
| sample_weight = np.random.RandomState(42).rand(100) |
| # estimator. | ||
| # Check that an error is raised if trying to fit with sample weight but | ||
| # inital estimator does not support sample weight | ||
| gb, dataset_maker, init = task |
There was a problem hiding this comment.
Maybe use a bit more explicit name here,
init_estimator
| else: | ||
| raise ValueError( | ||
| "The initial estimator {} does not support sample " | ||
| "weights yet.".format(self.init_.__class__.__name__)) |
There was a problem hiding this comment.
| "weights yet.".format(self.init_.__class__.__name__)) | |
| "weights.".format(self.init_.__class__.__name__)) |
There was a problem hiding this comment.
It was to point that it's not a definitive statement :)
|
|
||
| if sample_weight is None: | ||
| sample_weight = np.ones(n_samples, dtype=np.float32) | ||
| sample_weight_was_none = True |
There was a problem hiding this comment.
| sample_weight_was_none = True | |
| sample_weight_is_none = True |
| sample_weight_was_none = True | ||
| else: | ||
| sample_weight = column_or_1d(sample_weight, warn=True) | ||
| sample_weight_was_none = False |
There was a problem hiding this comment.
| sample_weight_was_none = False | |
| sample_weight_is_none = False |
| # fit initial model | ||
| try: | ||
| self.init_.fit(X, y, sample_weight=sample_weight) | ||
| except TypeError: |
There was a problem hiding this comment.
Maybe calling inspect.signature(self.init_.fit), and adapting the kwargs, in consequence, would be cleaner? Here I wonder what happens if one gets a genuine unrelated TypeError, then understanding the traceback will be somewhat difficult.
There was a problem hiding this comment.
Nevermind, I see, it was the case originally and Joel requested the opposite change in #12436 (comment)
NicolasHug
left a comment
There was a problem hiding this comment.
Happy to see this fixed!
The current fix does not support multiclass classification, I added some suggestions.
| @pytest.mark.parametrize( | ||
| "task", | ||
| [(GradientBoostingClassifier, make_classification, DummyClassifier), | ||
| (GradientBoostingRegressor, make_regression, DummyRegressor)], |
There was a problem hiding this comment.
Testing for multiclass classification currently fails:
def make_multiclass():
return make_classification(n_classes=3, n_clusters_per_class=1)
@pytest.mark.parametrize(
"task",
[(GradientBoostingClassifier, make_classification, DummyClassifier),
(GradientBoostingClassifier, make_multiclass, DummyClassifier),
(GradientBoostingRegressor, make_regression, DummyRegressor)],
ids=["classification", "multiclass", "regression"]
def test_gradient_boosting_with_init(task):
...
# will fail with make_multiclass| y_pred = self.init_.predict(X) | ||
| y_pred = self.init_.predict(X).astype(np.float64, copy=False) | ||
| if y_pred.ndim == 1: | ||
| y_pred = y_pred[:, np.newaxis] |
There was a problem hiding this comment.
| y_pred = y_pred[:, np.newaxis] | |
| if self.loss_.K >= 2: | |
| # multiclass classification needs y_pred to be of shape | |
| # (n_samples, K) where K is the number of trees per | |
| # iteration, i.e. the number of classes. | |
| # see e.g. PriorProbabilityEstimator or ZeroEstimator | |
| n_samples = y_pred.shape[0] | |
| y_pred = np.tile(y_pred, self.n_classes_) | |
| y_pred = y_pred.reshape(n_samples, self.loss_.K) | |
| else: | |
| y_pred = y_pred[:, np.newaxis] |
There was a problem hiding this comment.
Good catch. Wouldn't be better to make a one-hot of y_pred ?
I pushed it, let me know which version is best.
There was a problem hiding this comment.
Yes one-hot seems to be a better option. I think my version would have been equivalent to a zero initialization (or any constant value).
BTW I realized we could use if loss.is_multi_class to be more explicit.
|
Should we detect that this is happening and bail in the partial dependence plot? Non-constant init estimators will make the partial dependence plot misleading / wrong. |
|
@amueller what do you mean "this is happening"? I don't think this is related to partial dependence plot, we're simply fixing a shape issue here. |
|
By "this" I mean "a non-constant init estimator". So far in our implementation init estimators were always constant. After this patch they will not be. |
jnothman
left a comment
There was a problem hiding this comment.
LGTM again! Thanks @NicolasHug!
b5d1462 to
1129cb1
Compare
|
Thinking about this again, I don't think it makes any sense to have an init predictor in a classification setting (It's OK for regression). This So currently This is not a problem in regression because I would propose to either:
Slightly related: lots of the docstrings actually state that Hope it's clear... I can try to clarify if needed |
|
Is a classifier with decision_function appropriate? Or is it appropriate to
use an initial regressor in classification? I've not thought about it, just
trying to understand your proposal on the surface, @NicolasHug
|
That would be slightly less inappropriate than using In binary classification:
My proposition is to replace the constant It does not make sense to set |
|
From discussing with @NicolasHug I think the solution in this PR is not correct for the classification case and we should do what @NicolasHug proposes (which actually also cleans up the GradientBoosting interface). |
|
I propose the following: Each loss function should have a For the binary classification loss, this would look like while for least squares this would be The built-in init estimators like It would be similar to what we do in pygbm, except that pygbm does not accept custom init estimators. @jeremiedbb you probably didn't sign up for this so if you want, I can take it up. I was planning on refactoring the GB code anyway (putting losses in another file, renaming |
|
Does it make sense to make it get_init_probabilities and put the sigmoid
inside the method?
|
|
I don't think so because |
@NicolasHug feel free to take over. I won't have much time to work on it, and this is going out of my comfort zone :) |
Fixes #10302, Fixes #12429, Fixes #2691
Gradient Boosting used to fail when init was a sklearn estimator, which is a bit ironic :)
Issue was that the predict output didn't have the expected shape. And apparently there was no test for the init parameter with other estimator than default.
Edit Also accept initial estimator which does not support sample weights as long as the gradient boosting is not fitted with sample weights