-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
GradientBoosting fails when using init estimator parameter. #12429
Copy link
Copy link
Closed
Description
Description
Passing any estimator to the init parameter in GradientBoostingClassifier (or GradientBoostingRegressor) causes errors when fitting. I'm including one example below as a MWE but I've fiddled with this issue for a while and get a number of different errors even if I can get past one. My guess is that this part of the code (the init estimator part) hasn't been shown any love in quite a while (since I've found similar issues that date back to 2014: #2691). A more complete exploration can be found here: https://github.com/stoddardg/sklearn_bug_example/blob/master/Gradient%2BBoosting%2Binit%2BMWE.ipynb
Steps/Code to Reproduce
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegressionCV
from sklearn.ensemble import GradientBoostingClassifier
X, y = make_classification(n_samples=1000, n_features=50, n_informative=10,random_state=0)
init_clf = LogisticRegressionCV()
clf = GradientBoostingClassifier(init=init_clf)
clf.fit(X, y)Actual Results
IndexError Traceback (most recent call last)
<ipython-input-5-f94c05274ee0> in <module>()
6 init_clf = LogisticRegressionCV()
7 clf = GradientBoostingClassifier(init=init_clf)
----> 8 clf.fit(train_X, train_y)
~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in fit(self, X, y, sample_weight, monitor)
1463 n_stages = self._fit_stages(X, y, y_pred, sample_weight, self._rng,
1464 X_val, y_val, sample_weight_val,
-> 1465 begin_at_stage, monitor, X_idx_sorted)
1466
1467 # change shape of arrays after fit (early-stopping or additional ests)
~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in _fit_stages(self, X, y, y_pred, sample_weight, random_state, X_val, y_val, sample_weight_val, begin_at_stage, monitor, X_idx_sorted)
1527 y_pred = self._fit_stage(i, X, y, y_pred, sample_weight,
1528 sample_mask, random_state, X_idx_sorted,
-> 1529 X_csc, X_csr)
1530
1531 # track deviance (= loss)
~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in _fit_stage(self, i, X, y, y_pred, sample_weight, sample_mask, random_state, X_idx_sorted, X_csc, X_csr)
1197 loss.update_terminal_regions(tree.tree_, X, y, residual, y_pred,
1198 sample_weight, sample_mask,
-> 1199 self.learning_rate, k=k)
1200
1201 # add tree to ensemble
~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in update_terminal_regions(self, tree, X, y, residual, y_pred, sample_weight, sample_mask, learning_rate, k)
396 self._update_terminal_region(tree, masked_terminal_regions,
397 leaf, X, y, residual,
--> 398 y_pred[:, k], sample_weight)
399
400 # update predictions (both in-bag and out-of-bag)
IndexError: too many indices for array
Versions
System
------
python: 3.7.0 (default, Jun 28 2018, 13:15:42) [GCC 7.2.0]
executable: /home/gstoddard/anaconda3/envs/eis_env/bin/python
machine: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-redhat-7.5-Maipo
BLAS
----
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /home/gstoddard/anaconda3/envs/eis_env/lib
cblas_libs: mkl_rt, pthread
Python deps
-----------
pip: 18.0
setuptools: 39.2.0
sklearn: 0.20.0
numpy: 1.15.0
scipy: 1.1.0
Cython: None
pandas: 0.23.3
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels