Skip to content

GradientBoosting fails when using init estimator parameter. #12429

@stoddardg

Description

@stoddardg

Description

Passing any estimator to the init parameter in GradientBoostingClassifier (or GradientBoostingRegressor) causes errors when fitting. I'm including one example below as a MWE but I've fiddled with this issue for a while and get a number of different errors even if I can get past one. My guess is that this part of the code (the init estimator part) hasn't been shown any love in quite a while (since I've found similar issues that date back to 2014: #2691). A more complete exploration can be found here: https://github.com/stoddardg/sklearn_bug_example/blob/master/Gradient%2BBoosting%2Binit%2BMWE.ipynb

Steps/Code to Reproduce

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegressionCV
from sklearn.ensemble import GradientBoostingClassifier

X, y = make_classification(n_samples=1000, n_features=50, n_informative=10,random_state=0)
init_clf = LogisticRegressionCV()
clf = GradientBoostingClassifier(init=init_clf)
clf.fit(X, y)

Actual Results

IndexError                                Traceback (most recent call last)
<ipython-input-5-f94c05274ee0> in <module>()
      6 init_clf = LogisticRegressionCV()
      7 clf = GradientBoostingClassifier(init=init_clf)
----> 8 clf.fit(train_X, train_y)

~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in fit(self, X, y, sample_weight, monitor)
   1463         n_stages = self._fit_stages(X, y, y_pred, sample_weight, self._rng,
   1464                                     X_val, y_val, sample_weight_val,
-> 1465                                     begin_at_stage, monitor, X_idx_sorted)
   1466 
   1467         # change shape of arrays after fit (early-stopping or additional ests)

~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in _fit_stages(self, X, y, y_pred, sample_weight, random_state, X_val, y_val, sample_weight_val, begin_at_stage, monitor, X_idx_sorted)
   1527             y_pred = self._fit_stage(i, X, y, y_pred, sample_weight,
   1528                                      sample_mask, random_state, X_idx_sorted,
-> 1529                                      X_csc, X_csr)
   1530 
   1531             # track deviance (= loss)

~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in _fit_stage(self, i, X, y, y_pred, sample_weight, sample_mask, random_state, X_idx_sorted, X_csc, X_csr)
   1197             loss.update_terminal_regions(tree.tree_, X, y, residual, y_pred,
   1198                                          sample_weight, sample_mask,
-> 1199                                          self.learning_rate, k=k)
   1200 
   1201             # add tree to ensemble

~/anaconda3/envs/eis_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py in update_terminal_regions(self, tree, X, y, residual, y_pred, sample_weight, sample_mask, learning_rate, k)
    396             self._update_terminal_region(tree, masked_terminal_regions,
    397                                          leaf, X, y, residual,
--> 398                                          y_pred[:, k], sample_weight)
    399 
    400         # update predictions (both in-bag and out-of-bag)

IndexError: too many indices for array

Versions

System
------
    python: 3.7.0 (default, Jun 28 2018, 13:15:42)  [GCC 7.2.0]
executable: /home/gstoddard/anaconda3/envs/eis_env/bin/python
   machine: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-redhat-7.5-Maipo

BLAS
----
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /home/gstoddard/anaconda3/envs/eis_env/lib
cblas_libs: mkl_rt, pthread

Python deps
-----------
       pip: 18.0
setuptools: 39.2.0
   sklearn: 0.20.0
     numpy: 1.15.0
     scipy: 1.1.0
    Cython: None
    pandas: 0.23.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions