Skip to content

Regression in SelectFromModel where max_features_ does not exist with prefit=True. #23267

@glemaitre

Description

@glemaitre

Describe the bug

While testing the RC in xgboost, there is a test failure due to a regression after introducing #22356.

I assume that we did not think about the case when prefit=True.

Steps/Code to Reproduce

    from sklearn.datasets import load_digits
    from sklearn.feature_selection import SelectFromModel
    digits = load_digits(n_class=2)
    y = digits['target']
    X = digits['data']
    cls = xgb.XGBClassifier()
    cls.fit(X, y)
    selector = SelectFromModel(cls, prefit=True, max_features=1)
    X_selected = selector.transform(X)
    assert X_selected.shape[1] == 1

Expected Results

The test should pass

Actual Results

> pytest tests/python -k test_select_feature
============================================================= test session starts =============================================================
platform darwin -- Python 3.8.12, pytest-7.1.0, pluggy-1.0.0
Matplotlib: 3.4.3
Freetype: 2.10.4
rootdir: /Users/glemaitre/Documents/packages/xgboost/tests, configfile: pytest.ini
plugins: mpl-0.14.0, hypothesis-6.23.2, xdist-2.4.0, asdf-2.11.0, anyio-3.3.3, forked-1.3.0, cov-3.0.0
collected 334 items / 333 deselected / 1 selected                                                                                             

tests/python/test_with_sklearn.py F                                                                                                     [100%]

================================================================== FAILURES ===================================================================
_____________________________________________________________ test_select_feature _____________________________________________________________

    def test_select_feature():
        from sklearn.datasets import load_digits
        from sklearn.feature_selection import SelectFromModel
        digits = load_digits(n_class=2)
        y = digits['target']
        X = digits['data']
        cls = xgb.XGBClassifier()
        cls.fit(X, y)
        selector = SelectFromModel(cls, prefit=True, max_features=1)
>       X_selected = selector.transform(X)

tests/python/test_with_sklearn.py:326: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../scikit-learn/sklearn/feature_selection/_base.py:90: in transform
    return self._transform(X)
../scikit-learn/sklearn/feature_selection/_base.py:94: in _transform
    mask = self.get_support()
../scikit-learn/sklearn/feature_selection/_base.py:53: in get_support
    mask = self._get_support_mask()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = SelectFromModel(estimator=XGBClassifier(base_score=0.5, booster='gbtree',
                                        call...                                        random_state=0, reg_alpha=0, ...),
                max_features=1, prefit=True)

    def _get_support_mask(self):
        # SelectFromModel can directly call on transform.
        if self.prefit:
            estimator = self.estimator
        elif hasattr(self, "estimator_"):
            estimator = self.estimator_
        else:
            raise ValueError(
                "Either fit the model before transform or set"
                ' "prefit=True" while passing the fitted'
                " estimator to the constructor."
            )
        scores = _get_feature_importances(
            estimator=estimator,
            getter=self.importance_getter,
            transform_func="norm",
            norm_order=self.norm_order,
        )
        threshold = _calculate_threshold(estimator, scores, self.threshold)
        if self.max_features is not None:
            mask = np.zeros_like(scores, dtype=bool)
            candidate_indices = np.argsort(-scores, kind="mergesort")[
>               : self.max_features_
            ]
E           AttributeError: 'SelectFromModel' object has no attribute 'max_features_'

../scikit-learn/sklearn/feature_selection/_from_model.py:261: AttributeError

Versions

System:
    python: 3.8.12 | packaged by conda-forge | (default, Sep 16 2021, 01:38:21)  [Clang 11.1.0 ]
executable: /Users/glemaitre/mambaforge/envs/dev/bin/python
   machine: macOS-12.3.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.2.dev0
          pip: 21.3
   setuptools: 58.2.0
        numpy: 1.21.2
        scipy: 1.8.0.dev0+1902.b795164
       Cython: 0.29.24
       pandas: 1.3.3
   matplotlib: 3.4.3
       joblib: 1.0.1
threadpoolctl: 2.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /Users/glemaitre/mambaforge/envs/dev/lib/libopenblas_vortexp-r0.3.18.dylib
        version: 0.3.18
threading_layer: openmp
   architecture: VORTEX
    num_threads: 8

       user_api: openmp
   internal_api: openmp
         prefix: libomp
       filepath: /Users/glemaitre/mambaforge/envs/dev/lib/libomp.dylib
        version: None
    num_threads: 8

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions