Skip to content

Better documentation for RFECV #27193

@CameronBieganek

Description

@CameronBieganek

There is almost no description in the documentation of how RFECV actually works. The user guide simply says

RFECV performs RFE in a cross-validation loop to find the optimal number of features.

and the API page simply says

Recursive feature elimination with cross-validation to select features.

My best guess for what RFECV is actually doing is the following.

  1. Start with all features.
  2. Do the following (in either order):
    a) Fit the estimator on all rows of X (for the current subset of features). Use coefs_ or feature_importances_ or a callable to select the feature(s) that will be removed in the next round.
    b) Run cross-validation with the estimator on X to estimate the accuracy of the estimator trained on the current subset of features.
  3. Remove the features chosen for removal in step 2a.
  4. Repeat steps 2 and 3 until the minimum number of features has been reached.
  5. Select the set of features that maximizes the CV scores calculated in step 2b. (This set of features is recorded in the support_ attribute.)

Is that correct? Furthermore, can a detailed explanation of what RFECV is doing be added to the documentation?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions