Skip to content

Norm inconsistency between RFE and SelectFromModel #2121

@jnothman

Description

@jnothman

In each RFE iteration, the step features with the lowest importance are discarded. A similar thing happens in SelectFromModel (but selection is by threshold rather than by number of features). There are some inconsistencies:

  • the mixin admits either est.coef_ or est.feature_importances_ as the basis of its calculation; RFE only coef_ fixed in [MRG+1] Fix RFE #4496
  • the mixin uses np.abs(est.coef_), while RFE uses safe_sqr(est.coef_). The result is the same for selecting features by quantity, as long as coef_ is 1d, but where 2d, the sum is taken over axis 0, and the ordering under L1 and L2 norms may differ.

Should RFE support feature_importances_? For coef_.ndim == 2, should SelectFromModel use sqrt(sqr(coef_).sum(axis=0))?

(@glouppe, I think.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions