-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Norm inconsistency between RFE and SelectFromModel #2121
Copy link
Copy link
Closed
Labels
EasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolvemodule:feature_selection
Description
In each RFE iteration, the step features with the lowest importance are discarded. A similar thing happens in SelectFromModel (but selection is by threshold rather than by number of features). There are some inconsistencies:
the mixin admits eitherfixed in [MRG+1] Fix RFE #4496est.coef_orest.feature_importances_as the basis of its calculation; RFE onlycoef_- the mixin uses
np.abs(est.coef_), while RFE usessafe_sqr(est.coef_). The result is the same for selecting features by quantity, as long ascoef_is 1d, but where 2d, the sum is taken over axis 0, and the ordering under L1 and L2 norms may differ.
Should RFE support For feature_importances_?coef_.ndim == 2, should SelectFromModel use sqrt(sqr(coef_).sum(axis=0))?
(@glouppe, I think.)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
EasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolvemodule:feature_selection