-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
feature_importances_ should be a method in the ideal design #9606
Copy link
Copy link
Open
Labels
Description
This issue is not meant to be very practical, just a place to share my thoughts.
I believe feature_importances_ should have been designed as get_feature_importances() (which is, perhaps, funny because I think the get_feature_names design is pretty broken too), for the following reasons:
- calculating feature importances can be costly, and should not (and is not in some cases) be calculated at
fittime unnecessary - there are often multiple ways to calculate feature importances (as simply as choice of norm for
coef_), and (as long as they depend on the same sufficient statistics) the user may fairly not decide which is appropriate until afterfit. Thusget_feature_importancescould have parameters to choose its method. Meta-estimators such asSelectFromModelandRFEcurrently have parameters for how they should interpretcoef_as feature importances, but really these are parameters that should be passed to the linear model'sget_feature_importances; the model itself should know how to summarise itscoef_, and doing so gets more complicated once we have multi-outputcoef_. - it is semantically different from other attributes, not being a sufficient statistic upon which basis the estimator makes predictions
I don't think there is currently sufficient motivation to change, but I could be persuaded.
Ping @kmike?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Discussion