This issue is not meant to be very practical, just a place to share my thoughts.
I believe feature_importances_ should have been designed as get_feature_importances() (which is, perhaps, funny because I think the get_feature_names design is pretty broken too), for the following reasons:
- calculating feature importances can be costly, and should not (and is not in some cases) be calculated at
fit time unnecessary
- there are often multiple ways to calculate feature importances (as simply as choice of norm for
coef_), and (as long as they depend on the same sufficient statistics) the user may fairly not decide which is appropriate until after fit. Thus get_feature_importances could have parameters to choose its method. Meta-estimators such as SelectFromModel and RFE currently have parameters for how they should interpret coef_ as feature importances, but really these are parameters that should be passed to the linear model's get_feature_importances; the model itself should know how to summarise its coef_, and doing so gets more complicated once we have multi-output coef_.
- it is semantically different from other attributes, not being a sufficient statistic upon which basis the estimator makes predictions
I don't think there is currently sufficient motivation to change, but I could be persuaded.
Ping @kmike?
This issue is not meant to be very practical, just a place to share my thoughts.
I believe
feature_importances_should have been designed asget_feature_importances()(which is, perhaps, funny because I think theget_feature_namesdesign is pretty broken too), for the following reasons:fittime unnecessarycoef_), and (as long as they depend on the same sufficient statistics) the user may fairly not decide which is appropriate until afterfit. Thusget_feature_importancescould have parameters to choose its method. Meta-estimators such asSelectFromModelandRFEcurrently have parameters for how they should interpretcoef_as feature importances, but really these are parameters that should be passed to the linear model'sget_feature_importances; the model itself should know how to summarise itscoef_, and doing so gets more complicated once we have multi-outputcoef_.I don't think there is currently sufficient motivation to change, but I could be persuaded.
Ping @kmike?