Skip to content

feature_importances_ should be a method in the ideal design #9606

@jnothman

Description

@jnothman

This issue is not meant to be very practical, just a place to share my thoughts.

I believe feature_importances_ should have been designed as get_feature_importances() (which is, perhaps, funny because I think the get_feature_names design is pretty broken too), for the following reasons:

  • calculating feature importances can be costly, and should not (and is not in some cases) be calculated at fit time unnecessary
  • there are often multiple ways to calculate feature importances (as simply as choice of norm for coef_), and (as long as they depend on the same sufficient statistics) the user may fairly not decide which is appropriate until after fit. Thus get_feature_importances could have parameters to choose its method. Meta-estimators such as SelectFromModel and RFE currently have parameters for how they should interpret coef_ as feature importances, but really these are parameters that should be passed to the linear model's get_feature_importances; the model itself should know how to summarise its coef_, and doing so gets more complicated once we have multi-output coef_.
  • it is semantically different from other attributes, not being a sufficient statistic upon which basis the estimator makes predictions

I don't think there is currently sufficient motivation to change, but I could be persuaded.

Ping @kmike?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Discussion

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions