Skip to content

Improvement on Permutation importance example in release highlights #17313

@venkyyuvy

Description

@venkyyuvy

Describe the issue linked to the documentation

when I look at the example given here, I got confused why the feature names are not sorted with respect to importance.

Suggest a potential alternative/fix

X, y = make_classification(random_state=0, n_features=5,
                           n_informative=3)
rf = RandomForestClassifier(random_state=0).fit(X, y)
result = permutation_importance(rf, X, y, n_repeats=10, random_state=0,
                                n_jobs=-1)

feature_names = np.array([f'x_{i}' for i in range(X.shape[1])])

fig, ax = plt.subplots()
sorted_idx = result.importances_mean.argsort()
ax.boxplot(result.importances[sorted_idx].T,
           vert=False, labels=feature_names[sorted_idx])
ax.set_title("Permutation Importance of each feature")
ax.set_ylabel("Features")
fig.tight_layout()
plt.show()

image

Also, for clarity may be we can set n_redundant=0, hence emphasising that permutation_importance identifies the 3 informative features precisely.

X, y = make_classification(random_state=0, n_features=5,
                           n_informative=3, n_redundant=0)
rf = RandomForestClassifier(random_state=0).fit(X, y)
result = permutation_importance(rf, X, y, n_repeats=10, random_state=0,
                                n_jobs=-1)

feature_names = np.array([f'x_{i}' for i in range(X.shape[1])])

fig, ax = plt.subplots()
sorted_idx = result.importances_mean.argsort()
ax.boxplot(result.importances[sorted_idx].T,
           vert=False, labels=feature_names[sorted_idx])
ax.set_title("Permutation Importance of each feature")
ax.set_ylabel("Features")
fig.tight_layout()
plt.show()

image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions