Skip to content

[v0.5.0] Implement Feature Importance for Random Forests #32

@noahgift

Description

@noahgift

Feature Description

Implement feature importance calculation for both RandomForestClassifier and RandomForestRegressor. Feature importance helps users understand which features contribute most to predictions.

Implementation Requirements

Core Methods:

  • feature_importances(&self) -> Option<Vec<f32>> - Returns importance scores for each feature
  • Based on Gini impurity decrease (classification) or variance reduction (regression)
  • Aggregate importances across all trees in the forest

Algorithm:

  1. For each tree, track impurity decrease at each split
  2. Accumulate importance by feature index
  3. Normalize: importance_i = total_decrease_i / sum(all_decreases)
  4. Average across all trees

Testing:

  • Comprehensive tests for both classifier and regressor
  • Verify importances sum to 1.0
  • Test with known feature relationships
  • Verify reproducibility with random_state

Examples:

  • Update examples/random_forest_iris.rs to show feature importance
  • Update examples/random_forest_regression.rs to show feature importance
  • Add visualization of top features

Documentation:

  • Add theory section to book/src/ml-fundamentals/ensemble-methods.md
  • Explain Gini importance vs permutation importance
  • When to use feature importance for model interpretation

Acceptance Criteria

  • feature_importances() method for RandomForestClassifier
  • feature_importances() method for RandomForestRegressor
  • Importance scores sum to 1.0
  • 8+ comprehensive tests
  • Updated examples showing feature importance
  • Documentation with theory and usage
  • Zero clippy warnings
  • All tests pass

References

  • Breiman (2001): Random Forests - Section on variable importance
  • sklearn RandomForestClassifier.feature_importances_

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions