Overview
Implement Decision Tree Regressor using the CART (Classification and Regression Trees) algorithm for regression tasks.
Background
We currently have DecisionTreeClassifier for classification. This issue extends the decision tree implementation to support regression by predicting continuous values instead of discrete classes.
Implementation Details
Core Algorithm
- Splitting criterion: Mean Squared Error (MSE) or Mean Absolute Error (MAE)
- Prediction: Mean of target values in leaf nodes
- Tree structure: Reuse existing
TreeNode structure or extend for regression
API Design
pub struct DecisionTreeRegressor {
max_depth: Option<usize>,
min_samples_split: usize,
min_samples_leaf: usize,
criterion: RegressionCriterion, // MSE or MAE
random_state: Option<u64>,
}
pub enum RegressionCriterion {
MeanSquaredError,
MeanAbsoluteError,
}
impl Estimator for DecisionTreeRegressor {
fn fit(&mut self, x: &Matrix, y: &Vector) -> Result<(), &'static str>;
fn predict(&self, x: &Matrix) -> Vector;
fn score(&self, x: &Matrix, y: &Vector) -> f32; // R² score
}
Key Features
- Configurable max_depth (prevent overfitting)
- min_samples_split and min_samples_leaf (pruning parameters)
- Support for MSE and MAE splitting criteria
- Reproducible with random_state for tie-breaking
- feature_importances() method
Acceptance Criteria (EXTREME TDD)
Tests (15+ comprehensive tests)
Examples
Documentation
Technical Design
MSE Criterion
For a split at value v on feature f:
MSE(split) = (N_left / N) * Var(y_left) + (N_right / N) * Var(y_right)
Choose split that minimizes MSE.
Prediction
For a new sample, traverse tree to leaf node and return:
prediction = mean(y_samples_in_leaf)
Feature Importance
importance(f) = Σ (weighted_samples * improvement_in_MSE) for all splits on feature f
Complexity Analysis
- Training: O(n · d · n log n) - n samples, d features
- Prediction: O(log n) average, O(n) worst case (unbalanced tree)
- Space: O(n) for tree structure
Benefits
- Non-linear regression: Captures non-linear relationships without feature engineering
- Interpretability: Tree structure is human-readable
- Feature importance: Automatically identifies important features
- No assumptions: Works without assuming data distribution
- Foundation for RF: Required for Random Forest Regression
Testing Strategy
- Unit tests for splitting criteria
- Integration tests with known datasets
- Property tests: predictions within range of training data
- Comparison tests: matches sklearn on simple examples
Related Work
- Existing:
DecisionTreeClassifier (src/tree/mod.rs)
- Future:
RandomForestRegressor (Issue #TBD)
- Depends on: Regression metrics (R², MSE) - already implemented
Priority
High - First step toward v0.5.0 "Regression Trees & Advanced Ensemble Methods"
Overview
Implement Decision Tree Regressor using the CART (Classification and Regression Trees) algorithm for regression tasks.
Background
We currently have
DecisionTreeClassifierfor classification. This issue extends the decision tree implementation to support regression by predicting continuous values instead of discrete classes.Implementation Details
Core Algorithm
TreeNodestructure or extend for regressionAPI Design
Key Features
Acceptance Criteria (EXTREME TDD)
Tests (15+ comprehensive tests)
Examples
Documentation
Technical Design
MSE Criterion
For a split at value
von featuref:Choose split that minimizes MSE.
Prediction
For a new sample, traverse tree to leaf node and return:
Feature Importance
Complexity Analysis
Benefits
Testing Strategy
Related Work
DecisionTreeClassifier(src/tree/mod.rs)RandomForestRegressor(Issue #TBD)Priority
High - First step toward v0.5.0 "Regression Trees & Advanced Ensemble Methods"