Problem
When training LinearRegression with fewer samples than features (underdetermined system), the model fails with:
Matrix is not positive definite
This happens because the normal equations (X^T X) require n_samples ≥ n_features for the matrix to be invertible via Cholesky decomposition.
Current Behavior
let x = Matrix::from_vec(3, 18, features)?; // 3 samples, 18 features
let y = Vector::from_vec(vec![1.0, 0.0, 1.0]);
let mut model = LinearRegression::new();
model.fit(&x, &y)?; // ERROR: Matrix is not positive definite
Desired Behavior
Option 1: Ridge Regression (L2 Regularization)
Add regularization to make the system solvable:
let model = LinearRegression::new().with_regularization(0.01);
Option 2: Graceful Error Messages
Return a clear error explaining the constraint:
Err("LinearRegression requires n_samples >= n_features (got 3 samples, 18 features). Consider using Ridge regression or collecting more training data.")
Option 3: Pseudo-inverse (SVD-based)
Use SVD-based Moore-Penrose pseudo-inverse instead of Cholesky decomposition for underdetermined systems.
Use Case
Real-world ML applications often start with small datasets and need graceful handling:
- Early-stage training with limited data
- Cross-validation with small folds
- Incremental learning scenarios
Impact
This affects PMAT's mutation testing ML predictor migration from linfa to aprender. Currently falling back to statistical baseline when model training fails.
References
- scikit-learn handles this via
Ridge estimator with alpha parameter
- linfa-linear has
ridge parameter for regularization
- Pure Rust implementation could use nalgebra's SVD for pseudo-inverse
Problem
When training LinearRegression with fewer samples than features (underdetermined system), the model fails with:
This happens because the normal equations (X^T X) require n_samples ≥ n_features for the matrix to be invertible via Cholesky decomposition.
Current Behavior
Desired Behavior
Option 1: Ridge Regression (L2 Regularization)
Add regularization to make the system solvable:
Option 2: Graceful Error Messages
Return a clear error explaining the constraint:
Option 3: Pseudo-inverse (SVD-based)
Use SVD-based Moore-Penrose pseudo-inverse instead of Cholesky decomposition for underdetermined systems.
Use Case
Real-world ML applications often start with small datasets and need graceful handling:
Impact
This affects PMAT's mutation testing ML predictor migration from linfa to aprender. Currently falling back to statistical baseline when model training fails.
References
Ridgeestimator with alpha parameterridgeparameter for regularization