LinearRegression: Support for small sample sizes or graceful degradation

## Problem

When training LinearRegression with fewer samples than features (underdetermined system), the model fails with:
```
Matrix is not positive definite
```

This happens because the normal equations (X^T X) require n_samples ≥ n_features for the matrix to be invertible via Cholesky decomposition.

## Current Behavior

```rust
let x = Matrix::from_vec(3, 18, features)?; // 3 samples, 18 features
let y = Vector::from_vec(vec![1.0, 0.0, 1.0]);
let mut model = LinearRegression::new();
model.fit(&x, &y)?; // ERROR: Matrix is not positive definite
```

## Desired Behavior

### Option 1: Ridge Regression (L2 Regularization)
Add regularization to make the system solvable:
```rust
let model = LinearRegression::new().with_regularization(0.01);
```

### Option 2: Graceful Error Messages
Return a clear error explaining the constraint:
```rust
Err("LinearRegression requires n_samples >= n_features (got 3 samples, 18 features). Consider using Ridge regression or collecting more training data.")
```

### Option 3: Pseudo-inverse (SVD-based)
Use SVD-based Moore-Penrose pseudo-inverse instead of Cholesky decomposition for underdetermined systems.

## Use Case

Real-world ML applications often start with small datasets and need graceful handling:
- Early-stage training with limited data
- Cross-validation with small folds
- Incremental learning scenarios

## Impact

This affects PMAT's mutation testing ML predictor migration from linfa to aprender. Currently falling back to statistical baseline when model training fails.

## References

- scikit-learn handles this via `Ridge` estimator with alpha parameter
- linfa-linear has `ridge` parameter for regularization
- Pure Rust implementation could use nalgebra's SVD for pseudo-inverse

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LinearRegression: Support for small sample sizes or graceful degradation #4

Problem

Current Behavior

Desired Behavior

Option 1: Ridge Regression (L2 Regularization)

Option 2: Graceful Error Messages

Option 3: Pseudo-inverse (SVD-based)

Use Case

Impact

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

LinearRegression: Support for small sample sizes or graceful degradation #4

Description

Problem

Current Behavior

Desired Behavior

Option 1: Ridge Regression (L2 Regularization)

Option 2: Graceful Error Messages

Option 3: Pseudo-inverse (SVD-based)

Use Case

Impact

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions