Problem Statement
LOF (Local Outlier Factor) detects anomalies based on local density deviation. Unlike global methods, it finds outliers in varying density regions. Currently missing from aprender.
Advantages:
- Detects local outliers (not just global)
- Works with varying density clusters
- No global threshold needed
- Intuitive score interpretation
Use Cases:
- Fraud detection in transactions
- Network intrusion detection
- Sensor anomaly detection
- Quality control with varying densities
Proposed Solution
Implement LOF following sklearn API with EXTREME TDD.
Algorithm
Core Idea: Compare local density to neighbors' densities
Steps:
- For each point p:
- Find k-nearest neighbors
- Compute reachability distance
- Compute local reachability density (LRD)
- Compute LOF score:
- LOF(p) = avg(LRD(neighbor)) / LRD(p)
- LOF ≈ 1: similar density to neighbors (normal)
- LOF >> 1: lower density than neighbors (outlier)
Implementation
API Design:
pub struct LocalOutlierFactor {
n_neighbors: usize,
contamination: f32, // Expected proportion of outliers
lof_scores: Option<Vec<f32>>,
}
impl LocalOutlierFactor {
pub fn fit(&mut self, x: &Matrix<f32>) -> Result<(), &'static str>;
pub fn predict(&self, x: &Matrix<f32>) -> Vec<i32>; // 1=normal, -1=anomaly
pub fn score_samples(&self, x: &Matrix<f32>) -> Vec<f32>; // LOF scores
pub fn negative_outlier_factor(&self) -> &[f32]; // Opposite of LOF
}
Success Criteria
- ✅ LOF with k-NN and reachability distance
- ✅ fit/predict/score_samples methods
- ✅ contamination parameter for threshold
- ✅ 12+ tests (including varying density clusters)
- ✅ Zero clippy warnings
- ✅ Example: examples/lof_anomaly.rs
Estimated Effort
Timeline: 3-4 days
Complexity: Medium (k-NN search, density calculation)
Problem Statement
LOF (Local Outlier Factor) detects anomalies based on local density deviation. Unlike global methods, it finds outliers in varying density regions. Currently missing from aprender.
Advantages:
Use Cases:
Proposed Solution
Implement LOF following sklearn API with EXTREME TDD.
Algorithm
Core Idea: Compare local density to neighbors' densities
Steps:
Implementation
API Design:
Success Criteria
Estimated Effort
Timeline: 3-4 days
Complexity: Medium (k-NN search, density calculation)