Skip to content

Implement Local Outlier Factor (LOF) for Anomaly Detection #20

@noahgift

Description

@noahgift

Problem Statement

LOF (Local Outlier Factor) detects anomalies based on local density deviation. Unlike global methods, it finds outliers in varying density regions. Currently missing from aprender.

Advantages:

  • Detects local outliers (not just global)
  • Works with varying density clusters
  • No global threshold needed
  • Intuitive score interpretation

Use Cases:

  • Fraud detection in transactions
  • Network intrusion detection
  • Sensor anomaly detection
  • Quality control with varying densities

Proposed Solution

Implement LOF following sklearn API with EXTREME TDD.

Algorithm

Core Idea: Compare local density to neighbors' densities

Steps:

  1. For each point p:
    • Find k-nearest neighbors
    • Compute reachability distance
    • Compute local reachability density (LRD)
  2. Compute LOF score:
    • LOF(p) = avg(LRD(neighbor)) / LRD(p)
    • LOF ≈ 1: similar density to neighbors (normal)
    • LOF >> 1: lower density than neighbors (outlier)

Implementation

API Design:

pub struct LocalOutlierFactor {
    n_neighbors: usize,
    contamination: f32,  // Expected proportion of outliers
    lof_scores: Option<Vec<f32>>,
}

impl LocalOutlierFactor {
    pub fn fit(&mut self, x: &Matrix<f32>) -> Result<(), &'static str>;
    pub fn predict(&self, x: &Matrix<f32>) -> Vec<i32>;  // 1=normal, -1=anomaly
    pub fn score_samples(&self, x: &Matrix<f32>) -> Vec<f32>;  // LOF scores
    pub fn negative_outlier_factor(&self) -> &[f32];  // Opposite of LOF
}

Success Criteria

  • ✅ LOF with k-NN and reachability distance
  • ✅ fit/predict/score_samples methods
  • ✅ contamination parameter for threshold
  • ✅ 12+ tests (including varying density clusters)
  • ✅ Zero clippy warnings
  • ✅ Example: examples/lof_anomaly.rs

Estimated Effort

Timeline: 3-4 days
Complexity: Medium (k-NN search, density calculation)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions