Problem Statement
Spectral Clustering uses graph theory and eigenvalue decomposition to find clusters in non-convex shapes. Currently missing from aprender.
Advantages over K-Means:
- Handles non-convex clusters
- Based on graph connectivity
- Works with similarity matrices
- Captures manifold structure
Use Cases:
- Image segmentation
- Community detection in networks
- Clustering on graphs
- Non-linear cluster shapes
Proposed Solution
Implement Spectral Clustering following sklearn API with EXTREME TDD.
Algorithm
Steps:
- Construct similarity graph (k-NN or ε-neighborhood)
- Build affinity matrix W (Gaussian similarity)
- Compute graph Laplacian: L = D - W
- Eigendecomposition: find k smallest eigenvectors
- Cluster in eigenspace using K-Means
Implementation
Trait: UnsupervisedEstimator
API Design:
pub struct SpectralClustering {
n_clusters: usize,
affinity: Affinity, // RBF, KNN, Precomputed
gamma: f32, // Kernel coefficient for RBF
labels: Option<Vec<usize>>,
}
pub enum Affinity {
RBF, // Gaussian kernel
KNN, // k-nearest neighbors
Precomputed, // User-provided affinity matrix
}
impl UnsupervisedEstimator for SpectralClustering {
type Labels = Vec<usize>;
fn fit(&mut self, x: &Matrix<f32>) -> Result<(), &'static str>;
fn predict(&self, x: &Matrix<f32>) -> Vec<usize>;
}
Success Criteria
- ✅ SpectralClustering with graph Laplacian
- ✅ Multiple affinity types (RBF, KNN)
- ✅ Eigenvalue solver integration
- ✅ 12+ tests (including non-convex shapes)
- ✅ Zero clippy warnings
- ✅ Example: examples/spectral_clustering.rs
Estimated Effort
Timeline: 4-5 days
Complexity: High (eigendecomposition, graph construction)
Problem Statement
Spectral Clustering uses graph theory and eigenvalue decomposition to find clusters in non-convex shapes. Currently missing from aprender.
Advantages over K-Means:
Use Cases:
Proposed Solution
Implement Spectral Clustering following sklearn API with EXTREME TDD.
Algorithm
Steps:
Implementation
Trait:
UnsupervisedEstimatorAPI Design:
Success Criteria
Estimated Effort
Timeline: 4-5 days
Complexity: High (eigendecomposition, graph construction)