Problem Statement
t-SNE (t-Distributed Stochastic Neighbor Embedding) is widely used for visualizing high-dimensional data in 2D/3D. Currently missing from aprender.
Use Cases:
- Visualize MNIST digits
- Explore embeddings (word vectors, image features)
- Cluster visualization
- Exploratory data analysis
Advantages:
- Preserves local structure better than PCA
- Reveals clusters visually
- Non-linear dimensionality reduction
Proposed Solution
Implement t-SNE following sklearn API with EXTREME TDD.
Algorithm
Core Idea: Preserve pairwise similarities in lower dimensions
Steps:
- Compute pairwise similarities in high-D (Gaussian)
- Initialize low-D embedding (random or PCA)
- Compute pairwise similarities in low-D (Student's t)
- Minimize KL divergence via gradient descent
- Use early exaggeration for better separation
Implementation
Trait: Transformer
API Design:
pub struct TSNE {
n_components: usize, // Usually 2 or 3
perplexity: f32, // Balance local vs global structure (5-50)
learning_rate: f32,
n_iter: usize,
embedding: Option<Matrix<f32>>,
}
impl Transformer for TSNE {
fn fit(&mut self, x: &Matrix<f32>) -> Result<(), &'static str>;
fn transform(&self, x: &Matrix<f32>) -> Result<Matrix<f32>, &'static str>;
}
Success Criteria
- ✅ t-SNE with gradient descent optimization
- ✅ Perplexity parameter
- ✅ Early exaggeration
- ✅ 10+ tests
- ✅ Zero clippy warnings
- ✅ Example: examples/tsne_mnist.rs (visualize digits)
Estimated Effort
Timeline: 5-7 days
Complexity: High (complex optimization, pairwise distances O(n²))
Problem Statement
t-SNE (t-Distributed Stochastic Neighbor Embedding) is widely used for visualizing high-dimensional data in 2D/3D. Currently missing from aprender.
Use Cases:
Advantages:
Proposed Solution
Implement t-SNE following sklearn API with EXTREME TDD.
Algorithm
Core Idea: Preserve pairwise similarities in lower dimensions
Steps:
Implementation
Trait:
TransformerAPI Design:
Success Criteria
Estimated Effort
Timeline: 5-7 days
Complexity: High (complex optimization, pairwise distances O(n²))