Skip to content

Implement t-SNE for Dimensionality Reduction and Visualization #18

@noahgift

Description

@noahgift

Problem Statement

t-SNE (t-Distributed Stochastic Neighbor Embedding) is widely used for visualizing high-dimensional data in 2D/3D. Currently missing from aprender.

Use Cases:

  • Visualize MNIST digits
  • Explore embeddings (word vectors, image features)
  • Cluster visualization
  • Exploratory data analysis

Advantages:

  • Preserves local structure better than PCA
  • Reveals clusters visually
  • Non-linear dimensionality reduction

Proposed Solution

Implement t-SNE following sklearn API with EXTREME TDD.

Algorithm

Core Idea: Preserve pairwise similarities in lower dimensions

Steps:

  1. Compute pairwise similarities in high-D (Gaussian)
  2. Initialize low-D embedding (random or PCA)
  3. Compute pairwise similarities in low-D (Student's t)
  4. Minimize KL divergence via gradient descent
  5. Use early exaggeration for better separation

Implementation

Trait: Transformer

API Design:

pub struct TSNE {
    n_components: usize,  // Usually 2 or 3
    perplexity: f32,      // Balance local vs global structure (5-50)
    learning_rate: f32,
    n_iter: usize,
    embedding: Option<Matrix<f32>>,
}

impl Transformer for TSNE {
    fn fit(&mut self, x: &Matrix<f32>) -> Result<(), &'static str>;
    fn transform(&self, x: &Matrix<f32>) -> Result<Matrix<f32>, &'static str>;
}

Success Criteria

  • ✅ t-SNE with gradient descent optimization
  • ✅ Perplexity parameter
  • ✅ Early exaggeration
  • ✅ 10+ tests
  • ✅ Zero clippy warnings
  • ✅ Example: examples/tsne_mnist.rs (visualize digits)

Estimated Effort

Timeline: 5-7 days
Complexity: High (complex optimization, pairwise distances O(n²))

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions