Skip to content

Implement Content-Based Recommender with HNSW (Phase 1) #71

@noahgift

Description

@noahgift

Overview

Implement Phase 1 of the collaborative filtering recommendation system as specified in docs/specs/collab-filter-spec.md v1.1.

Scope

Components to Implement:

  1. HNSW Index (src/index/hnsw.rs)

    • Hierarchical Navigable Small World graph structure
    • O(log n × d) approximate nearest neighbor search
    • Multi-layer skip-list navigation
    • Configurable M (max connections) and efConstruction parameters
  2. Incremental IDF Tracker (src/text/incremental_idf.rs)

    • Streaming IDF updates with exponential decay
    • Avoids IDF drift in production (Toyota Way Jidoka requirement)
    • Integration with existing TF-IDF vectorizer
  3. Content-Based Recommender (src/recommend/content_based.rs)

    • Item-to-item similarity using HNSW index
    • Integration with incremental IDF for TF-IDF features
    • <100ms query latency for 1M items (benchmark requirement)

Quality Requirements

  • Test Coverage: ≥95% (property tests + unit tests)
  • Property Tests:
    • HNSW graph connectivity invariants
    • Search quality (recall ≥95% vs brute force)
    • IDF monotonicity (document frequency increases → IDF decreases)
  • Benchmarks: benches/recommend.rs
    • Query time <100ms for 1M items
    • Indexing time O(n log n × d)
  • Example: examples/recommend_content.rs
  • Documentation: Book chapter book/src/examples/content-recommender.md

Timeline

Estimate: 3-5 days (HIGH priority, Week 1)

References

  • [11] Indyk & Motwani (1998) - Locality-Sensitive Hashing
  • [12] Malkov & Yashunin (2018) - Efficient and robust approximate nearest neighbor search using HNSW
  • [15] Sculley et al. (2015) - Hidden Technical Debt in Machine Learning Systems

Acceptance Criteria

  • HNSW index implemented with property tests
  • Incremental IDF tracker with decay mechanism
  • Content-Based Recommender with <100ms latency benchmark
  • Example code demonstrating movie/article recommendations
  • Book chapter with usage tutorial
  • All quality gates pass (make tier3)
  • Test coverage ≥95%

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions