Skip to content

Implement codebook construction for PolarQuant (codebook.py) #3

Description

@TheTom

Summary

Implement optimal scalar quantization codebooks for the post-rotation Beta/Gaussian distribution. These centroids are the quantization levels that PolarQuant snaps each rotated coordinate to.

Paper Reference

  • Paper: arXiv 2504.19874, Section 1.3, Theorem 1
  • After random rotation, each coordinate follows Beta(d/2, d/2) which converges to N(0, 1/d) for large d
  • Paper provides closed-form optimal centroids for 1-bit and 2-bit
  • For higher bit-widths, use Lloyd-Max algorithm on the Gaussian approximation

Requirements

Closed-form centroids

  • 1-bit (2 centroids): ±√(2/πd) — verify against paper page 5, Theorem 1
  • 2-bit (4 centroids): {±0.453/√d, ±1.51/√d} — verify against paper Table 1
  • Cross-check these exact values against the PDF at ~/Downloads/2504.19874v1.pdf

Lloyd-Max algorithm for b ≥ 3

  • Optimal scalar quantization of N(0, σ²) where σ = 1/√d
  • Initialize with uniform quantiles of the Gaussian
  • Iterate: update boundaries (midpoints of centroids) → update centroids (conditional expectations)
  • _gaussian_conditional_expectation(σ, a, b) = E[X | a < X < b] for X ~ N(0, σ²)
    • Formula: σ · (φ(a/σ) - φ(b/σ)) / (Φ(b/σ) - Φ(a/σ))
    • Must handle a = -∞ and b = +∞ (tail regions)
  • Convergence: 100 iterations should be sufficient (verify)

Nearest centroid lookup

  • nearest_centroid_indices(values, centroids) — vectorized
  • Use np.searchsorted on midpoints between sorted centroids
  • Must handle values outside centroid range (clamp to nearest)

Tests Required (write FIRST)

  1. 1-bit centroids correct: verify ±√(2/πd) for d = {64, 128, 256, 1024}
  2. 2-bit centroids correct: verify {±0.453/√d, ±1.51/√d}
  3. Centroids are sorted: for all bit-widths
  4. Correct number of centroids: 2^b centroids for b-bit
  5. Lloyd convergence: for b=3,4 — centroids should be symmetric around 0
  6. Nearest centroid correctness: hand-crafted examples with known answers
  7. Nearest centroid vectorized: batch matches single-element
  8. Edge case — value at midpoint: should consistently go to one side
  9. Edge case — value far outside range: should clamp to nearest endpoint
  10. Gaussian conditional expectation: verify against known integrals
    • E[X | X > 0] for X ~ N(0,1) = √(2/π) ≈ 0.7979
    • E[X | -1 < X < 1] for X ~ N(0,1) ≈ 0.0 (symmetric)
  11. Numerical stability: extreme tail regions (a=5σ, b=∞) should not produce NaN/inf

Acceptance Criteria

  • Tests written and reviewed with codex BEFORE implementation
  • 1-bit and 2-bit centroids verified against paper PDF
  • Lloyd algorithm produces symmetric codebooks
  • Implementation passes all tests
  • codex-review on implementation
  • Roast review on implementation
  • Coverage >95% for codebook.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Core product worktype:algorithmCore algorithm implementation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions