Summary
Implement optimal scalar quantization codebooks for the post-rotation Beta/Gaussian distribution. These centroids are the quantization levels that PolarQuant snaps each rotated coordinate to.
Paper Reference
- Paper: arXiv 2504.19874, Section 1.3, Theorem 1
- After random rotation, each coordinate follows Beta(d/2, d/2) which converges to N(0, 1/d) for large d
- Paper provides closed-form optimal centroids for 1-bit and 2-bit
- For higher bit-widths, use Lloyd-Max algorithm on the Gaussian approximation
Requirements
Closed-form centroids
- 1-bit (2 centroids):
±√(2/πd) — verify against paper page 5, Theorem 1
- 2-bit (4 centroids):
{±0.453/√d, ±1.51/√d} — verify against paper Table 1
- Cross-check these exact values against the PDF at ~/Downloads/2504.19874v1.pdf
Lloyd-Max algorithm for b ≥ 3
- Optimal scalar quantization of N(0, σ²) where σ = 1/√d
- Initialize with uniform quantiles of the Gaussian
- Iterate: update boundaries (midpoints of centroids) → update centroids (conditional expectations)
_gaussian_conditional_expectation(σ, a, b) = E[X | a < X < b] for X ~ N(0, σ²)
- Formula: σ · (φ(a/σ) - φ(b/σ)) / (Φ(b/σ) - Φ(a/σ))
- Must handle a = -∞ and b = +∞ (tail regions)
- Convergence: 100 iterations should be sufficient (verify)
Nearest centroid lookup
nearest_centroid_indices(values, centroids) — vectorized
- Use
np.searchsorted on midpoints between sorted centroids
- Must handle values outside centroid range (clamp to nearest)
Tests Required (write FIRST)
- 1-bit centroids correct: verify ±√(2/πd) for d = {64, 128, 256, 1024}
- 2-bit centroids correct: verify {±0.453/√d, ±1.51/√d}
- Centroids are sorted: for all bit-widths
- Correct number of centroids: 2^b centroids for b-bit
- Lloyd convergence: for b=3,4 — centroids should be symmetric around 0
- Nearest centroid correctness: hand-crafted examples with known answers
- Nearest centroid vectorized: batch matches single-element
- Edge case — value at midpoint: should consistently go to one side
- Edge case — value far outside range: should clamp to nearest endpoint
- Gaussian conditional expectation: verify against known integrals
- E[X | X > 0] for X ~ N(0,1) = √(2/π) ≈ 0.7979
- E[X | -1 < X < 1] for X ~ N(0,1) ≈ 0.0 (symmetric)
- Numerical stability: extreme tail regions (a=5σ, b=∞) should not produce NaN/inf
Acceptance Criteria
Summary
Implement optimal scalar quantization codebooks for the post-rotation Beta/Gaussian distribution. These centroids are the quantization levels that PolarQuant snaps each rotated coordinate to.
Paper Reference
Requirements
Closed-form centroids
±√(2/πd)— verify against paper page 5, Theorem 1{±0.453/√d, ±1.51/√d}— verify against paper Table 1Lloyd-Max algorithm for b ≥ 3
_gaussian_conditional_expectation(σ, a, b)= E[X | a < X < b] for X ~ N(0, σ²)Nearest centroid lookup
nearest_centroid_indices(values, centroids)— vectorizednp.searchsortedon midpoints between sorted centroidsTests Required (write FIRST)
Acceptance Criteria