Skip to content

Implement full TurboQuant (turboquant.py) — Algorithm 2 #6

Description

@TheTom

Summary

Implement the full TurboQuant algorithm (Algorithm 2) that combines PolarQuant at (b-1) bits with QJL at 1 bit for near-optimal inner product quantization.

Depends on: #4 (PolarQuant), #5 (QJL)

Paper Reference

  • Paper: arXiv 2504.19874, Algorithm 2 (page 5), Theorem 2
  • Two-stage: MSE quantize at (b-1) bits → QJL on residual → total b bits
  • Eliminates the multiplicative bias of 2/π that MSE-only quantizers have for inner products

Algorithm

Setup:
  1. Instantiate PolarQuant_mse with bit-width b-1
  2. Generate random projection matrix S ∈ R^(d×d), S_ij ~ N(0,1)

Quantize(x):
  1. idx ← PolarQuant.quantize(x)              # MSE quantize at b-1 bits
  2. r ← x - PolarQuant.dequantize(idx)        # Residual
  3. signs ← sign(S @ r)                       # QJL on residual
  4. γ ← ||r||_2                               # Residual norm
  5. Return CompressedVector(idx, signs, γ)

Dequantize(compressed):
  1. x̃_mse ← PolarQuant.dequantize(idx)
  2. x̃_qjl ← √(π/2) / d · γ · S^T @ signs
  3. Return x̃_mse + x̃_qjl

Requirements

CompressedVector dataclass

  • mse_indices: np.ndarray — (b-1)-bit integer indices
  • qjl_signs: np.ndarray — int8 {+1, -1}
  • residual_norms: np.ndarray — float64
  • bit_width: int

TurboQuant class

  • __init__(d, bit_width, seed) — creates PolarQuant(b-1) + QJL
  • quantize(x) → CompressedVector
  • dequantize(compressed) → reconstructed vector
  • compressed_size_bits(n_vectors) → total storage in bits
  • compression_ratio(original_bits_per_value=16) → float
  • Requires bit_width >= 2 (1 bit PolarQuant + 1 bit QJL minimum)

TurboQuantMSE class (simpler variant)

  • MSE-only, no QJL stage. Use all b bits for PolarQuant.
  • For V cache compression where MSE matters more than inner product.

Tests Required (write FIRST)

  1. MSE within paper bounds (parametrized bit_width × dimension):
    • b=2: avg MSE < 0.117 × 3 (3× slack — both stages introduce error)
    • b=3: avg MSE < 0.03 × 3
    • b=4: avg MSE < 0.009 × 3
    • At d = {64, 128, 256}, 500+ unit vectors each
  2. Inner product preservation (single-side):
    • Quantize only x, keep y exact: E[|⟨y,x⟩ - ⟨y, x̂⟩|²] within paper bound
    • Paper bound: √(3π²)·||y||²/d · 1/4^b (Theorem 2)
  3. IP error decreases with bit-width: error(b=2) > error(b=3) > error(b=4)
  4. bit_width=1 raises ValueError: TurboQuant needs at least 2 bits
  5. Zero vector: small reconstruction norm
  6. Deterministic: same seed → same output
  7. Batch matches single: batch quantize/dequantize matches per-vector
  8. Compression ratio: 3-bit vs fp16 should be ~4-5×, 4-bit should be ~3-4×
  9. TurboQuantMSE round-trip: basic MSE test for the simpler variant

Acceptance Criteria

  • Tests written and reviewed with codex BEFORE implementation
  • MSE within 3× paper bounds
  • IP error monotonically decreases with bit-width
  • Single-side IP squared error within paper's theoretical bound
  • codex-review on implementation
  • Roast review on implementation
  • Coverage >95% for turboquant.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Core product worktype:algorithmCore algorithm implementation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions