Summary
Implement the full TurboQuant algorithm (Algorithm 2) that combines PolarQuant at (b-1) bits with QJL at 1 bit for near-optimal inner product quantization.
Depends on: #4 (PolarQuant), #5 (QJL)
Paper Reference
- Paper: arXiv 2504.19874, Algorithm 2 (page 5), Theorem 2
- Two-stage: MSE quantize at (b-1) bits → QJL on residual → total b bits
- Eliminates the multiplicative bias of 2/π that MSE-only quantizers have for inner products
Algorithm
Setup:
1. Instantiate PolarQuant_mse with bit-width b-1
2. Generate random projection matrix S ∈ R^(d×d), S_ij ~ N(0,1)
Quantize(x):
1. idx ← PolarQuant.quantize(x) # MSE quantize at b-1 bits
2. r ← x - PolarQuant.dequantize(idx) # Residual
3. signs ← sign(S @ r) # QJL on residual
4. γ ← ||r||_2 # Residual norm
5. Return CompressedVector(idx, signs, γ)
Dequantize(compressed):
1. x̃_mse ← PolarQuant.dequantize(idx)
2. x̃_qjl ← √(π/2) / d · γ · S^T @ signs
3. Return x̃_mse + x̃_qjl
Requirements
CompressedVector dataclass
mse_indices: np.ndarray — (b-1)-bit integer indices
qjl_signs: np.ndarray — int8 {+1, -1}
residual_norms: np.ndarray — float64
bit_width: int
TurboQuant class
__init__(d, bit_width, seed) — creates PolarQuant(b-1) + QJL
quantize(x) → CompressedVector
dequantize(compressed) → reconstructed vector
compressed_size_bits(n_vectors) → total storage in bits
compression_ratio(original_bits_per_value=16) → float
- Requires
bit_width >= 2 (1 bit PolarQuant + 1 bit QJL minimum)
TurboQuantMSE class (simpler variant)
- MSE-only, no QJL stage. Use all b bits for PolarQuant.
- For V cache compression where MSE matters more than inner product.
Tests Required (write FIRST)
- MSE within paper bounds (parametrized bit_width × dimension):
- b=2: avg MSE < 0.117 × 3 (3× slack — both stages introduce error)
- b=3: avg MSE < 0.03 × 3
- b=4: avg MSE < 0.009 × 3
- At d = {64, 128, 256}, 500+ unit vectors each
- Inner product preservation (single-side):
- Quantize only x, keep y exact: E[|⟨y,x⟩ - ⟨y, x̂⟩|²] within paper bound
- Paper bound: √(3π²)·||y||²/d · 1/4^b (Theorem 2)
- IP error decreases with bit-width: error(b=2) > error(b=3) > error(b=4)
- bit_width=1 raises ValueError: TurboQuant needs at least 2 bits
- Zero vector: small reconstruction norm
- Deterministic: same seed → same output
- Batch matches single: batch quantize/dequantize matches per-vector
- Compression ratio: 3-bit vs fp16 should be ~4-5×, 4-bit should be ~3-4×
- TurboQuantMSE round-trip: basic MSE test for the simpler variant
Acceptance Criteria
Summary
Implement the full TurboQuant algorithm (Algorithm 2) that combines PolarQuant at (b-1) bits with QJL at 1 bit for near-optimal inner product quantization.
Depends on: #4 (PolarQuant), #5 (QJL)
Paper Reference
Algorithm
Requirements
CompressedVector dataclass
mse_indices: np.ndarray — (b-1)-bit integer indicesqjl_signs: np.ndarray — int8 {+1, -1}residual_norms: np.ndarray — float64bit_width: intTurboQuant class
__init__(d, bit_width, seed)— creates PolarQuant(b-1) + QJLquantize(x)→ CompressedVectordequantize(compressed)→ reconstructed vectorcompressed_size_bits(n_vectors)→ total storage in bitscompression_ratio(original_bits_per_value=16)→ floatbit_width >= 2(1 bit PolarQuant + 1 bit QJL minimum)TurboQuantMSE class (simpler variant)
Tests Required (write FIRST)
Acceptance Criteria