Implement full TurboQuant (turboquant.py) — Algorithm 2

## Summary

Implement the full TurboQuant algorithm (Algorithm 2) that combines PolarQuant at (b-1) bits with QJL at 1 bit for near-optimal inner product quantization.

**Depends on:** #4 (PolarQuant), #5 (QJL)

## Paper Reference

- Paper: arXiv 2504.19874, Algorithm 2 (page 5), Theorem 2
- Two-stage: MSE quantize at (b-1) bits → QJL on residual → total b bits
- Eliminates the multiplicative bias of 2/π that MSE-only quantizers have for inner products

## Algorithm

```
Setup:
  1. Instantiate PolarQuant_mse with bit-width b-1
  2. Generate random projection matrix S ∈ R^(d×d), S_ij ~ N(0,1)

Quantize(x):
  1. idx ← PolarQuant.quantize(x)              # MSE quantize at b-1 bits
  2. r ← x - PolarQuant.dequantize(idx)        # Residual
  3. signs ← sign(S @ r)                       # QJL on residual
  4. γ ← ||r||_2                               # Residual norm
  5. Return CompressedVector(idx, signs, γ)

Dequantize(compressed):
  1. x̃_mse ← PolarQuant.dequantize(idx)
  2. x̃_qjl ← √(π/2) / d · γ · S^T @ signs
  3. Return x̃_mse + x̃_qjl
```

## Requirements

### CompressedVector dataclass
- `mse_indices`: np.ndarray — (b-1)-bit integer indices
- `qjl_signs`: np.ndarray — int8 {+1, -1}
- `residual_norms`: np.ndarray — float64
- `bit_width`: int

### TurboQuant class
- `__init__(d, bit_width, seed)` — creates PolarQuant(b-1) + QJL
- `quantize(x)` → CompressedVector
- `dequantize(compressed)` → reconstructed vector
- `compressed_size_bits(n_vectors)` → total storage in bits
- `compression_ratio(original_bits_per_value=16)` → float
- Requires `bit_width >= 2` (1 bit PolarQuant + 1 bit QJL minimum)

### TurboQuantMSE class (simpler variant)
- MSE-only, no QJL stage. Use all b bits for PolarQuant.
- For V cache compression where MSE matters more than inner product.

## Tests Required (write FIRST)

1. **MSE within paper bounds** (parametrized bit_width × dimension):
   - b=2: avg MSE < 0.117 × 3 (3× slack — both stages introduce error)
   - b=3: avg MSE < 0.03 × 3
   - b=4: avg MSE < 0.009 × 3
   - At d = {64, 128, 256}, 500+ unit vectors each
2. **Inner product preservation (single-side)**: 
   - Quantize only x, keep y exact: E[|⟨y,x⟩ - ⟨y, x̂⟩|²] within paper bound
   - Paper bound: √(3π²)·||y||²/d · 1/4^b (Theorem 2)
3. **IP error decreases with bit-width**: error(b=2) > error(b=3) > error(b=4)
4. **bit_width=1 raises ValueError**: TurboQuant needs at least 2 bits
5. **Zero vector**: small reconstruction norm
6. **Deterministic**: same seed → same output
7. **Batch matches single**: batch quantize/dequantize matches per-vector
8. **Compression ratio**: 3-bit vs fp16 should be ~4-5×, 4-bit should be ~3-4×
9. **TurboQuantMSE round-trip**: basic MSE test for the simpler variant

## Acceptance Criteria

- [ ] Tests written and reviewed with codex BEFORE implementation
- [ ] MSE within 3× paper bounds
- [ ] IP error monotonically decreases with bit-width
- [ ] Single-side IP squared error within paper's theoretical bound
- [ ] codex-review on implementation
- [ ] Roast review on implementation
- [ ] Coverage >95% for turboquant.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement full TurboQuant (turboquant.py) — Algorithm 2 #6

Summary

Paper Reference

Algorithm

Requirements

CompressedVector dataclass

TurboQuant class

TurboQuantMSE class (simpler variant)

Tests Required (write FIRST)

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Implement full TurboQuant (turboquant.py) — Algorithm 2 #6

Description

Summary

Paper Reference

Algorithm

Requirements

CompressedVector dataclass

TurboQuant class

TurboQuantMSE class (simpler variant)

Tests Required (write FIRST)

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions