Experiment: turbo2 (2-bit) format on Metal

## Hypothesis
2-bit quantization (4 centroids) trades quality for maximum compression, enabling 70B+ models on consumer hardware.

## Background
4 centroids instead of 8. On Metal's constrained constant cache, fewer reads (2 vs 4) = bigger relative win. Quality will degrade but useful for extreme memory pressure.

## What to test
- Implement 2-bit codebook (4 centroids + norm)
- PPL comparison vs turbo3 and q4_0
- Decode speed (fewer LUT reads should help on M2 Pro especially)
- NIAH at 2-bit
- Memory savings at 128K context

## Expected outcome
~7x compression, significant PPL degradation but potentially usable for large models.

## Priority
Low — exploratory.

## Source
AutoRepl: TODO-010 (buun, fork_dc582a)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: turbo2 (2-bit) format on Metal #8

Hypothesis

Background

What to test

Expected outcome

Priority

Source

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Experiment: turbo2 (2-bit) format on Metal #8

Description

Hypothesis

Background

What to test

Expected outcome

Priority

Source

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions