TurboQuant quality degradation on Q4_K_M base weight models + turbo4-V anomaly


## 
**Update: Asymmetric K/V rescue and turbo4-V anomaly**

Tested asymmetric K/V on Qwen2.5-7B-Instruct-Q4_K_M:

| K | V | PPL | Notes |
|---|---|-----|-------|
| q8_0 | q8_0 | 6.58 | baseline |
| q8_0 | turbo3 | **6.68** | +1.6% — quality rescued |
| turbo4 | turbo3 | 257 | K precision matters |
| turbo4 | turbo4 | 218 | symmetric turbo4 |
| turbo3 | turbo3 | 3556 | symmetric turbo3 |
| q8_0 | turbo4 | **45393** | turbo4-V catastrophic — possible bug |

Key findings:
- q8_0-K + turbo3-V recovers quality to within 1.6% of baseline even on Q4_K_M
- K quantization precision is the dominant quality factor
- turbo4-V is anomalously worse than turbo3-V (45393 vs 6.68 with same q8_0-K), suggesting a turbo4-specific V path issue
- Verifying turbo4-V on phi-4-Q8_0 now to determine if this is model-specific or a turbo4 bug



# BELOW IS FROM THE ORIGINAL ISSUE FOR POSTERITY. NOW INVESTGATING QUALITY DEGRADATION on Q4_K_M base weight models.

## Description

turbo3 and turbo4 KV cache quantization produce catastrophic perplexity on M2 Pro (Apple8) hardware while working correctly on M5 Max (Apple10).

## Reproduction

```bash
# M2 Pro — PPL 5321 (BROKEN)
./build/bin/llama-perplexity -m Qwen2.5-7B-Instruct-Q4_K_M.gguf \
  -ngl 99 -fa 1 -ctk turbo3 -ctv turbo3 -c 512 -f wiki.test.raw

# M5 Max — PPL 6.60 (correct)
# Same command, same commit, same model

# q8_0 on M2 Pro — PPL 8.03 (correct)
./build/bin/llama-perplexity -m Qwen2.5-7B-Instruct-Q4_K_M.gguf \
  -ngl 99 -fa 1 -ctk q8_0 -ctv q8_0 -c 512 -f wiki.test.raw
```

## Environment

- M2 Pro 32GB (Mac Mini), MTLGPUFamilyApple8
- M5 Max 128GB, MTLGPUFamilyApple10
- Branch: feature/turboquant-kv-cache
- Tested on commits 43f7d3d, 3380d3c, 172fc85, 4cf7145 — all produce same bad PPL on M2
- Models tested: Qwen2.5-7B-Instruct-Q4_K_M (PPL 5321), Qwen2.5-1.5B-Instruct-Q4_K_M (PPL 8643)

## Investigation so far

- **q8_0 PPL is fine on M2** — rules out general Metal FA bug
- **Same commit gives correct PPL on M5** — M2-specific
- **4-mag LUT math verified correct** — both 4-mag and 8-entry paths produce identical values
- **Pre-signalnine-merge commit also broken** — not introduced by PR #3
- **Binary search in progress** — going back through commit history to find introduction point
- **Decode speed benchmarks were unaffected** (speed was real, just computing on wrong attention scores)

## Possible causes under investigation

1. M2 uses `TURBO_USE_4MAG=1` path (pre-M5 hardware detection). M5 uses `TURBO_USE_4MAG=0`. While the dequant math is verified identical, there could be a precision or ordering issue in the FA kernel's inner loop that manifests differently.
2. Metal WHT kernel hardcodes 128-element groups, ignoring `group_size` from op_params. On head_dim=128 models this should be fine, but needs verification.
3. Some interaction between the 4-mag dequant and the Q pre-rotation or V inverse rotation that produces subtly wrong scores.

## Impact

- All turbo KV cache types are broken on M2/M1/M3/M4 (pre-M5) hardware
- Decode speed benchmarks were real but computed on wrong attention scores
- Asymmetric K/V work cannot be validated on M2 until this is fixed

## Priority

High — blocks M2 turbo quality validation and any pre-M5 deployment.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TurboQuant quality degradation on Q4_K_M base weight models + turbo4-V anomaly #27

BELOW IS FROM THE ORIGINAL ISSUE FOR POSTERITY. NOW INVESTGATING QUALITY DEGRADATION on Q4_K_M base weight models.

Description

Reproduction

Environment

Investigation so far

Possible causes under investigation

Impact

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

K	V	PPL	Notes
q8_0	q8_0	6.58	baseline
q8_0	turbo3	6.68	+1.6% — quality rescued
turbo4	turbo3	257	K precision matters
turbo4	turbo4	218	symmetric turbo4
turbo3	turbo3	3556	symmetric turbo3
q8_0	turbo4	45393	turbo4-V catastrophic — possible bug

TurboQuant quality degradation on Q4_K_M base weight models + turbo4-V anomaly #27

Description

BELOW IS FROM THE ORIGINAL ISSUE FOR POSTERITY. NOW INVESTGATING QUALITY DEGRADATION on Q4_K_M base weight models.

Description

Reproduction

Environment

Investigation so far

Possible causes under investigation

Impact

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions