Commit 965a6ca
feat: asymmetric K/V support + q8_0 × turbo FA kernel instantiations
Add full asymmetric K/V quantization support for Metal flash attention:
- Pipeline naming uses k{type}_v{type} format for all FA kernels (335 total),
eliminating underscore ambiguity in type names
- 90 turbo × turbo asymmetric instantiations (turbo2/3/4 all combinations)
- 150 q8_0 × turbo asymmetric instantiations (both directions, all head dims)
- Gatekeeper and assertion updated to allow turbo × turbo and q8_0 × turbo pairs
- Zero regression on existing symmetric paths (validated across 4 models, 2 machines)
The q8_0 × turbo kernels fix a silent dispatch failure where mixed q8_0-K + turbo-V
configs would NaN (turbo4-V) or fall to undefined paths (turbo3-V). This enables
the asymmetric quality rescue: q8_0-K + turbo-V recovers near-baseline PPL on
low-bit models where symmetric turbo-K degrades.
Validated on Metal (M2 Pro + M5 Max):
- phi-4-Q8_0: symmetric turbo3 +4.2%, turbo4 +1.7% (no regression)
- Qwen2.5-7B Q4_K_M: q8_0-K + turbo4-V +1.0%, q8_0-K + turbo3-V +2.0% (rescued)
- Qwen3.5-35B MoE, 27B Dense, Mistral-24B: all healthy (no regression)
- Cross-hardware M2/M5 parity confirmed on all tested configs
Closes #27
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai1 parent 43f7d3d commit 965a6ca
4 files changed
Lines changed: 194 additions & 4 deletions
File tree
- ggml/src/ggml-metal
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1348 | 1348 | | |
1349 | 1349 | | |
1350 | 1350 | | |
| 1351 | + | |
1351 | 1352 | | |
1352 | 1353 | | |
1353 | 1354 | | |
| |||
1414 | 1415 | | |
1415 | 1416 | | |
1416 | 1417 | | |
| 1418 | + | |
1417 | 1419 | | |
1418 | 1420 | | |
1419 | 1421 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1196 | 1196 | | |
1197 | 1197 | | |
1198 | 1198 | | |
1199 | | - | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
1200 | 1202 | | |
1201 | 1203 | | |
1202 | 1204 | | |
1203 | 1205 | | |
1204 | 1206 | | |
1205 | 1207 | | |
1206 | | - | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
1207 | 1214 | | |
1208 | 1215 | | |
1209 | 1216 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2682 | 2682 | | |
2683 | 2683 | | |
2684 | 2684 | | |
2685 | | - | |
| 2685 | + | |
2686 | 2686 | | |
2687 | 2687 | | |
2688 | 2688 | | |
2689 | 2689 | | |
2690 | 2690 | | |
2691 | 2691 | | |
2692 | | - | |
| 2692 | + | |
| 2693 | + | |
| 2694 | + | |
| 2695 | + | |
| 2696 | + | |
| 2697 | + | |
2693 | 2698 | | |
2694 | 2699 | | |
2695 | 2700 | | |
| |||
0 commit comments