forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 94
PR #4 turbo3 regression: PPL 6.18 → 181.6 after kernel split #6
Copy link
Copy link
Closed
Description
Bug
PR #4 (merged in 065ef53, reverted in a52586e) introduced a turbo3 quality regression on Metal.
| State | turbo3 PPL (c=512, 8ch, MoE 35B) |
|---|---|
Pre-merge (7d1bd95) |
6.1756 ✅ |
Post-merge (065ef53) |
181.5955 ❌ |
Post-revert (a52586e) |
6.1756 ✅ |
q8_0 baseline is unaffected at all context lengths — this is turbo3-specific.
Root cause
The kernel split from the shared kernel_set_rows_turbo template into separate kernel_set_rows_turbo3 / kernel_set_rows_turbo4 functions changed something in the turbo3 quantization path. The shared template worked correctly; the dedicated turbo3 function does not produce equivalent output.
Likely candidates:
- Template parameter difference (block type, QK constant)
- Quantization loop bounds
- Norm correction logic
Impact
turbo3 is completely broken on Metal when PR #4 is applied. Reverted.
References
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels