PR #4 turbo3 regression: PPL 6.18 → 181.6 after kernel split

## Bug

PR #4 (merged in `065ef53`, reverted in `a52586e`) introduced a turbo3 quality regression on Metal.

| State | turbo3 PPL (c=512, 8ch, MoE 35B) |
|-------|----------------------------------|
| Pre-merge (`7d1bd95`) | **6.1756** ✅ |
| Post-merge (`065ef53`) | **181.5955** ❌ |
| Post-revert (`a52586e`) | **6.1756** ✅ |

q8_0 baseline is unaffected at all context lengths — this is turbo3-specific.

## Root cause

The kernel split from the shared `kernel_set_rows_turbo` template into separate `kernel_set_rows_turbo3` / `kernel_set_rows_turbo4` functions changed something in the turbo3 quantization path. The shared template worked correctly; the dedicated turbo3 function does not produce equivalent output.

Likely candidates:
- Template parameter difference (block type, QK constant)
- Quantization loop bounds
- Norm correction logic

## Impact

turbo3 is completely broken on Metal when PR #4 is applied. Reverted.

## References

- PR #4: https://github.com/TheTom/llama-cpp-turboquant/pull/4
- Revert commit: `a52586e`

State	turbo3 PPL (c=512, 8ch, MoE 35B)
Pre-merge (`7d1bd95`)	6.1756 ✅
Post-merge (`065ef53`)	181.5955 ❌
Post-revert (`a52586e`)	6.1756 ✅

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR #4 turbo3 regression: PPL 6.18 → 181.6 after kernel split #6

Bug

Root cause

Impact

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

PR #4 turbo3 regression: PPL 6.18 → 181.6 after kernel split #6

Description

Bug

Root cause

Impact

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions