Skip to content

PR #4 turbo3 regression: PPL 6.18 → 181.6 after kernel split #6

@TheTom

Description

@TheTom

Bug

PR #4 (merged in 065ef53, reverted in a52586e) introduced a turbo3 quality regression on Metal.

State turbo3 PPL (c=512, 8ch, MoE 35B)
Pre-merge (7d1bd95) 6.1756
Post-merge (065ef53) 181.5955
Post-revert (a52586e) 6.1756

q8_0 baseline is unaffected at all context lengths — this is turbo3-specific.

Root cause

The kernel split from the shared kernel_set_rows_turbo template into separate kernel_set_rows_turbo3 / kernel_set_rows_turbo4 functions changed something in the turbo3 quantization path. The shared template worked correctly; the dedicated turbo3 function does not produce equivalent output.

Likely candidates:

  • Template parameter difference (block type, QK constant)
  • Quantization loop bounds
  • Norm correction logic

Impact

turbo3 is completely broken on Metal when PR #4 is applied. Reverted.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions