Skip to content

Segfault: ggml_can_mul_mat assertion with MXFP4 weights + turbo KV #42

@TheTom

Description

@TheTom

Bug Report

Priority: P1
Reporter: @NigelTufnel12345 (via X)
Status: Confirmed crash, awaiting logs

Environment

  • Hardware: M3 Ultra, 512GB RAM
  • Model: GPT-OSS-120B, MXFP4 quantization
  • Config: -ctk q8_0 -ctv turbo4 -fa on
  • Branch: feature/turboquant-kv-cache

Symptom

Immediate segfault on startup. Backtrace fires on:

ggml.c:3246: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed

Model runs fine with regular q8_0 KV (no turbo). Crash is specific to the turbo code path.

Probable Cause

ggml_can_mul_mat assertion indicates tensor shape mismatch. MXFP4 is a newer weight quantization format and the turbo KV insertion or flash attention path likely has an unhandled shape interaction with MXFP4 weight tensors.

Possible areas to investigate:

  • Turbo SET_ROWS kernel shape assumptions with MXFP4 input tensors
  • Flash attention tile/vec kernel dispatch with non-standard weight types
  • head_dim detection or zero-padding logic with MXFP4 metadata

Logs Needed

@NigelTufnel12345 — please upload:

  1. Full backtrace (the complete crash output, not just the assertion line)
  2. Model startup log (the lines showing n_embd, n_head, n_embd_head_k, and any turbo-related log lines)
  3. Exact command used to launch the server/cli

Workaround

Use -ctk q8_0 -ctv q8_0 (no turbo compression) until this is resolved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions