Segfault: ggml_can_mul_mat assertion with MXFP4 weights + turbo KV

## Bug Report

**Priority:** P1
**Reporter:** @NigelTufnel12345 (via X)
**Status:** Confirmed crash, awaiting logs

## Environment

- **Hardware:** M3 Ultra, 512GB RAM
- **Model:** GPT-OSS-120B, MXFP4 quantization
- **Config:** `-ctk q8_0 -ctv turbo4 -fa on`
- **Branch:** feature/turboquant-kv-cache

## Symptom

Immediate segfault on startup. Backtrace fires on:

```
ggml.c:3246: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed
```

Model runs fine with regular q8_0 KV (no turbo). Crash is specific to the turbo code path.

## Probable Cause

`ggml_can_mul_mat` assertion indicates tensor shape mismatch. MXFP4 is a newer weight quantization format and the turbo KV insertion or flash attention path likely has an unhandled shape interaction with MXFP4 weight tensors.

Possible areas to investigate:
- Turbo SET_ROWS kernel shape assumptions with MXFP4 input tensors
- Flash attention tile/vec kernel dispatch with non-standard weight types
- head_dim detection or zero-padding logic with MXFP4 metadata

## Logs Needed

@NigelTufnel12345 — please upload:

1. **Full backtrace** (the complete crash output, not just the assertion line)
2. **Model startup log** (the lines showing `n_embd`, `n_head`, `n_embd_head_k`, and any turbo-related log lines)
3. **Exact command used** to launch the server/cli

## Workaround

Use `-ctk q8_0 -ctv q8_0` (no turbo compression) until this is resolved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault: ggml_can_mul_mat assertion with MXFP4 weights + turbo KV #42

Bug Report

Environment

Symptom

Probable Cause

Logs Needed

Workaround

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Segfault: ggml_can_mul_mat assertion with MXFP4 weights + turbo KV #42

Description

Bug Report

Environment

Symptom

Probable Cause

Logs Needed

Workaround

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions