Summary
The DFlash block-diffusion speculative decoder crashes on the first decode, and has since ~2026-05-27. Confirmed still broken after the 2026-06-13 upstream sync (ht @ 1aec77684): upstream's v_cells_impl shared_ptr KV refactor — which mainlined ht's dual-context cell-sharing — did not fix it.
Repro (titan deployment-shape smoke, 2026-06-13)
- Image:
zot.ht.local/unified-llm:stockui-1aec7768 (llama-server 9628/1aec77684)
- Model:
gemma-4-31b-dflash-Q6_K, --spec-type dflash
- Loads clean (slot + speculative ctx init succeed)
- Child worker dies on the first decode → router returns
500 "Failed to read connection"
- Same failure family as the pre-sync
GGML_ASSERT n_outputs_max on first decode
Impact
- Non-blocking for deploy. The deployed speculative path is Gemma4 MTP (
--spec-type draft-mtp), which is healthy post-sync (0.834 acceptance, dual-context KV active). DFlash is a separate experimental decoder.
- Crash is isolated — the router survived and MTP kept serving; only the DFlash child died.
Likely area
First-decode output-buffer sizing (n_outputs_max) in the DFlash decode path — distinct from the KV cell-sharing the sync refactored, which is why the v_cells_impl change didn't address it.
Evidence: titan post-sync rollout smoke (snoop-kube), 2026-06-13.
Summary
The DFlash block-diffusion speculative decoder crashes on the first decode, and has since ~2026-05-27. Confirmed still broken after the 2026-06-13 upstream sync (
ht@1aec77684): upstream'sv_cells_implshared_ptr KV refactor — which mainlined ht's dual-context cell-sharing — did not fix it.Repro (titan deployment-shape smoke, 2026-06-13)
zot.ht.local/unified-llm:stockui-1aec7768(llama-server9628/1aec77684)gemma-4-31b-dflash-Q6_K,--spec-type dflash500 "Failed to read connection"GGML_ASSERT n_outputs_maxon first decodeImpact
--spec-type draft-mtp), which is healthy post-sync (0.834 acceptance, dual-context KV active). DFlash is a separate experimental decoder.Likely area
First-decode output-buffer sizing (
n_outputs_max) in the DFlash decode path — distinct from the KV cell-sharing the sync refactored, which is why thev_cells_implchange didn't address it.Evidence: titan post-sync rollout smoke (snoop-kube), 2026-06-13.