Skip to content

DFlash: child worker crashes on first decode (n_outputs_max family) — still broken after upstream sync #108

@marksverdhei

Description

@marksverdhei

Summary

The DFlash block-diffusion speculative decoder crashes on the first decode, and has since ~2026-05-27. Confirmed still broken after the 2026-06-13 upstream sync (ht @ 1aec77684): upstream's v_cells_impl shared_ptr KV refactor — which mainlined ht's dual-context cell-sharing — did not fix it.

Repro (titan deployment-shape smoke, 2026-06-13)

  • Image: zot.ht.local/unified-llm:stockui-1aec7768 (llama-server 9628/1aec77684)
  • Model: gemma-4-31b-dflash-Q6_K, --spec-type dflash
  • Loads clean (slot + speculative ctx init succeed)
  • Child worker dies on the first decode → router returns 500 "Failed to read connection"
  • Same failure family as the pre-sync GGML_ASSERT n_outputs_max on first decode

Impact

  • Non-blocking for deploy. The deployed speculative path is Gemma4 MTP (--spec-type draft-mtp), which is healthy post-sync (0.834 acceptance, dual-context KV active). DFlash is a separate experimental decoder.
  • Crash is isolated — the router survived and MTP kept serving; only the DFlash child died.

Likely area

First-decode output-buffer sizing (n_outputs_max) in the DFlash decode path — distinct from the KV cell-sharing the sync refactored, which is why the v_cells_impl change didn't address it.

Evidence: titan post-sync rollout smoke (snoop-kube), 2026-06-13.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions