Skip to content

docs(readme): inventory DFlash + Gemma4 MTP under HT Fork Changes#96

Merged
marksverdhei merged 1 commit into
htfrom
chore/readme-ht-inventory-dflash-mtp
Jun 12, 2026
Merged

docs(readme): inventory DFlash + Gemma4 MTP under HT Fork Changes#96
marksverdhei merged 1 commit into
htfrom
chore/readme-ht-inventory-dflash-mtp

Conversation

@marksverdhei

Copy link
Copy Markdown

Summary

Two HT-specific speculative-decoding features were shipped to ht but never inventoried in the README "HT Fork Changes / Backend & quantization" table. Adding both as table rows.

Change Description Tracked upstream
DFlash speculative decoding Block-diffusion drafter integration (LLM_ARCH_DFLASH, --spec-type dflash, custom CUDA kernels). Landed via PR #62 (b0daec5). No
Gemma4 MTP speculative Vendored upstream PR ggml-org#23398 (gemma4-assistant arch + --spec-type draft-mtp) via PR #93 (4c09765). ggml-org#23398

Why

  • The inventory is meant to be the canonical "what's in this fork that isn't upstream" reference — per AGENTS.md "consult it before assuming a behaviour is upstream stock." Omissions defeat the purpose.
  • Found during a §7 documentation freshness sweep after the §11 / §1 / etc. cycle.

Test plan

  • Docs-only, no code touched
  • Both flags verified to exist in code (--spec-type accepts both dflash and draft-mtp via common/arg.cpp)
  • Both features are in the deployed image unified-llm:mtp-pr23398-5e6dff22 on titan

Related

The Backend & quantization table omitted two HT-specific speculative
decoding features that have shipped to ht:

- DFlash (LLM_ARCH_DFLASH, --spec-type dflash, custom CUDA kernels for
  partial-accept feature extraction) — landed via PR #62 (b0daec5),
  integrates the z-lab DFlash block-diffusion drafter against Gemma4
  31B targets.

- Gemma4 MTP (gemma4-assistant arch + --spec-type draft-mtp) — vendored
  via PR #93 (4c09765) ahead of upstream PR ggml-org#23398
  merge so the gemma-4-12b-qat-mtp preset can ship on titan. Marked
  with Tracked-upstream=ggml-org#23398 since it retires when that PR merges and
  flows through a normal master sync.

Found during a §7 documentation freshness sweep — the inventory exists
to be authoritative ("consult it before assuming a behaviour is
upstream stock" per AGENTS.md), so omissions defeat the purpose.

Docs-only, no code touched.
@marksverdhei marksverdhei merged commit efcdfb4 into ht Jun 12, 2026
@marksverdhei marksverdhei deleted the chore/readme-ht-inventory-dflash-mtp branch June 12, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant