docs(readme): inventory DFlash + Gemma4 MTP under HT Fork Changes by marksverdhei · Pull Request #96 · heiervang-technologies/ht-llama.cpp

marksverdhei · 2026-06-07T18:34:29Z

Summary

Two HT-specific speculative-decoding features were shipped to ht but never inventoried in the README "HT Fork Changes / Backend & quantization" table. Adding both as table rows.

Change	Description	Tracked upstream
DFlash speculative decoding	Block-diffusion drafter integration (`LLM_ARCH_DFLASH`, `--spec-type dflash`, custom CUDA kernels). Landed via PR #62 (`b0daec5`).	No
Gemma4 MTP speculative	Vendored upstream PR ggml-org#23398 (`gemma4-assistant` arch + `--spec-type draft-mtp`) via PR #93 (`4c09765`).	ggml-org#23398

Why

The inventory is meant to be the canonical "what's in this fork that isn't upstream" reference — per AGENTS.md "consult it before assuming a behaviour is upstream stock." Omissions defeat the purpose.
Found during a §7 documentation freshness sweep after the §11 / §1 / etc. cycle.

Test plan

Docs-only, no code touched
Both flags verified to exist in code (--spec-type accepts both dflash and draft-mtp via common/arg.cpp)
Both features are in the deployed image unified-llm:mtp-pr23398-5e6dff22 on titan

PR feat(sync): upstream master sync (42 commits) + Gemma4 MTP via PR #23398 vendor #93 (γ master sync + Gemma4 MTP vendor)
PR feat(dflash): integrate DFlash block-diffusion speculative decoder (rebased on post-rewrite ht) #62 (DFlash integration, b0daec5)
AGENTS.md drift fix: chore(docs): AGENTS.md webui section reflects post-rewrite reality #95

The Backend & quantization table omitted two HT-specific speculative decoding features that have shipped to ht: - DFlash (LLM_ARCH_DFLASH, --spec-type dflash, custom CUDA kernels for partial-accept feature extraction) — landed via PR #62 (b0daec5), integrates the z-lab DFlash block-diffusion drafter against Gemma4 31B targets. - Gemma4 MTP (gemma4-assistant arch + --spec-type draft-mtp) — vendored via PR #93 (4c09765) ahead of upstream PR ggml-org#23398 merge so the gemma-4-12b-qat-mtp preset can ship on titan. Marked with Tracked-upstream=ggml-org#23398 since it retires when that PR merges and flows through a normal master sync. Found during a §7 documentation freshness sweep — the inventory exists to be authoritative ("consult it before assuming a behaviour is upstream stock" per AGENTS.md), so omissions defeat the purpose. Docs-only, no code touched.

marksverdhei merged commit efcdfb4 into ht Jun 12, 2026

marksverdhei deleted the chore/readme-ht-inventory-dflash-mtp branch June 12, 2026 18:31

marksverdhei mentioned this pull request Jun 12, 2026

docs(readme): complete HT Fork Changes inventory with per-change justifications #106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(readme): inventory DFlash + Gemma4 MTP under HT Fork Changes#96

docs(readme): inventory DFlash + Gemma4 MTP under HT Fork Changes#96
marksverdhei merged 1 commit into
htfrom
chore/readme-ht-inventory-dflash-mtp

marksverdhei commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marksverdhei commented Jun 7, 2026

Summary

Why

Test plan

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant