scripts(dflash): switch default bench target to Q8_0 + --target flag by marksverdhei · Pull Request #65 · heiervang-technologies/ht-llama.cpp

marksverdhei · 2026-06-04T17:39:34Z

Why

Per Markus 2026-06-04: DFlash quality measurement should use a Q8_0 target rather than Q4_K_M. The Q4_K_M target introduces enough quantization noise that it confounds DFlash's own accept-rate signal — we want a higher-quality reference for the speculative-decoding evaluation.

Changes

Default TARGET changed from gemma-4-31B-it-Q4_K_M.gguf to gemma-4-31B-it-Q8_0.gguf.
Added --target PATH flag for explicit per-run override.
Added DFLASH_BENCH_TARGET and DFLASH_BENCH_DRAFTER_DIR env vars (env-first, then CLI flag, then default).
Updated VRAM math in the comment block:
- Q4_K_M ~22 GB total (single 24 GB card)
- Q8_0 ~38 GB total (titan A100 80 GB only)
- BF16 ~67 GB total (titan A100 80 GB only)

Verified

bash -n scripts/bench-dflash.sh — syntax OK
--help renders the updated docblock correctly
No other scripts depend on the old default (grepped Q4_K_M.gguf across the tree)

Follow-up

Task #110 already updated to reflect this. Next concrete step is the titan re-bake against b0daec55b (Task #109), then this bench script can run with its new default.

Per Markus 2026-06-04: DFlash quality measurement should use a Q8_0 target rather than Q4_K_M, since Q4_K_M introduces enough target-side quantization noise to confound DFlash's own accept-rate signal. Q8_0 fits in 38 GB total, well within titan A100 80 GB. * Default `TARGET` is now `gemma-4-31B-it-Q8_0.gguf`. Override via `--target PATH` or `DFLASH_BENCH_TARGET` env var. * Also added `DFLASH_BENCH_DRAFTER_DIR` env var for consistency. * Comment block documents VRAM math for Q4_K_M / Q8_0 / BF16 targets so future runs can pick the right card.

… (#71) Measured perplexity on Qwen3.5-0.8B-BF16 / wikitext-2 / ctx=512: | cache-type | PPL | vs f16 | |------------|--------|--------| | f16 | 19.08 | baseline | | q8_0 | 19.08 | lossless | | tbq3_0 | 1252.30 | 65x worse | | tbq4_0 | 1393.00 | 73x worse | TBQ KV-cache produces near-random output. Likely root cause is statistical: TBQ's rotated-domain codebook was calibrated for weight distributions, not the K/V tensor distributions seen during inference. The encoding scheme itself cannot faithfully represent KV values. Snoop-kube's cluster audit confirms zero deployments use tbq* KV-cache (every host uses q8_0 or q4_0). DFlash also defaults to q8_0 (PR #65). No production consumer exists. This PR adds a one-line experimental note to the --cache-type-k/v and --cache-type-k-draft/v-draft help text, referencing issue #70 for the full data + recommendation. Code path stays in place — Markus may have roadmap intent I'm not aware of; this just stops anyone reading --help from assuming tbq* is a usable choice without checking. Follow-ups if Markus prefers full removal: * drop tbq3_0/tbq4_0 from common/arg.cpp's kv_cache_types list * keep the ftypes (TBQ weight quantization is separate from KV use) * close issues ggml-org#124 + ggml-org#125 as wont-fix

marksverdhei merged commit 09b2124 into ht Jun 4, 2026

marksverdhei deleted the chore/bench-dflash-q8-target branch June 4, 2026 17:52

This was referenced Jun 4, 2026

TBQ KV-cache (tbq3_0 / tbq4_0): 65-73x perplexity regression — recommend mark experimental #70

Closed

Hivemind Maintenance Tasks Epoch 4 #86

Closed

chore(scripts): restore PR #65 (Q8_0 default + --target flag) lost in ht rewrite #88

Closed

marksverdhei mentioned this pull request Jun 12, 2026

docs(readme): complete HT Fork Changes inventory with per-change justifications #106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts(dflash): switch default bench target to Q8_0 + --target flag#65

scripts(dflash): switch default bench target to Q8_0 + --target flag#65
marksverdhei merged 1 commit into
htfrom
chore/bench-dflash-q8-target

marksverdhei commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marksverdhei commented Jun 4, 2026

Why

Changes

Verified

Follow-up

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant