scripts(dflash): restore Round-12 bench + parity scaffolds to ht by marksverdhei · Pull Request #97 · heiervang-technologies/ht-llama.cpp

marksverdhei · 2026-06-08T09:42:21Z

Summary

Restores three scripts from the Round-12 DFlash investigation (June 1, originally on the unpushed chore/dflash-bench-scripts branch at HEAD 4d21baca4). They were never landed on ht and went out of reach when the 2026-06-04 ht history rewrite removed that branch from the visible refs. The commits (84a0da7bc, 5f4598ee9) are still in the object store; this PR cherry-picks them onto current ht (f6feddb49).

The scripts are needed to start the DFlash parity workstream: Phase 0 verified γ master-sync fixed the n_outputs_max crash, so the next bottleneck is the ~20× acceptance-quality gap vs the z-lab reference, and the logit-parity harness is the localizer.

Scripts restored

File	Purpose
`scripts/gguf-meta.py`	Numpy-free GGUF header reader with `--check-instruct` guard. Rejects base fine-tune and truncated/stub GGUFs before they reach a bench.
`scripts/bench-dflash-target-sweep.sh`	Sweeps the TARGET quant (drafter fixed) to isolate target-side quant noise. Uses raw `n_accept`/`n_drafted` counts and reports mean ± stddev with a noise-aware delta verdict.
`scripts/dflash-logit-parity.py`	Per-position logit-parity harness scaffold (z-lab PyTorch reference vs our llama.cpp drafter). The reference-forward `TODO(zlab)` is the next implementation step.

All three are pure additions under scripts/ — no source code is touched.

Why this and not the originals

The originals were on a feature branch that no longer exists in any visible ref after the 2026-06-04 force-push rewrite. Cherry-picking onto ht (a) preserves Markus's authorship and original commit messages, (b) removes the "shared-worktree hazard" called out in feedback_shared_path_writes (scripts disappearing on branch switch), and (c) makes them tracked artifacts for the long-term parity work rather than orphan-branch holdovers.

Test plan

scripts/gguf-meta.py --check-instruct models/gemma-4-31B-it-IQ4_XS.gguf → OK ... instruct gemma4
scripts/bench-dflash-target-sweep.sh --help → renders cleanly
scripts/dflash-logit-parity.py reference path runs (deferred — needs TODO(zlab) wired in next PR)

…gguf guard Three additive scripts for the DFlash accept-rate investigation (Round-12), none touching tracked source so they sit cleanly alongside the PR #53 squash: - gguf-meta.py: numpy-free GGUF header reader with --check-instruct, which refuses base-fine-tune and truncated/stub GGUFs. Prevents the base-vs-instruct confound (an -it-trained DFlash drafter benched against a base target). - bench-dflash-target-sweep.sh: sweeps the TARGET quant (drafter fixed) to test whether target-side quant noise off the drafter's bf16 training distribution drives the 8% vs ~21% accept gap. Accept recomputed from raw n_accept/n_drafted counts; mean +/- sample stddev over N runs; REAL(>1sigma)/within-noise deltas. - dflash-logit-parity.py: scaffold for FORWARD logit parity vs the z-lab PyTorch drafter (Round-7b only did weight parity). Constants read data-driven from the drafter config.json; reference forward marked TODO(zlab) pending the z-lab modeling code (HF repo ships weights only).

…data The guard validated the GGUF header but not that the tensor DATA was present, so a file truncated mid-write (valid header, missing weights) passed --check-instruct and would have been benched — loading garbage or crashing mid-run. Caught empirically: the corrupt gemma-4-31B-it-Q5_K_M.gguf (1.5GB, header intact) slipped through. read_meta() now walks the tensor-info section, computes the minimum file size implied by the tensor offsets + alignment, and sets _data_complete. --check-instruct rejects when actual size < implied minimum. Same failure class as the HF-xet silent shard drop the download step hit. Verified: corrupt Q5 (1.5GB < 21.7GB) REFUSED; Q8_0/BF16/Q4_K_M/IQ4_XS all complete and ACCEPT.

marksverdhei added 2 commits June 8, 2026 11:41

marksverdhei mentioned this pull request Jun 8, 2026

scripts(dflash): deployment-parity prompt suite + bench harness #98

Merged

4 tasks

marksverdhei merged commit 49e6d41 into ht Jun 12, 2026
1 check failed

marksverdhei deleted the chore/dflash-round12-scripts-restore branch June 12, 2026 18:36

marksverdhei mentioned this pull request Jun 12, 2026

docs(readme): complete HT Fork Changes inventory with per-change justifications #106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts(dflash): restore Round-12 bench + parity scaffolds to ht#97

scripts(dflash): restore Round-12 bench + parity scaffolds to ht#97
marksverdhei merged 2 commits into
htfrom
chore/dflash-round12-scripts-restore

marksverdhei commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marksverdhei commented Jun 8, 2026

Summary

Scripts restored

Why this and not the originals

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant