scripts(dflash): restore Round-12 bench + parity scaffolds to ht#97
Merged
Conversation
…gguf guard Three additive scripts for the DFlash accept-rate investigation (Round-12), none touching tracked source so they sit cleanly alongside the PR #53 squash: - gguf-meta.py: numpy-free GGUF header reader with --check-instruct, which refuses base-fine-tune and truncated/stub GGUFs. Prevents the base-vs-instruct confound (an -it-trained DFlash drafter benched against a base target). - bench-dflash-target-sweep.sh: sweeps the TARGET quant (drafter fixed) to test whether target-side quant noise off the drafter's bf16 training distribution drives the 8% vs ~21% accept gap. Accept recomputed from raw n_accept/n_drafted counts; mean +/- sample stddev over N runs; REAL(>1sigma)/within-noise deltas. - dflash-logit-parity.py: scaffold for FORWARD logit parity vs the z-lab PyTorch drafter (Round-7b only did weight parity). Constants read data-driven from the drafter config.json; reference forward marked TODO(zlab) pending the z-lab modeling code (HF repo ships weights only).
…data The guard validated the GGUF header but not that the tensor DATA was present, so a file truncated mid-write (valid header, missing weights) passed --check-instruct and would have been benched — loading garbage or crashing mid-run. Caught empirically: the corrupt gemma-4-31B-it-Q5_K_M.gguf (1.5GB, header intact) slipped through. read_meta() now walks the tensor-info section, computes the minimum file size implied by the tensor offsets + alignment, and sets _data_complete. --check-instruct rejects when actual size < implied minimum. Same failure class as the HF-xet silent shard drop the download step hit. Verified: corrupt Q5 (1.5GB < 21.7GB) REFUSED; Q8_0/BF16/Q4_K_M/IQ4_XS all complete and ACCEPT.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restores three scripts from the Round-12 DFlash investigation (June 1, originally on the unpushed
chore/dflash-bench-scriptsbranch at HEAD4d21baca4). They were never landed on ht and went out of reach when the 2026-06-04 ht history rewrite removed that branch from the visible refs. The commits (84a0da7bc,5f4598ee9) are still in the object store; this PR cherry-picks them onto current ht (f6feddb49).The scripts are needed to start the DFlash parity workstream: Phase 0 verified γ master-sync fixed the n_outputs_max crash, so the next bottleneck is the ~20× acceptance-quality gap vs the z-lab reference, and the logit-parity harness is the localizer.
Scripts restored
scripts/gguf-meta.py--check-instructguard. Rejects base fine-tune and truncated/stub GGUFs before they reach a bench.scripts/bench-dflash-target-sweep.shn_accept/n_draftedcounts and reports mean ± stddev with a noise-aware delta verdict.scripts/dflash-logit-parity.pyTODO(zlab)is the next implementation step.All three are pure additions under
scripts/— no source code is touched.Why this and not the originals
The originals were on a feature branch that no longer exists in any visible ref after the 2026-06-04 force-push rewrite. Cherry-picking onto ht (a) preserves Markus's authorship and original commit messages, (b) removes the "shared-worktree hazard" called out in
feedback_shared_path_writes(scripts disappearing on branch switch), and (c) makes them tracked artifacts for the long-term parity work rather than orphan-branch holdovers.Test plan
scripts/gguf-meta.py --check-instruct models/gemma-4-31B-it-IQ4_XS.gguf→OK ... instruct gemma4scripts/bench-dflash-target-sweep.sh --help→ renders cleanlyscripts/dflash-logit-parity.pyreference path runs (deferred — needs TODO(zlab) wired in next PR)