Skip to content

scripts(dflash): restore Round-12 bench + parity scaffolds to ht#97

Merged
marksverdhei merged 2 commits into
htfrom
chore/dflash-round12-scripts-restore
Jun 12, 2026
Merged

scripts(dflash): restore Round-12 bench + parity scaffolds to ht#97
marksverdhei merged 2 commits into
htfrom
chore/dflash-round12-scripts-restore

Conversation

@marksverdhei

Copy link
Copy Markdown

Summary

Restores three scripts from the Round-12 DFlash investigation (June 1, originally on the unpushed chore/dflash-bench-scripts branch at HEAD 4d21baca4). They were never landed on ht and went out of reach when the 2026-06-04 ht history rewrite removed that branch from the visible refs. The commits (84a0da7bc, 5f4598ee9) are still in the object store; this PR cherry-picks them onto current ht (f6feddb49).

The scripts are needed to start the DFlash parity workstream: Phase 0 verified γ master-sync fixed the n_outputs_max crash, so the next bottleneck is the ~20× acceptance-quality gap vs the z-lab reference, and the logit-parity harness is the localizer.

Scripts restored

File Purpose
scripts/gguf-meta.py Numpy-free GGUF header reader with --check-instruct guard. Rejects base fine-tune and truncated/stub GGUFs before they reach a bench.
scripts/bench-dflash-target-sweep.sh Sweeps the TARGET quant (drafter fixed) to isolate target-side quant noise. Uses raw n_accept/n_drafted counts and reports mean ± stddev with a noise-aware delta verdict.
scripts/dflash-logit-parity.py Per-position logit-parity harness scaffold (z-lab PyTorch reference vs our llama.cpp drafter). The reference-forward TODO(zlab) is the next implementation step.

All three are pure additions under scripts/ — no source code is touched.

Why this and not the originals

The originals were on a feature branch that no longer exists in any visible ref after the 2026-06-04 force-push rewrite. Cherry-picking onto ht (a) preserves Markus's authorship and original commit messages, (b) removes the "shared-worktree hazard" called out in feedback_shared_path_writes (scripts disappearing on branch switch), and (c) makes them tracked artifacts for the long-term parity work rather than orphan-branch holdovers.

Test plan

  • scripts/gguf-meta.py --check-instruct models/gemma-4-31B-it-IQ4_XS.ggufOK ... instruct gemma4
  • scripts/bench-dflash-target-sweep.sh --help → renders cleanly
  • scripts/dflash-logit-parity.py reference path runs (deferred — needs TODO(zlab) wired in next PR)

…gguf guard

Three additive scripts for the DFlash accept-rate investigation (Round-12),
none touching tracked source so they sit cleanly alongside the PR #53 squash:

- gguf-meta.py: numpy-free GGUF header reader with --check-instruct, which
  refuses base-fine-tune and truncated/stub GGUFs. Prevents the base-vs-instruct
  confound (an -it-trained DFlash drafter benched against a base target).
- bench-dflash-target-sweep.sh: sweeps the TARGET quant (drafter fixed) to test
  whether target-side quant noise off the drafter's bf16 training distribution
  drives the 8% vs ~21% accept gap. Accept recomputed from raw n_accept/n_drafted
  counts; mean +/- sample stddev over N runs; REAL(>1sigma)/within-noise deltas.
- dflash-logit-parity.py: scaffold for FORWARD logit parity vs the z-lab PyTorch
  drafter (Round-7b only did weight parity). Constants read data-driven from the
  drafter config.json; reference forward marked TODO(zlab) pending the z-lab
  modeling code (HF repo ships weights only).
…data

The guard validated the GGUF header but not that the tensor DATA was present, so
a file truncated mid-write (valid header, missing weights) passed --check-instruct
and would have been benched — loading garbage or crashing mid-run. Caught
empirically: the corrupt gemma-4-31B-it-Q5_K_M.gguf (1.5GB, header intact) slipped
through.

read_meta() now walks the tensor-info section, computes the minimum file size
implied by the tensor offsets + alignment, and sets _data_complete. --check-instruct
rejects when actual size < implied minimum. Same failure class as the HF-xet silent
shard drop the download step hit. Verified: corrupt Q5 (1.5GB < 21.7GB) REFUSED;
Q8_0/BF16/Q4_K_M/IQ4_XS all complete and ACCEPT.
@marksverdhei marksverdhei merged commit 49e6d41 into ht Jun 12, 2026
1 check failed
@marksverdhei marksverdhei deleted the chore/dflash-round12-scripts-restore branch June 12, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant