docs(audit): AUDIT-Q4K-SHAPE-001 — shape-swap is benign on 256-divisible artifacts (PMAT-690 defect 4)#1784
Merged
Conversation
…ivisible-both-dims artifacts (PMAT-690 defect 4) Closes task #110 by empirically falsifying the a-priori concern that pre-v0.34.0 Q4_K artifacts (paiml/qwen2.5-coder-7b-apache-q4k-v1 etc.) might be silently degraded. Central finding =============== When BOTH weight-tensor dims are 256-divisible (Qwen2 1.5B hidden=1536, intermediate=8960; Qwen2 7B hidden=3584, intermediate=18944), the pre-fix `quantize_q4_k_matrix(data, [in, out])` and the post-fix `quantize_q4_k_matrix(data, [out, in])` produce **byte-identical** output. Mechanism: the function consumes data in linear order, chunking into fixed-size 256-element super-blocks via `quantize_q4_k`. When both dims are 256-multiples, the per-iteration row boundary sits on a super-block boundary anyway, so the swap is a no-op for the data consumed and the output emitted. The bug only manifests when `cols % 256 != 0` (e.g., Qwen2 0.5B hidden=896 → cols=896 → pads to 1024, shifting subsequent super-blocks off-stride). That case is independently caught by the defect-2 K-divisibility fallback which forces F32 instead of quantizing. So either way, no shipped artifact is degraded by the shape-swap path. Falsification ============= New in-tree test `audit_q4k_shape_swap_byte_identical_when_both_dims_divisible` in gguf_export_config.rs::q4k_divisibility_tests. Generates a 256×512 matrix with heterogeneous per-row statistics (so any layout-sensitive divergence would be amplified) and asserts `assert_eq!(quantize_q4_k_matrix(data, [256, 512]), quantize_q4_k_matrix(data, [512, 256]))`. Passes 2026-05-18. Impact on MODEL-1 ================= `paiml/qwen2.5-coder-7b-apache-q4k-v1` at AC-SHIP1-005 = 86.59% HumanEval pass@1. The ~4.4pp gap vs upstream Qwen2.5-Coder-7B-Instruct Q4_K_M (~91%) is NOT defect 3 — it's pure Q4_K vs Q4_K_M (mixed precision, which keeps sensitive tensors at Q6_K). Recovery path is to add Q4_K_M export support (filed as Q4K-AUDIT-004, non-blocking). Doc === docs/specifications/audits/q4k-shape-swap-impact.md v1.1.0 — full math, empirical evidence, action items closed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 18, 2026
#1798) Adds a chown step BEFORE the cargo step that runs `docker run --rm` as root and chowns the per-RUN target dir + cargo registry to noah:1000. ## Why Docker's bind-mount creates missing host directories with the daemon's uid (root). Since #1693 switched to per-RUN target dirs (`/mnt/nvme-raid0/targets/aprender-ci/<PR>/run-<RUN_ID>`), every fresh run gets a root-owned target dir. Cargo (running as uid 1000 inside the container) cannot write to it and fails with: error: failed to create directory `/workspace/target/debug`: No such file or directory (os error 2) The existing post-job chown (line 245) was meant to fix this for the NEXT run's git-clean — but per-RUN paths invalidate that since each run gets a brand-new root-owned dir. First-runs always fail. This was observed across 6+ in-flight PRs (#1784, #1791-#1797) on 2026-05-18 — every "infrastructure flake" turned out to be the same ownership bug at different cargo entry points. ## Fix Pre-cargo chown step. Idempotent (`|| true`). Runs the existing sovereign-ci image as root for the chown, then exits — adds maybe 2s to runs. Matches the pattern of the post-job chown step that already exists; just moves it to BEFORE cargo as well. ## Manual one-shot The 6 currently-stuck PRs were unblocked by manually chowning their per-RUN dirs on the runner host: ssh intel sudo chown -R 1000:1000 \ /mnt/nvme-raid0/targets/aprender-ci/{1792,1793,1794,1796,1797,main}/run-* After this PR lands, future runs will fix themselves. Co-authored-by: Noah Gift <claude@noahgift.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes task #110 (P3-C-prep defect 4) by empirically falsifying the a-priori concern that pre-v0.34.0 Q4_K artifacts might have been silently degraded by the
quantize_q4_k_matrixshape-swap.Finding
When BOTH weight-tensor dims are 256-divisible (Qwen2 1.5B
hidden=1536, intermediate=8960; Qwen2 7Bhidden=3584, intermediate=18944), the pre-fixquantize_q4_k_matrix(data, [in, out])and post-fixquantize_q4_k_matrix(data, [out, in])produce byte-identical output.Mechanism: the function consumes data linearly, chunking into fixed-size 256-element super-blocks. When both dims are 256-multiples, the per-iteration row boundary lands on a super-block boundary anyway — the swap is a no-op for the output bytes.
The shape-swap bug only manifests when
cols % 256 != 0(e.g., Qwen2 0.5Bhidden=896). That case is already caught by the defect-2 K-divisibility fallback (forces F32 instead of quantizing). So no shipped artifact is degraded by defect 3.Falsifier
New unit test
audit_q4k_shape_swap_byte_identical_when_both_dims_divisibleingguf_export_config.rs::q4k_divisibility_tests. Generates a 256×512 matrix with heterogeneous-per-row statistics (so any layout-sensitive divergence would be amplified) and asserts byte-identity of the two quantization calls. Passes 2026-05-18.Impact
paiml/qwen2.5-coder-7b-apache-q4k-v1— pre-v0.34.0 artifact is bit-equivalent to a hypothetical post-fix re-export. No re-export needed.paiml/albor-370m-v1— uses the post-fix path AND defect-2 F32 fallback (hidden=896). Already correct.apr export— unaffected.Why MODEL-1's ~4.4pp HumanEval gap vs upstream is NOT this bug
paiml/qwen2.5-coder-7b-apache-q4k-v1at AC-SHIP1-005 = 86.59% vs upstreamQwen/Qwen2.5-Coder-7B-InstructQ4_K_M ≈ 91%. Post-finding, the gap is attributable to:_Mvariant keeps sensitive tensors (attn output, ffn down) at Q6_K. Aprender's Q4_K is pure throughout.Recovery path is to add Q4_K_M export support to
apr export(filed as Q4K-AUDIT-004, non-blocking, separate ticket).Test plan
cargo test -p aprender-core --lib audit_q4k_shape_swappasses🤖 Generated with Claude Code