docs(audit): AUDIT-Q4K-SHAPE-001 — shape-swap is benign on 256-divisible artifacts (PMAT-690 defect 4) by noahgift · Pull Request #1784 · paiml/aprender

noahgift · 2026-05-18T07:34:51Z

Summary

Closes task #110 (P3-C-prep defect 4) by empirically falsifying the a-priori concern that pre-v0.34.0 Q4_K artifacts might have been silently degraded by the quantize_q4_k_matrix shape-swap.

Finding

When BOTH weight-tensor dims are 256-divisible (Qwen2 1.5B hidden=1536, intermediate=8960; Qwen2 7B hidden=3584, intermediate=18944), the pre-fix quantize_q4_k_matrix(data, [in, out]) and post-fix quantize_q4_k_matrix(data, [out, in]) produce byte-identical output.

Mechanism: the function consumes data linearly, chunking into fixed-size 256-element super-blocks. When both dims are 256-multiples, the per-iteration row boundary lands on a super-block boundary anyway — the swap is a no-op for the output bytes.

The shape-swap bug only manifests when cols % 256 != 0 (e.g., Qwen2 0.5B hidden=896). That case is already caught by the defect-2 K-divisibility fallback (forces F32 instead of quantizing). So no shipped artifact is degraded by defect 3.

Falsifier

New unit test audit_q4k_shape_swap_byte_identical_when_both_dims_divisible in gguf_export_config.rs::q4k_divisibility_tests. Generates a 256×512 matrix with heterogeneous-per-row statistics (so any layout-sensitive divergence would be amplified) and asserts byte-identity of the two quantization calls. Passes 2026-05-18.

Impact

paiml/qwen2.5-coder-7b-apache-q4k-v1 — pre-v0.34.0 artifact is bit-equivalent to a hypothetical post-fix re-export. No re-export needed.
paiml/albor-370m-v1 — uses the post-fix path AND defect-2 F32 fallback (hidden=896). Already correct.
Any other Qwen2 1.5B / 7B Q4_K GGUF previously exported with apr export — unaffected.

Why MODEL-1's ~4.4pp HumanEval gap vs upstream is NOT this bug

paiml/qwen2.5-coder-7b-apache-q4k-v1 at AC-SHIP1-005 = 86.59% vs upstream Qwen/Qwen2.5-Coder-7B-Instruct Q4_K_M ≈ 91%. Post-finding, the gap is attributable to:

Q4_K vs Q4_K_M (mixed precision) — llama.cpp's _M variant keeps sensitive tensors (attn output, ffn down) at Q6_K. Aprender's Q4_K is pure throughout.
Different cumulative rounding paths from different f32→Q4_K implementations.

Recovery path is to add Q4_K_M export support to apr export (filed as Q4K-AUDIT-004, non-blocking, separate ticket).

Test plan

cargo test -p aprender-core --lib audit_q4k_shape_swap passes
All 8 q4k_divisibility_tests pass (1 new + 7 pre-existing)
Audit doc explains math + cites empirical falsifier + closes action items

🤖 Generated with Claude Code

…ivisible-both-dims artifacts (PMAT-690 defect 4) Closes task #110 by empirically falsifying the a-priori concern that pre-v0.34.0 Q4_K artifacts (paiml/qwen2.5-coder-7b-apache-q4k-v1 etc.) might be silently degraded. Central finding =============== When BOTH weight-tensor dims are 256-divisible (Qwen2 1.5B hidden=1536, intermediate=8960; Qwen2 7B hidden=3584, intermediate=18944), the pre-fix `quantize_q4_k_matrix(data, [in, out])` and the post-fix `quantize_q4_k_matrix(data, [out, in])` produce **byte-identical** output. Mechanism: the function consumes data in linear order, chunking into fixed-size 256-element super-blocks via `quantize_q4_k`. When both dims are 256-multiples, the per-iteration row boundary sits on a super-block boundary anyway, so the swap is a no-op for the data consumed and the output emitted. The bug only manifests when `cols % 256 != 0` (e.g., Qwen2 0.5B hidden=896 → cols=896 → pads to 1024, shifting subsequent super-blocks off-stride). That case is independently caught by the defect-2 K-divisibility fallback which forces F32 instead of quantizing. So either way, no shipped artifact is degraded by the shape-swap path. Falsification ============= New in-tree test `audit_q4k_shape_swap_byte_identical_when_both_dims_divisible` in gguf_export_config.rs::q4k_divisibility_tests. Generates a 256×512 matrix with heterogeneous per-row statistics (so any layout-sensitive divergence would be amplified) and asserts `assert_eq!(quantize_q4_k_matrix(data, [256, 512]), quantize_q4_k_matrix(data, [512, 256]))`. Passes 2026-05-18. Impact on MODEL-1 ================= `paiml/qwen2.5-coder-7b-apache-q4k-v1` at AC-SHIP1-005 = 86.59% HumanEval pass@1. The ~4.4pp gap vs upstream Qwen2.5-Coder-7B-Instruct Q4_K_M (~91%) is NOT defect 3 — it's pure Q4_K vs Q4_K_M (mixed precision, which keeps sensitive tensors at Q6_K). Recovery path is to add Q4_K_M export support (filed as Q4K-AUDIT-004, non-blocking). Doc === docs/specifications/audits/q4k-shape-swap-impact.md v1.1.0 — full math, empirical evidence, action items closed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

#1798) Adds a chown step BEFORE the cargo step that runs `docker run --rm` as root and chowns the per-RUN target dir + cargo registry to noah:1000. ## Why Docker's bind-mount creates missing host directories with the daemon's uid (root). Since #1693 switched to per-RUN target dirs (`/mnt/nvme-raid0/targets/aprender-ci/<PR>/run-<RUN_ID>`), every fresh run gets a root-owned target dir. Cargo (running as uid 1000 inside the container) cannot write to it and fails with: error: failed to create directory `/workspace/target/debug`: No such file or directory (os error 2) The existing post-job chown (line 245) was meant to fix this for the NEXT run's git-clean — but per-RUN paths invalidate that since each run gets a brand-new root-owned dir. First-runs always fail. This was observed across 6+ in-flight PRs (#1784, #1791-#1797) on 2026-05-18 — every "infrastructure flake" turned out to be the same ownership bug at different cargo entry points. ## Fix Pre-cargo chown step. Idempotent (`|| true`). Runs the existing sovereign-ci image as root for the chown, then exits — adds maybe 2s to runs. Matches the pattern of the post-job chown step that already exists; just moves it to BEFORE cargo as well. ## Manual one-shot The 6 currently-stuck PRs were unblocked by manually chowning their per-RUN dirs on the runner host: ssh intel sudo chown -R 1000:1000 \ /mnt/nvme-raid0/targets/aprender-ci/{1792,1793,1794,1796,1797,main}/run-* After this PR lands, future runs will fix themselves. Co-authored-by: Noah Gift <claude@noahgift.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 18, 2026 07:34

noahgift added 7 commits May 18, 2026 10:12

Merge branch 'main' into docs/q4k-shape-swap-impact-audit-defect-4

81cd0e0

Merge branch 'main' into docs/q4k-shape-swap-impact-audit-defect-4

92e6dc6

Merge branch 'main' into docs/q4k-shape-swap-impact-audit-defect-4

9f77902

Merge branch 'main' into docs/q4k-shape-swap-impact-audit-defect-4

408242b

Merge branch 'main' into docs/q4k-shape-swap-impact-audit-defect-4

1f2f69e

Merge branch 'main' into docs/q4k-shape-swap-impact-audit-defect-4

caffd67

Merge branch 'main' into docs/q4k-shape-swap-impact-audit-defect-4

03956c8

noahgift merged commit a133e12 into main May 18, 2026
14 of 20 checks passed

noahgift deleted the docs/q4k-shape-swap-impact-audit-defect-4 branch May 18, 2026 14:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(audit): AUDIT-Q4K-SHAPE-001 — shape-swap is benign on 256-divisible artifacts (PMAT-690 defect 4)#1784

docs(audit): AUDIT-Q4K-SHAPE-001 — shape-swap is benign on 256-divisible artifacts (PMAT-690 defect 4)#1784
noahgift merged 8 commits into
mainfrom
docs/q4k-shape-swap-impact-audit-defect-4

noahgift commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 18, 2026

Summary

Finding

Falsifier

Impact

Why MODEL-1's ~4.4pp HumanEval gap vs upstream is NOT this bug

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant