Skip to content

docs(audit): AUDIT-Q4K-SHAPE-001 — shape-swap is benign on 256-divisible artifacts (PMAT-690 defect 4)#1784

Merged
noahgift merged 8 commits into
mainfrom
docs/q4k-shape-swap-impact-audit-defect-4
May 18, 2026
Merged

docs(audit): AUDIT-Q4K-SHAPE-001 — shape-swap is benign on 256-divisible artifacts (PMAT-690 defect 4)#1784
noahgift merged 8 commits into
mainfrom
docs/q4k-shape-swap-impact-audit-defect-4

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Closes task #110 (P3-C-prep defect 4) by empirically falsifying the a-priori concern that pre-v0.34.0 Q4_K artifacts might have been silently degraded by the quantize_q4_k_matrix shape-swap.

Finding

When BOTH weight-tensor dims are 256-divisible (Qwen2 1.5B hidden=1536, intermediate=8960; Qwen2 7B hidden=3584, intermediate=18944), the pre-fix quantize_q4_k_matrix(data, [in, out]) and post-fix quantize_q4_k_matrix(data, [out, in]) produce byte-identical output.

Mechanism: the function consumes data linearly, chunking into fixed-size 256-element super-blocks. When both dims are 256-multiples, the per-iteration row boundary lands on a super-block boundary anyway — the swap is a no-op for the output bytes.

The shape-swap bug only manifests when cols % 256 != 0 (e.g., Qwen2 0.5B hidden=896). That case is already caught by the defect-2 K-divisibility fallback (forces F32 instead of quantizing). So no shipped artifact is degraded by defect 3.

Falsifier

New unit test audit_q4k_shape_swap_byte_identical_when_both_dims_divisible in gguf_export_config.rs::q4k_divisibility_tests. Generates a 256×512 matrix with heterogeneous-per-row statistics (so any layout-sensitive divergence would be amplified) and asserts byte-identity of the two quantization calls. Passes 2026-05-18.

Impact

  • paiml/qwen2.5-coder-7b-apache-q4k-v1 — pre-v0.34.0 artifact is bit-equivalent to a hypothetical post-fix re-export. No re-export needed.
  • paiml/albor-370m-v1 — uses the post-fix path AND defect-2 F32 fallback (hidden=896). Already correct.
  • Any other Qwen2 1.5B / 7B Q4_K GGUF previously exported with apr export — unaffected.

Why MODEL-1's ~4.4pp HumanEval gap vs upstream is NOT this bug

paiml/qwen2.5-coder-7b-apache-q4k-v1 at AC-SHIP1-005 = 86.59% vs upstream Qwen/Qwen2.5-Coder-7B-Instruct Q4_K_M ≈ 91%. Post-finding, the gap is attributable to:

  1. Q4_K vs Q4_K_M (mixed precision) — llama.cpp's _M variant keeps sensitive tensors (attn output, ffn down) at Q6_K. Aprender's Q4_K is pure throughout.
  2. Different cumulative rounding paths from different f32→Q4_K implementations.

Recovery path is to add Q4_K_M export support to apr export (filed as Q4K-AUDIT-004, non-blocking, separate ticket).

Test plan

  • cargo test -p aprender-core --lib audit_q4k_shape_swap passes
  • All 8 q4k_divisibility_tests pass (1 new + 7 pre-existing)
  • Audit doc explains math + cites empirical falsifier + closes action items

🤖 Generated with Claude Code

…ivisible-both-dims artifacts (PMAT-690 defect 4)

Closes task #110 by empirically falsifying the a-priori concern that
pre-v0.34.0 Q4_K artifacts (paiml/qwen2.5-coder-7b-apache-q4k-v1 etc.)
might be silently degraded.

Central finding
===============

When BOTH weight-tensor dims are 256-divisible (Qwen2 1.5B hidden=1536,
intermediate=8960; Qwen2 7B hidden=3584, intermediate=18944), the
pre-fix `quantize_q4_k_matrix(data, [in, out])` and the post-fix
`quantize_q4_k_matrix(data, [out, in])` produce **byte-identical**
output.

Mechanism: the function consumes data in linear order, chunking into
fixed-size 256-element super-blocks via `quantize_q4_k`. When both
dims are 256-multiples, the per-iteration row boundary sits on a
super-block boundary anyway, so the swap is a no-op for the data
consumed and the output emitted.

The bug only manifests when `cols % 256 != 0` (e.g., Qwen2 0.5B
hidden=896 → cols=896 → pads to 1024, shifting subsequent
super-blocks off-stride). That case is independently caught by the
defect-2 K-divisibility fallback which forces F32 instead of
quantizing. So either way, no shipped artifact is degraded by the
shape-swap path.

Falsification
=============

New in-tree test
`audit_q4k_shape_swap_byte_identical_when_both_dims_divisible` in
gguf_export_config.rs::q4k_divisibility_tests. Generates a 256×512
matrix with heterogeneous per-row statistics (so any layout-sensitive
divergence would be amplified) and asserts
`assert_eq!(quantize_q4_k_matrix(data, [256, 512]), quantize_q4_k_matrix(data, [512, 256]))`.
Passes 2026-05-18.

Impact on MODEL-1
=================

`paiml/qwen2.5-coder-7b-apache-q4k-v1` at AC-SHIP1-005 = 86.59% HumanEval
pass@1. The ~4.4pp gap vs upstream Qwen2.5-Coder-7B-Instruct Q4_K_M
(~91%) is NOT defect 3 — it's pure Q4_K vs Q4_K_M (mixed precision,
which keeps sensitive tensors at Q6_K). Recovery path is to add Q4_K_M
export support (filed as Q4K-AUDIT-004, non-blocking).

Doc
===

docs/specifications/audits/q4k-shape-swap-impact.md v1.1.0 — full math,
empirical evidence, action items closed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 18, 2026 07:34
@noahgift noahgift merged commit a133e12 into main May 18, 2026
14 of 20 checks passed
@noahgift noahgift deleted the docs/q4k-shape-swap-impact-audit-defect-4 branch May 18, 2026 14:31
noahgift added a commit that referenced this pull request May 18, 2026
#1798)

Adds a chown step BEFORE the cargo step that runs `docker run --rm` as
root and chowns the per-RUN target dir + cargo registry to noah:1000.

## Why

Docker's bind-mount creates missing host directories with the daemon's
uid (root). Since #1693 switched to per-RUN target dirs
(`/mnt/nvme-raid0/targets/aprender-ci/<PR>/run-<RUN_ID>`), every fresh
run gets a root-owned target dir. Cargo (running as uid 1000 inside the
container) cannot write to it and fails with:

    error: failed to create directory `/workspace/target/debug`:
    No such file or directory (os error 2)

The existing post-job chown (line 245) was meant to fix this for the
NEXT run's git-clean — but per-RUN paths invalidate that since each
run gets a brand-new root-owned dir. First-runs always fail.

This was observed across 6+ in-flight PRs (#1784, #1791-#1797) on
2026-05-18 — every "infrastructure flake" turned out to be the same
ownership bug at different cargo entry points.

## Fix

Pre-cargo chown step. Idempotent (`|| true`). Runs the existing
sovereign-ci image as root for the chown, then exits — adds maybe 2s
to runs. Matches the pattern of the post-job chown step that already
exists; just moves it to BEFORE cargo as well.

## Manual one-shot

The 6 currently-stuck PRs were unblocked by manually chowning their
per-RUN dirs on the runner host:

    ssh intel sudo chown -R 1000:1000 \
        /mnt/nvme-raid0/targets/aprender-ci/{1792,1793,1794,1796,1797,main}/run-*

After this PR lands, future runs will fix themselves.

Co-authored-by: Noah Gift <claude@noahgift.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant