contract(tensor-names-v1): v1.0.0 → v1.1.0 — qwen3_moe coverage + F-TNV-002 falsifier#1103
Merged
Merged
Conversation
886c5bc to
222b6d6
Compare
…NV-002 falsifier
Five-whys analysis of the Qwen3-Coder-30B-A3B-Instruct.gguf load failure:
Symptom: `apr code -p '...'` against the 17.3 GB GGUF fails with
"Invalid shape: Tensor 'blk.0.ffn_up.weight' not found".
Why 1: tensor_names_fallback.rs:368 hardcodes the dense-FFN GGUF
name `blk.{n}.ffn_up.weight` for the FfnUpWeight role,
regardless of the model's `general.architecture` metadata.
Why 2: The HuggingFace-side tensor naming branches on architecture
(per-arch templates exist for llama, qwen2, qwen3, qwen3_moe…)
but the GGUF-side `_fallback` is a single architecture-
agnostic string per role.
Why 3: qwen3_moe stores per-expert weights as 3D tensors with
different llama.cpp names — `blk.{n}.ffn_gate_exps.weight`,
`blk.{n}.ffn_up_exps.weight`, `blk.{n}.ffn_down_exps.weight`,
plus a router `blk.{n}.ffn_gate_inp.weight`. None of these
are `blk.{n}.ffn_up.weight`, so the lookup fails.
Why 4: No falsification test in the contract framework asserted
"for every architecture A in architecture_map, every required
role R has at least one template that resolves against a
representative .gguf file for A". Without that, `pv validate`
passes a contract whose GGUF templates are silently
incomplete.
Why 5 (root cause): The contract treated "GGUF tensor naming" as a
flat fallback, not as an architecture-aware namespace. Every
new architecture lands as a code patch in
tensor_names_fallback.rs without a paired contract gate. v1.1.0
adds qwen3_moe as a first-class architecture key with its own
GGUF templates AND adds an F-TNV-002 falsification gate
against a real qwen3_moe.gguf tensor inventory.
What ships:
contracts/tensor-names-v1.yaml (v1.0.0 → v1.1.0):
- metadata.version 1.0.0 → 1.1.0; added `updated: 2026-04-28`
- metadata.five_whys_qwen3_moe_gap full transcript embedded
- architecture_map: 6 new entries pointing to qwen3_moe key
(Qwen3MoeForCausalLM, Qwen3MoEForCausalLM, Qwen3CoderForCausalLM,
Qwen3_5MoeForCausalLM, qwen3_moe, qwen3moe)
- layer_roles.ffn_gate_weight / ffn_up_weight / ffn_down_weight:
added `required_per_arch: { qwen3_moe: false }` and
`templates.qwen3_moe: []` so dense-FFN expectations don't fire
on MoE
- 4 NEW layer_roles for the MoE namespace:
ffn_gate_inp_weight — router projection (hidden → experts)
ffn_gate_exps_weight — per-expert gate (3D)
ffn_up_exps_weight — per-expert up (3D)
ffn_down_exps_weight — per-expert down (3D)
Each carries arch templates for qwen3_moe + a GGUF _fallback
that matches llama.cpp's actual tensor names.
- falsification_tests: new entry F-TNV-002 with the prediction
"templates[qwen3_moe] for required MoE roles must resolve against
a real qwen3_moe.gguf header byte-for-byte" + the cross-check
command + a falsification oracle anchored to the
Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf inventory captured by
`apr inspect` on 2026-04-28.
crates/aprender-serve/src/tensor_names_fallback.rs:
- normalize_architecture: added cases for the 4 new HF class names
that the contract architecture_map declares
(Qwen3MoEForCausalLM uppercase MoE, Qwen3CoderForCausalLM,
Qwen3_5MoeForCausalLM) plus lowercase canonical keys
(qwen3_moe, qwen3moe).
crates/aprender-serve/tests/qwen3_moe_tensor_inventory.rs (NEW, ~150 LOC):
- 4 F-TNV-002 falsification tests:
a) qwen3_moe_architecture_keys_normalize_correctly — every HF
class name routes to "qwen3_moe"
b) dense_qwen3_unchanged_after_v1_1_0 — regression guard:
dense Qwen3 still maps to "qwen3"
c) unknown_architecture_still_falls_back_to_llama — invariant
from contract.proof_obligations
d) live_gguf_inventory_check_when_present — opens the real
Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf and asserts the 4
load-bearing MoE tensor names appear at byte level in the
header (skipped gracefully when the 17 GB file isn't present
in ~/.apr/models/, so CI doesn't fail; runs locally after
`apr pull qwen3-coder`)
What this PR does NOT do (intentionally):
- This PR does NOT implement the MoE forward pass. Adding
expert routing, per-expert dispatch, and weighted aggregation
is a separate workstream. v1.1.0's job is the contract +
falsifier so future implementation can compose against a
declarative spec rather than reverse-engineering llama.cpp.
- This PR does NOT regenerate `tensor_names_generated.rs` from
the YAML — that's done by build.rs at compile time, and the
F-TNV-002 falsifier in this PR works against the in-tree
tensor_names_fallback.rs which is the source of truth when the
YAML isn't present at build time.
Verification (local, this PR):
$ pv validate contracts/tensor-names-v1.yaml
0 error(s), 0 warning(s)
Contract is valid.
$ pv lint contracts/tensor-names-v1.yaml
Result: PASS
$ cargo test -p aprender-serve --test qwen3_moe_tensor_inventory
test result: ok. 4 passed; 0 failed (incl. live GGUF inventory check)
Refs:
- Five-whys transcript embedded in contract metadata
- tensor-names-v1.yaml § falsification_tests F-TNV-002
- Hugging Face: unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF (sha256
01b5fec0b9d789c2, 17.3 GB, downloaded via `apr pull qwen3-coder`)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
222b6d6 to
c96c637
Compare
noahgift
added a commit
to paiml/claude-code-parity-apr
that referenced
this pull request
Apr 28, 2026
Companion-side bookkeeping for the M29 cross-repo fix. The technical work itself shipped at paiml/aprender#1103 — five-whys analysis of `apr code` against Qwen3-Coder-30B-A3B-Instruct.gguf failing with "Tensor 'blk.0.ffn_up.weight' not found", traced to the contract treating GGUF tensor naming as a flat fallback rather than an architecture-aware namespace. Fix: contracts/tensor-names-v1.yaml v1.0.0 → v1.1.0 - 6 new architecture_map entries → qwen3_moe - dense FFN roles marked required_per_arch.qwen3_moe = false - 4 new MoE-specific layer roles - F-TNV-002 falsifier validated against the real 17.3 GB GGUF crates/aprender-serve/{src,tests}/...: - normalize_architecture extended for 6 new HF class names - 4 new falsification tests including a live-GGUF-inventory check Spec relevance: the M28 ccpa measure → apr code --emit-trace measurement path cannot produce a non-tautological FALSIFY-CCPA-013 discharge against tool-dispatching fixtures until apr-code can actually run a capable model. M28 + M29 are the two cleanly- separable enabling steps. Full MoE forward-pass implementation remains a separate larger workstream. Contract bump v1.16.0 → v1.17.0 with full five-whys transcript + the cross-repo fix narrative; aprender contract-mirror at byte- identical commit 499f8b978; pin.lock refreshed via the M22 4-step ritual. Gates (all green locally): pv validate / pv lint PASS pmat comply check (is_compliant) true, 0 Fail, 12 advisory Warn cargo test --workspace all pass (0 new tests companion-side) scripts/pin-check.sh sha256 matches scripts/pin-check-roundtrip.sh byte-identical to aprender@499f8b978 Refs: paiml/aprender#1103 (M29 upstream contract PR) contracts/claude-code-parity-apr-v1.yaml § status_history (M29) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
c96c637 to
6239d8b
Compare
10 files across 4 crates had accumulated rustfmt drift on main that was failing `cargo fmt --all -- --check` in CI for any new PR. Affected files (none touched in this PR's contract / qwen3_moe work): crates/aprender-core/src/format/ship_010.rs crates/aprender-core/src/format/v2/stamp.rs crates/aprender-gpu/src/kernels/backward/mod.rs crates/aprender-serve/src/gguf/inference/forward/traced.rs crates/aprender-serve/tests/qwen2_gqa_7_1_attention_parity.rs crates/aprender-train/src/autograd/cuda_backward/structured.rs crates/aprender-train/src/train/gputrain_006.rs crates/aprender-train/src/train/pretrain.rs crates/aprender-train/src/train/shard_reader.rs crates/aprender-train/tests/ship_two_001_const_pinning.rs Bundled here as the minimum-friction unblock for the qwen3_moe tensor-names contract PR's CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6239d8b to
7727ad7
Compare
noahgift
added a commit
to paiml/claude-code-parity-apr
that referenced
this pull request
Apr 28, 2026
…oseout (#35) The companion-side spec markdown's milestone table stopped at M27. M28 (apr code --emit-trace + Qwen3-Coder default + qwen3-coder short-name alias) and M29 (five-whys + tensor-names-v1 v1.1.0 contract amendment + F-TNV-002 falsifier) both landed at aprender main but their narrative hadn't reached the spec. This PR closes that gap: - status snapshot bumped: M0–M30 all SHIPPED, contract v1.18.0 - new line on the M28+M29 cross-repo enabling chain - sub-milestones table extended through M30: M28 — cross-repo apr code --emit-trace + default model M29 — qwen3_moe contract amendment v1.1.0 + F-TNV-002 falsifier (paiml/aprender#1103 merged at 15d504cfe) M30 — this spec-table refresh (closeout) - outstanding next-goal reframed: MoE forward-pass implementation is the only piece remaining for a measured tool-dispatch parity score. That's realizar/aprender-serve engineering — not a CCPA POC scope item. The contract namespace, falsifier, model availability, and emit-trace plumbing are all in place. State at M30 close: - Companion-side spec POC: complete (M0–M30 all SHIPPED) - Aprender-side enabling chain (M28+M29): complete - Both repos byte-identical at sha256 7b1d79db710a91786033792a68b32a3cc7396472f7f7a61413c3e87728f88752 - 13/13 falsification gates green - Corpus complete (30/30 fixtures, 15/15 reachable) - 100% mutation coverage workspace-wide - Companion ↔ aprender drift guard mechanically enforced - Contributor onramp documented (CONTRIBUTING.md) - Cross-repo audit trail intact across status_history Contract bump v1.17.0 → v1.18.0 with the M30 status_history entry documenting the doc closeout. Aprender mirror pushed in paired commit b7f42619d. pin.lock refreshed via the M22 4-step ritual. Gates (all green locally): pv validate / pv lint PASS pmat comply check (is_compliant) true, 0 Fail, 12 advisory Warn cargo test --workspace all pass (0 new tests) scripts/pin-check.sh sha256 matches scripts/pin-check-roundtrip.sh byte-identical to aprender@b7f42619d Refs: paiml/aprender#1103 (M29 contract — merged 15d504cfe) contracts/claude-code-parity-apr-v1.yaml § status_history (M30) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
Apr 28, 2026
…oE forward gap M32a — first slice of the MoE forward-pass implementation chain that the companion claude-code-parity-apr POC named as the "Outstanding next-goal (in-scope, M32)" in v1.19.0 (M31 spec). WHY THIS CONTRACT EXISTS ======================== `apr run <qwen3-coder>.gguf` currently fails with: Invalid shape: Tensor 'blk.0.ffn_up.weight' not found at the FFN load step. The M29 contract amendment (tensor-names-v1 v1.1.0, #1103) declared the qwen3_moe tensor namespace but explicitly deferred the forward-pass implementation. This contract discharges that deferral with a 4-stage staged plan. WHAT THIS PR SHIPS ================== A KernelContract `qwen3-moe-forward-v1.yaml` (DRAFT status) that: * Composes existing kernels: tensor-names-v1 v1.1.0 + moe-router-v1 + moe-expert-dispatch-v1 + qwen3moe-shapes-v1 + swiglu-kernel-v1 + silu-kernel-v1 + rmsnorm-kernel-v1 + rope-kernel-v1 * Names 5 acceptance criteria (AC_QW3_MOE_001 .. _005) * Names 4 implementation stages (M32a SHIPPED, M32b/c/d PENDING) * Names 4 falsification tests (F-QW3-MOE-FORWARD-001 REPRODUCED at commit 15d504c = end of M29; the other three are PENDING and each maps to one stage) * Names the Qwen3-Coder-30B-A3B-Instruct shape algebra explicitly (L=48, d=2048, d_ff=6144, N_experts=128, k=8, n_heads=32, n_kv=4, vocab=151936, RoPE θ=1e7) so the contract is testable on the live cached GGUF (~/.cache/pacha/models/2b88b180a790988f.gguf, 17.3 GB) WHAT M32b/c/d WILL SHIP (in subsequent PRs) ============================================ M32b: Architecture-aware FFN load. Branch transformer_loader.rs (line ~145) on tensor_names_fallback::normalize_architecture(...). For arch == "qwen3_moe", load the 4 contract-named tensors per layer (ffn_gate_inp/ffn_gate_exps/ffn_up_exps/ffn_down_exps) into a new MoeLayerWeights field. Forward emits structured UnsupportedOperation containing this contract's id. M32c: Wire CPU MoE forward. The pure-Rust moe_forward_token in gpu/scheduler/moe_dispatch.rs already implements the full router + per-expert SwiGLU + weighted aggregation kernel. Populate MoeExpertWeights from M32b-loaded tensors and call it from the FFN dispatch site. After M32c, `apr run` emits tokens. M32d: Numerical parity vs llama.cpp Q4_K (primary) + HF FP16 (secondary) per CLAUDE.md ground-truth checklist. Discharges AC_QW3_MOE_001 and AC_QW3_MOE_005. Flips this contract from DRAFT to ACTIVE_RUNTIME and unblocks companion-repo FALSIFY-CCPA-013 measured tool-dispatch parity score. CROSS-REPO LINKS ================ This contract is the aprender-side spine of: * paiml/claude-code-parity-apr v1.19.0 (M31 spec, 2026-04-28) — "Outstanding next-goal (in-scope, M32)" was created exactly for this 4-stage plan; the user clarified at M31 that aprender and claude-code-parity-apr are the same monorepo, so this work IS in-scope companion-repo work, not "upstream realizar engineering" * paiml/aprender contracts/tensor-names-v1.yaml v1.1.0 (M29) — declared the namespace this contract operates over VALIDATION ========== $ pv validate contracts/qwen3-moe-forward-v1.yaml 0 error(s), 0 warning(s) Contract is valid. NO CODE CHANGE in this PR. M32a is contract-only by design; M32b is where Rust changes start. Authoring contract before code per CLAUDE.md rule 1 (CB-1400). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Five-whys analysis
Symptom:
apr code -p '<prompt>'against Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf (17.3 GB) fails with:tensor_names_fallback.rs:368hardcodes the dense-FFN GGUF nameblk.{n}.ffn_up.weightfor the FfnUpWeight role, regardless of architecture._fallbackis a single architecture-agnostic string per role.ffn_gate_exps,ffn_up_exps,ffn_down_exps) plus a router (ffn_gate_inp). None of these areffn_up.pv validatewas passing a silently-incomplete contract.Fix (contract-first, per realizar's CLAUDE.md "NEVER write code before writing a provable contract")
contracts/tensor-names-v1.yaml v1.0.0 → v1.1.0
metadata.five_whys_qwen3_moe_gap— full transcript embedded for auditarchitecture_map: 6 new entries →qwen3_moe(Qwen3MoeForCausalLM, Qwen3MoEForCausalLM, Qwen3CoderForCausalLM, Qwen3_5MoeForCausalLM, qwen3_moe, qwen3moe)layer_roles.ffn_gate_weight / ffn_up_weight / ffn_down_weight: addedrequired_per_arch: { qwen3_moe: false }andtemplates.qwen3_moe: []— dense-FFN expectations no longer fire on MoEffn_gate_inp_weight— router projection (hidden → experts)ffn_gate_exps_weight— per-expert gate (3D)ffn_up_exps_weight— per-expert up (3D)ffn_down_exps_weight— per-expert down (3D)templates.qwen3_moe+_fallbackmatching llama.cpp's actual GGUF namesfalsification_tests.F-TNV-002predictingtemplates[qwen3_moe]resolves byte-for-byte against a real qwen3_moe.gguf headercrates/aprender-serve/src/tensor_names_fallback.rs
normalize_architectureextended to cover all 6 newarchitecture_mapkeys.crates/aprender-serve/tests/qwen3_moe_tensor_inventory.rs (NEW)
4 F-TNV-002 falsification tests:
qwen3_moe_architecture_keys_normalize_correctly— every HF class name routes toqwen3_moedense_qwen3_unchanged_after_v1_1_0— regression guardunknown_architecture_still_falls_back_to_llama— invariant fromproof_obligationslive_gguf_inventory_check_when_present— opens the real Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf and asserts the 4 load-bearing MoE tensor names appear byte-for-byte in the header. Skipped gracefully when the 17 GB file isn't present (so CI passes); runs locally afterapr pull qwen3-coder.What this PR does NOT do (intentionally)
tensor_names_generated.rsfrom YAML —build.rsdoes that at compile time.This PR's job is the contract + falsifier so future MoE-implementation work composes against a declarative spec rather than reverse-engineering llama.cpp.
Verification (all green locally)
🤖 Generated with Claude Code