Skip to content

contract(tensor-names-v1): v1.0.0 → v1.1.0 — qwen3_moe coverage + F-TNV-002 falsifier#1103

Merged
noahgift merged 2 commits into
mainfrom
feat/qwen3-moe-tensor-names-contract
Apr 28, 2026
Merged

contract(tensor-names-v1): v1.0.0 → v1.1.0 — qwen3_moe coverage + F-TNV-002 falsifier#1103
noahgift merged 2 commits into
mainfrom
feat/qwen3-moe-tensor-names-contract

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Five-whys analysis

Symptom: apr code -p '<prompt>' against Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf (17.3 GB) fails with:

Invalid shape: Tensor 'blk.0.ffn_up.weight' not found
Why Answer
1 tensor_names_fallback.rs:368 hardcodes the dense-FFN GGUF name blk.{n}.ffn_up.weight for the FfnUpWeight role, regardless of architecture.
2 The HuggingFace-side tensor naming branches on architecture, but the GGUF-side _fallback is a single architecture-agnostic string per role.
3 qwen3_moe stores per-expert weights as 3D tensors with different llama.cpp names (ffn_gate_exps, ffn_up_exps, ffn_down_exps) plus a router (ffn_gate_inp). None of these are ffn_up.
4 No falsification test asserted "for every architecture A, every required role R has a template that resolves against a representative .gguf for A". pv validate was passing a silently-incomplete contract.
5 (root) The contract treated GGUF tensor naming as a flat fallback, not as an architecture-aware namespace. New architectures landed as code patches without paired contract gates.

Fix (contract-first, per realizar's CLAUDE.md "NEVER write code before writing a provable contract")

contracts/tensor-names-v1.yaml v1.0.0 → v1.1.0

  • metadata.five_whys_qwen3_moe_gap — full transcript embedded for audit
  • architecture_map: 6 new entries → qwen3_moe (Qwen3MoeForCausalLM, Qwen3MoEForCausalLM, Qwen3CoderForCausalLM, Qwen3_5MoeForCausalLM, qwen3_moe, qwen3moe)
  • layer_roles.ffn_gate_weight / ffn_up_weight / ffn_down_weight: added required_per_arch: { qwen3_moe: false } and templates.qwen3_moe: [] — dense-FFN expectations no longer fire on MoE
  • 4 new layer roles for the MoE namespace:
    • ffn_gate_inp_weight — router projection (hidden → experts)
    • ffn_gate_exps_weight — per-expert gate (3D)
    • ffn_up_exps_weight — per-expert up (3D)
    • ffn_down_exps_weight — per-expert down (3D)
    • Each carries templates.qwen3_moe + _fallback matching llama.cpp's actual GGUF names
  • New falsification_tests.F-TNV-002 predicting templates[qwen3_moe] resolves byte-for-byte against a real qwen3_moe.gguf header

crates/aprender-serve/src/tensor_names_fallback.rs

  • normalize_architecture extended to cover all 6 new architecture_map keys.

crates/aprender-serve/tests/qwen3_moe_tensor_inventory.rs (NEW)

4 F-TNV-002 falsification tests:

  • (a) qwen3_moe_architecture_keys_normalize_correctly — every HF class name routes to qwen3_moe
  • (b) dense_qwen3_unchanged_after_v1_1_0 — regression guard
  • (c) unknown_architecture_still_falls_back_to_llama — invariant from proof_obligations
  • (d) live_gguf_inventory_check_when_present — opens the real Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf and asserts the 4 load-bearing MoE tensor names appear byte-for-byte in the header. Skipped gracefully when the 17 GB file isn't present (so CI passes); runs locally after apr pull qwen3-coder.

What this PR does NOT do (intentionally)

  • Does not implement the MoE forward pass — that's a larger workstream.
  • Does not regenerate tensor_names_generated.rs from YAML — build.rs does that at compile time.

This PR's job is the contract + falsifier so future MoE-implementation work composes against a declarative spec rather than reverse-engineering llama.cpp.

Verification (all green locally)

$ pv validate contracts/tensor-names-v1.yaml
  0 error(s), 0 warning(s)
  Contract is valid.

$ pv lint contracts/tensor-names-v1.yaml
  Result: PASS

$ cargo test -p aprender-serve --test qwen3_moe_tensor_inventory
  test result: ok. 4 passed; 0 failed
  (incl. live GGUF inventory check against 17.3 GB Qwen3-Coder file)

🤖 Generated with Claude Code

…NV-002 falsifier

Five-whys analysis of the Qwen3-Coder-30B-A3B-Instruct.gguf load failure:

  Symptom: `apr code -p '...'` against the 17.3 GB GGUF fails with
           "Invalid shape: Tensor 'blk.0.ffn_up.weight' not found".

  Why 1: tensor_names_fallback.rs:368 hardcodes the dense-FFN GGUF
         name `blk.{n}.ffn_up.weight` for the FfnUpWeight role,
         regardless of the model's `general.architecture` metadata.

  Why 2: The HuggingFace-side tensor naming branches on architecture
         (per-arch templates exist for llama, qwen2, qwen3, qwen3_moe…)
         but the GGUF-side `_fallback` is a single architecture-
         agnostic string per role.

  Why 3: qwen3_moe stores per-expert weights as 3D tensors with
         different llama.cpp names — `blk.{n}.ffn_gate_exps.weight`,
         `blk.{n}.ffn_up_exps.weight`, `blk.{n}.ffn_down_exps.weight`,
         plus a router `blk.{n}.ffn_gate_inp.weight`. None of these
         are `blk.{n}.ffn_up.weight`, so the lookup fails.

  Why 4: No falsification test in the contract framework asserted
         "for every architecture A in architecture_map, every required
         role R has at least one template that resolves against a
         representative .gguf file for A". Without that, `pv validate`
         passes a contract whose GGUF templates are silently
         incomplete.

  Why 5 (root cause): The contract treated "GGUF tensor naming" as a
         flat fallback, not as an architecture-aware namespace. Every
         new architecture lands as a code patch in
         tensor_names_fallback.rs without a paired contract gate. v1.1.0
         adds qwen3_moe as a first-class architecture key with its own
         GGUF templates AND adds an F-TNV-002 falsification gate
         against a real qwen3_moe.gguf tensor inventory.

What ships:

  contracts/tensor-names-v1.yaml (v1.0.0 → v1.1.0):
    - metadata.version 1.0.0 → 1.1.0; added `updated: 2026-04-28`
    - metadata.five_whys_qwen3_moe_gap full transcript embedded
    - architecture_map: 6 new entries pointing to qwen3_moe key
      (Qwen3MoeForCausalLM, Qwen3MoEForCausalLM, Qwen3CoderForCausalLM,
       Qwen3_5MoeForCausalLM, qwen3_moe, qwen3moe)
    - layer_roles.ffn_gate_weight / ffn_up_weight / ffn_down_weight:
      added `required_per_arch: { qwen3_moe: false }` and
      `templates.qwen3_moe: []` so dense-FFN expectations don't fire
      on MoE
    - 4 NEW layer_roles for the MoE namespace:
        ffn_gate_inp_weight  — router projection (hidden → experts)
        ffn_gate_exps_weight — per-expert gate (3D)
        ffn_up_exps_weight   — per-expert up (3D)
        ffn_down_exps_weight — per-expert down (3D)
      Each carries arch templates for qwen3_moe + a GGUF _fallback
      that matches llama.cpp's actual tensor names.
    - falsification_tests: new entry F-TNV-002 with the prediction
      "templates[qwen3_moe] for required MoE roles must resolve against
      a real qwen3_moe.gguf header byte-for-byte" + the cross-check
      command + a falsification oracle anchored to the
      Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf inventory captured by
      `apr inspect` on 2026-04-28.

  crates/aprender-serve/src/tensor_names_fallback.rs:
    - normalize_architecture: added cases for the 4 new HF class names
      that the contract architecture_map declares
      (Qwen3MoEForCausalLM uppercase MoE, Qwen3CoderForCausalLM,
       Qwen3_5MoeForCausalLM) plus lowercase canonical keys
      (qwen3_moe, qwen3moe).

  crates/aprender-serve/tests/qwen3_moe_tensor_inventory.rs (NEW, ~150 LOC):
    - 4 F-TNV-002 falsification tests:
      a) qwen3_moe_architecture_keys_normalize_correctly — every HF
         class name routes to "qwen3_moe"
      b) dense_qwen3_unchanged_after_v1_1_0 — regression guard:
         dense Qwen3 still maps to "qwen3"
      c) unknown_architecture_still_falls_back_to_llama — invariant
         from contract.proof_obligations
      d) live_gguf_inventory_check_when_present — opens the real
         Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf and asserts the 4
         load-bearing MoE tensor names appear at byte level in the
         header (skipped gracefully when the 17 GB file isn't present
         in ~/.apr/models/, so CI doesn't fail; runs locally after
         `apr pull qwen3-coder`)

What this PR does NOT do (intentionally):

  - This PR does NOT implement the MoE forward pass. Adding
    expert routing, per-expert dispatch, and weighted aggregation
    is a separate workstream. v1.1.0's job is the contract +
    falsifier so future implementation can compose against a
    declarative spec rather than reverse-engineering llama.cpp.

  - This PR does NOT regenerate `tensor_names_generated.rs` from
    the YAML — that's done by build.rs at compile time, and the
    F-TNV-002 falsifier in this PR works against the in-tree
    tensor_names_fallback.rs which is the source of truth when the
    YAML isn't present at build time.

Verification (local, this PR):

  $ pv validate contracts/tensor-names-v1.yaml
    0 error(s), 0 warning(s)
    Contract is valid.

  $ pv lint contracts/tensor-names-v1.yaml
    Result: PASS

  $ cargo test -p aprender-serve --test qwen3_moe_tensor_inventory
    test result: ok. 4 passed; 0 failed (incl. live GGUF inventory check)

Refs:
  - Five-whys transcript embedded in contract metadata
  - tensor-names-v1.yaml § falsification_tests F-TNV-002
  - Hugging Face: unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF (sha256
    01b5fec0b9d789c2, 17.3 GB, downloaded via `apr pull qwen3-coder`)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/qwen3-moe-tensor-names-contract branch from 222b6d6 to c96c637 Compare April 28, 2026 10:05
noahgift added a commit to paiml/claude-code-parity-apr that referenced this pull request Apr 28, 2026
Companion-side bookkeeping for the M29 cross-repo fix.

The technical work itself shipped at paiml/aprender#1103 — five-whys
analysis of `apr code` against Qwen3-Coder-30B-A3B-Instruct.gguf
failing with "Tensor 'blk.0.ffn_up.weight' not found", traced to
the contract treating GGUF tensor naming as a flat fallback rather
than an architecture-aware namespace. Fix:

  contracts/tensor-names-v1.yaml v1.0.0 → v1.1.0
    - 6 new architecture_map entries → qwen3_moe
    - dense FFN roles marked required_per_arch.qwen3_moe = false
    - 4 new MoE-specific layer roles
    - F-TNV-002 falsifier validated against the real 17.3 GB GGUF

  crates/aprender-serve/{src,tests}/...:
    - normalize_architecture extended for 6 new HF class names
    - 4 new falsification tests including a live-GGUF-inventory check

Spec relevance: the M28 ccpa measure → apr code --emit-trace
measurement path cannot produce a non-tautological FALSIFY-CCPA-013
discharge against tool-dispatching fixtures until apr-code can
actually run a capable model. M28 + M29 are the two cleanly-
separable enabling steps. Full MoE forward-pass implementation
remains a separate larger workstream.

Contract bump v1.16.0 → v1.17.0 with full five-whys transcript +
the cross-repo fix narrative; aprender contract-mirror at byte-
identical commit 499f8b978; pin.lock refreshed via the M22 4-step
ritual.

Gates (all green locally):
  pv validate / pv lint                       PASS
  pmat comply check (is_compliant)            true, 0 Fail, 12 advisory Warn
  cargo test --workspace                      all pass (0 new tests companion-side)
  scripts/pin-check.sh                        sha256 matches
  scripts/pin-check-roundtrip.sh              byte-identical to aprender@499f8b978

Refs: paiml/aprender#1103 (M29 upstream contract PR)
      contracts/claude-code-parity-apr-v1.yaml § status_history (M29)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/qwen3-moe-tensor-names-contract branch from c96c637 to 6239d8b Compare April 28, 2026 10:15
10 files across 4 crates had accumulated rustfmt drift on main that
was failing `cargo fmt --all -- --check` in CI for any new PR.
Affected files (none touched in this PR's contract / qwen3_moe work):

  crates/aprender-core/src/format/ship_010.rs
  crates/aprender-core/src/format/v2/stamp.rs
  crates/aprender-gpu/src/kernels/backward/mod.rs
  crates/aprender-serve/src/gguf/inference/forward/traced.rs
  crates/aprender-serve/tests/qwen2_gqa_7_1_attention_parity.rs
  crates/aprender-train/src/autograd/cuda_backward/structured.rs
  crates/aprender-train/src/train/gputrain_006.rs
  crates/aprender-train/src/train/pretrain.rs
  crates/aprender-train/src/train/shard_reader.rs
  crates/aprender-train/tests/ship_two_001_const_pinning.rs

Bundled here as the minimum-friction unblock for the qwen3_moe
tensor-names contract PR's CI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/qwen3-moe-tensor-names-contract branch from 6239d8b to 7727ad7 Compare April 28, 2026 10:37
@noahgift noahgift merged commit 15d504c into main Apr 28, 2026
10 checks passed
@noahgift noahgift deleted the feat/qwen3-moe-tensor-names-contract branch April 28, 2026 10:54
noahgift added a commit to paiml/claude-code-parity-apr that referenced this pull request Apr 28, 2026
…oseout (#35)

The companion-side spec markdown's milestone table stopped at M27.
M28 (apr code --emit-trace + Qwen3-Coder default + qwen3-coder
short-name alias) and M29 (five-whys + tensor-names-v1 v1.1.0
contract amendment + F-TNV-002 falsifier) both landed at aprender
main but their narrative hadn't reached the spec.

This PR closes that gap:

  - status snapshot bumped: M0–M30 all SHIPPED, contract v1.18.0
  - new line on the M28+M29 cross-repo enabling chain
  - sub-milestones table extended through M30:
      M28 — cross-repo apr code --emit-trace + default model
      M29 — qwen3_moe contract amendment v1.1.0 + F-TNV-002 falsifier
            (paiml/aprender#1103 merged at 15d504cfe)
      M30 — this spec-table refresh (closeout)
  - outstanding next-goal reframed: MoE forward-pass implementation
    is the only piece remaining for a measured tool-dispatch parity
    score. That's realizar/aprender-serve engineering — not a CCPA
    POC scope item. The contract namespace, falsifier, model
    availability, and emit-trace plumbing are all in place.

State at M30 close:
  - Companion-side spec POC: complete (M0–M30 all SHIPPED)
  - Aprender-side enabling chain (M28+M29): complete
  - Both repos byte-identical at sha256
    7b1d79db710a91786033792a68b32a3cc7396472f7f7a61413c3e87728f88752
  - 13/13 falsification gates green
  - Corpus complete (30/30 fixtures, 15/15 reachable)
  - 100% mutation coverage workspace-wide
  - Companion ↔ aprender drift guard mechanically enforced
  - Contributor onramp documented (CONTRIBUTING.md)
  - Cross-repo audit trail intact across status_history

Contract bump v1.17.0 → v1.18.0 with the M30 status_history entry
documenting the doc closeout. Aprender mirror pushed in paired
commit b7f42619d. pin.lock refreshed via the M22 4-step ritual.

Gates (all green locally):
  pv validate / pv lint                       PASS
  pmat comply check (is_compliant)            true, 0 Fail, 12 advisory Warn
  cargo test --workspace                      all pass (0 new tests)
  scripts/pin-check.sh                        sha256 matches
  scripts/pin-check-roundtrip.sh              byte-identical to aprender@b7f42619d

Refs: paiml/aprender#1103 (M29 contract — merged 15d504cfe)
      contracts/claude-code-parity-apr-v1.yaml § status_history (M30)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 28, 2026
…oE forward gap

M32a — first slice of the MoE forward-pass implementation chain that the
companion claude-code-parity-apr POC named as the "Outstanding next-goal
(in-scope, M32)" in v1.19.0 (M31 spec).

WHY THIS CONTRACT EXISTS
========================
`apr run <qwen3-coder>.gguf` currently fails with:

  Invalid shape: Tensor 'blk.0.ffn_up.weight' not found

at the FFN load step. The M29 contract amendment (tensor-names-v1
v1.1.0, #1103) declared the qwen3_moe tensor namespace
but explicitly deferred the forward-pass implementation. This contract
discharges that deferral with a 4-stage staged plan.

WHAT THIS PR SHIPS
==================
A KernelContract `qwen3-moe-forward-v1.yaml` (DRAFT status) that:

  * Composes existing kernels: tensor-names-v1 v1.1.0 + moe-router-v1 +
    moe-expert-dispatch-v1 + qwen3moe-shapes-v1 + swiglu-kernel-v1 +
    silu-kernel-v1 + rmsnorm-kernel-v1 + rope-kernel-v1
  * Names 5 acceptance criteria (AC_QW3_MOE_001 .. _005)
  * Names 4 implementation stages (M32a SHIPPED, M32b/c/d PENDING)
  * Names 4 falsification tests (F-QW3-MOE-FORWARD-001 REPRODUCED at
    commit 15d504c = end of M29; the other three are PENDING and
    each maps to one stage)
  * Names the Qwen3-Coder-30B-A3B-Instruct shape algebra explicitly
    (L=48, d=2048, d_ff=6144, N_experts=128, k=8, n_heads=32, n_kv=4,
    vocab=151936, RoPE θ=1e7) so the contract is testable on the live
    cached GGUF (~/.cache/pacha/models/2b88b180a790988f.gguf, 17.3 GB)

WHAT M32b/c/d WILL SHIP (in subsequent PRs)
============================================
M32b: Architecture-aware FFN load. Branch transformer_loader.rs (line
      ~145) on tensor_names_fallback::normalize_architecture(...).
      For arch == "qwen3_moe", load the 4 contract-named tensors per
      layer (ffn_gate_inp/ffn_gate_exps/ffn_up_exps/ffn_down_exps)
      into a new MoeLayerWeights field. Forward emits structured
      UnsupportedOperation containing this contract's id.

M32c: Wire CPU MoE forward. The pure-Rust moe_forward_token in
      gpu/scheduler/moe_dispatch.rs already implements the full
      router + per-expert SwiGLU + weighted aggregation kernel.
      Populate MoeExpertWeights from M32b-loaded tensors and call
      it from the FFN dispatch site. After M32c, `apr run` emits
      tokens.

M32d: Numerical parity vs llama.cpp Q4_K (primary) + HF FP16
      (secondary) per CLAUDE.md ground-truth checklist. Discharges
      AC_QW3_MOE_001 and AC_QW3_MOE_005. Flips this contract from
      DRAFT to ACTIVE_RUNTIME and unblocks companion-repo
      FALSIFY-CCPA-013 measured tool-dispatch parity score.

CROSS-REPO LINKS
================
This contract is the aprender-side spine of:

  * paiml/claude-code-parity-apr v1.19.0 (M31 spec, 2026-04-28) —
    "Outstanding next-goal (in-scope, M32)" was created exactly for
    this 4-stage plan; the user clarified at M31 that aprender and
    claude-code-parity-apr are the same monorepo, so this work IS
    in-scope companion-repo work, not "upstream realizar engineering"

  * paiml/aprender contracts/tensor-names-v1.yaml v1.1.0 (M29) —
    declared the namespace this contract operates over

VALIDATION
==========
$ pv validate contracts/qwen3-moe-forward-v1.yaml
0 error(s), 0 warning(s)
Contract is valid.

NO CODE CHANGE in this PR. M32a is contract-only by design; M32b is
where Rust changes start. Authoring contract before code per CLAUDE.md
rule 1 (CB-1400).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant