contract(trace-moe-gpu-sub-stages-v1): v1.0.0 → v1.1.0 — clarify M-MOE-SUB-2 traced target by noahgift · Pull Request #1503 · paiml/aprender

noahgift · 2026-05-05T06:03:47Z

Summary

Code archaeology found pre-existing forward_qwen3_moe_traced (M32d Step 2 work). v1.0.0's M-MOE-SUB-2 description was ambiguous about whether to modify production hot paths. v1.1.0 clarifies: extend the traced sibling functions, NOT production.

Changes

Field	Change
metadata.version	1.0.0 → 1.1.0
metadata.description	+ 23-line v1.1.0 amendment block
M-MOE-SUB-2	Expanded into 3-part (a) extend `forward_qwen3_moe_traced` + (b) NEW `forward_qwen3_moe_cuda_traced` + (c) sibling `moe_ffn_forward_layer_with_router`; explicit blocker "MUST NOT modify production hot paths"
M-MOE-SUB-3	Clarified targets traced forward bodies, not production

Validation

pv validate contracts/trace-moe-gpu-sub-stages-v1.yaml → 0 errors, 0 warnings

Test plan

pv validate clean
CI ci/gate green

🤖 Generated with Claude Code

…E-SUB-2 traced target Code archaeology found pre-existing `forward_qwen3_moe_traced` (M32d Step 2 work) — the natural extension point for SaveTensorPlan capture. v1.0.0's M-MOE-SUB-2 description was ambiguous: "wire into both forward_qwen3_moe AND forward_qwen3_moe_cuda" could be read as modifying the production hot paths, which would force every dense caller to plumb None and add a per-token branch. v1.1.0 clarifies: (a) Extend `forward_qwen3_moe_traced` (CPU traced sibling, pre-existing) to accept Option<&SaveTensorPlan>. (b) Author NEW `forward_qwen3_moe_cuda_traced` (GPU traced sibling, ~150-300 LOC mirroring CPU). (c) Add sibling `moe_ffn_forward_layer_with_router` to qwen3_moe_load.rs (production sibling stays byte-identical). (d) Production hot paths (forward_qwen3_moe + forward_qwen3_moe_cuda) MUST NOT be modified — preserves "additive purity" invariant. UPDATED FIELDS: * metadata.version: 1.0.0 → 1.1.0 * metadata.description: 23-line v1.1.0 amendment block at top * implementation_stages.M-MOE-SUB-2: expanded into 3-part (a)/(b)/(c) description; added explicit blockers ("must NOT modify production") * implementation_stages.M-MOE-SUB-3: clarified targets the **traced** forward bodies (NOT the production hot paths) VALIDATION: pv validate exits 0 errors, 0 warnings. Per CLAUDE.md "NEVER write code before writing a provable contract" — this amendment pins the architectural decision (don't touch production hot paths) BEFORE M-MOE-SUB-2 implementation lands. Refs: M-GPU-MOE-1.4 step (a) instrumentation, R10, qwen3-moe-forward-gpu-v1 v1.4.0, forward_qwen3_moe_traced (M32d Step 2 pre-existing). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… step (c) (#1507) Author additive-pure sibling of `moe_ffn_forward_layer` per `contracts/trace-moe-gpu-sub-stages-v1.yaml` v1.1.0 step (c). ## What ships `moe_ffn_forward_layer_with_router` returns `(output, router_top_k_weights)`: - `output: Vec<f32>` — `[hidden_dim]` aggregated MoE FFN output (the MoeFfnOut SaveTensorStage capture target). - `router_top_k_weights: Vec<f32>` — `[num_experts_per_tok]` post-softmax + renormalize top-k expert weights (the MoeRouter SaveTensorStage capture target). The helper enables traced forward bodies (M-MOE-SUB-2 steps a/b upcoming) to capture both `MoeRouter` and `MoeFfnOut` stages without a second router computation. ## Hot path safety Production `moe_ffn_forward_layer` is unchanged byte-for-byte. The helper duplicates the router/softmax/top-k logic to satisfy the v1.1.0 amendment's additive-purity invariant: "MUST NOT modify production forward_qwen3_moe / forward_qwen3_moe_cuda hot paths". Drift between sibling functions is mitigated by: 1. Two new unit tests asserting the helper's input-validation error messages match the production sibling's error class for the same shape/qtype boundary violations (`hidden.len() != hidden_dim` and `router qtype != F32`). 2. End-to-end byte-identity for realistic GGUF inputs is exercised by the heavy parity tests at `qwen3_moe_gpu_parity.rs` (out of scope for unit tests since they require the cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct GGUF on lambda-vector RTX 4090). ## What this discharges - FALSIFY-MOE-SUB-002 (byte-identity preservation): partial — the helper exists and validates inputs symmetrically with production. Full discharge needs M-MOE-SUB-2 steps (a)+(b) to wire it into traced CPU + GPU forward paths. ## Verification $ cargo test -p aprender-serve --release --lib gguf::qwen3_moe_load::tests 8 passed (including 2 new helper tests) $ cargo clippy -p aprender-serve --lib --release -- -D warnings clean $ rustfmt --check crates/aprender-serve/src/gguf/qwen3_moe_load.rs clean ## What this does NOT ship - M-MOE-SUB-2 step (a): extending `forward_qwen3_moe_traced` to call this helper (CPU-side traced wireup) — separate PR. - M-MOE-SUB-2 step (b): NEW `forward_qwen3_moe_cuda_traced.rs` GPU sibling — separate PR. - Real-GGUF byte-identity test — exercised end-to-end via heavy parity tests, not unit tests. Refs: contracts/trace-moe-gpu-sub-stages-v1.yaml v1.1.0 (M64 companion-spec record, aprender PR #1503) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 5, 2026 06:03

noahgift force-pushed the contract/trace-moe-gpu-sub-stages-v1-1-0-traced-target branch from cae4641 to 087a52b Compare May 5, 2026 06:38

noahgift merged commit 8c4c6d5 into main May 5, 2026
10 checks passed

noahgift deleted the contract/trace-moe-gpu-sub-stages-v1-1-0-traced-target branch May 5, 2026 06:59

This was referenced May 5, 2026

docs(M64): record trace-moe-gpu-sub-stages-v1 v1.0.0 → v1.1.0 amendment SHIPPED paiml/claude-code-parity-apr#50

Merged

feat(aprender-serve): moe_ffn_forward_layer_with_router — M-MOE-SUB-2 step (c) #1507

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contract(trace-moe-gpu-sub-stages-v1): v1.0.0 → v1.1.0 — clarify M-MOE-SUB-2 traced target#1503

contract(trace-moe-gpu-sub-stages-v1): v1.0.0 → v1.1.0 — clarify M-MOE-SUB-2 traced target#1503
noahgift merged 1 commit into
mainfrom
contract/trace-moe-gpu-sub-stages-v1-1-0-traced-target

noahgift commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 5, 2026

Summary

Changes

Validation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant