Skip to content

contract(trace-moe-gpu-sub-stages-v1): v1.0.0 → v1.1.0 — clarify M-MOE-SUB-2 traced target#1503

Merged
noahgift merged 1 commit into
mainfrom
contract/trace-moe-gpu-sub-stages-v1-1-0-traced-target
May 5, 2026
Merged

contract(trace-moe-gpu-sub-stages-v1): v1.0.0 → v1.1.0 — clarify M-MOE-SUB-2 traced target#1503
noahgift merged 1 commit into
mainfrom
contract/trace-moe-gpu-sub-stages-v1-1-0-traced-target

Conversation

@noahgift

@noahgift noahgift commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Code archaeology found pre-existing forward_qwen3_moe_traced (M32d Step 2 work). v1.0.0's M-MOE-SUB-2 description was ambiguous about whether to modify production hot paths. v1.1.0 clarifies: extend the traced sibling functions, NOT production.

Changes

Field Change
metadata.version 1.0.0 → 1.1.0
metadata.description + 23-line v1.1.0 amendment block
M-MOE-SUB-2 Expanded into 3-part (a) extend forward_qwen3_moe_traced + (b) NEW forward_qwen3_moe_cuda_traced + (c) sibling moe_ffn_forward_layer_with_router; explicit blocker "MUST NOT modify production hot paths"
M-MOE-SUB-3 Clarified targets traced forward bodies, not production

Validation

pv validate contracts/trace-moe-gpu-sub-stages-v1.yaml → 0 errors, 0 warnings

Test plan

  • pv validate clean
  • CI ci/gate green

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) May 5, 2026 06:03
…E-SUB-2 traced target

Code archaeology found pre-existing `forward_qwen3_moe_traced` (M32d
Step 2 work) — the natural extension point for SaveTensorPlan
capture. v1.0.0's M-MOE-SUB-2 description was ambiguous: "wire into
both forward_qwen3_moe AND forward_qwen3_moe_cuda" could be read as
modifying the production hot paths, which would force every dense
caller to plumb None and add a per-token branch.

v1.1.0 clarifies:

  (a) Extend `forward_qwen3_moe_traced` (CPU traced sibling,
      pre-existing) to accept Option<&SaveTensorPlan>.
  (b) Author NEW `forward_qwen3_moe_cuda_traced` (GPU traced
      sibling, ~150-300 LOC mirroring CPU).
  (c) Add sibling `moe_ffn_forward_layer_with_router` to
      qwen3_moe_load.rs (production sibling stays byte-identical).
  (d) Production hot paths (forward_qwen3_moe + forward_qwen3_moe_cuda)
      MUST NOT be modified — preserves "additive purity" invariant.

UPDATED FIELDS:

  * metadata.version: 1.0.0 → 1.1.0
  * metadata.description: 23-line v1.1.0 amendment block at top
  * implementation_stages.M-MOE-SUB-2: expanded into 3-part (a)/(b)/(c)
    description; added explicit blockers ("must NOT modify production")
  * implementation_stages.M-MOE-SUB-3: clarified targets the **traced**
    forward bodies (NOT the production hot paths)

VALIDATION: pv validate exits 0 errors, 0 warnings.

Per CLAUDE.md "NEVER write code before writing a provable contract"
— this amendment pins the architectural decision (don't touch
production hot paths) BEFORE M-MOE-SUB-2 implementation lands.

Refs: M-GPU-MOE-1.4 step (a) instrumentation, R10,
      qwen3-moe-forward-gpu-v1 v1.4.0,
      forward_qwen3_moe_traced (M32d Step 2 pre-existing).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the contract/trace-moe-gpu-sub-stages-v1-1-0-traced-target branch from cae4641 to 087a52b Compare May 5, 2026 06:38
@noahgift noahgift merged commit 8c4c6d5 into main May 5, 2026
10 checks passed
@noahgift noahgift deleted the contract/trace-moe-gpu-sub-stages-v1-1-0-traced-target branch May 5, 2026 06:59
noahgift added a commit that referenced this pull request May 5, 2026
… step (c) (#1507)

Author additive-pure sibling of `moe_ffn_forward_layer` per
`contracts/trace-moe-gpu-sub-stages-v1.yaml` v1.1.0 step (c).

## What ships

`moe_ffn_forward_layer_with_router` returns `(output, router_top_k_weights)`:

  - `output: Vec<f32>` — `[hidden_dim]` aggregated MoE FFN output
    (the MoeFfnOut SaveTensorStage capture target).
  - `router_top_k_weights: Vec<f32>` — `[num_experts_per_tok]`
    post-softmax + renormalize top-k expert weights (the MoeRouter
    SaveTensorStage capture target).

The helper enables traced forward bodies (M-MOE-SUB-2 steps a/b
upcoming) to capture both `MoeRouter` and `MoeFfnOut` stages without a
second router computation.

## Hot path safety

Production `moe_ffn_forward_layer` is unchanged byte-for-byte. The
helper duplicates the router/softmax/top-k logic to satisfy the
v1.1.0 amendment's additive-purity invariant: "MUST NOT modify
production forward_qwen3_moe / forward_qwen3_moe_cuda hot paths".

Drift between sibling functions is mitigated by:
1. Two new unit tests asserting the helper's input-validation error
   messages match the production sibling's error class for the same
   shape/qtype boundary violations (`hidden.len() != hidden_dim` and
   `router qtype != F32`).
2. End-to-end byte-identity for realistic GGUF inputs is exercised by
   the heavy parity tests at `qwen3_moe_gpu_parity.rs` (out of scope
   for unit tests since they require the cached 17.3 GB
   Qwen3-Coder-30B-A3B-Instruct GGUF on lambda-vector RTX 4090).

## What this discharges

- FALSIFY-MOE-SUB-002 (byte-identity preservation): partial — the
  helper exists and validates inputs symmetrically with production.
  Full discharge needs M-MOE-SUB-2 steps (a)+(b) to wire it into
  traced CPU + GPU forward paths.

## Verification

  $ cargo test -p aprender-serve --release --lib gguf::qwen3_moe_load::tests
    8 passed (including 2 new helper tests)
  $ cargo clippy -p aprender-serve --lib --release -- -D warnings
    clean
  $ rustfmt --check crates/aprender-serve/src/gguf/qwen3_moe_load.rs
    clean

## What this does NOT ship

- M-MOE-SUB-2 step (a): extending `forward_qwen3_moe_traced` to call
  this helper (CPU-side traced wireup) — separate PR.
- M-MOE-SUB-2 step (b): NEW `forward_qwen3_moe_cuda_traced.rs` GPU
  sibling — separate PR.
- Real-GGUF byte-identity test — exercised end-to-end via heavy
  parity tests, not unit tests.

Refs: contracts/trace-moe-gpu-sub-stages-v1.yaml v1.1.0
      (M64 companion-spec record, aprender PR #1503)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant