Skip to content

feat(aprender-serve): SaveTensorStage gains MoeRouter + MoeFfnOut — M-MOE-SUB-1#1499

Merged
noahgift merged 1 commit into
mainfrom
feat/save-tensor-stage-moe-router-and-ffn-out-m-moe-sub-1
May 5, 2026
Merged

feat(aprender-serve): SaveTensorStage gains MoeRouter + MoeFfnOut — M-MOE-SUB-1#1499
noahgift merged 1 commit into
mainfrom
feat/save-tensor-stage-moe-router-and-ffn-out-m-moe-sub-1

Conversation

@noahgift

@noahgift noahgift commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Per trace-moe-gpu-sub-stages-v1.yaml v1.0.0 (PR #1498), implements M-MOE-SUB-1: extend the parent SaveTensorStage enum with the 2 new mandatory MoE-GPU bisection variants needed for M-GPU-MOE-1.4.

Changes

Element Change
SaveTensorStage enum 2 new variants MoeRouter + MoeFfnOut
ALL array 20 → 22 entries, in canonical order
canonical_name() 2 new arms (moe_router, moe_ffn_out)
FromStr parser 2 new accepted strings
Existing tests counts updated 20→22 / 18→20
New tests 5 falsify_moe_sub_001_* tests

Discharges

  • FALSIFY-MOE-SUB-001 ✓ (per contract trace-moe-gpu-sub-stages-v1.yaml)

NOT yet discharged (separate scope)

  • FALSIFY-MOE-SUB-002 (byte-identity preservation — needs M-MOE-SUB-2/3 wiring)
  • FALSIFY-MOE-SUB-003 (live bisection on lambda-vector RTX 4090)
  • FALSIFY-MOE-SUB-004 (fix PR cites bisected stage by name)

Verification

cargo test -p aprender-serve --lib save_tensor_stage
→ 29 passed; 0 failed; 0 ignored
rustfmt --check → exit 0

Stacks on

PR #1498 (trace-moe-gpu-sub-stages-v1 v1.0.0 contract scaffold).

Test plan

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) May 5, 2026 03:44
@noahgift noahgift force-pushed the feat/save-tensor-stage-moe-router-and-ffn-out-m-moe-sub-1 branch from c59abb0 to 264c22d Compare May 5, 2026 04:11
…-MOE-SUB-1

Per `contracts/trace-moe-gpu-sub-stages-v1.yaml` v1.0.0 (PR #1498),
M-MOE-SUB-1 implementation: extend the parent SaveTensorStage enum
with the 2 new mandatory MoE-GPU bisection variants.

WHAT THIS PR ADDS:

  * `MoeRouter` — top-k expert weights post-softmax + renormalize,
    per-layer, captured AFTER FfnNorm and BEFORE per-expert SwiGLU.
  * `MoeFfnOut` — aggregated MoE FFN output Σ w_e * expert_out_e,
    per-layer, captured AFTER all per-expert computations.

CHANGES:

  * `SaveTensorStage` enum: 2 new variants between `PostFfnResidual`
    and `FinalNorm`.
  * `ALL` array: 20 → 22 entries; new variants in canonical position.
  * `canonical_name()`: 2 new arms returning "moe_router" + "moe_ffn_out".
  * `FromStr::from_str()` lowercase match: 2 new accepted strings.
  * `is_per_layer_count_matches_contract` test: per_layer 18 → 20,
    total 20 → 22.
  * `canonical_names_match_contract_enumeration` test: expected[]
    grows to 22 with new entries in canonical order.
  * 5 new `falsify_moe_sub_001_*` tests:
      - `moe_router_round_trip`
      - `moe_ffn_out_round_trip`
      - `2_new_stages_in_canonical_order` (ALL position assertions)
      - `parse_list_accepts_2_new_stages_together`
      - `parse_list_accepts_full_moe_block_chain` (3-stage bisection
        chain `ffn_norm,moe_router,moe_ffn_out`)

DISCHARGES: FALSIFY-MOE-SUB-001 (per contract).

NOT DISCHARGED YET (separate follow-up scope per contract):

  * FALSIFY-MOE-SUB-002 (byte-identity preservation) — needs the
    instrumentation PR (M-MOE-SUB-2 + 3) that wires MoeRouter +
    MoeFfnOut capture into both forward bodies.
  * FALSIFY-MOE-SUB-003 (live bisection on lambda-vector RTX 4090).
  * FALSIFY-MOE-SUB-004 (fix PR cites bisected stage by name).

VERIFICATION:

  cargo test -p aprender-serve --lib save_tensor_stage
  → 29 passed; 0 failed; 0 ignored

  rustfmt --check  → exit 0

DEPENDS ON: PR #1498 (`trace-moe-gpu-sub-stages-v1` v1.0.0 contract
scaffold) being on aprender main first. The contract pinned the
new variants' names + semantics; this PR implements them.

Refs: M-GPU-MOE-1.4 step (a) instrumentation, R10,
      qwen3-moe-forward-gpu-v1 v1.4.0,
      trace-moe-gpu-sub-stages-v1 v1.0.0,
      FALSIFY-MOE-SUB-001 (DISCHARGED).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/save-tensor-stage-moe-router-and-ffn-out-m-moe-sub-1 branch from 264c22d to e645913 Compare May 5, 2026 04:16
@noahgift noahgift merged commit b519866 into main May 5, 2026
10 checks passed
@noahgift noahgift deleted the feat/save-tensor-stage-moe-router-and-ffn-out-m-moe-sub-1 branch May 5, 2026 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant