feat(m-gpu-moe-3): PR-3e2 MoeRouterIndices stage + L47 expert-set falsifier — H(ii) CONFIRMED (#1583) by noahgift · Pull Request #1743 · paiml/aprender

noahgift · 2026-05-17T10:37:15Z

Summary

PR-3e2 of the #1583 M-GPU-MOE-3 cascade. Adds SaveTensorStage::MoeRouterIndices to definitively confirm or falsify H(ii) expert-set divergence at L47.

Hardware verdict — H(ii) CONFIRMED (lambda-vector RTX 4090, 2026-05-17)

L47 sorted top-8:
  cpu = [  2,  20,  36,  57,  60,  73, 111, 120 ]
  gpu = [  2,  12,  36,  57,  60, 103, 111, 120 ]
                ^^^                ^^^
                cpu-only={20, 73}; gpu-only={12, 103}

CPU and GPU agree on 6 of 8 experts at L47 but disagree on 2 (mild H(ii) confirmation per the test docstring matrix). All other 47 layers produce identical expert SETS between CPU and GPU.

Root cause: by L47 the accumulated post-routing drift (from per-expert q6k_gemv fp64 accumulation through 47 layers' worth of MoeFfnOut) has perturbed the gate input enough that two boundary expert scores swap. The resulting FFN output diverges by O(1) because the disjoint experts produce unrelated outputs.

Fix space (PR-3f+)

Deterministic tie-breaking: sort top-k by (-prob, +index) — when CPU and GPU agree on score magnitude but disagree marginally on probability, both pick the lower-index expert
fp64 gate softmax: keep W_gate @ x → softmax → renormalize at fp64; only quantize to f32 after top-k selection
Reorder-stable top-k: stable partial sort with ε-tolerance on the (k+1)-th vs k-th score boundary

What's in this PR

inference_trace/save_tensor_stage.rs: new MoeRouterIndices variant, ALL/per_layer counts updated, tests renamed *_twenty_* → *_stages_*
gguf/qwen3_moe_load.rs + gguf/cuda/moe_ffn_forward_layer_cuda.rs: _with_router helpers return (output, weights, indices) instead of (output, weights)
gguf/inference/forward/forward_qwen3_moe_traced.rs + gguf/cuda/forward_qwen3_moe_cuda_traced.rs: emit MoeRouterIndices (indices cast to f32, lossless for num_experts ≤ 2^24)
tests/qwen3_moe_per_layer_gpu_parity.rs: helpers + new test falsify_qw3_moe_l47_router_indices — definitive H(ii) falsifier

Cascade context

PR	Status
PR-1 #1713	✅ shipped
PR-2 #1737	✅ shipped
PR-3 verify	✅ ran
PR-3b #1739	✅ contract v1.7.1
PR-3c #1740	✅ scope doc
PR-3d	✅ H(i) FALSIFIED
PR-3e #1741	✅ router-weight probe
PR-3e2 (this)	✅ H(ii) CONFIRMED
PR-3f+	pending — fix

Test plan

Cuda build clean (cargo build -p aprender-serve --features cuda)
SaveTensorStage unit tests pass (29/29 lib tests)
falsify_qw3_moe_l47_router_indices runs on RTX 4090 in 29.5s
H(ii) verdict printed at end with sorted index sets diff

Reproduction

cargo test --release --features cuda \
  -p aprender-serve --test qwen3_moe_per_layer_gpu_parity \
  falsify_qw3_moe_l47_router_indices \
  -- --ignored --nocapture

🤖 Generated with Claude Code

…GN (#1583) PR-3g of the M-GPU-MOE-3 cascade. Adds the canonical "is L47 actually user-visible" falsifier, runs 4 canonical prompts through both CPU and GPU full forwards, and asserts argmax agreement. ## Result (lambda-vector RTX 4090, 2026-05-17) PROMPT | CPU argmax (val) | GPU argmax (val) canonical_3tok | 944 ( 13.7270) | 944 ( 14.4133) ✓ single_tok_785 | 220 ( 15.5523) | 25 ( 18.5098) ✗ MISMATCH multi_tok_short | 315 ( 26.2279) | 315 ( 25.5230) ✓ multi_tok_code | 198 ( 17.7453) | 198 ( 17.8433) ✓ **3/4 prompts agree, 1 disagrees.** L47 cliff is NOT benign — the expert-set divergence DOES flip the top-1 predicted token for some prompts (~25% in this small sample). Option E (Accept) is off the table; must pursue Option C (fp64 in per-expert SwiGLU). ## What this PR adds crates/aprender-serve/tests/qwen3_moe_gpu_parity.rs: + new test `falsify_qw3_moe_gpu_argmax_agreement` — multi-prompt probe that builds CPU + GPU models once, runs 4 canonical prompts through both full forwards, and prints argmax agreement table + verdict. PROBE not hard-assert; prints "BENIGN" if all agree or "NOT BENIGN" + disagreeing prompts otherwise. ## Cascade context - PR-1 #1713 ✅ per-layer cos falsifier - PR-2 #1737 ✅ q6k_gemv fp64 accumulators - PR-3 ✅ hardware verify — 47/48 PASS, L47 surfaces - PR-3b #1739 ✅ contract v1.7.0 → v1.7.1 - PR-3c #1740 ✅ scope-doc + L47 sub-cascade - PR-3d ✅ H(i) qtype-mismatch FALSIFIED - PR-3e #1741 ✅ router-weight probe - PR-3e2 #1743 ✅ H(ii) CONFIRMED (2-of-8 expert swap) - PR-3f1 ❌ falsified (fp64 softmax) — dropped - PR-3f2 ❌ falsified (f64 weighted-sum) — dropped - PR-3g ✅ **THIS PR** — L47 NOT BENIGN, must pursue fix - PR-3h pending — Option C fp64 in per-expert SwiGLU intermediates ## Why the cascade kept eliminating candidates The 3-falsifier sequence ruled out the "easy" fix locations: 1. PR-3f1 (gate softmax precision) — drift upstream of softmax 2. PR-3f2 (weighted-sum precision) — drift upstream of weighted-sum 3. **Remaining**: drift inside each per-expert SwiGLU's intermediate chain (silu × up at f32, down-proj at f32 except its q6k_gemv acc which PR-2 already promoted to fp64) PR-3h must promote the silu(gate) × up element-wise multiply and the hidden-dim×4 intermediate state to f64. ~30-50 LOC across both CPU and CUDA expert_swiglu helpers. ## Reproduction cargo test --release --features cuda \ -p aprender-serve --test qwen3_moe_gpu_parity \ falsify_qw3_moe_gpu_argmax_agreement \ -- --ignored --nocapture ~25s on RTX 4090. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-3 PR-3 cascade CLOSED, L47 marked KNOWN_DIVERGENCE_NOT_BENIGN (#1747) * docs(contracts): qwen3-moe-forward-gpu-v1 v1.7.0 → v1.7.1 — M-GPU-MOE-3 PR-2 verified, L47 surfaced Hardware-verification amendment after M-GPU-MOE-3 PR-2 landed on main (#1737, 88ce47f — q6k_gemv fp64 accumulators). PR-3 ran the per-layer FALSIFY-QW3-MOE-PER-LAYER-001 falsifier on lambda-vector (RTX 4090) against Qwen3-Coder-30B-A3B-Instruct-Q4_K_M on 2026-05-17. Result: 47/48 decoder layers cos ≥ 0.99 (PASS). One layer (L47, the final decoder layer) sits at cos=0.961236 — 3σ below the L40-L46 cluster (~0.998). Full 48-layer cos vector logged in GitHub comment on #1583 (issuecomment-4470195446). The 7 originally-cited problem layers (L7/L9/L12/L20/L23/L29/L46, v1.7.0 amendment lines 41-45) ALL lifted above 0.99 — PR-2 was a real win. L47 was previously undetected because no per-layer falsifier existed in-tree; PR-1 of this cascade (#1713) closed that gap and surfaced the L47 anomaly. WHAT FLIPS: metadata.version 1.7.0 → 1.7.1 bottom-of-file version: "1.7.0" → "1.7.1" bottom-of-file status comment refreshed: "1.x cascade DISCHARGED — wgpu (2) + throughput (3) PENDING" → "47/48 layers cos≥0.99 post-PR #1737; L47 single-layer cascade PENDING" AC_GPU_MOE_001 stage status text refresh (text-only — not yet refactored into a new amendment_history entry since this PR is scoped to the v1.7.1 amendment block only). WHAT STAYS PENDING: - L47 single-layer cascade — root cause unknown. Three candidate hypotheses captured in the v1.7.1 amendment block (qtype mismatch, MoE expert distribution, stride/shape boundary). Forthcoming PR-3c surfaces §85 (or next-available section) covering the L47 cascade. Forthcoming PR-3d+: per-tensor histogram on L47 before authoring fix. - M-GPU-MOE-2 (wgpu fallback) — unchanged - M-GPU-MOE-3 PR-4 throughput — unchanged YAML-ONLY: Production hot paths byte-unchanged. Additive-purity invariant pinned in v1.1.0 still holds. Contract validates via: cargo run -p aprender-contracts-cli --bin pv -- \ validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 error(s), 0 warning(s), Contract is valid. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(contracts): qwen3-moe-forward-gpu-v1 v1.7.1 → v1.7.2 — M-GPU-MOE-3 PR-3 cascade CLOSED, L47 marked KNOWN_DIVERGENCE_NOT_BENIGN Terminal amendment for the M-GPU-MOE-3 PR-3 sub-cascade. After v1.7.1 surfaced L47 as a single-layer cliff (cos=0.961236 post fp64 q6k_gemv acc, PR-2 #1737), the cascade ran a 5-step falsifier sequence (PRs #1737, #1739-1745 + 4 #1583 comments) to pin the root cause and verify user-visible impact. OUTCOME PR-3 ✅ 47/48 layers cos ≥ 0.99, L47 alone at 0.961236 PR-3d ❌ H(i) qtype-mismatch FALSIFIED PR-3e ✅ #1741 — L47 first divergent router (cos 0.9926) PR-3e2 ✅ #1743 — H(ii) CONFIRMED, 2-of-8 expert swap at L47 PR-3f1 ❌ fp64 gate softmax FALSIFIED — drift upstream PR-3f2 ❌ f64 weighted-sum FALSIFIED — drift upstream PR-3g ✅ #1745 — multi-prompt argmax: 3/4 agree, 1/4 disagrees → L47 NOT BENIGN (~25% prompt-dependent impact) ROOT CAUSE (by elimination) Per-expert SwiGLU f32 intermediates: 1. gate_proj @ hidden ← fp64 acc thanks to PR-2 ✅ 2. silu(gate) ← f32 ✗ 3. silu(gate) × up_proj ← f32 multiply on 8192-element vector ✗ 4. down_proj @ above ← fp64 acc thanks to PR-2 ✅ Fix scope = PR-3h: promote silu × up multiply + intermediate state to f64 in both expert_swiglu_quantized (CPU, simple) and expert_swiglu_cuda (GPU, requires unfusing/refusing the SwiGLU kernel). Multi-week kernel work. STATUS FLIPS metadata.version: 1.7.1 → 1.7.2 metadata.status: ACTIVE_ALGORITHM_LEVEL (unchanged) AC_GPU_MOE_001: 47/48 layers ALGORITHM_LEVEL_DISCHARGED + L47 KNOWN_DIVERGENCE_NOT_BENIGN WHAT STAYS PENDING - PR-3h fp64 per-expert SwiGLU (multi-week) - M-GPU-MOE-2 wgpu fallback (#1582) - M-GPU-MOE-3 PR-4 throughput (independent of L47 fix; unblocked by this amendment) WHY NOT KNOWN_BUG L47 is a numerical-precision artifact, not a correctness bug. CPU and GPU follow the same algorithm against the same weights; only the order of f32 accumulation inside the per-expert SwiGLU differs. Both pick legitimate top-8 sets at L47 — neither is wrong — but the small score-perturbation crosses a top-k boundary. Same class as gemv reduction-order variance, one call-stack level higher. REGRESSION GATE FOR PR-3h - falsify_qw3_moe_l47_router_indices (#1743): expect CPU L47 sorted top-8 == GPU L47 sorted top-8 - falsify_qw3_moe_gpu_argmax_agreement (#1745): expect 4/4 prompts argmax agreement YAML-ONLY Production hot paths byte-unchanged. Additive-purity invariant pinned in v1.1.0 still holds. Contract validates via: cargo run -p aprender-contracts-cli --bin pv -- \ validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 error(s), 0 warning(s), Contract is valid. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift · 2026-05-17T12:44:39Z

Needs rebase — conflicts with #1741 (PR-3e router-weight probe)

After #1741 landed on main, this PR (PR-3e2) has a merge conflict on crates/aprender-serve/tests/qwen3_moe_per_layer_gpu_parity.rs — both PRs added new test functions and helpers to that file.

Rebase recipe

git worktree add /tmp/m-gpu-moe-3-pr3e2-rebase feat/m-gpu-moe-3-pr3e2-router-indices
cd /tmp/m-gpu-moe-3-pr3e2-rebase
git fetch origin main
git rebase origin/main
# In the test file, keep BOTH:
#   - the PR-3e probe helper `make_router_and_ffn_out_plan` + test `falsify_qw3_moe_l47_router_probe` (from #1741, now on main)
#   - the PR-3e2 helpers `make_router_indices_plan` + `read_indices_stage_file` + test `falsify_qw3_moe_l47_router_indices` (from this branch)
# Resolve, stage, --continue
git push --force-with-lease origin feat/m-gpu-moe-3-pr3e2-router-indices

The other 5 files (non-test) shouldn't conflict — #1741 only touched the test file.

Status if not rebased

This PR's behavioral payload (the H(ii)-CONFIRMED finding) has already been captured by the v1.7.2 contract amendment (#1747, MERGED). The MoeRouterIndices stage itself + the falsify_qw3_moe_l47_router_indices test would still be useful as a regression gate when PR-3h lands, but the cascade narrative is complete without them on main.

So this PR is nice-to-have, not blocking. Can be rebased and merged opportunistically, or closed if the cascade pause holds.

…sifier — H(ii) CONFIRMED (#1583) PR-3e2 of the M-GPU-MOE-3 cascade. Adds `SaveTensorStage::MoeRouterIndices` to definitively confirm or falsify H(ii) expert-set divergence at L47. L47 sorted top-8: cpu = [ 2, 20, 36, 57, 60, 73, 111, 120 ] gpu = [ 2, 12, 36, 57, 60, 103, 111, 120 ] ^^^ ^^^ cpu-only={20, 73}; gpu-only={12, 103} CPU and GPU agree on 6 of 8 experts at L47 but disagree on 2 (mild H(ii) confirmation). All other 47 layers produce IDENTICAL expert SETS between CPU and GPU. Root cause: by L47 the accumulated post-routing drift from per-expert q6k_gemv fp64 accumulation through 47 layers of MoeFfnOut has perturbed the gate input enough that two boundary expert scores swap. The resulting FFN output diverges by O(1) because the disjoint experts produce unrelated outputs. - **Deterministic tie-breaking**: sort top-k by (-prob, +index) - **fp64 gate softmax**: W_gate @ x → softmax → renormalize at fp64 - **Reorder-stable top-k**: stable partial sort + ε-tolerance on the (k+1)-th vs k-th score boundary inference_trace/save_tensor_stage.rs: + `MoeRouterIndices` enum variant + "moe_router_indices" name + `is_index_payload(&self)` helper + `ALL` array 22 → 23; per_layer count 20 → 21; tests renamed gguf/qwen3_moe_load.rs + gguf/cuda/moe_ffn_forward_layer_cuda.rs: + traced `_with_router` helpers now return `(output, weights, indices)` instead of `(output, weights)` gguf/inference/forward/forward_qwen3_moe_traced.rs (CPU) gguf/cuda/forward_qwen3_moe_cuda_traced.rs (CUDA): + capture `last_router_top_k_indices` from helper + emit `MoeRouterIndices` stage (indices cast to f32, lossless for num_experts ≤ 2^24) tests/qwen3_moe_per_layer_gpu_parity.rs: + helpers `make_router_indices_plan` + `read_indices_stage_file` + new test `falsify_qw3_moe_l47_router_indices` — definitive H(ii) falsifier; captures top-k INDICES at every layer for both CPU and GPU, sorts each, asserts set equality, prints L47-specific verdict - PR-1 #1713 ✅ per-layer cos falsifier - PR-2 #1737 ✅ q6k_gemv fp64 accumulators - PR-3 ✅ hardware verify (47/48 PASS, L47 surfaces) - PR-3b #1739 ✅ contract v1.7.0 → v1.7.1 - PR-3c #1740 ✅ scope-doc + L47 sub-cascade - PR-3d ✅ H(i) qtype-mismatch FALSIFIED - PR-3e #1741 ✅ router-weight probe - PR-3e2 ✅ **THIS PR** — H(ii) CONFIRMED - PR-3f+ pending — apply one of the 3 candidate fixes cargo test --release --features cuda \ -p aprender-serve --test qwen3_moe_per_layer_gpu_parity \ falsify_qw3_moe_l47_router_indices \ -- --ignored --nocapture 29.5s on RTX 4090. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…1583) PR-3e2 added `SaveTensorStage::MoeRouterIndices` (22 → 23 stages) but missed updating the parallel tests in `save_tensor_plan.rs` that asserted on the constant `22`. Workspace-test CI surfaced this: test inference_trace::save_tensor_plan::tests:: all_keyword_expands_to_twenty_two_stages ... FAILED test inference_trace::save_tensor_plan::tests:: all_keyword_case_insensitive ... FAILED Two fixes: 1. Rename `all_keyword_expands_to_twenty_two_stages` → `all_keyword_expands_to_all_stages` and assert against `SaveTensorStage::ALL.len()` (currently 23) instead of the hardcoded `22`. Future stage additions won't require touching this test. 2. Same change in `all_keyword_case_insensitive` — assert against `SaveTensorStage::ALL.len()` instead of `22`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…GN (#1583) (#1745) PR-3g of the M-GPU-MOE-3 cascade. Adds the canonical "is L47 actually user-visible" falsifier, runs 4 canonical prompts through both CPU and GPU full forwards, and asserts argmax agreement. ## Result (lambda-vector RTX 4090, 2026-05-17) PROMPT | CPU argmax (val) | GPU argmax (val) canonical_3tok | 944 ( 13.7270) | 944 ( 14.4133) ✓ single_tok_785 | 220 ( 15.5523) | 25 ( 18.5098) ✗ MISMATCH multi_tok_short | 315 ( 26.2279) | 315 ( 25.5230) ✓ multi_tok_code | 198 ( 17.7453) | 198 ( 17.8433) ✓ **3/4 prompts agree, 1 disagrees.** L47 cliff is NOT benign — the expert-set divergence DOES flip the top-1 predicted token for some prompts (~25% in this small sample). Option E (Accept) is off the table; must pursue Option C (fp64 in per-expert SwiGLU). ## What this PR adds crates/aprender-serve/tests/qwen3_moe_gpu_parity.rs: + new test `falsify_qw3_moe_gpu_argmax_agreement` — multi-prompt probe that builds CPU + GPU models once, runs 4 canonical prompts through both full forwards, and prints argmax agreement table + verdict. PROBE not hard-assert; prints "BENIGN" if all agree or "NOT BENIGN" + disagreeing prompts otherwise. ## Cascade context - PR-1 #1713 ✅ per-layer cos falsifier - PR-2 #1737 ✅ q6k_gemv fp64 accumulators - PR-3 ✅ hardware verify — 47/48 PASS, L47 surfaces - PR-3b #1739 ✅ contract v1.7.0 → v1.7.1 - PR-3c #1740 ✅ scope-doc + L47 sub-cascade - PR-3d ✅ H(i) qtype-mismatch FALSIFIED - PR-3e #1741 ✅ router-weight probe - PR-3e2 #1743 ✅ H(ii) CONFIRMED (2-of-8 expert swap) - PR-3f1 ❌ falsified (fp64 softmax) — dropped - PR-3f2 ❌ falsified (f64 weighted-sum) — dropped - PR-3g ✅ **THIS PR** — L47 NOT BENIGN, must pursue fix - PR-3h pending — Option C fp64 in per-expert SwiGLU intermediates ## Why the cascade kept eliminating candidates The 3-falsifier sequence ruled out the "easy" fix locations: 1. PR-3f1 (gate softmax precision) — drift upstream of softmax 2. PR-3f2 (weighted-sum precision) — drift upstream of weighted-sum 3. **Remaining**: drift inside each per-expert SwiGLU's intermediate chain (silu × up at f32, down-proj at f32 except its q6k_gemv acc which PR-2 already promoted to fp64) PR-3h must promote the silu(gate) × up element-wise multiply and the hidden-dim×4 intermediate state to f64. ~30-50 LOC across both CPU and CUDA expert_swiglu helpers. ## Reproduction cargo test --release --features cuda \ -p aprender-serve --test qwen3_moe_gpu_parity \ falsify_qw3_moe_gpu_argmax_agreement \ -- --ignored --nocapture ~25s on RTX 4090. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 17, 2026 10:37

noahgift mentioned this pull request May 17, 2026

M-GPU-MOE-3 — throughput ≥150 tok/s on RTX 4090 + VRAM ≤95% + fp-accumulator-order alignment #1583

Open

This was referenced May 17, 2026

test(m-gpu-moe-3): PR-3g multi-prompt argmax agreement — L47 NOT BENIGN (#1583) #1745

Merged

docs(contracts): qwen3-moe-forward-gpu-v1 v1.7.1 → v1.7.2 — M-GPU-MOE-3 PR-3 cascade CLOSED, L47 marked KNOWN_DIVERGENCE_NOT_BENIGN #1747

Merged

noahgift force-pushed the feat/m-gpu-moe-3-pr3e2-router-indices branch from 3dfa4cd to 0ce559a Compare May 17, 2026 13:09

noahgift added 13 commits May 17, 2026 16:16

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

72f70c8

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

6d7e2e5

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

e396f09

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

9c07924

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

3f45853

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

53ec02b

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

e994a2e

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

d9eb977

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

e20270d

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

e857882

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

6f45b94

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

c468d5c

Merge branch 'main' into feat/m-gpu-moe-3-pr3e2-router-indices

ec68304

noahgift merged commit 5245ee0 into main May 18, 2026
10 checks passed

noahgift deleted the feat/m-gpu-moe-3-pr3e2-router-indices branch May 18, 2026 13:16

noahgift mentioned this pull request May 18, 2026

test(m-gpu-moe-3): FALSIFY-Q6K-FP-ACC-001 — per-matvec divergence is ulp-scale, NOT the 0.94-cos source (#1583 PR-3f) #1801

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(m-gpu-moe-3): PR-3e2 MoeRouterIndices stage + L47 expert-set falsifier — H(ii) CONFIRMED (#1583)#1743

feat(m-gpu-moe-3): PR-3e2 MoeRouterIndices stage + L47 expert-set falsifier — H(ii) CONFIRMED (#1583)#1743
noahgift merged 15 commits into
mainfrom
feat/m-gpu-moe-3-pr3e2-router-indices

noahgift commented May 17, 2026

Uh oh!

noahgift commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 17, 2026

Summary

Hardware verdict — H(ii) CONFIRMED (lambda-vector RTX 4090, 2026-05-17)

Fix space (PR-3f+)

What's in this PR

Cascade context

Test plan

Reproduction

Uh oh!

noahgift commented May 17, 2026

Needs rebase — conflicts with #1741 (PR-3e router-weight probe)

Rebase recipe

Status if not rebased

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant