test(aprender-serve): qwen3_moe_gpu_parity — M-GPU-MOE-1.2 cosine ≥0.99 falsifier by noahgift · Pull Request #1484 · paiml/aprender

noahgift · 2026-05-04T19:57:41Z

Summary

Authors the FALSIFY-QW3-MOE-GPU-PARITY-001 test scaffold from qwen3-moe-forward-gpu-v1 v1.1.0 implementation_stages M-GPU-MOE-1.2.
New test file crates/aprender-serve/tests/qwen3_moe_gpu_parity.rs. Follows the M32d.2 CPU-vs-HF-FP16 template (qwen3_moe_parity.rs) line-for-line.
#[cfg(feature = \"cuda\")] + #[ignore] on the heavy test (CI default skips; explicit --include-ignored runs it on RTX 4090).
Three helper unit tests (cosine_similarity sanity coverage) DO run by default.

What the test does (when invoked with `--include-ignored`)

Loads the cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf once (mmap).
Builds CPU forward_qwen3_moe reference logits (LAZY-FUSED-MATVEC ground truth).
Builds GPU OwnedQuantizedModelCuda::forward_qwen3_moe_cuda logits.
Computes cosine similarity over the full 151936-dim vocab.
Asserts cos_sim ≥ 0.99 per the contract's formal bound.

Dependency

When the heavy test is run on lambda-vector, the M-GPU-MOE-1.1.2 full forward integration (PR #1477) must be on main first. Currently main has the v1.0-redo stub which returns UnsupportedOperation — running the heavy test against the stub will panic (correct behaviour for a falsifier against an incomplete impl).

The mut gpu_model binding carries a #[allow(unused_mut)] because PR #1477 changes the receiver &self → &mut self.

Test plan

cargo check -p aprender-serve --test qwen3_moe_gpu_parity --features cuda — clean
cargo test -p aprender-serve --test qwen3_moe_gpu_parity --features cuda — 3 helpers pass, heavy test ignored
rustfmt --check — clean
CI ci/gate green
After PR feat(aprender-serve): forward_qwen3_moe_cuda full integration — M-GPU-MOE-1.1.2 #1477 merges: re-run with --include-ignored on lambda-vector

🤖 Generated with Claude Code

…99 falsifier Authors the FALSIFY-QW3-MOE-GPU-PARITY-001 test scaffold from contract qwen3-moe-forward-gpu-v1 v1.1.0 implementation_stages M-GPU-MOE-1.2. WHAT THE TEST DOES (when run with `--include-ignored` against the cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf on RTX 4090): 1. Loads the GGUF once (single mmap). 2. Builds moe_layers: Vec<Qwen3MoeQuantizedLayer> once. 3. Builds CPU OwnedQuantizedModel #1 → runs forward_qwen3_moe on a fixed prompt → cpu_logits (the LAZY-FUSED-MATVEC ground truth). 4. Builds CPU OwnedQuantizedModel #2 → wraps into OwnedQuantizedModelCuda → runs forward_qwen3_moe_cuda on the same prompt → gpu_logits. 5. Computes cosine_similarity(cpu_logits, gpu_logits) over the full 151936-dim vocab. 6. Asserts cos_sim ≥ 0.99 per the contract's formal bound. The test follows the qwen3_moe_parity.rs (M32d.2 CPU-vs-HF-FP16) template line-for-line — same canonical GGUF paths array, same fixture-skip pattern, same cosine_similarity helper. The only difference is the second forward pass dispatches to forward_qwen3_moe_cuda instead of treating an FP32 fixture as truth. CI WIRING: - #[cfg(feature = "cuda")] gates the entire file (no GPU host = no compile) - #[ignore] on the heavy test (CI default skips; explicit `--include-ignored` runs it) - 3 helper unit tests (cosine_similarity_unit_vectors / handles_zero / within_threshold) DO run by default — they cover the cosine helper itself WHEN THE TEST PASSES: - The aprender PR #1477 (M-GPU-MOE-1.1.2 full forward integration) must be on main first. Currently main has the v1.0-redo stub; running this test against the stub returns UnsupportedOperation error and the test panics (correct behaviour for a falsifier against an incomplete impl). - Once #1477 lands, run the test on lambda-vector with: cargo test -p aprender-serve --test qwen3_moe_gpu_parity \ --features cuda -- --include-ignored - On PASS, the contract's M-GPU-MOE-1.2 stage flips PENDING → SHIPPED and (with PARITY-002 from the v1 sibling) the gate discharges qwen3-moe-forward-gpu-v1 v1.1.0 DRAFT → ACTIVE_ALGORITHM_LEVEL. PR #1477 changes forward_qwen3_moe_cuda's receiver from `&self` to `&mut self` (kernel cache mutation). The `mut gpu_model` binding here carries a forward-looking #[allow(unused_mut)] note for that reason. Refs: qwen3-moe-forward-gpu-v1 v1.1.0 :: M-GPU-MOE-1.2 + FALSIFY-QW3-MOE-GPU-PARITY-001 + companion-spec M51 + R10. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…QuantizedModelWgpu) Pre-implementation architecture amendment for M-GPU-MOE-2 (wgpu fallback). Mirrors the v1.1.0 option D amendment that pinned the CUDA substrate before M-GPU-MOE-1.0 implementation; this one pins the wgpu substrate before any wgpu code lands. Why now: M-GPU-MOE-1 is in flight (1.0-redo SHIPPED, 1.1.1 SHIPPED, 1.1.2 OPEN as PR #1477, 1.2 test scaffold OPEN as PR #1484). Choosing the wgpu seam early prevents the wrong-type-stub waste that bit M-GPU-MOE-1.0 (PR #1460 placed forward_qwen3_moe_gpu on OwnedQuantizedModel; one cycle later #1464 redo'd it on OwnedQuantizedModelCuda — option D). FOUR options considered: (I) OwnedQuantizedModelWgpu wrapper type (analog of v1.1.0 option D) — CHOSEN (II) GpuExecutor trait abstracting CUDA + wgpu — REJECTED (over-engineered) (III) Backend enum inside renamed OwnedQuantizedModelGpu — REJECTED (invasive) (IV) Defer wgpu indefinitely — REJECTED (violates CLAUDE.md backend-agnostic mandate) Option I picks wgpu by code-path symmetry, not by trait abstraction: new file tree at `crates/aprender-serve/src/gguf/wgpu/` mirrors `crates/aprender-serve/src/gguf/cuda/` line-for-line. Maintenance-mode reviewer can verify a parity bug by diff, not by elaborate test infrastructure. M-GPU-MOE-2 decomposed into four substages mirroring M-GPU-MOE-1.x: M-GPU-MOE-2.0 stub on OwnedQuantizedModelWgpu M-GPU-MOE-2.1 per-expert wgpu dispatch helpers (expert_swiglu_wgpu, moe_ffn_forward_layer_wgpu) M-GPU-MOE-2.2 full forward integration (replaces 2.0 stub body) M-GPU-MOE-2.3 cosine-vs-CPU parity test on hardware with wgpu Two new blockers documented: - wgpu adapter selection probe for non-NVIDIA hardware - trueno-gpu Q6_K QuantizeKernel coverage check before 2.1 Companion-spec records this as M52 (no companion contract bump). Validation: pv validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 error(s), 0 warning(s). Contract is valid. Refs: M52, R10, qwen3-moe-forward-gpu-v1 v1.2.0 option I. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

….99 falsifier (wgpu) (#1488) wgpu sibling of `qwen3_moe_gpu_parity.rs` (M-GPU-MOE-1.2, PR #1484). Asserts cosine ≥ 0.99 between APR's CPU `forward_qwen3_moe` reference and the wgpu `OwnedQuantizedModelWgpu::forward_qwen3_moe_wgpu` integration on the same prompt. Same falsifier ID as the cuda sibling (FALSIFY-QW3-MOE-GPU-PARITY-001) — wgpu is a SECOND backend implementing the same contract gate, not a different gate. Same threshold (≥ 0.99), same canonical 17.3 GB Qwen3-Coder GGUF, same 3-token canonical prompt as the cuda test. CI WIRING: - #[cfg(feature = "gpu")] gates the file (matches the gate on OwnedQuantizedModelWgpu in gguf/mod.rs) - #[ignore] on the heavy test (CI default skips; explicit `--include-ignored` runs it on a wgpu-capable adapter — Apple Silicon Metal, AMD Vulkan, Intel ARC Vulkan) - 2 helper unit tests (cosine_similarity sanity coverage) DO run by default WHEN THE TEST PASSES: - M-GPU-MOE-2.0 stub returns UnsupportedOperation, so this test currently panics at the wgpu forward call (correct behaviour for a falsifier against an incomplete impl). - M-GPU-MOE-2.1 (per-expert wgpu helpers via trueno-gpu QuantizeKernel + GemmKernel compute pipelines) + M-GPU-MOE-2.2 (full forward integration analog of forward_qwen3_moe_cuda) must both land before this test passes on hardware. - On hardware with wgpu support, run with --include-ignored to exercise. PASS discharges FALSIFY-QW3-MOE-GPU-PARITY-001 for the wgpu backend (cuda backend discharged by sibling test). DEPENDS ON: PR #1485 (v1.2.0 amendment + M-GPU-MOE-2.0 stub). Branch is stacked on the v1.2.0 contract branch; once #1485 lands on main, this PR's base flips to main automatically. Refs: M52, M53, R10, qwen3-moe-forward-gpu-v1 v1.2.0 :: M-GPU-MOE-2.3 + FALSIFY-QW3-MOE-GPU-PARITY-001 (wgpu). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…QuantizedModelWgpu) Pre-implementation architecture amendment for M-GPU-MOE-2 (wgpu fallback). Mirrors the v1.1.0 option D amendment that pinned the CUDA substrate before M-GPU-MOE-1.0 implementation; this one pins the wgpu substrate before any wgpu code lands. Why now: M-GPU-MOE-1 is in flight (1.0-redo SHIPPED, 1.1.1 SHIPPED, 1.1.2 OPEN as PR #1477, 1.2 test scaffold OPEN as PR #1484). Choosing the wgpu seam early prevents the wrong-type-stub waste that bit M-GPU-MOE-1.0 (PR #1460 placed forward_qwen3_moe_gpu on OwnedQuantizedModel; one cycle later #1464 redo'd it on OwnedQuantizedModelCuda — option D). FOUR options considered: (I) OwnedQuantizedModelWgpu wrapper type (analog of v1.1.0 option D) — CHOSEN (II) GpuExecutor trait abstracting CUDA + wgpu — REJECTED (over-engineered) (III) Backend enum inside renamed OwnedQuantizedModelGpu — REJECTED (invasive) (IV) Defer wgpu indefinitely — REJECTED (violates CLAUDE.md backend-agnostic mandate) Option I picks wgpu by code-path symmetry, not by trait abstraction: new file tree at `crates/aprender-serve/src/gguf/wgpu/` mirrors `crates/aprender-serve/src/gguf/cuda/` line-for-line. Maintenance-mode reviewer can verify a parity bug by diff, not by elaborate test infrastructure. M-GPU-MOE-2 decomposed into four substages mirroring M-GPU-MOE-1.x: M-GPU-MOE-2.0 stub on OwnedQuantizedModelWgpu M-GPU-MOE-2.1 per-expert wgpu dispatch helpers (expert_swiglu_wgpu, moe_ffn_forward_layer_wgpu) M-GPU-MOE-2.2 full forward integration (replaces 2.0 stub body) M-GPU-MOE-2.3 cosine-vs-CPU parity test on hardware with wgpu Two new blockers documented: - wgpu adapter selection probe for non-NVIDIA hardware - trueno-gpu Q6_K QuantizeKernel coverage check before 2.1 Companion-spec records this as M52 (no companion contract bump). Validation: pv validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 error(s), 0 warning(s). Contract is valid. Refs: M52, R10, qwen3-moe-forward-gpu-v1 v1.2.0 option I. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

….99 falsifier (wgpu) (#1488) wgpu sibling of `qwen3_moe_gpu_parity.rs` (M-GPU-MOE-1.2, PR #1484). Asserts cosine ≥ 0.99 between APR's CPU `forward_qwen3_moe` reference and the wgpu `OwnedQuantizedModelWgpu::forward_qwen3_moe_wgpu` integration on the same prompt. Same falsifier ID as the cuda sibling (FALSIFY-QW3-MOE-GPU-PARITY-001) — wgpu is a SECOND backend implementing the same contract gate, not a different gate. Same threshold (≥ 0.99), same canonical 17.3 GB Qwen3-Coder GGUF, same 3-token canonical prompt as the cuda test. CI WIRING: - #[cfg(feature = "gpu")] gates the file (matches the gate on OwnedQuantizedModelWgpu in gguf/mod.rs) - #[ignore] on the heavy test (CI default skips; explicit `--include-ignored` runs it on a wgpu-capable adapter — Apple Silicon Metal, AMD Vulkan, Intel ARC Vulkan) - 2 helper unit tests (cosine_similarity sanity coverage) DO run by default WHEN THE TEST PASSES: - M-GPU-MOE-2.0 stub returns UnsupportedOperation, so this test currently panics at the wgpu forward call (correct behaviour for a falsifier against an incomplete impl). - M-GPU-MOE-2.1 (per-expert wgpu helpers via trueno-gpu QuantizeKernel + GemmKernel compute pipelines) + M-GPU-MOE-2.2 (full forward integration analog of forward_qwen3_moe_cuda) must both land before this test passes on hardware. - On hardware with wgpu support, run with --include-ignored to exercise. PASS discharges FALSIFY-QW3-MOE-GPU-PARITY-001 for the wgpu backend (cuda backend discharged by sibling test). DEPENDS ON: PR #1485 (v1.2.0 amendment + M-GPU-MOE-2.0 stub). Branch is stacked on the v1.2.0 contract branch; once #1485 lands on main, this PR's base flips to main automatically. Refs: M52, M53, R10, qwen3-moe-forward-gpu-v1 v1.2.0 :: M-GPU-MOE-2.3 + FALSIFY-QW3-MOE-GPU-PARITY-001 (wgpu). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…arity test (#1485) * contract(qwen3-moe-forward-gpu-v1): v1.1.0 → v1.2.0 — option I (OwnedQuantizedModelWgpu) Pre-implementation architecture amendment for M-GPU-MOE-2 (wgpu fallback). Mirrors the v1.1.0 option D amendment that pinned the CUDA substrate before M-GPU-MOE-1.0 implementation; this one pins the wgpu substrate before any wgpu code lands. Why now: M-GPU-MOE-1 is in flight (1.0-redo SHIPPED, 1.1.1 SHIPPED, 1.1.2 OPEN as PR #1477, 1.2 test scaffold OPEN as PR #1484). Choosing the wgpu seam early prevents the wrong-type-stub waste that bit M-GPU-MOE-1.0 (PR #1460 placed forward_qwen3_moe_gpu on OwnedQuantizedModel; one cycle later #1464 redo'd it on OwnedQuantizedModelCuda — option D). FOUR options considered: (I) OwnedQuantizedModelWgpu wrapper type (analog of v1.1.0 option D) — CHOSEN (II) GpuExecutor trait abstracting CUDA + wgpu — REJECTED (over-engineered) (III) Backend enum inside renamed OwnedQuantizedModelGpu — REJECTED (invasive) (IV) Defer wgpu indefinitely — REJECTED (violates CLAUDE.md backend-agnostic mandate) Option I picks wgpu by code-path symmetry, not by trait abstraction: new file tree at `crates/aprender-serve/src/gguf/wgpu/` mirrors `crates/aprender-serve/src/gguf/cuda/` line-for-line. Maintenance-mode reviewer can verify a parity bug by diff, not by elaborate test infrastructure. M-GPU-MOE-2 decomposed into four substages mirroring M-GPU-MOE-1.x: M-GPU-MOE-2.0 stub on OwnedQuantizedModelWgpu M-GPU-MOE-2.1 per-expert wgpu dispatch helpers (expert_swiglu_wgpu, moe_ffn_forward_layer_wgpu) M-GPU-MOE-2.2 full forward integration (replaces 2.0 stub body) M-GPU-MOE-2.3 cosine-vs-CPU parity test on hardware with wgpu Two new blockers documented: - wgpu adapter selection probe for non-NVIDIA hardware - trueno-gpu Q6_K QuantizeKernel coverage check before 2.1 Companion-spec records this as M52 (no companion contract bump). Validation: pv validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 error(s), 0 warning(s). Contract is valid. Refs: M52, R10, qwen3-moe-forward-gpu-v1 v1.2.0 option I. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(aprender-serve): OwnedQuantizedModelWgpu stub — M-GPU-MOE-2.0 (#1487) Implements M-GPU-MOE-2.0 per qwen3-moe-forward-gpu-v1 v1.2.0 option I (see PR #1485 amendment). Analog of M-GPU-MOE-1.0-redo (PR #1464) for the wgpu backend. WHAT THIS PR ADDS: * crates/aprender-serve/src/gguf/wgpu_backend/mod.rs — new module with OwnedQuantizedModelWgpu struct + new() + stub method forward_qwen3_moe_wgpu(). Mirrors cuda/mod.rs structure. * crates/aprender-serve/src/gguf/wgpu_model.rs — re-export shim `pub use super::wgpu_backend::OwnedQuantizedModelWgpu`. Mirrors cuda_model.rs. * crates/aprender-serve/src/gguf/mod.rs — adds the two new modules behind `#[cfg(feature = \"gpu\")]` (the existing wgpu feature flag — `gpu = [\"trueno/gpu\"]` per Cargo.toml line 208). WHY MODULE NAMED `wgpu_backend`: The Rust ecosystem already has a `wgpu` crate. A module named `wgpu` inside the same crate would shadow it inside the file's body. The public re-export still presents `OwnedQuantizedModelWgpu` (no ugly suffix) thanks to wgpu_model.rs. WHY THIS IS A STUB: Same staging discipline as M-GPU-MOE-1.0-redo — contract first, scaffold second, implementation third. The body of forward_qwen3_moe_wgpu validates preconditions (mirroring the cuda sibling's boundary) then returns RealizarError::UnsupportedOperation whose reason points at the v1.2.0 amendment block for the M-GPU-MOE-2 staging plan. Until M-GPU-MOE-2.2 lands, callers on non-CUDA hardware fall back to OwnedQuantizedModel::forward_qwen3_moe (CPU LAZY-FUSED-MATVEC, ~30 tok/s). VERIFICATION: cargo check -p aprender-serve → 0 errors (default) cargo check -p aprender-serve --features cuda → 0 errors (cuda) cargo check -p aprender-serve --features gpu → 0 errors (wgpu) cargo test -p aprender-serve --lib --features gpu \ owned_quantized_model_wgpu_tests → 1 passed Lib unit test asserts the function signature exists and matches the cuda sibling step-for-step (compile-time checks via fn pointer coercion — no runtime model construction needed at the stub stage). DEPENDS ON: PR #1485 (qwen3-moe-forward-gpu-v1 v1.2.0 option I amendment). Branch is stacked on the v1.2.0 contract branch; once #1485 lands on main, this PR rebases onto main directly. NEXT STAGES per v1.2.0: M-GPU-MOE-2.1 per-expert wgpu dispatch helpers (expert_swiglu_wgpu, moe_ffn_forward_layer_wgpu) M-GPU-MOE-2.2 full forward integration mirror of cuda sibling M-GPU-MOE-2.3 cosine-vs-CPU parity test on wgpu hardware Refs: M52, R10, qwen3-moe-forward-gpu-v1 v1.2.0 :: M-GPU-MOE-2.0. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> * test(aprender-serve): qwen3_moe_wgpu_parity — M-GPU-MOE-2.3 cosine ≥0.99 falsifier (wgpu) (#1488) wgpu sibling of `qwen3_moe_gpu_parity.rs` (M-GPU-MOE-1.2, PR #1484). Asserts cosine ≥ 0.99 between APR's CPU `forward_qwen3_moe` reference and the wgpu `OwnedQuantizedModelWgpu::forward_qwen3_moe_wgpu` integration on the same prompt. Same falsifier ID as the cuda sibling (FALSIFY-QW3-MOE-GPU-PARITY-001) — wgpu is a SECOND backend implementing the same contract gate, not a different gate. Same threshold (≥ 0.99), same canonical 17.3 GB Qwen3-Coder GGUF, same 3-token canonical prompt as the cuda test. CI WIRING: - #[cfg(feature = "gpu")] gates the file (matches the gate on OwnedQuantizedModelWgpu in gguf/mod.rs) - #[ignore] on the heavy test (CI default skips; explicit `--include-ignored` runs it on a wgpu-capable adapter — Apple Silicon Metal, AMD Vulkan, Intel ARC Vulkan) - 2 helper unit tests (cosine_similarity sanity coverage) DO run by default WHEN THE TEST PASSES: - M-GPU-MOE-2.0 stub returns UnsupportedOperation, so this test currently panics at the wgpu forward call (correct behaviour for a falsifier against an incomplete impl). - M-GPU-MOE-2.1 (per-expert wgpu helpers via trueno-gpu QuantizeKernel + GemmKernel compute pipelines) + M-GPU-MOE-2.2 (full forward integration analog of forward_qwen3_moe_cuda) must both land before this test passes on hardware. - On hardware with wgpu support, run with --include-ignored to exercise. PASS discharges FALSIFY-QW3-MOE-GPU-PARITY-001 for the wgpu backend (cuda backend discharged by sibling test). DEPENDS ON: PR #1485 (v1.2.0 amendment + M-GPU-MOE-2.0 stub). Branch is stacked on the v1.2.0 contract branch; once #1485 lands on main, this PR's base flips to main automatically. Refs: M52, M53, R10, qwen3-moe-forward-gpu-v1 v1.2.0 :: M-GPU-MOE-2.3 + FALSIFY-QW3-MOE-GPU-PARITY-001 (wgpu). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…d-bug fix plan Live-dogfood finding 2026-05-04 on lambda-vector RTX 4090: the M-GPU-MOE-1.2 heavy `qwen3_moe_gpu_parity` test (FALSIFY-QW3-MOE- GPU-PARITY-001) cannot run on the cached 17.3 GB Qwen3-Coder GGUF because `OwnedQuantizedModelCuda::new` itself fails: UnsupportedOperation { operation: "preload_weights_gpu", reason: "PAR-043: Failed to build indexed weights: Invalid launch config: Quantized weight 'blk.0.ffn_gate.weight' not cached" } ROOT CAUSE (5-whys in evidence file): `executor.build_indexed_weights` at `crates/aprender-serve/src/cuda/executor/weights.rs:325-373` unconditionally requires `blk.{i}.ffn_gate.weight`, `.ffn_up.weight`, `.ffn_down.weight` to be cached for every layer. For MoE these names DO NOT EXIST — MoE has 128 expert gates per layer (`blk.{i}.ffn_gate_exps.weight`) loaded into the `moe_layers` parameter at forward-time. M-GPU-MOE-1.1.2 (PR #1477)'s forward body sidesteps the indexed weights for FFN, but the wrapper construction goes through `preload_weights_gpu` BEFORE forward is ever called. Wrapper construction fails first. WHY DEFAULT CI DIDN'T CATCH IT: Lib-only stub test (PR #1464) only checks signature at compile time. Heavy `qwen3_moe_gpu_parity.rs` (PR #1484) is `#[ignore]`d + needs RTX 4090 + 17.3 GB GGUF. First `--include-ignored` dogfood on lambda-vector found this 2026-05-04. THIS PR ADDS: (1) Evidence file `evidence/m-gpu-moe-1-2-blocked-by-preload-bug-2026-05-04/findings.md` documenting the live failure + 5-whys + fix architecture. (2) Contract `qwen3-moe-forward-gpu-v1` v1.2.0 → v1.3.0: * New v1.3.0 amendment_history block (~110 lines) describing the bug, root cause, and three-step fix architecture * New implementation_stage `M-GPU-MOE-1.3` between 1.2 and 2 with status PENDING * New falsification_test FALSIFY-QW3-MOE-GPU-PRELOAD-001 (hardware test + lib-only sibling) * Top-level version "1.2.0" → "1.3.0" * Status comment expanded to mention M-GPU-MOE-1.3 as a precondition for ACTIVE_ALGORITHM_LEVEL flip VALIDATION: pv validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 errors, 0 warnings. Contract is valid. WHAT THIS PR DOES NOT DO: Does NOT implement the fix. Per CLAUDE.md "NEVER write code before writing a provable contract", this PR pins the contract first. The fix lands in a separate PR (M-GPU-MOE-1.3 stage): ~30 LOC in weights.rs + 1-2 callers + ArchConstraints field + drift-prevention test. Does NOT block PR #1485's already-shipped 3-commit cascade (M52/M54). The cascade is correct; M-GPU-MOE-1.3 is a sibling bug-fix. Refs: M52, M53, M54, R10, qwen3-moe-forward-gpu-v1 v1.3.0, FALSIFY-QW3-MOE-GPU-PRELOAD-001 (new). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…d-bug fix plan (#1490) Live-dogfood finding 2026-05-04 on lambda-vector RTX 4090: the M-GPU-MOE-1.2 heavy `qwen3_moe_gpu_parity` test (FALSIFY-QW3-MOE- GPU-PARITY-001) cannot run on the cached 17.3 GB Qwen3-Coder GGUF because `OwnedQuantizedModelCuda::new` itself fails: UnsupportedOperation { operation: "preload_weights_gpu", reason: "PAR-043: Failed to build indexed weights: Invalid launch config: Quantized weight 'blk.0.ffn_gate.weight' not cached" } ROOT CAUSE (5-whys in evidence file): `executor.build_indexed_weights` at `crates/aprender-serve/src/cuda/executor/weights.rs:325-373` unconditionally requires `blk.{i}.ffn_gate.weight`, `.ffn_up.weight`, `.ffn_down.weight` to be cached for every layer. For MoE these names DO NOT EXIST — MoE has 128 expert gates per layer (`blk.{i}.ffn_gate_exps.weight`) loaded into the `moe_layers` parameter at forward-time. M-GPU-MOE-1.1.2 (PR #1477)'s forward body sidesteps the indexed weights for FFN, but the wrapper construction goes through `preload_weights_gpu` BEFORE forward is ever called. Wrapper construction fails first. WHY DEFAULT CI DIDN'T CATCH IT: Lib-only stub test (PR #1464) only checks signature at compile time. Heavy `qwen3_moe_gpu_parity.rs` (PR #1484) is `#[ignore]`d + needs RTX 4090 + 17.3 GB GGUF. First `--include-ignored` dogfood on lambda-vector found this 2026-05-04. THIS PR ADDS: (1) Evidence file `evidence/m-gpu-moe-1-2-blocked-by-preload-bug-2026-05-04/findings.md` documenting the live failure + 5-whys + fix architecture. (2) Contract `qwen3-moe-forward-gpu-v1` v1.2.0 → v1.3.0: * New v1.3.0 amendment_history block (~110 lines) describing the bug, root cause, and three-step fix architecture * New implementation_stage `M-GPU-MOE-1.3` between 1.2 and 2 with status PENDING * New falsification_test FALSIFY-QW3-MOE-GPU-PRELOAD-001 (hardware test + lib-only sibling) * Top-level version "1.2.0" → "1.3.0" * Status comment expanded to mention M-GPU-MOE-1.3 as a precondition for ACTIVE_ALGORITHM_LEVEL flip VALIDATION: pv validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 errors, 0 warnings. Contract is valid. WHAT THIS PR DOES NOT DO: Does NOT implement the fix. Per CLAUDE.md "NEVER write code before writing a provable contract", this PR pins the contract first. The fix lands in a separate PR (M-GPU-MOE-1.3 stage): ~30 LOC in weights.rs + 1-2 callers + ArchConstraints field + drift-prevention test. Does NOT block PR #1485's already-shipped 3-commit cascade (M52/M54). The cascade is correct; M-GPU-MOE-1.3 is a sibling bug-fix. Refs: M52, M53, M54, R10, qwen3-moe-forward-gpu-v1 v1.3.0, FALSIFY-QW3-MOE-GPU-PRELOAD-001 (new). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift force-pushed the feat/qwen3-moe-gpu-parity-test-m-1-2 branch from 5b4ddfe to 6302645 Compare May 4, 2026 20:05

noahgift mentioned this pull request May 4, 2026

contract+feat+test: v1.2.0 wgpu cascade — option I + 2.0 stub + 2.3 parity test #1485

Merged

6 tasks

noahgift force-pushed the feat/qwen3-moe-gpu-parity-test-m-1-2 branch from 6302645 to 0a0d7b3 Compare May 4, 2026 20:37

noahgift enabled auto-merge (squash) May 4, 2026 20:37

noahgift mentioned this pull request May 4, 2026

test(aprender-serve): qwen3_moe_wgpu_parity — M-GPU-MOE-2.3 cosine ≥0.99 falsifier (wgpu) #1488

Merged

4 tasks

noahgift merged commit 8cbb7b5 into main May 4, 2026
10 checks passed

noahgift deleted the feat/qwen3-moe-gpu-parity-test-m-1-2 branch May 4, 2026 21:02

noahgift mentioned this pull request May 6, 2026

contract(qwen3-moe-forward-gpu-v1): v1.6.0 → v1.7.0 — DRAFT → ACTIVE_ALGORITHM_LEVEL #1530

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(aprender-serve): qwen3_moe_gpu_parity — M-GPU-MOE-1.2 cosine ≥0.99 falsifier#1484

test(aprender-serve): qwen3_moe_gpu_parity — M-GPU-MOE-1.2 cosine ≥0.99 falsifier#1484
noahgift merged 1 commit into
mainfrom
feat/qwen3-moe-gpu-parity-test-m-1-2

noahgift commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 4, 2026

Summary

What the test does (when invoked with --include-ignored)

Dependency

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

What the test does (when invoked with `--include-ignored`)