test(aprender-serve): qwen3_moe_wgpu_parity — M-GPU-MOE-2.3 cosine ≥0.99 falsifier (wgpu) by noahgift · Pull Request #1488 · paiml/aprender

noahgift · 2026-05-04T20:57:00Z

Summary

wgpu sibling of qwen3_moe_gpu_parity.rs (M-GPU-MOE-1.2, PR test(aprender-serve): qwen3_moe_gpu_parity — M-GPU-MOE-1.2 cosine ≥0.99 falsifier #1484)
Same falsifier ID FALSIFY-QW3-MOE-GPU-PARITY-001, same threshold ≥0.99
1 heavy test marked #[cfg(feature = \"gpu\")] #[ignore], 2 helper unit tests run by default
Stacked on PR contract+feat+test: v1.2.0 wgpu cascade — option I + 2.0 stub + 2.3 parity test #1485 (v1.2.0 amendment + M-GPU-MOE-2.0 stub)

When the test runs

Default CI: heavy test ignored, helpers pass (3/3 PASS)
--include-ignored on wgpu-capable hardware (Apple Silicon Metal, AMD Vulkan, Intel ARC Vulkan): exercises the full CPU-vs-wgpu cosine gate
Currently the wgpu forward returns UnsupportedOperation (M-GPU-MOE-2.0 stub) — heavy test will panic until M-GPU-MOE-2.1 + 2.2 land

Verification

cargo check -p aprender-serve --test qwen3_moe_wgpu_parity --features gpu
   → 0 errors
cargo test -p aprender-serve --test qwen3_moe_wgpu_parity --features gpu
   → 2 passed; 1 ignored
rustfmt --check  → exit 0

Test plan

Compiles clean
Helpers pass
Heavy test correctly #[ignore]d
CI ci/gate green

🤖 Generated with Claude Code

….99 falsifier (wgpu) wgpu sibling of `qwen3_moe_gpu_parity.rs` (M-GPU-MOE-1.2, PR #1484). Asserts cosine ≥ 0.99 between APR's CPU `forward_qwen3_moe` reference and the wgpu `OwnedQuantizedModelWgpu::forward_qwen3_moe_wgpu` integration on the same prompt. Same falsifier ID as the cuda sibling (FALSIFY-QW3-MOE-GPU-PARITY-001) — wgpu is a SECOND backend implementing the same contract gate, not a different gate. Same threshold (≥ 0.99), same canonical 17.3 GB Qwen3-Coder GGUF, same 3-token canonical prompt as the cuda test. CI WIRING: - #[cfg(feature = "gpu")] gates the file (matches the gate on OwnedQuantizedModelWgpu in gguf/mod.rs) - #[ignore] on the heavy test (CI default skips; explicit `--include-ignored` runs it on a wgpu-capable adapter — Apple Silicon Metal, AMD Vulkan, Intel ARC Vulkan) - 2 helper unit tests (cosine_similarity sanity coverage) DO run by default WHEN THE TEST PASSES: - M-GPU-MOE-2.0 stub returns UnsupportedOperation, so this test currently panics at the wgpu forward call (correct behaviour for a falsifier against an incomplete impl). - M-GPU-MOE-2.1 (per-expert wgpu helpers via trueno-gpu QuantizeKernel + GemmKernel compute pipelines) + M-GPU-MOE-2.2 (full forward integration analog of forward_qwen3_moe_cuda) must both land before this test passes on hardware. - On hardware with wgpu support, run with --include-ignored to exercise. PASS discharges FALSIFY-QW3-MOE-GPU-PARITY-001 for the wgpu backend (cuda backend discharged by sibling test). DEPENDS ON: PR #1485 (v1.2.0 amendment + M-GPU-MOE-2.0 stub). Branch is stacked on the v1.2.0 contract branch; once #1485 lands on main, this PR's base flips to main automatically. Refs: M52, M53, R10, qwen3-moe-forward-gpu-v1 v1.2.0 :: M-GPU-MOE-2.3 + FALSIFY-QW3-MOE-GPU-PARITY-001 (wgpu). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

….99 falsifier (wgpu) (#1488) wgpu sibling of `qwen3_moe_gpu_parity.rs` (M-GPU-MOE-1.2, PR #1484). Asserts cosine ≥ 0.99 between APR's CPU `forward_qwen3_moe` reference and the wgpu `OwnedQuantizedModelWgpu::forward_qwen3_moe_wgpu` integration on the same prompt. Same falsifier ID as the cuda sibling (FALSIFY-QW3-MOE-GPU-PARITY-001) — wgpu is a SECOND backend implementing the same contract gate, not a different gate. Same threshold (≥ 0.99), same canonical 17.3 GB Qwen3-Coder GGUF, same 3-token canonical prompt as the cuda test. CI WIRING: - #[cfg(feature = "gpu")] gates the file (matches the gate on OwnedQuantizedModelWgpu in gguf/mod.rs) - #[ignore] on the heavy test (CI default skips; explicit `--include-ignored` runs it on a wgpu-capable adapter — Apple Silicon Metal, AMD Vulkan, Intel ARC Vulkan) - 2 helper unit tests (cosine_similarity sanity coverage) DO run by default WHEN THE TEST PASSES: - M-GPU-MOE-2.0 stub returns UnsupportedOperation, so this test currently panics at the wgpu forward call (correct behaviour for a falsifier against an incomplete impl). - M-GPU-MOE-2.1 (per-expert wgpu helpers via trueno-gpu QuantizeKernel + GemmKernel compute pipelines) + M-GPU-MOE-2.2 (full forward integration analog of forward_qwen3_moe_cuda) must both land before this test passes on hardware. - On hardware with wgpu support, run with --include-ignored to exercise. PASS discharges FALSIFY-QW3-MOE-GPU-PARITY-001 for the wgpu backend (cuda backend discharged by sibling test). DEPENDS ON: PR #1485 (v1.2.0 amendment + M-GPU-MOE-2.0 stub). Branch is stacked on the v1.2.0 contract branch; once #1485 lands on main, this PR's base flips to main automatically. Refs: M52, M53, R10, qwen3-moe-forward-gpu-v1 v1.2.0 :: M-GPU-MOE-2.3 + FALSIFY-QW3-MOE-GPU-PARITY-001 (wgpu). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…arity test (#1485) * contract(qwen3-moe-forward-gpu-v1): v1.1.0 → v1.2.0 — option I (OwnedQuantizedModelWgpu) Pre-implementation architecture amendment for M-GPU-MOE-2 (wgpu fallback). Mirrors the v1.1.0 option D amendment that pinned the CUDA substrate before M-GPU-MOE-1.0 implementation; this one pins the wgpu substrate before any wgpu code lands. Why now: M-GPU-MOE-1 is in flight (1.0-redo SHIPPED, 1.1.1 SHIPPED, 1.1.2 OPEN as PR #1477, 1.2 test scaffold OPEN as PR #1484). Choosing the wgpu seam early prevents the wrong-type-stub waste that bit M-GPU-MOE-1.0 (PR #1460 placed forward_qwen3_moe_gpu on OwnedQuantizedModel; one cycle later #1464 redo'd it on OwnedQuantizedModelCuda — option D). FOUR options considered: (I) OwnedQuantizedModelWgpu wrapper type (analog of v1.1.0 option D) — CHOSEN (II) GpuExecutor trait abstracting CUDA + wgpu — REJECTED (over-engineered) (III) Backend enum inside renamed OwnedQuantizedModelGpu — REJECTED (invasive) (IV) Defer wgpu indefinitely — REJECTED (violates CLAUDE.md backend-agnostic mandate) Option I picks wgpu by code-path symmetry, not by trait abstraction: new file tree at `crates/aprender-serve/src/gguf/wgpu/` mirrors `crates/aprender-serve/src/gguf/cuda/` line-for-line. Maintenance-mode reviewer can verify a parity bug by diff, not by elaborate test infrastructure. M-GPU-MOE-2 decomposed into four substages mirroring M-GPU-MOE-1.x: M-GPU-MOE-2.0 stub on OwnedQuantizedModelWgpu M-GPU-MOE-2.1 per-expert wgpu dispatch helpers (expert_swiglu_wgpu, moe_ffn_forward_layer_wgpu) M-GPU-MOE-2.2 full forward integration (replaces 2.0 stub body) M-GPU-MOE-2.3 cosine-vs-CPU parity test on hardware with wgpu Two new blockers documented: - wgpu adapter selection probe for non-NVIDIA hardware - trueno-gpu Q6_K QuantizeKernel coverage check before 2.1 Companion-spec records this as M52 (no companion contract bump). Validation: pv validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 error(s), 0 warning(s). Contract is valid. Refs: M52, R10, qwen3-moe-forward-gpu-v1 v1.2.0 option I. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(aprender-serve): OwnedQuantizedModelWgpu stub — M-GPU-MOE-2.0 (#1487) Implements M-GPU-MOE-2.0 per qwen3-moe-forward-gpu-v1 v1.2.0 option I (see PR #1485 amendment). Analog of M-GPU-MOE-1.0-redo (PR #1464) for the wgpu backend. WHAT THIS PR ADDS: * crates/aprender-serve/src/gguf/wgpu_backend/mod.rs — new module with OwnedQuantizedModelWgpu struct + new() + stub method forward_qwen3_moe_wgpu(). Mirrors cuda/mod.rs structure. * crates/aprender-serve/src/gguf/wgpu_model.rs — re-export shim `pub use super::wgpu_backend::OwnedQuantizedModelWgpu`. Mirrors cuda_model.rs. * crates/aprender-serve/src/gguf/mod.rs — adds the two new modules behind `#[cfg(feature = \"gpu\")]` (the existing wgpu feature flag — `gpu = [\"trueno/gpu\"]` per Cargo.toml line 208). WHY MODULE NAMED `wgpu_backend`: The Rust ecosystem already has a `wgpu` crate. A module named `wgpu` inside the same crate would shadow it inside the file's body. The public re-export still presents `OwnedQuantizedModelWgpu` (no ugly suffix) thanks to wgpu_model.rs. WHY THIS IS A STUB: Same staging discipline as M-GPU-MOE-1.0-redo — contract first, scaffold second, implementation third. The body of forward_qwen3_moe_wgpu validates preconditions (mirroring the cuda sibling's boundary) then returns RealizarError::UnsupportedOperation whose reason points at the v1.2.0 amendment block for the M-GPU-MOE-2 staging plan. Until M-GPU-MOE-2.2 lands, callers on non-CUDA hardware fall back to OwnedQuantizedModel::forward_qwen3_moe (CPU LAZY-FUSED-MATVEC, ~30 tok/s). VERIFICATION: cargo check -p aprender-serve → 0 errors (default) cargo check -p aprender-serve --features cuda → 0 errors (cuda) cargo check -p aprender-serve --features gpu → 0 errors (wgpu) cargo test -p aprender-serve --lib --features gpu \ owned_quantized_model_wgpu_tests → 1 passed Lib unit test asserts the function signature exists and matches the cuda sibling step-for-step (compile-time checks via fn pointer coercion — no runtime model construction needed at the stub stage). DEPENDS ON: PR #1485 (qwen3-moe-forward-gpu-v1 v1.2.0 option I amendment). Branch is stacked on the v1.2.0 contract branch; once #1485 lands on main, this PR rebases onto main directly. NEXT STAGES per v1.2.0: M-GPU-MOE-2.1 per-expert wgpu dispatch helpers (expert_swiglu_wgpu, moe_ffn_forward_layer_wgpu) M-GPU-MOE-2.2 full forward integration mirror of cuda sibling M-GPU-MOE-2.3 cosine-vs-CPU parity test on wgpu hardware Refs: M52, R10, qwen3-moe-forward-gpu-v1 v1.2.0 :: M-GPU-MOE-2.0. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> * test(aprender-serve): qwen3_moe_wgpu_parity — M-GPU-MOE-2.3 cosine ≥0.99 falsifier (wgpu) (#1488) wgpu sibling of `qwen3_moe_gpu_parity.rs` (M-GPU-MOE-1.2, PR #1484). Asserts cosine ≥ 0.99 between APR's CPU `forward_qwen3_moe` reference and the wgpu `OwnedQuantizedModelWgpu::forward_qwen3_moe_wgpu` integration on the same prompt. Same falsifier ID as the cuda sibling (FALSIFY-QW3-MOE-GPU-PARITY-001) — wgpu is a SECOND backend implementing the same contract gate, not a different gate. Same threshold (≥ 0.99), same canonical 17.3 GB Qwen3-Coder GGUF, same 3-token canonical prompt as the cuda test. CI WIRING: - #[cfg(feature = "gpu")] gates the file (matches the gate on OwnedQuantizedModelWgpu in gguf/mod.rs) - #[ignore] on the heavy test (CI default skips; explicit `--include-ignored` runs it on a wgpu-capable adapter — Apple Silicon Metal, AMD Vulkan, Intel ARC Vulkan) - 2 helper unit tests (cosine_similarity sanity coverage) DO run by default WHEN THE TEST PASSES: - M-GPU-MOE-2.0 stub returns UnsupportedOperation, so this test currently panics at the wgpu forward call (correct behaviour for a falsifier against an incomplete impl). - M-GPU-MOE-2.1 (per-expert wgpu helpers via trueno-gpu QuantizeKernel + GemmKernel compute pipelines) + M-GPU-MOE-2.2 (full forward integration analog of forward_qwen3_moe_cuda) must both land before this test passes on hardware. - On hardware with wgpu support, run with --include-ignored to exercise. PASS discharges FALSIFY-QW3-MOE-GPU-PARITY-001 for the wgpu backend (cuda backend discharged by sibling test). DEPENDS ON: PR #1485 (v1.2.0 amendment + M-GPU-MOE-2.0 stub). Branch is stacked on the v1.2.0 contract branch; once #1485 lands on main, this PR's base flips to main automatically. Refs: M52, M53, R10, qwen3-moe-forward-gpu-v1 v1.2.0 :: M-GPU-MOE-2.3 + FALSIFY-QW3-MOE-GPU-PARITY-001 (wgpu). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift merged commit 10cc7ad into contract/qwen3-moe-forward-gpu-v1-2-0-wgpu-fallback May 4, 2026
1 check passed

noahgift deleted the feat/qwen3-moe-wgpu-parity-test-m-2-3 branch May 4, 2026 20:57

noahgift mentioned this pull request May 4, 2026

contract+feat+test: v1.2.0 wgpu cascade — option I + 2.0 stub + 2.3 parity test #1485

Merged

6 tasks

noahgift mentioned this pull request May 9, 2026

M-GPU-MOE-2.x — wgpu helpers + integration + parity test for qwen3-moe-forward-gpu-v1 #1582

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(aprender-serve): qwen3_moe_wgpu_parity — M-GPU-MOE-2.3 cosine ≥0.99 falsifier (wgpu)#1488

test(aprender-serve): qwen3_moe_wgpu_parity — M-GPU-MOE-2.3 cosine ≥0.99 falsifier (wgpu)#1488
noahgift merged 1 commit into
contract/qwen3-moe-forward-gpu-v1-2-0-wgpu-fallbackfrom
feat/qwen3-moe-wgpu-parity-test-m-2-3

noahgift commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 4, 2026

Summary

When the test runs

Verification

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant