feat(aprender-serve): forward_qwen3_moe_cuda full integration — M-GPU-MOE-1.1.2 by noahgift · Pull Request #1477 · paiml/aprender

noahgift · 2026-05-04T15:40:04Z

Summary

Replaces M-GPU-MOE-1.0-redo stub body with full forward integration
Mirrors CPU sibling `forward_qwen3_moe` line-for-line
FFN section routes through `moe_ffn_forward_layer_cuda` (feat(aprender-serve): moe_ffn_forward_layer_cuda — M-GPU-MOE-1.1.1 #1469) → `expert_swiglu_cuda` → `self.executor.q4k_matvec` / `q6k_gemv`
Attention path stays CPU (existing `forward_cuda` pattern)
1 unit test passes

Signature changes

`&self → &mut self` (executor needs mutable for kernel cache)
`_data → data` (passed through to per-expert byte slicer)

Per qwen3-moe-forward-gpu-v1 v1.1.0 option D

Extends OwnedQuantizedModelCuda's CPU-attention + CUDA-FFN pattern. The actual GPU compute happens at the per-expert SwiGLU dispatch (q4k_matvec × 2 + q6k_gemv per top-k expert per token).

Test plan

`cargo check -p aprender-serve --features cuda` — compiles
`cargo test -p aprender-serve --features cuda --lib forward_qwen3_moe_cuda` — passes
`pv validate contracts/qwen3-moe-forward-gpu-v1.yaml` — 0/0
M-GPU-MOE-1.2 PR: cosine-vs-CPU parity gate against real Qwen3-Coder GGUF (`#[ignore]` test bearing FALSIFY-QW3-MOE-GPU-PARITY-001)
M-GPU-MOE-3 PR: throughput target ≥150 tok/s

🤖 Generated with Claude Code

…-MOE-1.1.2 Replaces the M-GPU-MOE-1.0-redo stub body with the full forward integration. forward_qwen3_moe_cuda now mirrors the CPU sibling OwnedQuantizedModel::forward_qwen3_moe (forward_qwen3_moe.rs) line-for-line, with one difference: the per-layer FFN section routes through moe_ffn_forward_layer_cuda which dispatches per- expert matmuls to self.executor (CudaExecutor) via the expert_swiglu_cuda helper. Per qwen3-moe-forward-gpu-v1 v1.1.0 option D — extends the existing OwnedQuantizedModelCuda CPU-attention + CUDA-FFN pattern (forward_cuda in cuda.rs:18). Attention path stays on CPU; only FFN matmuls go to GPU. M-GPU-MOE-3 fuses dispatch into a single sparse-expert kernel for ~5× throughput. Signature changes ================= - &self → &mut self (executor needs mutable for kernel cache) - _data → data (passed to moe_ffn_forward_layer_cuda for expert_byte_slice) Forward body structure (mirrors CPU sibling step-for-step): 1. Embed (CPU) — self.model.embed 2. Per-layer: 2a. Attention norm (CPU) — ops::rms_norm 2b. QKV projection (CPU) — self.model.qkv_matmul 2c. Per-head Q/K RMSNorm + RoPE (M32d Step 5/5b) — ops::apply_per_head_rms_norm 2d. Causal attention + output proj (CPU) — self.model.causal_attention 2e. Residual — element-wise CPU 2f. Pre-FFN norm (CPU) — ops::rms_norm 2g. **MoE FFN on GPU** — moe_ffn_forward_layer_cuda → expert_swiglu_cuda → self.executor.q4k_matvec .q6k_gemv 2h. Residual — element-wise CPU 3. Final norm (CPU) 4. LM head — last token (CPU) Implementation stages updated ============================= M-GPU-MOE-0 Contract scaffold v1.0.0 SHIPPED ✓ M-GPU-MOE-0.5 v1.1.0 option D amendment SHIPPED ✓ M-GPU-MOE-1.0-redo Stub on OwnedQuantizedModelCuda SHIPPED ✓ (#1464) M-GPU-MOE-1.1.0 expert_swiglu_cuda helper SHIPPED ✓ (via #1469 squash) M-GPU-MOE-1.1.1 moe_ffn_forward_layer_cuda SHIPPED ✓ (#1469) M-GPU-MOE-1.1.2 forward_qwen3_moe_cuda full integ SHIPPED ✓ (THIS PR) M-GPU-MOE-1.2 Cosine-vs-CPU parity gate ≥0.99 PENDING (FALSIFY-QW3-MOE-GPU-PARITY-001) M-GPU-MOE-2 wgpu fallback PENDING M-GPU-MOE-3 Throughput ≥150 + VRAM ≤ 95% PENDING Verification ============ $ cargo check -p aprender-serve --features cuda ✓ Compiles $ cargo test -p aprender-serve --features cuda --lib forward_qwen3_moe_cuda test ... ok. 1 passed Refs PR #1469 squash 77b9f0d (helpers landed) Refs PR #1462 squash 4495407 (v1.1.0 option D amendment) Refs claude-code-parity-apr POC M49 / R10 (P0 elevation) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…99 falsifier Authors the FALSIFY-QW3-MOE-GPU-PARITY-001 test scaffold from contract qwen3-moe-forward-gpu-v1 v1.1.0 implementation_stages M-GPU-MOE-1.2. WHAT THE TEST DOES (when run with `--include-ignored` against the cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf on RTX 4090): 1. Loads the GGUF once (single mmap). 2. Builds moe_layers: Vec<Qwen3MoeQuantizedLayer> once. 3. Builds CPU OwnedQuantizedModel #1 → runs forward_qwen3_moe on a fixed prompt → cpu_logits (the LAZY-FUSED-MATVEC ground truth). 4. Builds CPU OwnedQuantizedModel #2 → wraps into OwnedQuantizedModelCuda → runs forward_qwen3_moe_cuda on the same prompt → gpu_logits. 5. Computes cosine_similarity(cpu_logits, gpu_logits) over the full 151936-dim vocab. 6. Asserts cos_sim ≥ 0.99 per the contract's formal bound. The test follows the qwen3_moe_parity.rs (M32d.2 CPU-vs-HF-FP16) template line-for-line — same canonical GGUF paths array, same fixture-skip pattern, same cosine_similarity helper. The only difference is the second forward pass dispatches to forward_qwen3_moe_cuda instead of treating an FP32 fixture as truth. CI WIRING: - #[cfg(feature = "cuda")] gates the entire file (no GPU host = no compile) - #[ignore] on the heavy test (CI default skips; explicit `--include-ignored` runs it) - 3 helper unit tests (cosine_similarity_unit_vectors / handles_zero / within_threshold) DO run by default — they cover the cosine helper itself WHEN THE TEST PASSES: - The aprender PR #1477 (M-GPU-MOE-1.1.2 full forward integration) must be on main first. Currently main has the v1.0-redo stub; running this test against the stub returns UnsupportedOperation error and the test panics (correct behaviour for a falsifier against an incomplete impl). - Once #1477 lands, run the test on lambda-vector with: cargo test -p aprender-serve --test qwen3_moe_gpu_parity \ --features cuda -- --include-ignored - On PASS, the contract's M-GPU-MOE-1.2 stage flips PENDING → SHIPPED and (with PARITY-002 from the v1 sibling) the gate discharges qwen3-moe-forward-gpu-v1 v1.1.0 DRAFT → ACTIVE_ALGORITHM_LEVEL. PR #1477 changes forward_qwen3_moe_cuda's receiver from `&self` to `&mut self` (kernel cache mutation). The `mut gpu_model` binding here carries a forward-looking #[allow(unused_mut)] note for that reason. Refs: qwen3-moe-forward-gpu-v1 v1.1.0 :: M-GPU-MOE-1.2 + FALSIFY-QW3-MOE-GPU-PARITY-001 + companion-spec M51 + R10. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…QuantizedModelWgpu) Pre-implementation architecture amendment for M-GPU-MOE-2 (wgpu fallback). Mirrors the v1.1.0 option D amendment that pinned the CUDA substrate before M-GPU-MOE-1.0 implementation; this one pins the wgpu substrate before any wgpu code lands. Why now: M-GPU-MOE-1 is in flight (1.0-redo SHIPPED, 1.1.1 SHIPPED, 1.1.2 OPEN as PR #1477, 1.2 test scaffold OPEN as PR #1484). Choosing the wgpu seam early prevents the wrong-type-stub waste that bit M-GPU-MOE-1.0 (PR #1460 placed forward_qwen3_moe_gpu on OwnedQuantizedModel; one cycle later #1464 redo'd it on OwnedQuantizedModelCuda — option D). FOUR options considered: (I) OwnedQuantizedModelWgpu wrapper type (analog of v1.1.0 option D) — CHOSEN (II) GpuExecutor trait abstracting CUDA + wgpu — REJECTED (over-engineered) (III) Backend enum inside renamed OwnedQuantizedModelGpu — REJECTED (invasive) (IV) Defer wgpu indefinitely — REJECTED (violates CLAUDE.md backend-agnostic mandate) Option I picks wgpu by code-path symmetry, not by trait abstraction: new file tree at `crates/aprender-serve/src/gguf/wgpu/` mirrors `crates/aprender-serve/src/gguf/cuda/` line-for-line. Maintenance-mode reviewer can verify a parity bug by diff, not by elaborate test infrastructure. M-GPU-MOE-2 decomposed into four substages mirroring M-GPU-MOE-1.x: M-GPU-MOE-2.0 stub on OwnedQuantizedModelWgpu M-GPU-MOE-2.1 per-expert wgpu dispatch helpers (expert_swiglu_wgpu, moe_ffn_forward_layer_wgpu) M-GPU-MOE-2.2 full forward integration (replaces 2.0 stub body) M-GPU-MOE-2.3 cosine-vs-CPU parity test on hardware with wgpu Two new blockers documented: - wgpu adapter selection probe for non-NVIDIA hardware - trueno-gpu Q6_K QuantizeKernel coverage check before 2.1 Companion-spec records this as M52 (no companion contract bump). Validation: pv validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 error(s), 0 warning(s). Contract is valid. Refs: M52, R10, qwen3-moe-forward-gpu-v1 v1.2.0 option I. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…99 falsifier (#1484) Authors the FALSIFY-QW3-MOE-GPU-PARITY-001 test scaffold from contract qwen3-moe-forward-gpu-v1 v1.1.0 implementation_stages M-GPU-MOE-1.2. WHAT THE TEST DOES (when run with `--include-ignored` against the cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf on RTX 4090): 1. Loads the GGUF once (single mmap). 2. Builds moe_layers: Vec<Qwen3MoeQuantizedLayer> once. 3. Builds CPU OwnedQuantizedModel #1 → runs forward_qwen3_moe on a fixed prompt → cpu_logits (the LAZY-FUSED-MATVEC ground truth). 4. Builds CPU OwnedQuantizedModel #2 → wraps into OwnedQuantizedModelCuda → runs forward_qwen3_moe_cuda on the same prompt → gpu_logits. 5. Computes cosine_similarity(cpu_logits, gpu_logits) over the full 151936-dim vocab. 6. Asserts cos_sim ≥ 0.99 per the contract's formal bound. The test follows the qwen3_moe_parity.rs (M32d.2 CPU-vs-HF-FP16) template line-for-line — same canonical GGUF paths array, same fixture-skip pattern, same cosine_similarity helper. The only difference is the second forward pass dispatches to forward_qwen3_moe_cuda instead of treating an FP32 fixture as truth. CI WIRING: - #[cfg(feature = "cuda")] gates the entire file (no GPU host = no compile) - #[ignore] on the heavy test (CI default skips; explicit `--include-ignored` runs it) - 3 helper unit tests (cosine_similarity_unit_vectors / handles_zero / within_threshold) DO run by default — they cover the cosine helper itself WHEN THE TEST PASSES: - The aprender PR #1477 (M-GPU-MOE-1.1.2 full forward integration) must be on main first. Currently main has the v1.0-redo stub; running this test against the stub returns UnsupportedOperation error and the test panics (correct behaviour for a falsifier against an incomplete impl). - Once #1477 lands, run the test on lambda-vector with: cargo test -p aprender-serve --test qwen3_moe_gpu_parity \ --features cuda -- --include-ignored - On PASS, the contract's M-GPU-MOE-1.2 stage flips PENDING → SHIPPED and (with PARITY-002 from the v1 sibling) the gate discharges qwen3-moe-forward-gpu-v1 v1.1.0 DRAFT → ACTIVE_ALGORITHM_LEVEL. PR #1477 changes forward_qwen3_moe_cuda's receiver from `&self` to `&mut self` (kernel cache mutation). The `mut gpu_model` binding here carries a forward-looking #[allow(unused_mut)] note for that reason. Refs: qwen3-moe-forward-gpu-v1 v1.1.0 :: M-GPU-MOE-1.2 + FALSIFY-QW3-MOE-GPU-PARITY-001 + companion-spec M51 + R10. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…QuantizedModelWgpu) Pre-implementation architecture amendment for M-GPU-MOE-2 (wgpu fallback). Mirrors the v1.1.0 option D amendment that pinned the CUDA substrate before M-GPU-MOE-1.0 implementation; this one pins the wgpu substrate before any wgpu code lands. Why now: M-GPU-MOE-1 is in flight (1.0-redo SHIPPED, 1.1.1 SHIPPED, 1.1.2 OPEN as PR #1477, 1.2 test scaffold OPEN as PR #1484). Choosing the wgpu seam early prevents the wrong-type-stub waste that bit M-GPU-MOE-1.0 (PR #1460 placed forward_qwen3_moe_gpu on OwnedQuantizedModel; one cycle later #1464 redo'd it on OwnedQuantizedModelCuda — option D). FOUR options considered: (I) OwnedQuantizedModelWgpu wrapper type (analog of v1.1.0 option D) — CHOSEN (II) GpuExecutor trait abstracting CUDA + wgpu — REJECTED (over-engineered) (III) Backend enum inside renamed OwnedQuantizedModelGpu — REJECTED (invasive) (IV) Defer wgpu indefinitely — REJECTED (violates CLAUDE.md backend-agnostic mandate) Option I picks wgpu by code-path symmetry, not by trait abstraction: new file tree at `crates/aprender-serve/src/gguf/wgpu/` mirrors `crates/aprender-serve/src/gguf/cuda/` line-for-line. Maintenance-mode reviewer can verify a parity bug by diff, not by elaborate test infrastructure. M-GPU-MOE-2 decomposed into four substages mirroring M-GPU-MOE-1.x: M-GPU-MOE-2.0 stub on OwnedQuantizedModelWgpu M-GPU-MOE-2.1 per-expert wgpu dispatch helpers (expert_swiglu_wgpu, moe_ffn_forward_layer_wgpu) M-GPU-MOE-2.2 full forward integration (replaces 2.0 stub body) M-GPU-MOE-2.3 cosine-vs-CPU parity test on hardware with wgpu Two new blockers documented: - wgpu adapter selection probe for non-NVIDIA hardware - trueno-gpu Q6_K QuantizeKernel coverage check before 2.1 Companion-spec records this as M52 (no companion contract bump). Validation: pv validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 error(s), 0 warning(s). Contract is valid. Refs: M52, R10, qwen3-moe-forward-gpu-v1 v1.2.0 option I. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…arity test (#1485) * contract(qwen3-moe-forward-gpu-v1): v1.1.0 → v1.2.0 — option I (OwnedQuantizedModelWgpu) Pre-implementation architecture amendment for M-GPU-MOE-2 (wgpu fallback). Mirrors the v1.1.0 option D amendment that pinned the CUDA substrate before M-GPU-MOE-1.0 implementation; this one pins the wgpu substrate before any wgpu code lands. Why now: M-GPU-MOE-1 is in flight (1.0-redo SHIPPED, 1.1.1 SHIPPED, 1.1.2 OPEN as PR #1477, 1.2 test scaffold OPEN as PR #1484). Choosing the wgpu seam early prevents the wrong-type-stub waste that bit M-GPU-MOE-1.0 (PR #1460 placed forward_qwen3_moe_gpu on OwnedQuantizedModel; one cycle later #1464 redo'd it on OwnedQuantizedModelCuda — option D). FOUR options considered: (I) OwnedQuantizedModelWgpu wrapper type (analog of v1.1.0 option D) — CHOSEN (II) GpuExecutor trait abstracting CUDA + wgpu — REJECTED (over-engineered) (III) Backend enum inside renamed OwnedQuantizedModelGpu — REJECTED (invasive) (IV) Defer wgpu indefinitely — REJECTED (violates CLAUDE.md backend-agnostic mandate) Option I picks wgpu by code-path symmetry, not by trait abstraction: new file tree at `crates/aprender-serve/src/gguf/wgpu/` mirrors `crates/aprender-serve/src/gguf/cuda/` line-for-line. Maintenance-mode reviewer can verify a parity bug by diff, not by elaborate test infrastructure. M-GPU-MOE-2 decomposed into four substages mirroring M-GPU-MOE-1.x: M-GPU-MOE-2.0 stub on OwnedQuantizedModelWgpu M-GPU-MOE-2.1 per-expert wgpu dispatch helpers (expert_swiglu_wgpu, moe_ffn_forward_layer_wgpu) M-GPU-MOE-2.2 full forward integration (replaces 2.0 stub body) M-GPU-MOE-2.3 cosine-vs-CPU parity test on hardware with wgpu Two new blockers documented: - wgpu adapter selection probe for non-NVIDIA hardware - trueno-gpu Q6_K QuantizeKernel coverage check before 2.1 Companion-spec records this as M52 (no companion contract bump). Validation: pv validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 error(s), 0 warning(s). Contract is valid. Refs: M52, R10, qwen3-moe-forward-gpu-v1 v1.2.0 option I. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(aprender-serve): OwnedQuantizedModelWgpu stub — M-GPU-MOE-2.0 (#1487) Implements M-GPU-MOE-2.0 per qwen3-moe-forward-gpu-v1 v1.2.0 option I (see PR #1485 amendment). Analog of M-GPU-MOE-1.0-redo (PR #1464) for the wgpu backend. WHAT THIS PR ADDS: * crates/aprender-serve/src/gguf/wgpu_backend/mod.rs — new module with OwnedQuantizedModelWgpu struct + new() + stub method forward_qwen3_moe_wgpu(). Mirrors cuda/mod.rs structure. * crates/aprender-serve/src/gguf/wgpu_model.rs — re-export shim `pub use super::wgpu_backend::OwnedQuantizedModelWgpu`. Mirrors cuda_model.rs. * crates/aprender-serve/src/gguf/mod.rs — adds the two new modules behind `#[cfg(feature = \"gpu\")]` (the existing wgpu feature flag — `gpu = [\"trueno/gpu\"]` per Cargo.toml line 208). WHY MODULE NAMED `wgpu_backend`: The Rust ecosystem already has a `wgpu` crate. A module named `wgpu` inside the same crate would shadow it inside the file's body. The public re-export still presents `OwnedQuantizedModelWgpu` (no ugly suffix) thanks to wgpu_model.rs. WHY THIS IS A STUB: Same staging discipline as M-GPU-MOE-1.0-redo — contract first, scaffold second, implementation third. The body of forward_qwen3_moe_wgpu validates preconditions (mirroring the cuda sibling's boundary) then returns RealizarError::UnsupportedOperation whose reason points at the v1.2.0 amendment block for the M-GPU-MOE-2 staging plan. Until M-GPU-MOE-2.2 lands, callers on non-CUDA hardware fall back to OwnedQuantizedModel::forward_qwen3_moe (CPU LAZY-FUSED-MATVEC, ~30 tok/s). VERIFICATION: cargo check -p aprender-serve → 0 errors (default) cargo check -p aprender-serve --features cuda → 0 errors (cuda) cargo check -p aprender-serve --features gpu → 0 errors (wgpu) cargo test -p aprender-serve --lib --features gpu \ owned_quantized_model_wgpu_tests → 1 passed Lib unit test asserts the function signature exists and matches the cuda sibling step-for-step (compile-time checks via fn pointer coercion — no runtime model construction needed at the stub stage). DEPENDS ON: PR #1485 (qwen3-moe-forward-gpu-v1 v1.2.0 option I amendment). Branch is stacked on the v1.2.0 contract branch; once #1485 lands on main, this PR rebases onto main directly. NEXT STAGES per v1.2.0: M-GPU-MOE-2.1 per-expert wgpu dispatch helpers (expert_swiglu_wgpu, moe_ffn_forward_layer_wgpu) M-GPU-MOE-2.2 full forward integration mirror of cuda sibling M-GPU-MOE-2.3 cosine-vs-CPU parity test on wgpu hardware Refs: M52, R10, qwen3-moe-forward-gpu-v1 v1.2.0 :: M-GPU-MOE-2.0. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> * test(aprender-serve): qwen3_moe_wgpu_parity — M-GPU-MOE-2.3 cosine ≥0.99 falsifier (wgpu) (#1488) wgpu sibling of `qwen3_moe_gpu_parity.rs` (M-GPU-MOE-1.2, PR #1484). Asserts cosine ≥ 0.99 between APR's CPU `forward_qwen3_moe` reference and the wgpu `OwnedQuantizedModelWgpu::forward_qwen3_moe_wgpu` integration on the same prompt. Same falsifier ID as the cuda sibling (FALSIFY-QW3-MOE-GPU-PARITY-001) — wgpu is a SECOND backend implementing the same contract gate, not a different gate. Same threshold (≥ 0.99), same canonical 17.3 GB Qwen3-Coder GGUF, same 3-token canonical prompt as the cuda test. CI WIRING: - #[cfg(feature = "gpu")] gates the file (matches the gate on OwnedQuantizedModelWgpu in gguf/mod.rs) - #[ignore] on the heavy test (CI default skips; explicit `--include-ignored` runs it on a wgpu-capable adapter — Apple Silicon Metal, AMD Vulkan, Intel ARC Vulkan) - 2 helper unit tests (cosine_similarity sanity coverage) DO run by default WHEN THE TEST PASSES: - M-GPU-MOE-2.0 stub returns UnsupportedOperation, so this test currently panics at the wgpu forward call (correct behaviour for a falsifier against an incomplete impl). - M-GPU-MOE-2.1 (per-expert wgpu helpers via trueno-gpu QuantizeKernel + GemmKernel compute pipelines) + M-GPU-MOE-2.2 (full forward integration analog of forward_qwen3_moe_cuda) must both land before this test passes on hardware. - On hardware with wgpu support, run with --include-ignored to exercise. PASS discharges FALSIFY-QW3-MOE-GPU-PARITY-001 for the wgpu backend (cuda backend discharged by sibling test). DEPENDS ON: PR #1485 (v1.2.0 amendment + M-GPU-MOE-2.0 stub). Branch is stacked on the v1.2.0 contract branch; once #1485 lands on main, this PR's base flips to main automatically. Refs: M52, M53, R10, qwen3-moe-forward-gpu-v1 v1.2.0 :: M-GPU-MOE-2.3 + FALSIFY-QW3-MOE-GPU-PARITY-001 (wgpu). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…d-bug fix plan Live-dogfood finding 2026-05-04 on lambda-vector RTX 4090: the M-GPU-MOE-1.2 heavy `qwen3_moe_gpu_parity` test (FALSIFY-QW3-MOE- GPU-PARITY-001) cannot run on the cached 17.3 GB Qwen3-Coder GGUF because `OwnedQuantizedModelCuda::new` itself fails: UnsupportedOperation { operation: "preload_weights_gpu", reason: "PAR-043: Failed to build indexed weights: Invalid launch config: Quantized weight 'blk.0.ffn_gate.weight' not cached" } ROOT CAUSE (5-whys in evidence file): `executor.build_indexed_weights` at `crates/aprender-serve/src/cuda/executor/weights.rs:325-373` unconditionally requires `blk.{i}.ffn_gate.weight`, `.ffn_up.weight`, `.ffn_down.weight` to be cached for every layer. For MoE these names DO NOT EXIST — MoE has 128 expert gates per layer (`blk.{i}.ffn_gate_exps.weight`) loaded into the `moe_layers` parameter at forward-time. M-GPU-MOE-1.1.2 (PR #1477)'s forward body sidesteps the indexed weights for FFN, but the wrapper construction goes through `preload_weights_gpu` BEFORE forward is ever called. Wrapper construction fails first. WHY DEFAULT CI DIDN'T CATCH IT: Lib-only stub test (PR #1464) only checks signature at compile time. Heavy `qwen3_moe_gpu_parity.rs` (PR #1484) is `#[ignore]`d + needs RTX 4090 + 17.3 GB GGUF. First `--include-ignored` dogfood on lambda-vector found this 2026-05-04. THIS PR ADDS: (1) Evidence file `evidence/m-gpu-moe-1-2-blocked-by-preload-bug-2026-05-04/findings.md` documenting the live failure + 5-whys + fix architecture. (2) Contract `qwen3-moe-forward-gpu-v1` v1.2.0 → v1.3.0: * New v1.3.0 amendment_history block (~110 lines) describing the bug, root cause, and three-step fix architecture * New implementation_stage `M-GPU-MOE-1.3` between 1.2 and 2 with status PENDING * New falsification_test FALSIFY-QW3-MOE-GPU-PRELOAD-001 (hardware test + lib-only sibling) * Top-level version "1.2.0" → "1.3.0" * Status comment expanded to mention M-GPU-MOE-1.3 as a precondition for ACTIVE_ALGORITHM_LEVEL flip VALIDATION: pv validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 errors, 0 warnings. Contract is valid. WHAT THIS PR DOES NOT DO: Does NOT implement the fix. Per CLAUDE.md "NEVER write code before writing a provable contract", this PR pins the contract first. The fix lands in a separate PR (M-GPU-MOE-1.3 stage): ~30 LOC in weights.rs + 1-2 callers + ArchConstraints field + drift-prevention test. Does NOT block PR #1485's already-shipped 3-commit cascade (M52/M54). The cascade is correct; M-GPU-MOE-1.3 is a sibling bug-fix. Refs: M52, M53, M54, R10, qwen3-moe-forward-gpu-v1 v1.3.0, FALSIFY-QW3-MOE-GPU-PRELOAD-001 (new). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Followup to the previous M-GPU-MOE-1.3 commit. The parity_gate (Jidoka stop-the-line in `OwnedQuantizedModelCuda::with_max_seq_len`) also runs the dense forward paths (`forward_single_with_cache` CPU + `forward_gpu_resident` GPU) on construction. For MoE these dispatch to `fused_matmul_f32` against the `dense_ffn_placeholder` (byte_size=0), causing rayon-parallel panics in `matmul_fused.rs:211`. Fix: skip parity_gate when `arch.is_moe`, mirroring the rationale already in v1.3.0's amendment_history block. - The parity gate's purpose is "stop the line if GPU diverges from CPU" — for dense models, it's load-time safety. - For MoE, the equivalent gate is FALSIFY-QW3-MOE-GPU-PARITY-001 (qwen3_moe_gpu_parity.rs), which exercises the MoE-specific forward paths and bypasses the dense path the gate runs. - Net: MoE models lose load-time parity but gain test-time parity via the qwen3_moe_gpu_parity test. VERIFICATION ON LAMBDA-VECTOR RTX 4090: Test progresses much further now: BEFORE: panic at OwnedQuantizedModelCuda::new build_indexed_weights (FALSIFY-QW3-MOE-GPU-PRELOAD-001 falsifier) AFTER previous commit: panic at parity_gate matmul_fused.rs:211 (downstream bug — exposed but not yet fixed) AFTER this commit: CPU forward succeeds, GPU forward executes, then asserts at gpu_logits.iter().all(|v| v.is_finite()) because the GPU produces NaN/Inf logits. Test output: [GH-129] Early kernel preload: 49 modules compiled [PMAT-082] cuBLASLt FP8 JIT warmed (2048x16x2048) [PMAT-053] FP8 weight cache: 193 matrices cached (728.8 MB) FALSIFY-QW3-MOE-GPU-PARITY-001: running GPU forward... panicked at qwen3_moe_gpu_parity.rs:168: all GPU logits must be finite (no NaN/Inf) PARTIAL DISCHARGE: FALSIFY-QW3-MOE-GPU-PRELOAD-001 — wrapper construction succeeds. FALSIFY-QW3-MOE-GPU-INVARIANTS-001 — partial (output length OK implicitly; finiteness FAILS). FALSIFY-QW3-MOE-GPU-PARITY-001 — blocked by NaN/Inf bug. NEW DOWNSTREAM BUG: GPU forward (forward_qwen3_moe_cuda body, M-GPU-MOE-1.1.2 PR #1477) produces NaN/Inf for at least the canonical 3-token Qwen3-Coder prompt. This is the NEXT bug to investigate (M-GPU-MOE-1.5 follow-up). Likely candidates: - Q4K matmul accumulator overflow in expert_swiglu_cuda - Per-expert SwiGLU silu activation produces Inf for large inputs - Top-k router weight renormalization division by zero - missing per-head Q/K RMSNorm path for MoE (qk_norm tensors loaded but not applied) Bisection via `apr trace --json --payload` per the M32d Step 2 surface methodology (per qwen3-moe-forward-gpu-v1 v1.1.0 PARITY-001 if_fails). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…d-bug fix plan Live-dogfood finding 2026-05-04 on lambda-vector RTX 4090: the M-GPU-MOE-1.2 heavy `qwen3_moe_gpu_parity` test (FALSIFY-QW3-MOE- GPU-PARITY-001) cannot run on the cached 17.3 GB Qwen3-Coder GGUF because `OwnedQuantizedModelCuda::new` itself fails: UnsupportedOperation { operation: "preload_weights_gpu", reason: "PAR-043: Failed to build indexed weights: Invalid launch config: Quantized weight 'blk.0.ffn_gate.weight' not cached" } ROOT CAUSE (5-whys in evidence file): `executor.build_indexed_weights` at `crates/aprender-serve/src/cuda/executor/weights.rs:325-373` unconditionally requires `blk.{i}.ffn_gate.weight`, `.ffn_up.weight`, `.ffn_down.weight` to be cached for every layer. For MoE these names DO NOT EXIST — MoE has 128 expert gates per layer (`blk.{i}.ffn_gate_exps.weight`) loaded into the `moe_layers` parameter at forward-time. M-GPU-MOE-1.1.2 (PR #1477)'s forward body sidesteps the indexed weights for FFN, but the wrapper construction goes through `preload_weights_gpu` BEFORE forward is ever called. Wrapper construction fails first. WHY DEFAULT CI DIDN'T CATCH IT: Lib-only stub test (PR #1464) only checks signature at compile time. Heavy `qwen3_moe_gpu_parity.rs` (PR #1484) is `#[ignore]`d + needs RTX 4090 + 17.3 GB GGUF. First `--include-ignored` dogfood on lambda-vector found this 2026-05-04. THIS PR ADDS: (1) Evidence file `evidence/m-gpu-moe-1-2-blocked-by-preload-bug-2026-05-04/findings.md` documenting the live failure + 5-whys + fix architecture. (2) Contract `qwen3-moe-forward-gpu-v1` v1.2.0 → v1.3.0: * New v1.3.0 amendment_history block (~110 lines) describing the bug, root cause, and three-step fix architecture * New implementation_stage `M-GPU-MOE-1.3` between 1.2 and 2 with status PENDING * New falsification_test FALSIFY-QW3-MOE-GPU-PRELOAD-001 (hardware test + lib-only sibling) * Top-level version "1.2.0" → "1.3.0" * Status comment expanded to mention M-GPU-MOE-1.3 as a precondition for ACTIVE_ALGORITHM_LEVEL flip VALIDATION: pv validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 errors, 0 warnings. Contract is valid. WHAT THIS PR DOES NOT DO: Does NOT implement the fix. Per CLAUDE.md "NEVER write code before writing a provable contract", this PR pins the contract first. The fix lands in a separate PR (M-GPU-MOE-1.3 stage): ~30 LOC in weights.rs + 1-2 callers + ArchConstraints field + drift-prevention test. Does NOT block PR #1485's already-shipped 3-commit cascade (M52/M54). The cascade is correct; M-GPU-MOE-1.3 is a sibling bug-fix. Refs: M52, M53, M54, R10, qwen3-moe-forward-gpu-v1 v1.3.0, FALSIFY-QW3-MOE-GPU-PRELOAD-001 (new). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…d-bug fix plan (#1490) Live-dogfood finding 2026-05-04 on lambda-vector RTX 4090: the M-GPU-MOE-1.2 heavy `qwen3_moe_gpu_parity` test (FALSIFY-QW3-MOE- GPU-PARITY-001) cannot run on the cached 17.3 GB Qwen3-Coder GGUF because `OwnedQuantizedModelCuda::new` itself fails: UnsupportedOperation { operation: "preload_weights_gpu", reason: "PAR-043: Failed to build indexed weights: Invalid launch config: Quantized weight 'blk.0.ffn_gate.weight' not cached" } ROOT CAUSE (5-whys in evidence file): `executor.build_indexed_weights` at `crates/aprender-serve/src/cuda/executor/weights.rs:325-373` unconditionally requires `blk.{i}.ffn_gate.weight`, `.ffn_up.weight`, `.ffn_down.weight` to be cached for every layer. For MoE these names DO NOT EXIST — MoE has 128 expert gates per layer (`blk.{i}.ffn_gate_exps.weight`) loaded into the `moe_layers` parameter at forward-time. M-GPU-MOE-1.1.2 (PR #1477)'s forward body sidesteps the indexed weights for FFN, but the wrapper construction goes through `preload_weights_gpu` BEFORE forward is ever called. Wrapper construction fails first. WHY DEFAULT CI DIDN'T CATCH IT: Lib-only stub test (PR #1464) only checks signature at compile time. Heavy `qwen3_moe_gpu_parity.rs` (PR #1484) is `#[ignore]`d + needs RTX 4090 + 17.3 GB GGUF. First `--include-ignored` dogfood on lambda-vector found this 2026-05-04. THIS PR ADDS: (1) Evidence file `evidence/m-gpu-moe-1-2-blocked-by-preload-bug-2026-05-04/findings.md` documenting the live failure + 5-whys + fix architecture. (2) Contract `qwen3-moe-forward-gpu-v1` v1.2.0 → v1.3.0: * New v1.3.0 amendment_history block (~110 lines) describing the bug, root cause, and three-step fix architecture * New implementation_stage `M-GPU-MOE-1.3` between 1.2 and 2 with status PENDING * New falsification_test FALSIFY-QW3-MOE-GPU-PRELOAD-001 (hardware test + lib-only sibling) * Top-level version "1.2.0" → "1.3.0" * Status comment expanded to mention M-GPU-MOE-1.3 as a precondition for ACTIVE_ALGORITHM_LEVEL flip VALIDATION: pv validate contracts/qwen3-moe-forward-gpu-v1.yaml → 0 errors, 0 warnings. Contract is valid. WHAT THIS PR DOES NOT DO: Does NOT implement the fix. Per CLAUDE.md "NEVER write code before writing a provable contract", this PR pins the contract first. The fix lands in a separate PR (M-GPU-MOE-1.3 stage): ~30 LOC in weights.rs + 1-2 callers + ArchConstraints field + drift-prevention test. Does NOT block PR #1485's already-shipped 3-commit cascade (M52/M54). The cascade is correct; M-GPU-MOE-1.3 is a sibling bug-fix. Refs: M52, M53, M54, R10, qwen3-moe-forward-gpu-v1 v1.3.0, FALSIFY-QW3-MOE-GPU-PRELOAD-001 (new). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Followup to the previous M-GPU-MOE-1.3 commit. The parity_gate (Jidoka stop-the-line in `OwnedQuantizedModelCuda::with_max_seq_len`) also runs the dense forward paths (`forward_single_with_cache` CPU + `forward_gpu_resident` GPU) on construction. For MoE these dispatch to `fused_matmul_f32` against the `dense_ffn_placeholder` (byte_size=0), causing rayon-parallel panics in `matmul_fused.rs:211`. Fix: skip parity_gate when `arch.is_moe`, mirroring the rationale already in v1.3.0's amendment_history block. - The parity gate's purpose is "stop the line if GPU diverges from CPU" — for dense models, it's load-time safety. - For MoE, the equivalent gate is FALSIFY-QW3-MOE-GPU-PARITY-001 (qwen3_moe_gpu_parity.rs), which exercises the MoE-specific forward paths and bypasses the dense path the gate runs. - Net: MoE models lose load-time parity but gain test-time parity via the qwen3_moe_gpu_parity test. VERIFICATION ON LAMBDA-VECTOR RTX 4090: Test progresses much further now: BEFORE: panic at OwnedQuantizedModelCuda::new build_indexed_weights (FALSIFY-QW3-MOE-GPU-PRELOAD-001 falsifier) AFTER previous commit: panic at parity_gate matmul_fused.rs:211 (downstream bug — exposed but not yet fixed) AFTER this commit: CPU forward succeeds, GPU forward executes, then asserts at gpu_logits.iter().all(|v| v.is_finite()) because the GPU produces NaN/Inf logits. Test output: [GH-129] Early kernel preload: 49 modules compiled [PMAT-082] cuBLASLt FP8 JIT warmed (2048x16x2048) [PMAT-053] FP8 weight cache: 193 matrices cached (728.8 MB) FALSIFY-QW3-MOE-GPU-PARITY-001: running GPU forward... panicked at qwen3_moe_gpu_parity.rs:168: all GPU logits must be finite (no NaN/Inf) PARTIAL DISCHARGE: FALSIFY-QW3-MOE-GPU-PRELOAD-001 — wrapper construction succeeds. FALSIFY-QW3-MOE-GPU-INVARIANTS-001 — partial (output length OK implicitly; finiteness FAILS). FALSIFY-QW3-MOE-GPU-PARITY-001 — blocked by NaN/Inf bug. NEW DOWNSTREAM BUG: GPU forward (forward_qwen3_moe_cuda body, M-GPU-MOE-1.1.2 PR #1477) produces NaN/Inf for at least the canonical 3-token Qwen3-Coder prompt. This is the NEXT bug to investigate (M-GPU-MOE-1.5 follow-up). Likely candidates: - Q4K matmul accumulator overflow in expert_swiglu_cuda - Per-expert SwiGLU silu activation produces Inf for large inputs - Top-k router weight renormalization division by zero - missing per-head Q/K RMSNorm path for MoE (qk_norm tensors loaded but not applied) Bisection via `apr trace --json --payload` per the M32d Step 2 surface methodology (per qwen3-moe-forward-gpu-v1 v1.1.0 PARITY-001 if_fails). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…partial discharge) (#1491) * feat(aprender-serve): M-GPU-MOE-1.3 — preload_weights_gpu MoE-aware (partial discharge) Per qwen3-moe-forward-gpu-v1 v1.3.0 amendment (PR #1490). WHAT THIS PR FIXES: ArchConstraints + build_indexed_weights + ValidatedLayerWeights all made MoE-aware via new `is_moe: bool` field on ArchConstraints. (1) `crates/aprender-serve/src/gguf/config.rs` — adds `is_moe: bool` field to `ArchConstraints` struct. (2) `crates/aprender-serve/src/gguf/arch_constraints_fallback.rs` — sets `is_moe: false` on all 19 dense arch entries; sets `is_moe: true` on the qwen3_moe arm. Also adds the raw GGUF arch string `qwen3moe` (no underscore) and `qwen3_5moe` to the same arm — these reach `from_architecture` from `ValidatedModelConfig::from_apr` without going through `normalize_architecture`. (3) `crates/aprender-serve/src/cuda/executor/weights.rs` — `build_indexed_weights` gates the 3 FFN-related quant lookups (ffn_gate.weight, ffn_up.weight, ffn_down.weight) on `arch.is_moe`; uses (0u64, 0usize) sentinels for MoE. Same gating for the 3 qtype resolutions. (4) `crates/aprender-serve/src/cuda/types.rs` — `ValidatedLayerWeights::validate` skips the FfnGate/FfnUp/FfnDown role checks when `arch.is_moe`. The MoE forward path (`forward_qwen3_moe_cuda`) routes FFN through `moe_layers` parameter, never reading these from the indexed weights. WHAT THIS PR PARTIALLY DISCHARGES: FALSIFY-QW3-MOE-GPU-PRELOAD-001 (new in v1.3.0) — wrapper construction now succeeds for qwen3_moe GGUFs. Before this PR, `OwnedQuantizedModelCuda::new(model, 0)` panicked at: UnsupportedOperation { operation: "preload_weights_gpu", reason: "PAR-043: Failed to build indexed weights: Invalid launch config: Quantized weight 'blk.0.ffn_gate.weight' not cached" } After this PR, that specific path no longer fails. Verified by re-running M-GPU-MOE-1.2 heavy test — it now progresses past `OwnedQuantizedModelCuda::new`. NEW DOWNSTREAM BUG (not blocking this PR): After the wrapper construction fix, the heavy test now panics in CPU forward `matmul_fused.rs:211` with `index out of bounds: the len is 0 but the index is N`. This is a separate bug class: someone in the CPU forward path is dereferencing `layer.ffn_up_weight.data` (or similar) which is the `dense_ffn_placeholder` (byte_size=0) for MoE layers per `transformer.rs:348-353`. Root cause likely: the CPU `forward_qwen3_moe` does NOT touch the dense placeholders directly, but some preload/validation/init step does. Needs a follow-up PR (M-GPU-MOE-1.4) to either (a) skip dense-FFN-data access for MoE layers, or (b) replace the placeholder with proper sentinel. This PR DOES NOT regress the previous behaviour: the previous state was "wrapper construction fails", which masked the downstream bug. M-GPU-MOE-1.4 will surface and fix it. VERIFICATION: cargo check -p aprender-serve → 0 errors cargo check -p aprender-serve --features cuda → 0 errors cargo test -p aprender-serve --test qwen3_moe_gpu_parity \ --features cuda → 3 helpers pass Heavy test on lambda-vector RTX 4090: BEFORE this PR: panic at OwnedQuantizedModelCuda::new (preload_weights_gpu / build_indexed_weights) AFTER this PR: panic moved to CPU forward matmul_fused.rs:211 (downstream bug, separate PR scope) Net: progress one bug class. M-GPU-MOE-1.3 stage is FUNCTIONALLY DISCHARGED as defined; M-GPU-MOE-1.4 follow-up needed for full PARITY-001 discharge. NOTE ON PR STACKING: This PR depends on PR #1490 (contract v1.2.0 → v1.3.0 amendment + evidence file) being on aprender main first. The contract pinned the architectural decision; this PR implements it. Refs: M52, M53, M54, R10, qwen3-moe-forward-gpu-v1 v1.3.0, FALSIFY-QW3-MOE-GPU-PRELOAD-001 (partial discharge) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(aprender-serve): M-GPU-MOE-1.3 — also skip parity_gate for MoE Followup to the previous M-GPU-MOE-1.3 commit. The parity_gate (Jidoka stop-the-line in `OwnedQuantizedModelCuda::with_max_seq_len`) also runs the dense forward paths (`forward_single_with_cache` CPU + `forward_gpu_resident` GPU) on construction. For MoE these dispatch to `fused_matmul_f32` against the `dense_ffn_placeholder` (byte_size=0), causing rayon-parallel panics in `matmul_fused.rs:211`. Fix: skip parity_gate when `arch.is_moe`, mirroring the rationale already in v1.3.0's amendment_history block. - The parity gate's purpose is "stop the line if GPU diverges from CPU" — for dense models, it's load-time safety. - For MoE, the equivalent gate is FALSIFY-QW3-MOE-GPU-PARITY-001 (qwen3_moe_gpu_parity.rs), which exercises the MoE-specific forward paths and bypasses the dense path the gate runs. - Net: MoE models lose load-time parity but gain test-time parity via the qwen3_moe_gpu_parity test. VERIFICATION ON LAMBDA-VECTOR RTX 4090: Test progresses much further now: BEFORE: panic at OwnedQuantizedModelCuda::new build_indexed_weights (FALSIFY-QW3-MOE-GPU-PRELOAD-001 falsifier) AFTER previous commit: panic at parity_gate matmul_fused.rs:211 (downstream bug — exposed but not yet fixed) AFTER this commit: CPU forward succeeds, GPU forward executes, then asserts at gpu_logits.iter().all(|v| v.is_finite()) because the GPU produces NaN/Inf logits. Test output: [GH-129] Early kernel preload: 49 modules compiled [PMAT-082] cuBLASLt FP8 JIT warmed (2048x16x2048) [PMAT-053] FP8 weight cache: 193 matrices cached (728.8 MB) FALSIFY-QW3-MOE-GPU-PARITY-001: running GPU forward... panicked at qwen3_moe_gpu_parity.rs:168: all GPU logits must be finite (no NaN/Inf) PARTIAL DISCHARGE: FALSIFY-QW3-MOE-GPU-PRELOAD-001 — wrapper construction succeeds. FALSIFY-QW3-MOE-GPU-INVARIANTS-001 — partial (output length OK implicitly; finiteness FAILS). FALSIFY-QW3-MOE-GPU-PARITY-001 — blocked by NaN/Inf bug. NEW DOWNSTREAM BUG: GPU forward (forward_qwen3_moe_cuda body, M-GPU-MOE-1.1.2 PR #1477) produces NaN/Inf for at least the canonical 3-token Qwen3-Coder prompt. This is the NEXT bug to investigate (M-GPU-MOE-1.5 follow-up). Likely candidates: - Q4K matmul accumulator overflow in expert_swiglu_cuda - Per-expert SwiGLU silu activation produces Inf for large inputs - Top-k router weight renormalization division by zero - missing per-head Q/K RMSNorm path for MoE (qk_norm tensors loaded but not applied) Bisection via `apr trace --json --payload` per the M32d Step 2 surface methodology (per qwen3-moe-forward-gpu-v1 v1.1.0 PARITY-001 if_fails). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 4, 2026 15:40

noahgift force-pushed the feat/forward-qwen3-moe-cuda-integration-m-1-1-2 branch from 6d6603f to 275f246 Compare May 4, 2026 19:47

This was referenced May 4, 2026

docs(M51): M-GPU-MOE-1.0 → 1.1.1 cascade SHIPPED + 1.1.2 OPEN paiml/claude-code-parity-apr#39

Merged

test(aprender-serve): qwen3_moe_gpu_parity — M-GPU-MOE-1.2 cosine ≥0.99 falsifier #1484

Merged

noahgift force-pushed the feat/forward-qwen3-moe-cuda-integration-m-1-1-2 branch from 275f246 to 1f49eac Compare May 4, 2026 20:05

noahgift mentioned this pull request May 4, 2026

contract+feat+test: v1.2.0 wgpu cascade — option I + 2.0 stub + 2.3 parity test #1485

Merged

6 tasks

noahgift merged commit dc6f94d into main May 4, 2026
10 checks passed

noahgift deleted the feat/forward-qwen3-moe-cuda-integration-m-1-1-2 branch May 4, 2026 20:35

This was referenced May 6, 2026

contract(qwen3-moe-forward-gpu-v1): v1.6.0 → v1.7.0 — DRAFT → ACTIVE_ALGORITHM_LEVEL #1530

Merged

M-GPU-MOE-2.x — wgpu helpers + integration + parity test for qwen3-moe-forward-gpu-v1 #1582

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(aprender-serve): forward_qwen3_moe_cuda full integration — M-GPU-MOE-1.1.2#1477

feat(aprender-serve): forward_qwen3_moe_cuda full integration — M-GPU-MOE-1.1.2#1477
noahgift merged 1 commit into
mainfrom
feat/forward-qwen3-moe-cuda-integration-m-1-1-2

noahgift commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 4, 2026

Summary

Signature changes

Per qwen3-moe-forward-gpu-v1 v1.1.0 option D

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant