Context
M32d FUNCTIONALLY DISCHARGED 2026-05-02 via #1228 squash 5235aaeb9 (Step 5 + 5b + 6 + 7 fix bundle: per-head Q/K RMSNorm + rope_theta default 10K → 1M for qwen3_moe + chat template no-think + traced sync). Output transitioned on lambda-vector RTX 4090 against the cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M GGUF:
- pre-fix:
%%%%%%%% (gibberish, repeated argmax)
- post-fix:
2 + 2 = 4 + multi-domain coherent answers (math/geography/translation/code)
Remaining gate
The formal flip of qwen3-moe-forward-v1 v1.4.0 ACTIVE_ALGORITHM_LEVEL → ACTIVE_RUNTIME is gated on the cosine ≥ 0.99 vs HF FP16 measurement at LM-head logits (FALSIFY-QW3-MOE-PARITY-001, contract v1.3.0).
Operator-confirm dependency
scripts/generate_qwen3_moe_fp16_logits.py (#1129 squash 87a2a61c1) downloads Qwen/Qwen3-Coder-30B-A3B-Instruct (~60 GB) and dumps [batch, seq, vocab] logits to JSON. Multi-device offload via device_map="auto". ~30 min runtime on a 30B-A3B model.
Test ready
Acceptance
- HF FP16 fixture present on disk
cargo test -p aprender-serve --test qwen3_moe_parity -- --include-ignored f_qw3_moe_parity_001 PASS
- Cosine ≥ 0.99 of APR
forward_qwen3_moe LM-head logits vs HF FP16 reference
- Contract
qwen3-moe-forward-v1 v1.4.0 → v1.5.0 with status promoted to ACTIVE_RUNTIME
Cross-refs
- Companion-repo: paiml/claude-code-parity-apr § Sub-extension 1 + § R9 risk
- Academic basis: arXiv:2210.17323 (GPTQ — quantization-aware reference comparison framework)
Context
M32d FUNCTIONALLY DISCHARGED 2026-05-02 via #1228 squash
5235aaeb9(Step 5 + 5b + 6 + 7 fix bundle: per-head Q/K RMSNorm + rope_theta default 10K → 1M for qwen3_moe + chat template no-think + traced sync). Output transitioned on lambda-vector RTX 4090 against the cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M GGUF:%%%%%%%%(gibberish, repeated argmax)2 + 2 = 4+ multi-domain coherent answers (math/geography/translation/code)Remaining gate
The formal flip of
qwen3-moe-forward-v1v1.4.0 ACTIVE_ALGORITHM_LEVEL → ACTIVE_RUNTIME is gated on the cosine ≥ 0.99 vs HF FP16 measurement at LM-head logits (FALSIFY-QW3-MOE-PARITY-001, contract v1.3.0).Operator-confirm dependency
scripts/generate_qwen3_moe_fp16_logits.py(#1129 squash87a2a61c1) downloadsQwen/Qwen3-Coder-30B-A3B-Instruct(~60 GB) and dumps[batch, seq, vocab]logits to JSON. Multi-device offload viadevice_map="auto". ~30 min runtime on a 30B-A3B model.Test ready
crates/aprender-serve/tests/qwen3_moe_parity.rs::f_qw3_moe_parity_001(test(aprender-serve): M32d.2 — qwen3_moe_parity.rs F-QW3-MOE-PARITY-001 cosine gate #1130 squashce6ca4bb4) —#[ignore]-gated; runs with--include-ignoredonce the fixture lands.f_qw3_moe_argmax_parity_002for llama.cpp argmax sanity (test(aprender-serve): M32d.3 — qwen3_moe_argmax_parity.rs F-QW3-MOE-PARITY-002 llama.cpp argmax sanity #1131 squash9f93d02d9, FALSIFY-QW3-MOE-PARITY-002, independent of HF fixture).Acceptance
cargo test -p aprender-serve --test qwen3_moe_parity -- --include-ignored f_qw3_moe_parity_001PASSforward_qwen3_moeLM-head logits vs HF FP16 referenceqwen3-moe-forward-v1v1.4.0 → v1.5.0 with status promoted to ACTIVE_RUNTIMECross-refs