Skip to content

qwen3-moe-forward-v1 ACTIVE_RUNTIME flip — operator-confirm cosine ≥ 0.99 vs HF FP16 reference (~60 GB download) #1584

@noahgift

Description

@noahgift

Context

M32d FUNCTIONALLY DISCHARGED 2026-05-02 via #1228 squash 5235aaeb9 (Step 5 + 5b + 6 + 7 fix bundle: per-head Q/K RMSNorm + rope_theta default 10K → 1M for qwen3_moe + chat template no-think + traced sync). Output transitioned on lambda-vector RTX 4090 against the cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M GGUF:

  • pre-fix: %%%%%%%% (gibberish, repeated argmax)
  • post-fix: 2 + 2 = 4 + multi-domain coherent answers (math/geography/translation/code)

Remaining gate

The formal flip of qwen3-moe-forward-v1 v1.4.0 ACTIVE_ALGORITHM_LEVELACTIVE_RUNTIME is gated on the cosine ≥ 0.99 vs HF FP16 measurement at LM-head logits (FALSIFY-QW3-MOE-PARITY-001, contract v1.3.0).

Operator-confirm dependency

scripts/generate_qwen3_moe_fp16_logits.py (#1129 squash 87a2a61c1) downloads Qwen/Qwen3-Coder-30B-A3B-Instruct (~60 GB) and dumps [batch, seq, vocab] logits to JSON. Multi-device offload via device_map="auto". ~30 min runtime on a 30B-A3B model.

Test ready

Acceptance

  • HF FP16 fixture present on disk
  • cargo test -p aprender-serve --test qwen3_moe_parity -- --include-ignored f_qw3_moe_parity_001 PASS
  • Cosine ≥ 0.99 of APR forward_qwen3_moe LM-head logits vs HF FP16 reference
  • Contract qwen3-moe-forward-v1 v1.4.0 → v1.5.0 with status promoted to ACTIVE_RUNTIME

Cross-refs

  • Companion-repo: paiml/claude-code-parity-apr § Sub-extension 1 + § R9 risk
  • Academic basis: arXiv:2210.17323 (GPTQ — quantization-aware reference comparison framework)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions