qwen3-moe-forward-v1 ACTIVE_RUNTIME flip — operator-confirm cosine ≥ 0.99 vs HF FP16 reference (~60 GB download)

## Context

M32d **FUNCTIONALLY DISCHARGED** 2026-05-02 via #1228 squash `5235aaeb9` (Step 5 + 5b + 6 + 7 fix bundle: per-head Q/K RMSNorm + rope_theta default 10K → 1M for qwen3_moe + chat template no-think + traced sync). Output transitioned on lambda-vector RTX 4090 against the cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M GGUF:
- pre-fix: `%%%%%%%%` (gibberish, repeated argmax)
- post-fix: `2 + 2 = 4` + multi-domain coherent answers (math/geography/translation/code)

## Remaining gate

The formal flip of `qwen3-moe-forward-v1` v1.4.0 **ACTIVE_ALGORITHM_LEVEL** → **ACTIVE_RUNTIME** is gated on the cosine ≥ 0.99 vs HF FP16 measurement at LM-head logits (**FALSIFY-QW3-MOE-PARITY-001**, contract v1.3.0).

## Operator-confirm dependency

`scripts/generate_qwen3_moe_fp16_logits.py` (#1129 squash `87a2a61c1`) downloads `Qwen/Qwen3-Coder-30B-A3B-Instruct` (~60 GB) and dumps `[batch, seq, vocab]` logits to JSON. Multi-device offload via `device_map="auto"`. ~30 min runtime on a 30B-A3B model.

## Test ready

- `crates/aprender-serve/tests/qwen3_moe_parity.rs::f_qw3_moe_parity_001` (#1130 squash `ce6ca4bb4`) — `#[ignore]`-gated; runs with `--include-ignored` once the fixture lands.
- Sibling `f_qw3_moe_argmax_parity_002` for llama.cpp argmax sanity (#1131 squash `9f93d02d9`, FALSIFY-QW3-MOE-PARITY-002, independent of HF fixture).

## Acceptance

- HF FP16 fixture present on disk
- `cargo test -p aprender-serve --test qwen3_moe_parity -- --include-ignored f_qw3_moe_parity_001` PASS
- Cosine ≥ 0.99 of APR `forward_qwen3_moe` LM-head logits vs HF FP16 reference
- Contract `qwen3-moe-forward-v1` v1.4.0 → v1.5.0 with status promoted to **ACTIVE_RUNTIME**

## Cross-refs

- Companion-repo: paiml/claude-code-parity-apr § Sub-extension 1 + § R9 risk
- Academic basis: arXiv:2210.17323 (GPTQ — quantization-aware reference comparison framework)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen3-moe-forward-v1 ACTIVE_RUNTIME flip — operator-confirm cosine ≥ 0.99 vs HF FP16 reference (~60 GB download) #1584

Context

Remaining gate

Operator-confirm dependency

Test ready

Acceptance

Cross-refs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

qwen3-moe-forward-v1 ACTIVE_RUNTIME flip — operator-confirm cosine ≥ 0.99 vs HF FP16 reference (~60 GB download) #1584

Description

Context

Remaining gate

Operator-confirm dependency

Test ready

Acceptance

Cross-refs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions