test(aprender-serve): M32d.2 — qwen3_moe_parity.rs F-QW3-MOE-PARITY-001 cosine gate by noahgift · Pull Request #1130 · paiml/aprender

noahgift · 2026-04-29T10:52:54Z

Summary

Per qwen3-moe-forward-v1 v1.3.0 staged plan, M32d.2 authors the cosine-similarity parity test that consumes the JSON fixture from M32d.1 (#1129) and exercises `OwnedQuantizedModel::forward_qwen3_moe` end-to-end on the canonical 17.3 GB Qwen3-Coder-30B-A3B-Instruct GGUF.

This is the primary axis of `FALSIFY-QW3-MOE-FORWARD-004`: `cosine_similarity(apr_logits, hf_fp16_logits) > 0.99` per `AC_QW3_MOE_005`. Axis (b) (argmax vs llama.cpp top-1) is M32d.3 (next slice).

Tests

`f_qw3_moe_parity_001_cosine_vs_hf_fp16` (`#[ignore]`, heavy):

Skips with `eprintln!` if no cached Qwen3-Coder GGUF or no FP16 fixture present (both are operator-confirm-gated; fixture alone is multi-GB).
Loads fixture via `serde_json`: `model_name`, `prompt`, `tokens`, `vocab_size`, `logits[151936]`, `argmax_token`.
Asserts `vocab_size == 151936` and `logits.len() == 151936` to catch fixture drift.
Loads GGUF via `MappedGGUFModel` + `OwnedQuantizedModel::from_mapped`, loads all 48 MoE layer descriptors, runs ONE forward pass on the fixture's prompt tokens.
Computes `cosine_similarity(apr_logits, hf_fp16_logits)`.
Asserts `cos_sim > 0.99` per `AC_QW3_MOE_005`.
Reports per `FALSIFY-QW3-MOE-FORWARD-004` `if_fails` diagnostic order on miss.

Three sibling unit tests run in default CI (not `#[ignore]`):

`fixture_loader_handles_missing_path`: `load_fixture` returns `None` on absent path (no panic).
`cosine_similarity_unit_vectors`: parallel / orthogonal / anti-parallel unit-vector cases.
`cosine_similarity_handles_zero_vector`: zero-vector edge case returns 0.0 (no NaN from divide-by-zero).

Live verification

```
$ cargo test -p aprender-serve --test qwen3_moe_parity
running 4 tests
test f_qw3_moe_parity_001_cosine_vs_hf_fp16 ... ignored
test cosine_similarity_handles_zero_vector ... ok
test cosine_similarity_unit_vectors ... ok
test fixture_loader_handles_missing_path ... ok

test result: ok. 3 passed; 0 failed; 1 ignored
```

Why this is small

This PR is tight: 1 new test file (~250 LOC), no behavior change to any binary, no contract-rev (M32d.0 already shipped at v1.3.0). M32d.3 (`llama-cli` argmax sanity) and M32d.4 (DRAFT → ACTIVE_RUNTIME bump) follow.

Test plan

`cargo test -p aprender-serve --test qwen3_moe_parity` — 3 sibling tests pass, heavy test ignored
`cargo fmt -p aprender-serve` — formatted
Pre-commit quality gates passed
Operator runs `cargo test ... -- --ignored` on lambda-vector after M32d.1 fixture is generated (deferred to M32d.4 DRAFT→ACTIVE flip)

🤖 Generated with Claude Code

…01 cosine gate Per qwen3-moe-forward-v1 v1.3.0 staged plan: M32d.2 authors the cosine- similarity parity test that consumes the JSON fixture from M32d.1 (PR #1129) and exercises OwnedQuantizedModel::forward_qwen3_moe end-to- end on the canonical 17.3 GB Qwen3-Coder-30B-A3B-Instruct GGUF. Test: f_qw3_moe_parity_001_cosine_vs_hf_fp16 (#[ignore]) - Skips with eprintln if no cached Qwen3-Coder GGUF or no FP16 fixture file present (operator-confirm-gated; FP16 fixture is multi-GB). - Loads fixture (model_name, prompt, tokens, vocab_size, logits[151936], argmax_token) via serde_json. - Asserts vocab_size == 151936 and logits.len() == 151936 to catch fixture drift. - Loads GGUF via MappedGGUFModel + OwnedQuantizedModel::from_mapped, loads all 48 MoE layer descriptors, runs ONE forward pass on the fixture's prompt tokens. - Computes cosine_similarity(apr_logits, hf_fp16_logits). - Asserts cos_sim > 0.99 per AC_QW3_MOE_005. - Reports per FALSIFY-QW3-MOE-FORWARD-004 if_fails diagnostic order. Three sibling unit tests run in default CI (not #[ignore]): - fixture_loader_handles_missing_path: load_fixture returns None on absent path (no panic). - cosine_similarity_unit_vectors: parallel/orthogonal/anti-parallel unit-vector cases. - cosine_similarity_handles_zero_vector: zero-vector edge case returns 0.0 (no NaN from divide-by-zero). Live results from cargo test -p aprender-serve --test qwen3_moe_parity: test result: ok. 3 passed; 0 failed; 1 ignored This is a tight one-PR slice: 1 new test file (~230 LOC), no behavior change to any binary, no contract-rev (M32d.0 already shipped at v1.3.0). M32d.3 (llama-cli argmax sanity) and M32d.4 (DRAFT → ACTIVE_RUNTIME bump) follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

M33 audit-trail bump on companion side. Records: * #1127 (M32c.2.2.2.1.4) live regression test on aprender main * #1128 #1129 #1130 #1131 (M32d.0/.1/.2/.3) parity scaffolding No code change beyond this contract mirror. M22 4-step ritual: mirror push (this commit) → companion pin.lock refresh → companion spec PR. Contract sha256 f4ea18b1acaea56ef8ef40fc857e5057e06e0627232be5b248dad6389b68e846 byte-identical with companion side. Refs: claude-code-parity-apr-v1 § companion_repo.contract_pin

… — closes sweep Algorithm-level PARTIAL discharge for FALSIFY-QW3-MOE-FORWARD-001 + 002 + 003 + 004 per `contracts/qwen3-moe-forward-v1.yaml`. Closes 4/4 sweep on the M32d MoE forward parity contract. ## ✅ Closes 4/4 qwen3-moe-forward sweep **Thirteen contract families now fully algorithm-bound at PARTIAL:** - All 11 prior families (dataset/tokenizer/apr-cli-* + apr-vs-gguf-forward-parity-v1) - `qwen3-moe-forward-v1` (4/4) ← this PR ## What this binds (M32 milestone state machine) The four gates encode a milestone state machine for the Qwen3-MoE forward path: - **001 (M32a-precursor)**: regression sentinel pinning the "dense-FFN tensor lookup is reached" pre-M32b error string. Pass at this level proves the bug exists; flips polarity once M32b lands. - **002 (M32b)**: arch-aware load wired but forward not yet implemented; expects `RealizarError::UnsupportedOperation` with `moe_forward_pass`. - **003 (M32c)**: CPU forward wired; `apr run` exits 0 and emits at least one non-whitespace byte (correctness not yet asserted). - **004 (M32d)**: numerical parity vs HuggingFace FP16 reference; cosine similarity > 0.99 strict. ## Verdict shapes - 001: substring contains (regression-sentinel). - 002: substring conjunction (NOT dense-FFN AND HAS unsupported). - 003: conjunctive (exit 0 AND non-whitespace stdout). - 004: bounded-threshold (finite + in [-1, 1] + > 0.99 strict). ## Five-Whys 1. Why bind these now? — Closes 4/4 sweep on a milestone-tracking contract; pins the M32d acceptance criterion at algorithm level. 2. Why one module? — Bundle precedent. 3. Why distinct verdicts per gate? — Each represents a distinct milestone state; substring/conjunctive/threshold shapes match. 4. Why strict `> 0.99` for cosine? — Contract-literal `> 0.99`. 5. Why 19 tests across 4 verdict sections? — Mutation-survey coverage per gate. ## Cross-reference Per memory `2026-04-28 session distillation track complete`: M32d.0-M32d.3 already shipped (PRs #1129/#1130/#1131); M32d.4 fixture-gen + actual cosine measurement remain. This verdict gives the M32d.4 work an algorithm-level acceptance criterion. ## Tests 19 unit tests, all green.

noahgift enabled auto-merge (squash) April 29, 2026 10:52

noahgift mentioned this pull request Apr 29, 2026

test(aprender-serve): M32d.3 — qwen3_moe_argmax_parity.rs F-QW3-MOE-PARITY-002 llama.cpp argmax sanity #1131

Merged

4 tasks

noahgift merged commit ce6ca4b into main Apr 29, 2026
11 checks passed

noahgift deleted the feat/m32d-2-qwen3-moe-parity-test branch April 29, 2026 11:17

noahgift mentioned this pull request May 9, 2026

qwen3-moe-forward-v1 ACTIVE_RUNTIME flip — operator-confirm cosine ≥ 0.99 vs HF FP16 reference (~60 GB download) #1584

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(aprender-serve): M32d.2 — qwen3_moe_parity.rs F-QW3-MOE-PARITY-001 cosine gate#1130

test(aprender-serve): M32d.2 — qwen3_moe_parity.rs F-QW3-MOE-PARITY-001 cosine gate#1130
noahgift merged 1 commit into
mainfrom
feat/m32d-2-qwen3-moe-parity-test

noahgift commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 29, 2026

Summary

Tests

Live verification

Why this is small

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant