test(aprender-serve): M32d.2 — qwen3_moe_parity.rs F-QW3-MOE-PARITY-001 cosine gate#1130
Merged
Merged
Conversation
…01 cosine gate Per qwen3-moe-forward-v1 v1.3.0 staged plan: M32d.2 authors the cosine- similarity parity test that consumes the JSON fixture from M32d.1 (PR #1129) and exercises OwnedQuantizedModel::forward_qwen3_moe end-to- end on the canonical 17.3 GB Qwen3-Coder-30B-A3B-Instruct GGUF. Test: f_qw3_moe_parity_001_cosine_vs_hf_fp16 (#[ignore]) - Skips with eprintln if no cached Qwen3-Coder GGUF or no FP16 fixture file present (operator-confirm-gated; FP16 fixture is multi-GB). - Loads fixture (model_name, prompt, tokens, vocab_size, logits[151936], argmax_token) via serde_json. - Asserts vocab_size == 151936 and logits.len() == 151936 to catch fixture drift. - Loads GGUF via MappedGGUFModel + OwnedQuantizedModel::from_mapped, loads all 48 MoE layer descriptors, runs ONE forward pass on the fixture's prompt tokens. - Computes cosine_similarity(apr_logits, hf_fp16_logits). - Asserts cos_sim > 0.99 per AC_QW3_MOE_005. - Reports per FALSIFY-QW3-MOE-FORWARD-004 if_fails diagnostic order. Three sibling unit tests run in default CI (not #[ignore]): - fixture_loader_handles_missing_path: load_fixture returns None on absent path (no panic). - cosine_similarity_unit_vectors: parallel/orthogonal/anti-parallel unit-vector cases. - cosine_similarity_handles_zero_vector: zero-vector edge case returns 0.0 (no NaN from divide-by-zero). Live results from cargo test -p aprender-serve --test qwen3_moe_parity: test result: ok. 3 passed; 0 failed; 1 ignored This is a tight one-PR slice: 1 new test file (~230 LOC), no behavior change to any binary, no contract-rev (M32d.0 already shipped at v1.3.0). M32d.3 (llama-cli argmax sanity) and M32d.4 (DRAFT → ACTIVE_RUNTIME bump) follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
noahgift
added a commit
that referenced
this pull request
May 1, 2026
M33 audit-trail bump on companion side. Records: * #1127 (M32c.2.2.2.1.4) live regression test on aprender main * #1128 #1129 #1130 #1131 (M32d.0/.1/.2/.3) parity scaffolding No code change beyond this contract mirror. M22 4-step ritual: mirror push (this commit) → companion pin.lock refresh → companion spec PR. Contract sha256 f4ea18b1acaea56ef8ef40fc857e5057e06e0627232be5b248dad6389b68e846 byte-identical with companion side. Refs: claude-code-parity-apr-v1 § companion_repo.contract_pin
noahgift
added a commit
that referenced
this pull request
May 11, 2026
… — closes sweep Algorithm-level PARTIAL discharge for FALSIFY-QW3-MOE-FORWARD-001 + 002 + 003 + 004 per `contracts/qwen3-moe-forward-v1.yaml`. Closes 4/4 sweep on the M32d MoE forward parity contract. ## ✅ Closes 4/4 qwen3-moe-forward sweep **Thirteen contract families now fully algorithm-bound at PARTIAL:** - All 11 prior families (dataset/tokenizer/apr-cli-* + apr-vs-gguf-forward-parity-v1) - `qwen3-moe-forward-v1` (4/4) ← this PR ## What this binds (M32 milestone state machine) The four gates encode a milestone state machine for the Qwen3-MoE forward path: - **001 (M32a-precursor)**: regression sentinel pinning the "dense-FFN tensor lookup is reached" pre-M32b error string. Pass at this level proves the bug exists; flips polarity once M32b lands. - **002 (M32b)**: arch-aware load wired but forward not yet implemented; expects `RealizarError::UnsupportedOperation` with `moe_forward_pass`. - **003 (M32c)**: CPU forward wired; `apr run` exits 0 and emits at least one non-whitespace byte (correctness not yet asserted). - **004 (M32d)**: numerical parity vs HuggingFace FP16 reference; cosine similarity > 0.99 strict. ## Verdict shapes - 001: substring contains (regression-sentinel). - 002: substring conjunction (NOT dense-FFN AND HAS unsupported). - 003: conjunctive (exit 0 AND non-whitespace stdout). - 004: bounded-threshold (finite + in [-1, 1] + > 0.99 strict). ## Five-Whys 1. Why bind these now? — Closes 4/4 sweep on a milestone-tracking contract; pins the M32d acceptance criterion at algorithm level. 2. Why one module? — Bundle precedent. 3. Why distinct verdicts per gate? — Each represents a distinct milestone state; substring/conjunctive/threshold shapes match. 4. Why strict `> 0.99` for cosine? — Contract-literal `> 0.99`. 5. Why 19 tests across 4 verdict sections? — Mutation-survey coverage per gate. ## Cross-reference Per memory `2026-04-28 session distillation track complete`: M32d.0-M32d.3 already shipped (PRs #1129/#1130/#1131); M32d.4 fixture-gen + actual cosine measurement remain. This verdict gives the M32d.4 work an algorithm-level acceptance criterion. ## Tests 19 unit tests, all green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Per
qwen3-moe-forward-v1v1.3.0 staged plan, M32d.2 authors the cosine-similarity parity test that consumes the JSON fixture from M32d.1 (#1129) and exercises `OwnedQuantizedModel::forward_qwen3_moe` end-to-end on the canonical 17.3 GB Qwen3-Coder-30B-A3B-Instruct GGUF.This is the primary axis of `FALSIFY-QW3-MOE-FORWARD-004`: `cosine_similarity(apr_logits, hf_fp16_logits) > 0.99` per `AC_QW3_MOE_005`. Axis (b) (argmax vs llama.cpp top-1) is M32d.3 (next slice).
Tests
`f_qw3_moe_parity_001_cosine_vs_hf_fp16` (`#[ignore]`, heavy):
Three sibling unit tests run in default CI (not `#[ignore]`):
Live verification
```
$ cargo test -p aprender-serve --test qwen3_moe_parity
running 4 tests
test f_qw3_moe_parity_001_cosine_vs_hf_fp16 ... ignored
test cosine_similarity_handles_zero_vector ... ok
test cosine_similarity_unit_vectors ... ok
test fixture_loader_handles_missing_path ... ok
test result: ok. 3 passed; 0 failed; 1 ignored
```
Why this is small
This PR is tight: 1 new test file (~250 LOC), no behavior change to any binary, no contract-rev (M32d.0 already shipped at v1.3.0). M32d.3 (`llama-cli` argmax sanity) and M32d.4 (DRAFT → ACTIVE_RUNTIME bump) follow.
Test plan
🤖 Generated with Claude Code