contract(qwen3-moe-forward-v1): v1.4.0 → v1.5.0 ACTIVE_RUNTIME — F-QW3-MOE-PARITY-001 DISCHARGED (cos 0.995384)#1597
Merged
Conversation
…3-MOE-PARITY-001 DISCHARGED Closes #1584. ## Discharge summary Live test result on lambda-vector RTX 4090 (2026-05-09): cargo test --release -p aprender-serve --test qwen3_moe_parity -- \ --include-ignored f_qw3_moe_parity_001_cosine_vs_hf_fp16 F-QW3-MOE-PARITY-001: elapsed = 555.52868ms cos_sim = 0.995384 threshold = 0.99 apr_argmax = 3555 (val = 22.3671) hf_argmax = 3555 (" What") test result: ok. 4 passed; 0 failed; 0 ignored cos_sim 0.995384 ≥ 0.99 by margin 0.0054. apr_argmax = hf_argmax = 3555 (exact agreement, no near-tie hypothesis needed). ## Five-whys for "60 GB HF download was stale claim" (1) v1.4.0 (2026-05-02) elevated R9 to "operator-confirm pending ~60 GB HF download". Issue #1584 (filed 2026-05-09 from companion repo) inherited the stale claim. (2) Companion M109 (2026-05-09) discovered FP16 weights had been on lambda-vector at /mnt/nvme-raid0/models/Qwen3-Coder-30B-A3B-Instruct/ (57 GB across 16 safetensors shards, Mar 8 timestamps) for ~62 days. (3) The "operator-confirm pending X" claim aged silently because no detector class checks "is X-pending claim still mechanically true?" Companion's 12-assert detector covers M-count / gate-count / contract-version / fixture-count / status-anchor consistency, but not filesystem-state-of-pending-claim. (4) Future M110-class kaizen on companion side: detector for mechanically-checkable "pending" claims. (5) Root cause: kaizen blind-spot for filesystem-state drift. M109 leaves M110 as a future detector extension. ## Edits - `contracts/qwen3-moe-forward-v1.yaml`: - metadata.version: 1.4.0 → 1.5.0 - top-level version: "1.4.0" → "1.5.0" - status: ACTIVE_ALGORITHM_LEVEL → ACTIVE_RUNTIME (with new comment block citing M109 discharge evidence) - amendment_history: prepended v1.5.0 entry with full discharge evidence + reproducibility recipe + cross-repo refs - implementation_stages.M32d: PENDING → DISCHARGED with discharge evidence in description - `crates/aprender-serve/tests/fixtures/qwen3_moe_fp16_logits_pos0.json`: NEW (2.06 MiB) — 151936-dim FP32 logit vector for prompt "What is 2+2?" at position 6 (end-of-prompt). Generated 2026-05-09 in 52s wall via uv run --with torch --with transformers --with accelerate scripts/generate_qwen3_moe_fp16_logits.py --model /mnt/nvme-raid0/models/Qwen3-Coder-30B-A3B-Instruct --output crates/aprender-serve/tests/fixtures/... Committed verbatim per script docstring "this fixture is captured once and committed" — makes the discharge reproducible. ## F-QW3-MOE-PARITY-002 sibling status Deferred — CPU-only `llama-cli` on Qwen3-Coder-30B-A3B-Instruct hung at 99.9% single-CPU for 2 hrs without producing output even with `-ngl 999` GPU-offload flag (suspected MoE expert dispatch is CPU-bound in llama.cpp build #7746 — upstream issue, not aprender-side). Not load-bearing for ACTIVE_RUNTIME flip: axis (a) directly proved apr_argmax = hf_argmax = 3555. Test stays #[ignore]-gated for regression coverage when llama.cpp's MoE-on-CPU performance improves. ## Cross-repo refs - Companion-repo M109 milestone: paiml/claude-code-parity-apr#95 squash 9c2833334 (2026-05-09T15:02:33Z) - Companion-repo M108 ticketing: paiml/claude-code-parity-apr#94 (filed #1584 with stale "60 GB pending" claim — corrected by M109 same day) ## Verification - pv validate contracts/qwen3-moe-forward-v1.yaml: 0 errors / 0 warnings - F-QW3-MOE-PARITY-001 cosine test: PASS at cos 0.995384 - HF FP16 fixture: 2.06 MiB committed at canonical path - apr forward: 555ms on 7-token prefill (no regression vs v1.4.0) Refs PMAT-CODE-QWEN3-MOE-PARITY-FLIP-001. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 10, 2026
… main + M109 discharge integrated (#1613) ## Summary Authors `contracts/claude-code-parity-apr-v1.yaml` v1.24.0 directly on aprender main as the FIRST canonical landing of this contract. Replaces the closed PR #1078 (M0 mirror, closed 2026-05-10 due to a workspace-test failure on the rebased branch unrelated to contract content). v1.24.0 amendments to the v1.23.0 baseline: 1. Status-prose at line 67: "Cosine vs HF FP16 ... operator-confirm pending ~60 GB HF download" → "DISCHARGED 2026-05-09 at companion-repo M109 (cos_sim 0.995384, lambda-vector RTX 4090)". 2. "What is NOT in this discharge" list item at line 808: cosine measurement now DISCHARGED, with cross-references to aprender PR #1597 squash 3fb04ef (v1.4.0 → v1.5.0 ACTIVE_RUNTIME flip). 3. Inline narrative at line 888: "~60 GB HF download" claim annotated as stale by 62 days; FP16 weights had been on lambda-vector at /mnt/nvme-raid0/models/ for ~7 days. 4. New v1.23.0 → v1.24.0 status_history entry recording the discharge evidence. ## Why this lived as "PR-pinned canonical" until v1.24.0 The v1.23.0 contract was authored on aprender PR #1078 (M0 mirror PR, never merged to main). Companion-repo M130 identified that the contract did NOT exist on aprender main — only on PR #1078's feature branch. PR #1078 closed 2026-05-10 (companion-repo M131) due to a workspace-test failure unrelated to contract content (`agent::auto_memory::tests::root_uses_config_dir_when_env_unset` — pre-existing aprender-side flake on the rebased state). v1.24.0 is authored fresh from aprender main, removing the "PR-pinned canonical" anomaly. ## Companion-repo follow-up After this PR merges, the companion repo will refresh `contracts/pin.lock` with the squash commit hash + content sha256 and execute the M22 5-step ritual (4 cross-reference surface bumps + new M-row). ## Verification - `pv validate contracts/claude-code-parity-apr-v1.yaml` → 0 errors, 0 warnings - Contract is byte-identical to companion's v1.23.0 except for the v1.24.0 amendments listed above No falsification gates added or modified. 13/13 gates remain green; 30/30 fixtures remain at aggregate parity 1.0000. Refs PMAT-037, paiml/claude-code-parity-apr#117, #1597 (M109 discharge). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Promotes
qwen3-moe-forward-v1v1.4.0 ACTIVE_ALGORITHM_LEVEL → v1.5.0 ACTIVE_RUNTIME based on the live discharge of FALSIFY-QW3-MOE-FORWARD-004 axis (a) measured 2026-05-09 on lambda-vector RTX 4090.Closes #1584.
Live discharge evidence
What lands
contracts/qwen3-moe-forward-v1.yamlamendment_historyentry; M32d implementation_stage PENDING → DISCHARGEDcrates/aprender-serve/tests/fixtures/qwen3_moe_fp16_logits_pos0.jsonFive-whys for "60 GB HF download was a stale claim"
(Full text in commit body)
The "operator-confirm pending ~60 GB HF download" claim in v1.4.0 / R9 / #1584 was stale by ~62 days — the FP16 weights had been on lambda-vector at
/mnt/nvme-raid0/models/Qwen3-Coder-30B-A3B-Instruct/since well before v1.4.0 was authored. Companion-repo M109 (paiml/claude-code-parity-apr#95) discovered this simply viafind /mnt -name "*Qwen3-Coder-30B*"at the start of the discharge session.Future M110-class kaizen opportunity in companion repo: detector for mechanically-checkable "pending" claims (the existing 12-assert detector doesn't query filesystem state).
F-QW3-MOE-PARITY-002 sibling status
Deferred — CPU-only
llama-clion Qwen3-Coder-30B-A3B-Instruct hung at 99.9% single-CPU for 2 hours without producing output even with-ngl 999GPU-offload flag (suspected MoE expert dispatch is CPU-bound in llama.cpp build #7746 — upstream issue). Not load-bearing for ACTIVE_RUNTIME flip because axis (a) directly provedapr_argmax = hf_argmax. Test stays#[ignore]-gated for regression coverage.Cross-repo refs
9c2833334(2026-05-09T15:02:33Z)Test plan
pv validate contracts/qwen3-moe-forward-v1.yaml— 0 errors / 0 warnings🤖 Generated with Claude Code