contract(qwen3-moe-forward-v1): v1.3.0 → v1.4.0 — M32d FUNCTIONAL DISCHARGE — DRAFT → ACTIVE_ALGORITHM_LEVEL#1409
Merged
Conversation
…CHARGE Status flips DRAFT → ACTIVE_ALGORITHM_LEVEL. M32d numerical parity is functionally discharged on aprender main as of PR #1228 squash 5235aae (2026-05-02 13:42 UTC). Output transition on lambda-vector RTX 4090 against the cached 17.3 GB Qwen3-Coder-30B-A3B- Instruct-Q4_K_M.gguf: pre-fix "%%%%%%%%" (gibberish, repeated argmax) + Step 5 "Human: What is 2+" (coherent English, partial) + Step 5b "Human: What is 2+2?" (full prompt reproduced) + Step 6 "2 + 2 = 4" (correct answer) Multi-domain dogfood (math/geography/translation/code) all correct. Why ACTIVE_ALGORITHM_LEVEL not ACTIVE_RUNTIME ============================================== Per the v1.3.0 (M32d.0) parity-strategy decision, full ACTIVE_RUNTIME discharge requires: 1. F-QW3-MOE-PARITY-001: cosine ≥ 0.99 vs HF FP16 reference logits 2. F-QW3-MOE-PARITY-002: argmax matches llama.cpp top-1 #1 requires running scripts/generate_qwen3_moe_fp16_logits.py which is operator-confirm pending (~60 GB HF download + ~30 min on 30B-A3B multi-device offload). ACTIVE_ALGORITHM_LEVEL is the right intermediate state: forward path is functionally correct (verified by output quality across diverse prompts), but the formal cosine-vs-HF gate hasn't fired yet. Component priors verified empirically (M34 FAST PATH plan) ========================================================== rank-3 Q/K norm (15%) FIXED #1228 Step 5 rank-4 RoPE θ (10%) FIXED #1228 Step 5b outside-priors FIXED #1228 Step 6 (chat template wrapping) The diagnostic surface from PRs #1222 (Step 2) + #1226 (Step 2.5) + #1401 (Step 2 JSON wire) named rank-3 directly via the 40× std-growth signature without needing the HF FP16 fixture. Step 1 of the original plan was bypassed. M34 FAST PATH cost ================== Outcome PRs Wall-clock ACTUAL 5 ~6 hours Lucky estimate 4-6 2-3 days Realistic 8-10 4-6 days Pessimistic 12-15 1-2 weeks Came in at the lucky-case bound. Refs aprender PR #1228 commit 5235aae Refs companion `paiml/claude-code-parity-apr` M35 status_history Refs `project_m32d_discharge_2026_05_02.md` (memory) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
M32d numerical parity is functionally discharged on aprender main as of PR #1228 squash 5235aae (2026-05-02 13:42 UTC). Status flips
DRAFT→ACTIVE_ALGORITHM_LEVEL.Output transition
Multi-domain dogfood (math/geography/translation/code) all correct on lambda-vector RTX 4090 against the cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf.
Why ACTIVE_ALGORITHM_LEVEL not full ACTIVE_RUNTIME
Per v1.3.0's parity strategy, ACTIVE_RUNTIME requires:
#1 requires running
scripts/generate_qwen3_moe_fp16_logits.py— operator-confirm pending (~60 GB HF download + ~30 min on 30B-A3B). ACTIVE_ALGORITHM_LEVEL is the right intermediate state: forward path is functionally correct, but the formal cosine-vs-HF gate hasn't fired yet.Component priors verified
M34 FAST PATH cost
5 PRs / ~6 hours — lucky-case bound of 4-6 PRs / 2-3 days estimate.
Test plan
pv validate contracts/qwen3-moe-forward-v1.yaml— cleanRefs
paiml/claude-code-parity-aprM35 status_history (already merged)project_m32d_discharge_2026_05_02.md🤖 Generated with Claude Code