Skip to content

feat(apr-cli): apr trace --json --payload — wire JSON output for FAST PATH Step 2 exit criterion#1401

Merged
noahgift merged 2 commits into
mainfrom
feat/apr-trace-json-payload-output
May 2, 2026
Merged

feat(apr-cli): apr trace --json --payload — wire JSON output for FAST PATH Step 2 exit criterion#1401
noahgift merged 2 commits into
mainfrom
feat/apr-trace-json-payload-output

Conversation

@noahgift

@noahgift noahgift commented May 2, 2026

Copy link
Copy Markdown
Contributor

TL;DR

apr trace --json --payload <gguf> was silently ignoring --json when --payload was set. The FAST PATH Step 2 exit criterion in paiml/claude-code-parity-apr explicitly calls for JSON output. Now mechanically satisfied.

Schema

{
  "format": "GGUF (qwen3moe)",
  "architecture": "qwen3moe",
  "num_layers": 48, "hidden_dim": 2048, "vocab_size": 151936,
  "prompt": "What is 2+2?",
  "encoded_tokens": [...],
  "embedding": { "min": ..., "max": ..., "mean": ..., "std_dev": ..., "count": 2048 },
  "layers": [
    { "layer_idx": 0,
      "attn_norm": {...}, "qkv": {...}, "attn_out": {...},
      "ffn_norm": {...}, "ffn_out": {...}, "output": {...} },
    /* 47 more */
  ],
  "final_norm": {...},
  "logits": { "vocab_size": 151936, "l2_norm": 1025.75, "top_k": [...] }
}

Live verification

$ apr trace --json --payload <17.3 GB Qwen3-Coder GGUF> 2>/dev/null | jq '
  {arch: .architecture, layers: (.layers | length),
   all_finite: ([.layers[].output.std_dev] | all(isfinite))}'
{
  "arch": "qwen3moe",
  "layers": 48,
  "all_finite": true
}

Implementation notes

  • New handle_special_modes_with_json — old handle_special_modes preserved as thin wrapper (existing test callers unbroken).
  • New run_traced_inference_json — mirrors text-mode trace but emits JSON via serde_json::to_string_pretty.
  • Skips the Model: ... / Contract: ... preamble that would break | jq consumers.

Hot-path safety

  • apr trace --payload (no --json): text mode UNCHANGED
  • apr trace --json (no --payload): static-layer JSON UNCHANGED
  • Only branches when both --json && --payload set
  • All 5 callers of handle_special_modes continue to work

What this PR does NOT ship

  • Custom prompt flag
  • Per-token-position trace (LAST token only, per existing semantics)
  • Sub-FFN MoE breakdown
  • SafeTensors JSON path (same encoding, different format string)

Test plan

  • cargo check -p apr-cli --features inference — clean
  • cargo clippy -p apr-cli --features inference --lib -- -D warnings — clean
  • cargo fmt -p apr-cli --check — clean
  • Live apr trace --json --payload <Qwen3-Coder GGUF> produces valid JSON, parses with python json.load + jq
  • All 48 transformer_block entries have finite L2 norms (FAST PATH Step 2 exit criterion)

Refs

🤖 Generated with Claude Code

… PATH Step 2 exit criterion

`apr trace --json --payload <gguf>` was silently ignoring `--json` when
`--payload` was set, falling back to the human-readable text format. The
FAST PATH Step 2 exit criterion in
paiml/claude-code-parity-apr docs/specifications/claude-code-parity-apr-poc.md
explicitly says:

  "apr trace --json --payload <gguf> --prompt 'What is 2+2?' returns
   non-null output_stats for every transformer_block_N entry, with
   finite L2 norms."

Now mechanically satisfied. Schema:

```jsonc
{
  "format": "GGUF (qwen3moe)",
  "architecture": "qwen3moe",
  "num_layers": 48, "hidden_dim": 2048, "vocab_size": 151936,
  "num_heads": 32, "num_kv_heads": 4,
  "prompt": "What is 2+2?",
  "encoded_tokens": [3838, 374, 220, 17, 10, 17, 30],
  "embedding": { "min": ..., "max": ..., "mean": ..., "std_dev": ...,
                 "nan_count": 0, "inf_count": 0, "zero_count": 0,
                 "count": 2048 },
  "layers": [
    { "layer_idx": 0,
      "attn_norm": {...stats...}, "qkv": {...stats...}, "attn_out": {...},
      "ffn_norm": {...}, "ffn_out": {...}, "output": {...} },
    /* 47 more, one per decoder layer */
  ],
  "final_norm": {...stats...},
  "logits_stats": {...stats...},
  "logits": {
    "vocab_size": 151936,
    "l2_norm": 1025.7530,
    "top_k": [{"token_id": 3555, "logit": 16.96}, ...]
  }
}
```

Implementation
==============

  - `handle_special_modes_with_json` (new) — JSON-aware variant of
    `handle_special_modes`. Old function preserved as a thin wrapper
    so existing test callers don't break.
  - `run_traced_inference_json` (new) — JSON output path. Mirrors
    `run_traced_inference_gguf` for trace computation but emits one
    JSON object via serde_json::to_string_pretty.
  - Skips the human-readable "Model: ..." / "Contract: ..." preamble
    that `resolve_model_path` + `preflight_contract_check` would print
    — those would break `apr trace --json --payload | jq` consumers.

Live verification on lambda-vector RTX 4090
============================================

  $ apr trace --json --payload ~/.cache/pacha/models/2b88b180a790988f.gguf \
      2>/dev/null | python3 -c '
  import sys, json
  d = json.load(sys.stdin)
  print(f"arch: {d[\"architecture\"]}")
  print(f"num_layers: {d[\"num_layers\"]}")
  print(f"layers in payload: {len(d[\"layers\"])}")
  all_finite = all(
      isinstance(la[\"output\"][\"std_dev\"], (int, float))
      and abs(la[\"output\"][\"std_dev\"]) < float(\"inf\")
      for la in d[\"layers\"]
  )
  print(f"all 48 layers finite: {all_finite}")
  '

  arch: qwen3moe
  num_layers: 48
  layers in payload: 48
  all 48 layers finite: True

Stdout is now strictly valid JSON; `2>/dev/null` discards BOS-FALLBACK
warnings on stderr.

What this PR does NOT ship
==========================

  - Custom prompt via `--prompt <str>` flag (test prompt is hardcoded
    "What is 2+2?"; matches text-mode default).
  - Per-token-position trace (only LAST token captured per the GGUF
    forward_traced + forward_qwen3_moe_traced semantics).
  - Sub-FFN MoE breakdown (router output, per-expert contribution) —
    those are zero in qwen3_moe traced forward; left for Step 4 work.
  - SafeTensors JSON output — same encoding works there, just the
    format string differs.

Hot-path safety
===============

  - Existing `apr trace --payload` (no --json) text mode unchanged.
  - Existing `apr trace --json` (no --payload) static-layer JSON mode
    unchanged — `handle_special_modes_with_json` only branches when
    BOTH json && payload are set.
  - All 5 callers of `handle_special_modes` continue to work
    (signature preserved via thin wrapper).

Refs M32d Step 2 exit criterion (M34 FAST PATH plan)
Refs PR #1226 (Step 2.5: apr trace dispatch — wired qwen3_moe arch)
Refs PR #1222 (Step 2: forward_qwen3_moe_traced — supplies the data)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 2, 2026 10:02
Locks in the JSON output schema that satisfies the M34 FAST PATH Step 2
exit criterion. Asserts:

  - stdout is valid JSON (no preamble lines breaking jq consumers)
  - all 11 top-level fields present (format, architecture, num_layers,
    hidden_dim, vocab_size, prompt, encoded_tokens, embedding, layers,
    final_norm, logits)
  - 48 layers (Qwen3-Coder-30B-A3B-Instruct)
  - per-layer 7 fields present (layer_idx + 6 stat slots)
  - every layer.output.std_dev is finite
  - every layer.output.nan_count == 0 && inf_count == 0
  - logits.l2_norm is finite and > 0

Skipped when GGUF or apr binary absent (fixture-absent ≠ defect, per
M32c.2.2.2.1.4 convention).

Live PASS on lambda-vector RTX 4090 in 5.38s.

Refs M32d Step 2 exit criterion (M34 FAST PATH plan)
Refs the JSON schema in this PR's main commit

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit be396c6 into main May 2, 2026
10 checks passed
@noahgift noahgift deleted the feat/apr-trace-json-payload-output branch May 2, 2026 11:58
noahgift added a commit that referenced this pull request May 2, 2026
…e_theta + chat template

Squashes 4 substantive M32d FAST PATH fixes (Step 5 + 5b + 6 + 7) +
regression test + evidence into a single commit on top of fresh main.
Replaces the original messy stacked-PR chain that conflicted on rebase
after sibling PRs (#1401, #1405) landed.

Live verification on lambda-vector RTX 4090 (post-rebuild):

  $ apr run <Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf> \
       --prompt "What is 2+2?" --max-tokens 8
  Output: 2 + 2 = 4
  Completed in 40.24s (cached)

Step 5 — per-head Q/K RMSNorm in forward_qwen3_moe (rank-3 prior, 15%)
====================================================================

Qwen3 GH-279 per-head Q/K RMSNorm was wired into the dense path
(adaptive_ffn.rs:174-179) but missing from forward_qwen3_moe.rs. Now
applied AFTER bias, BEFORE RoPE — same code as adaptive_ffn.

Pre-fix: layer std-dev grew 40× over 48 layers (signature of attention
scores compounding without per-head Q/K norm). Output `%%%%%%%%`.

Step 5b — rope_theta default 10K → 1M for qwen3_moe (rank-4 prior, 10%)
=======================================================================

GGUF for Qwen3-Coder-30B-A3B-Instruct-Q4_K_M ships WITHOUT
`qwen3moe.rope.freq_base` metadata. config.rs's default lookup had
`"qwen2" | "qwen3" => 1_000_000.0` but no qwen3_moe entry — fell to
catch-all 10K. Off by 100×. Added qwen3_moe to the 1M arm.

Step 6 — chat template (qwen3_moe → ChatML, no <think>)
========================================================

`detect_format_from_name` routed any "qwen3" name to Qwen3NoThink
(PMAT-181), which pre-injects empty `<think>\n</think>\n` into the
assistant turn. Qwen3-Coder does NOT have thinking mode (verified via
the Jinja `tokenizer.chat_template` in the GGUF) — empty think block
caused the model to emit `<|endoftext|>` immediately. Route qwen3_moe
to plain ChatML before the generic qwen3 → NoThink rule. PMAT-181
preserved for thinking-mode dense Qwen3.

Step 7 — sync forward_qwen3_moe_traced with Step 5 Q/K norm
============================================================

forward_qwen3_moe_traced (created in PR #1222 on main) was authored
mirroring the OLD pre-Q/K-norm forward_qwen3_moe. Without sync, `apr
trace --payload` shows DIFFERENT numerics from `apr run` — silent
diagnostic-vs-production drift. Mirror the same Q/K norm into the
traced variant.

Component priors discharge status (M34 FAST PATH)

  | Rank | Component | Prior | Status     |
  |------|-----------|-------|------------|
  | 1    | LAYOUT    | 30%   | not at issue |
  | 2    | Q4_K_M    | 20%   | not at issue |
  | 3    | Q/K norm  | 15%   | FIXED (this commit) |
  | 4    | RoPE θ    | 10%   | FIXED (this commit) |
  | 5    | router sm | 10%   | not at issue |
  | 6    | token emb | 10%   | not at issue |
  | 7    | other     | 5%    | n/a          |
  | n/a  | chat tpl  | n/a   | FIXED (this commit) |

Output transition

  pre-fix         → "%%%%%%%%"               (gibberish)
  + Step 5        → "Human: What is 2+"      (coherent English, partial)
  + Step 5b       → "Human: What is 2+2?"    (full prompt reproduced)
  + Step 6        → "2 + 2 = 4"              (correct answer)
  + Step 7        → diagnostic trace matches production

Multi-domain verification (also passes):
  "Capital of France:"   → "The capital of France is Paris."
  "Translate to Spanish: Hello world" → "¡Hola mundo!"
  "Count to 5:"          → "1, 2, 3, 4, 5"
  "Solve x^2 - 5x + 6 = 0:" → "I need to solve the quadratic equation x² - 5x + 6 = 0..."

Hot-path safety

  - Production text-generation path (`apr run` → run_qwen3_moe_generate
    → forward_qwen3_moe) now applies the norm.
  - `apr trace --payload` (forward_qwen3_moe_traced) syncs the same fix.
  - Sibling tests pass unchanged.
  - `forward_qwen3_moe_traced` reads `self.config.rope_theta` which is set
    at model load from the default lookup — Step 5b auto-applies via config.
  - Dense Qwen3 path UNCHANGED (Qwen3NoThink preserved for thinking-mode
    variants).

Regression test

  `crates/aprender-serve/tests/qwen3_moe_qk_norm_regression.rs`
  F-QW3-MOE-STEP5-001 asserts the context-awareness invariant: two
  distinct prompts must produce distinct argmax tokens, top-2 logit
  gap < 50.

  Live PASS on lambda-vector RTX 4090 in 6.60s.

Stack research

  - HuggingFace transformers Qwen3MoeForCausalLM applies per-head
    q_norm/k_norm in Qwen3MoeAttention.forward
  - llama.cpp ggml_qwen3_moe_kv_norm in llama-arch.cpp does the same
    (attn_q_norm.weight / attn_k_norm.weight)
  - HF Qwen3MoeConfig.rope_theta default = 1_000_000.0
  - Qwen3-Coder Jinja chat_template generation prompt is plain
    `<|im_start|>assistant\n` (no thinking)

Refs M32d FAST PATH plan (M34, paiml/claude-code-parity-apr)
Refs GH-279 (Qwen3 per-head Q/K RMSNorm)
Refs PMAT-181 (Qwen3NoThink preserved for thinking variants)
Refs FALSIFY-QW3-MOE-FORWARD-003

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 2, 2026
…e_theta + chat template (#1228)

Squashes 4 substantive M32d FAST PATH fixes (Step 5 + 5b + 6 + 7) +
regression test + evidence into a single commit on top of fresh main.
Replaces the original messy stacked-PR chain that conflicted on rebase
after sibling PRs (#1401, #1405) landed.

Live verification on lambda-vector RTX 4090 (post-rebuild):

  $ apr run <Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf> \
       --prompt "What is 2+2?" --max-tokens 8
  Output: 2 + 2 = 4
  Completed in 40.24s (cached)

Step 5 — per-head Q/K RMSNorm in forward_qwen3_moe (rank-3 prior, 15%)
====================================================================

Qwen3 GH-279 per-head Q/K RMSNorm was wired into the dense path
(adaptive_ffn.rs:174-179) but missing from forward_qwen3_moe.rs. Now
applied AFTER bias, BEFORE RoPE — same code as adaptive_ffn.

Pre-fix: layer std-dev grew 40× over 48 layers (signature of attention
scores compounding without per-head Q/K norm). Output `%%%%%%%%`.

Step 5b — rope_theta default 10K → 1M for qwen3_moe (rank-4 prior, 10%)
=======================================================================

GGUF for Qwen3-Coder-30B-A3B-Instruct-Q4_K_M ships WITHOUT
`qwen3moe.rope.freq_base` metadata. config.rs's default lookup had
`"qwen2" | "qwen3" => 1_000_000.0` but no qwen3_moe entry — fell to
catch-all 10K. Off by 100×. Added qwen3_moe to the 1M arm.

Step 6 — chat template (qwen3_moe → ChatML, no <think>)
========================================================

`detect_format_from_name` routed any "qwen3" name to Qwen3NoThink
(PMAT-181), which pre-injects empty `<think>\n</think>\n` into the
assistant turn. Qwen3-Coder does NOT have thinking mode (verified via
the Jinja `tokenizer.chat_template` in the GGUF) — empty think block
caused the model to emit `<|endoftext|>` immediately. Route qwen3_moe
to plain ChatML before the generic qwen3 → NoThink rule. PMAT-181
preserved for thinking-mode dense Qwen3.

Step 7 — sync forward_qwen3_moe_traced with Step 5 Q/K norm
============================================================

forward_qwen3_moe_traced (created in PR #1222 on main) was authored
mirroring the OLD pre-Q/K-norm forward_qwen3_moe. Without sync, `apr
trace --payload` shows DIFFERENT numerics from `apr run` — silent
diagnostic-vs-production drift. Mirror the same Q/K norm into the
traced variant.

Component priors discharge status (M34 FAST PATH)

  | Rank | Component | Prior | Status     |
  |------|-----------|-------|------------|
  | 1    | LAYOUT    | 30%   | not at issue |
  | 2    | Q4_K_M    | 20%   | not at issue |
  | 3    | Q/K norm  | 15%   | FIXED (this commit) |
  | 4    | RoPE θ    | 10%   | FIXED (this commit) |
  | 5    | router sm | 10%   | not at issue |
  | 6    | token emb | 10%   | not at issue |
  | 7    | other     | 5%    | n/a          |
  | n/a  | chat tpl  | n/a   | FIXED (this commit) |

Output transition

  pre-fix         → "%%%%%%%%"               (gibberish)
  + Step 5        → "Human: What is 2+"      (coherent English, partial)
  + Step 5b       → "Human: What is 2+2?"    (full prompt reproduced)
  + Step 6        → "2 + 2 = 4"              (correct answer)
  + Step 7        → diagnostic trace matches production

Multi-domain verification (also passes):
  "Capital of France:"   → "The capital of France is Paris."
  "Translate to Spanish: Hello world" → "¡Hola mundo!"
  "Count to 5:"          → "1, 2, 3, 4, 5"
  "Solve x^2 - 5x + 6 = 0:" → "I need to solve the quadratic equation x² - 5x + 6 = 0..."

Hot-path safety

  - Production text-generation path (`apr run` → run_qwen3_moe_generate
    → forward_qwen3_moe) now applies the norm.
  - `apr trace --payload` (forward_qwen3_moe_traced) syncs the same fix.
  - Sibling tests pass unchanged.
  - `forward_qwen3_moe_traced` reads `self.config.rope_theta` which is set
    at model load from the default lookup — Step 5b auto-applies via config.
  - Dense Qwen3 path UNCHANGED (Qwen3NoThink preserved for thinking-mode
    variants).

Regression test

  `crates/aprender-serve/tests/qwen3_moe_qk_norm_regression.rs`
  F-QW3-MOE-STEP5-001 asserts the context-awareness invariant: two
  distinct prompts must produce distinct argmax tokens, top-2 logit
  gap < 50.

  Live PASS on lambda-vector RTX 4090 in 6.60s.

Stack research

  - HuggingFace transformers Qwen3MoeForCausalLM applies per-head
    q_norm/k_norm in Qwen3MoeAttention.forward
  - llama.cpp ggml_qwen3_moe_kv_norm in llama-arch.cpp does the same
    (attn_q_norm.weight / attn_k_norm.weight)
  - HF Qwen3MoeConfig.rope_theta default = 1_000_000.0
  - Qwen3-Coder Jinja chat_template generation prompt is plain
    `<|im_start|>assistant\n` (no thinking)

Refs M32d FAST PATH plan (M34, paiml/claude-code-parity-apr)
Refs GH-279 (Qwen3 per-head Q/K RMSNorm)
Refs PMAT-181 (Qwen3NoThink preserved for thinking variants)
Refs FALSIFY-QW3-MOE-FORWARD-003

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 2, 2026
…d discharge audit-trail bump (#1078)

Source-of-truth bytes pushed by the companion repo. M22 paired-mirror
guard via pin.lock (sha256 byte-identity, will be refreshed in companion
PR).

Net change: bumps top-level contract YAML from v1.22.0 to v1.23.0 with
one new status_history entry (M35) recording M32d's functional discharge
on aprender main as commit 5235aae (#1228 squash).

What M35 records
================

  M32d numerical-parity bundle landed across multiple aprender PRs:
    #1222 (Step 2)        forward_qwen3_moe_traced diagnostic surface
    #1226 (Step 2.5)      `apr trace --payload` qwen3_moe dispatch
                          (squashed into #1222)
    #1242                 RUSTSEC-2026-0114 audit unblocker
    #1401 (Step 2 JSON)   `apr trace --json --payload` JSON output
                          (FAST PATH Step 2 exit-criterion shape)
    #1228 (THE BUNDLE)    Step 5 + 5b + 6 + 7 + regression test +
                          evidence — squashed into one commit on main:
                          - per-head Q/K RMSNorm in
                            forward_qwen3_moe (rank-3 prior, 15%)
                          - rope_theta 10K → 1M for qwen3_moe (rank-4
                            prior, 10%)
                          - chat template: qwen3_moe → ChatML
                            (no `<think>` injection)
                          - sync forward_qwen3_moe_traced with Step 5
                          - F-QW3-MOE-STEP5-001 regression test
                          - evidence/m32d-discharge-2026-05-01/

Live evidence on lambda-vector RTX 4090 against the 17.3 GB
Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf:

  $ apr run --prompt "What is 2+2?" --max-tokens 8
  Output: 2 + 2 = 4

  $ apr run --prompt "Capital of France:" --max-tokens 30
  Output: The capital of France is Paris.

  $ apr run --prompt "Translate to Spanish: Hello world" --max-tokens 30
  Output: ¡Hola mundo!

  $ apr run --prompt "Solve x^2 - 5x + 6 = 0:" --max-tokens 30
  Output: I need to solve the quadratic equation x² - 5x + 6 = 0.
          I can solve this by factoring.

Output transition timeline:
  pre-fix         "%%%%%%%%"
  + Step 5        "Human: What is 2+"
  + Step 5b       "Human: What is 2+2?"
  + Step 6        "2 + 2 = 4"

M34 FAST PATH actual cost: 5 PRs / ~6 hours wall — **lucky-case bound**
of the 4-6 PRs / 2-3 days estimate.

What M35 does NOT discharge
============================

  - Cosine vs HF FP16 measurement (operator-confirm — ~60 GB download).
    The formal flip of `qwen3-moe-forward-v1` v1.3.0 DRAFT → v1.4.0
    ACTIVE_RUNTIME waits on that measurement.
  - GPU MoE path (no forward_qwen3_moe_gpu; CUDA/wgpu kernels TBD).
  - Other Qwen3-MoE variants.

Refs aprender commit 5235aae (#1228)
Refs companion M34 (v1.21.0 → v1.22.0 plan)
Refs PMAT-CCPA-PARITY-001
Refs M22 paired-mirror invariant

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 2, 2026
…CHARGE (#1409)

Status flips DRAFT → ACTIVE_ALGORITHM_LEVEL.

M32d numerical parity is functionally discharged on aprender main as of
PR #1228 squash 5235aae (2026-05-02 13:42 UTC). Output transition on
lambda-vector RTX 4090 against the cached 17.3 GB Qwen3-Coder-30B-A3B-
Instruct-Q4_K_M.gguf:

  pre-fix         "%%%%%%%%"               (gibberish, repeated argmax)
  + Step 5        "Human: What is 2+"      (coherent English, partial)
  + Step 5b       "Human: What is 2+2?"    (full prompt reproduced)
  + Step 6        "2 + 2 = 4"              (correct answer)

Multi-domain dogfood (math/geography/translation/code) all correct.

Why ACTIVE_ALGORITHM_LEVEL not ACTIVE_RUNTIME
==============================================

Per the v1.3.0 (M32d.0) parity-strategy decision, full ACTIVE_RUNTIME
discharge requires:
  1. F-QW3-MOE-PARITY-001: cosine ≥ 0.99 vs HF FP16 reference logits
  2. F-QW3-MOE-PARITY-002: argmax matches llama.cpp top-1

#1 requires running scripts/generate_qwen3_moe_fp16_logits.py which is
operator-confirm pending (~60 GB HF download + ~30 min on 30B-A3B
multi-device offload).

ACTIVE_ALGORITHM_LEVEL is the right intermediate state: forward path is
functionally correct (verified by output quality across diverse
prompts), but the formal cosine-vs-HF gate hasn't fired yet.

Component priors verified empirically (M34 FAST PATH plan)
==========================================================

  rank-3 Q/K norm (15%)      FIXED #1228 Step 5
  rank-4 RoPE θ (10%)        FIXED #1228 Step 5b
  outside-priors             FIXED #1228 Step 6 (chat template wrapping)

The diagnostic surface from PRs #1222 (Step 2) + #1226 (Step 2.5) +
#1401 (Step 2 JSON wire) named rank-3 directly via the 40× std-growth
signature without needing the HF FP16 fixture. Step 1 of the original
plan was bypassed.

M34 FAST PATH cost
==================

  Outcome          PRs     Wall-clock
  ACTUAL           5       ~6 hours
  Lucky estimate   4-6     2-3 days
  Realistic        8-10    4-6 days
  Pessimistic      12-15   1-2 weeks

Came in at the lucky-case bound.

Refs aprender PR #1228 commit 5235aae
Refs companion `paiml/claude-code-parity-apr` M35 status_history
Refs `project_m32d_discharge_2026_05_02.md` (memory)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant