docs(p3): apr-cli-trace-save-tensor-v1 — contract that unblocks SHIP-007 layer-0 bisection (ships MODEL-1) by noahgift · Pull Request #1102 · paiml/aprender

noahgift · 2026-04-28T09:14:30Z

Summary

SHIP-007's hypothesis space has been narrowed by 5 falsified hypotheses across this session (§28 matmul kernel, §28.4(a) q4k_layers populated, §31 qkv_bias values, §32 layer-3 weights byte-identical, #1101 parallel-reduction-nondeterminism). The remaining bug surface is per-element divergence at some specific stage of layer-0 forward.

Aggregate stats — already emitted by apr trace --payload — are insufficient: they hide per-element drift behind similar std values. This contract defines the missing per-stage tensor capture infrastructure.

Linkage to shipping MODEL-1

paiml/qwen2.5-coder-7b-apache-q4k-v1 is published but blocked on SHIP-002/005/006/007/008 (5 PARTIALs) which all depend on the SHIP-007 fix. Once this contract's implementation lands, the layer-0 bisection completes in one debug session: run save-tensor in both APR and GGUF formats, apr diff at each of 19 stages, pinpoint the first divergent stage as the actual bug surface, fix at root → 5 PARTIALs flip to DISCHARGED → MODEL-1 ships cleanly through both backends.

Contract structure

4 equations: cli_signature, byte_format (APRT magic), determinism, apr_diff_values_compat
8 falsification tests covering CLI surface, determinism, expected APR-vs-GGUF diff at ffn_gate, header format, multi-stage, NaN preservation, --layer subset, pv validation
19 named stages enumerated (embedding → lm_head)

Status

PROPOSED. Implementation cost: ~400-600 LOC + 8 tests, multi-day Rust task.

Test plan

pv validate contracts/apr-cli-trace-save-tensor-v1.yaml exits 0 (verified live)

🤖 Generated with Claude Code

Adds `apr code --emit-trace <path>` flag — when set, after the agent loop completes the runtime writes a 4-record `ccpa-trace.jsonl` file to `<path>` describing the run. Format mirrors the schema at https://github.com/paiml/claude-code-parity-apr/blob/main/contracts/claude-code-parity-apr-v1.yaml § trace_schema. The companion-repo `ccpa measure` subcommand (M26) consumes this file to score apr-code against canonical Claude Code reference fixtures. Records emitted: 1. session_start — synthetic UUIDv7-shaped session_id derived from the start ts; ts is a timestamp string; cwd_sha256 is a 64-char placeholder (the companion-repo differ normalizes these at compare time). 2. user_prompt — turn 0, verbatim text. 3. assistant_turn — turn 1, single Block::Text carrying the agent's final response text. Tool dispatch / hook / skill records are M29+ enrichment follow-ups. 4. session_end — real elapsed_ms + token counts from AgentLoopResult.usage (input_tokens / output_tokens). Real metadata, not stubbed. Plumbing: - commands_enum.rs — new `emit_trace: Option<PathBuf>` field on the Code variant. - dispatch.rs — threads it into batuta::agent::code::cmd_code. - code.rs cmd_code — accepts the new param + plumbs to run_single_prompt. - code.rs run_single_prompt — captures `Instant::now()` at start; after the agent loop returns Ok(r), if the caller passed --emit-trace, calls the new emit_ccpa_trace helper. On write-failure eprintln! a warning but DO NOT fail the agent run. - code.rs emit_ccpa_trace — new helper (~85 LOC) that hand-rolls JSONL via serde_json::json! macros (no new dependency on ccpa_trace types). Tests (4 new in code_tests.rs::emit_trace_tests): - emit_writes_4_jsonl_records_with_correct_kinds - emit_carries_prompt_and_response_text - emit_carries_token_counts_and_elapsed - emit_each_record_has_v1_envelope (per-record back-compat invariant from the ccpa-trace v2 schema) Total in agent::code: 50 → 54 tests passing. Live dogfood: $ apr code --emit-trace /tmp/measured.jsonl \ -p "Show me which CLAUDE.md takes precedence right now" $ cat /tmp/measured.jsonl | jq -r '.kind' session_start user_prompt assistant_turn session_end $ cat /tmp/measured.jsonl | jq -r 'select(.kind=="session_end")' {"v":1,"kind":"session_end","turn":1,"stop_reason":"end_turn", "elapsed_ms":3295,"tokens_in":44,"tokens_out":1024} Real elapsed_ms / token counts populated correctly. Note: the response text from Qwen3-1.7B in the dogfood was gibberish (<think>-loop pre-existing aprender concern, see PMAT-190). The trace format is correct; the model behavior is a separate workstream. The emit-trace flag works regardless of model quality. Refs: - paiml/claude-code-parity-apr#31 (M26 — ccpa measure subcommand that consumes this file) - paiml/claude-code-parity-apr/contracts/claude-code-parity-apr-v1.yaml § trace_schema (the canonical schema) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Recommends and auto-discovers Qwen3-Coder-30B-A3B-Instruct as the default model for `apr code` when present. Aligned with the research write-up at paiml/claude-code-parity-apr / 2026-04-28. What ships: configs/aliases.yaml + new short name `qwen3-coder` → hf://unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF Now `apr pull qwen3-coder` works. crates/aprender-registry/src/aliases.rs + matching entry in the in-memory AliasRegistry (kept in sync with configs/aliases.yaml). crates/aprender-orchestrate/src/agent/manifest.rs + `~/.cache/pacha/models/` added to model_search_dirs so `apr pull`-cached files (content-hashed names) are visible to discovery; pair with a friendly symlink in `~/.apr/models/` for the preferred-name filter to recognize. + new module-level helper `is_preferred_default_model(path)`: case-insensitive substring match against a short list of recommended-default model names. Order: 1. qwen3-coder-30b-a3b 2. qwen3-coder-next 3. qwen2.5-coder-32b 4. qwen2.5-coder-14b + discover_model + sort_candidates updated to insert preferred-name as a sort key BETWEEN validity (still wins overall) and newest-mtime. So when a small recently-pulled model exists alongside the recommended default, the recommended default is selected — fixing the failure mode where Qwen3-1.7B (PMAT-190 thinking-loop bug, emits gibberish) was being auto-picked over a known-good 30B model. Tests (5 new in manifest_tests_discovery.rs, 49 → 54 in agent::manifest): - preferred_default_recognises_qwen3_coder_30b_a3b (any-case, any-quant matching) - preferred_default_rejects_small_fallbacks (1.7B / 1.5B / 1.1B / 7B all rejected — the 7B Qwen2.5-Coder is still useful but we don't anchor it as the recommended-default family for 24 GB GPUs) - sort_candidates_promotes_preferred_over_newer (preferred-name beats newer-but-smaller mtime) - sort_candidates_newer_preferred_beats_older_preferred (within preferred-names, mtime still tiebreaks) - sort_candidates_validity_outranks_preference (Jidoka — invalid preferred loses to valid non-preferred) Live verification (this PR): $ apr pull qwen3-coder ✓ Downloaded successfully Path: /home/noah/.cache/pacha/models/2b88b180a790988f.gguf Size: 17.3 GB $ ln -s /home/noah/.cache/pacha/models/2b88b180a790988f.gguf \ /home/noah/.apr/models/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf $ apr code -p "ping" --max-turns 1 Model: Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf (auto-discovered) ↑ default-model preference picked correctly. Known gap (NOT addressed by this PR): After auto-discovery picks the model, both apr-serve subprocess and embedded inference fail with: Error: driver error: inference failed: Invalid shape: Tensor 'blk.0.ffn_up.weight' not found Qwen3-Coder-30B-A3B is a Mixture-of-Experts model that uses per-expert tensor names (`ffn_up_exps`, `ffn_gate_exps`, etc.), not the dense `ffn_up.weight` the current realizar GGUF loader expects. qwen3moe architecture support is upstream realizar work — separate from this PR. The discovery / alias / preferred- name selection mechanism is fully ready for when that lands. In the interim users hitting the inference error should fall back to a dense model — either Qwen2.5-Coder-32B-Instruct (also recognized by is_preferred_default_model) or Qwen2.5-Coder-7B. Refs: - Research write-up: paiml/claude-code-parity-apr / chat 2026-04-28 - Hugging Face: unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF - aprender CLAUDE.md § Claude Messages-API proxy spec — same model is already declared as the default for `apr serve anthropic` Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…r capture (unblocks SHIP-007 layer-0 bisection) Triggering observation 2026-04-28: SHIP-007's hypothesis space has been narrowed by 5 falsified hypotheses (§28, §28.4(a), §31, §32, #1101). The remaining bug surface is per-element divergence at some specific stage of layer-0 forward. Aggregate stats — already emitted by `apr trace --payload` — are insufficient since they can hide per-element drift behind similar std values. This contract defines the missing infrastructure: `--save-tensor <stage>` flag that captures raw F32 tensor values at chosen forward-pass stages, written as APRT-magic-prefixed binaries that `apr diff --values` can load directly. ## Stages enumerated (19 total) embedding, attn_norm, qkv_matmul, qkv_bias, q_post_rope, k_post_rope, attention, attn_out, post_attn_residual, ffn_norm, ffn_gate, ffn_up, ffn_silu, ffn_swigl, ffn_out, post_ffn_residual, layer_output, final_norm, lm_head ## Falsification tests (8) - 001: --save-tensor flag recognized - 002: determinism (byte-identical across runs) - 003: ffn_gate stage produces expected APR-vs-GGUF diff (corroborates #1099) - 004: APRT header format self-describing - 005: multi-stage comma-list works - 006: NaN preservation - 007: --layer subset compatible - 008: pv validates `pv validate` exits 0 (verified). ## Implementation cost 400-600 LOC + 8 tests, multi-day Rust task. ## Linkage to shipping MODEL-1 Once shipped, the SHIP-007 layer-0 bisection completes in one debug session: run save-tensor in both APR and GGUF formats, apr diff at each stage, pinpoint the first divergent stage as the actual bug surface. SHIP-002/005/006/007/008 (5 PARTIALs) all depend on the SHIP-007 fix. With this tooling, the fix is unblocked. paiml/qwen2.5-coder-7b-apache-q4k-v1 ships cleanly through both APR and GGUF backends → MODEL-1 completes. Status: PROPOSED. Implementation deferred to multi-day Rust task. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) April 28, 2026 09:14

noahgift and others added 3 commits April 28, 2026 11:40

noahgift force-pushed the docs/apr-trace-save-tensor-contract branch from 9174fbf to 84fe408 Compare April 28, 2026 09:40

noahgift merged commit 2e003ac into main Apr 28, 2026
10 checks passed

noahgift deleted the docs/apr-trace-save-tensor-contract branch April 28, 2026 09:56

noahgift mentioned this pull request Apr 28, 2026

feat(apr-code): add --emit-trace flag (M28 — ccpa-trace.jsonl emission) #1100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(p3): apr-cli-trace-save-tensor-v1 — contract that unblocks SHIP-007 layer-0 bisection (ships MODEL-1)#1102

docs(p3): apr-cli-trace-save-tensor-v1 — contract that unblocks SHIP-007 layer-0 bisection (ships MODEL-1)#1102
noahgift merged 3 commits into
mainfrom
docs/apr-trace-save-tensor-contract

noahgift commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 28, 2026

Summary

Linkage to shipping MODEL-1

Contract structure

Status

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant