feat(apr-code): add --emit-trace flag (M28 — ccpa-trace.jsonl emission) by noahgift · Pull Request #1100 · paiml/aprender

noahgift · 2026-04-28T08:28:56Z

Summary

Adds apr code --emit-trace <path> flag — when set, after the agent loop completes the runtime writes a 4-record ccpa-trace.jsonl file describing the run.

Format mirrors the schema at paiml/claude-code-parity-apr / contracts/claude-code-parity-apr-v1.yaml § trace_schema. The companion-repo ccpa measure subcommand (paiml/claude-code-parity-apr#31, M26) consumes this file to score apr-code against canonical Claude Code reference fixtures.

Records emitted

#	kind	payload
1	`session_start`	synthetic UUIDv7-shaped session_id, ts, actor=apr-code, model path, cwd_sha256 placeholder
2	`user_prompt`	turn 0, verbatim text
3	`assistant_turn`	turn 1, single Block::Text with agent's final response
4	`session_end`	real elapsed_ms + tokens_in/tokens_out from AgentLoopResult.usage

Tool dispatch / hook / skill records are M29+ enrichment follow-ups.

Plumbing

commands_enum.rs — new emit_trace: Option<PathBuf> field on the Code variant
dispatch.rs — threads it into batuta::agent::code::cmd_code
code.rs cmd_code — accepts the new param + plumbs to run_single_prompt
code.rs run_single_prompt — captures Instant::now() at start; after the agent loop returns Ok(r), if --emit-trace was set, calls the new helper. Write failures eprintln! a warning but do NOT fail the agent run.
code.rs emit_ccpa_trace — new ~85 LOC helper that hand-rolls JSONL via serde_json::json! macros (no new dependency on ccpa_trace types).

Tests

4 new in code_tests.rs::emit_trace_tests (50 → 54 passing in agent::code):

emit_writes_4_jsonl_records_with_correct_kinds
emit_carries_prompt_and_response_text
emit_carries_token_counts_and_elapsed
emit_each_record_has_v1_envelope (per-record back-compat invariant from ccpa-trace v2)

Live dogfood

$ apr code --emit-trace /tmp/measured.jsonl \
    -p "Show me which CLAUDE.md takes precedence right now"
$ jq -r '.kind' /tmp/measured.jsonl
session_start
user_prompt
assistant_turn
session_end
$ jq 'select(.kind=="session_end")' /tmp/measured.jsonl
{"v":1,"kind":"session_end","turn":1,"stop_reason":"end_turn",
 "elapsed_ms":3295,"tokens_in":44,"tokens_out":1024}

Real elapsed_ms / token counts populated correctly. The response text was gibberish in the dogfood because Qwen3-1.7B is hitting <think>-loop issues (PMAT-190 — pre-existing aprender concern, separate workstream). The trace format is correct; the model behavior is unrelated.

What I checked

cargo build -p apr-cli --features code — clean
cargo fmt --check — clean on changed crates
cargo test -p aprender-orchestrate --lib agent::code — 54 passing
Live dogfood produces a valid 4-record JSONL

Why now

paiml/claude-code-parity-apr#31 (M26) ships a ccpa measure subcommand that drives apr code -p and synthesizes a student trace from stdout. That synthesis is text-only — tool dispatch, hooks, and skill invocations are invisible. M28 establishes the API contract for faithful trace emission so future M29+ enrichment can fill in tool/hook/skill records and produce a non-tautological FALSIFY-CCPA-013 discharge.

🤖 Generated with Claude Code

Records the M28 launch in status_history. Apr-side feature lives on a separate branch (feat/apr-code-emit-trace-m28, #1100). Refs: #1100 (M28 — apr code --emit-trace) paiml/claude-code-parity-apr@feat/m28-record-aprender-pr Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Companion-side bookkeeping for the M28 upstream feature. The apr-cli feature itself lives on a separate aprender branch (feat/apr-code-emit-trace-m28, paiml/aprender#1100). This PR records the launch in the companion contract's status_history audit trail. What landed upstream (paiml/aprender#1100): - new `--emit-trace <path>` flag on `apr code` - 4-record ccpa-trace.jsonl emission after every -p run (session_start + user_prompt + assistant_turn + session_end) - real elapsed_ms + token counts from AgentLoopResult.usage - 4 new unit tests; 50 → 54 passing in agent::code Live dogfood (verified before this PR): $ apr code --emit-trace /tmp/measured.jsonl -p "..." → 4-line valid ccpa-trace.jsonl → elapsed_ms=3295, tokens_in=44, tokens_out=1024 populated correctly Tool dispatch / hook event / skill invocation records remain M29+ enrichment follow-ups (text-only path is what M28 ships). Contract bump v1.15.0 → v1.16.0: - status field annotated with the M28 launch - status_history M28 entry detailing what shipped, dogfood result, and what remains for M29+ - aprender contract-mirror at byte-identical commit 8549cdc69 - pin.lock refreshed (sha256 e979ddfd...) Gates (all green locally): pv validate / pv lint PASS pmat comply check (is_compliant) true, 0 Fail, 12 advisory Warn cargo test --workspace all pass (0 new tests companion-side) scripts/pin-check.sh sha256 matches scripts/pin-check-roundtrip.sh byte-identical to aprender@8549cdc69 Refs: paiml/aprender#1100 (M28 upstream PR) contracts/claude-code-parity-apr-v1.yaml § status_history (M28) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Adds `apr code --emit-trace <path>` flag — when set, after the agent loop completes the runtime writes a 4-record `ccpa-trace.jsonl` file to `<path>` describing the run. Format mirrors the schema at https://github.com/paiml/claude-code-parity-apr/blob/main/contracts/claude-code-parity-apr-v1.yaml § trace_schema. The companion-repo `ccpa measure` subcommand (M26) consumes this file to score apr-code against canonical Claude Code reference fixtures. Records emitted: 1. session_start — synthetic UUIDv7-shaped session_id derived from the start ts; ts is a timestamp string; cwd_sha256 is a 64-char placeholder (the companion-repo differ normalizes these at compare time). 2. user_prompt — turn 0, verbatim text. 3. assistant_turn — turn 1, single Block::Text carrying the agent's final response text. Tool dispatch / hook / skill records are M29+ enrichment follow-ups. 4. session_end — real elapsed_ms + token counts from AgentLoopResult.usage (input_tokens / output_tokens). Real metadata, not stubbed. Plumbing: - commands_enum.rs — new `emit_trace: Option<PathBuf>` field on the Code variant. - dispatch.rs — threads it into batuta::agent::code::cmd_code. - code.rs cmd_code — accepts the new param + plumbs to run_single_prompt. - code.rs run_single_prompt — captures `Instant::now()` at start; after the agent loop returns Ok(r), if the caller passed --emit-trace, calls the new emit_ccpa_trace helper. On write-failure eprintln! a warning but DO NOT fail the agent run. - code.rs emit_ccpa_trace — new helper (~85 LOC) that hand-rolls JSONL via serde_json::json! macros (no new dependency on ccpa_trace types). Tests (4 new in code_tests.rs::emit_trace_tests): - emit_writes_4_jsonl_records_with_correct_kinds - emit_carries_prompt_and_response_text - emit_carries_token_counts_and_elapsed - emit_each_record_has_v1_envelope (per-record back-compat invariant from the ccpa-trace v2 schema) Total in agent::code: 50 → 54 tests passing. Live dogfood: $ apr code --emit-trace /tmp/measured.jsonl \ -p "Show me which CLAUDE.md takes precedence right now" $ cat /tmp/measured.jsonl | jq -r '.kind' session_start user_prompt assistant_turn session_end $ cat /tmp/measured.jsonl | jq -r 'select(.kind=="session_end")' {"v":1,"kind":"session_end","turn":1,"stop_reason":"end_turn", "elapsed_ms":3295,"tokens_in":44,"tokens_out":1024} Real elapsed_ms / token counts populated correctly. Note: the response text from Qwen3-1.7B in the dogfood was gibberish (<think>-loop pre-existing aprender concern, see PMAT-190). The trace format is correct; the model behavior is a separate workstream. The emit-trace flag works regardless of model quality. Refs: - paiml/claude-code-parity-apr#31 (M26 — ccpa measure subcommand that consumes this file) - paiml/claude-code-parity-apr/contracts/claude-code-parity-apr-v1.yaml § trace_schema (the canonical schema) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Recommends and auto-discovers Qwen3-Coder-30B-A3B-Instruct as the default model for `apr code` when present. Aligned with the research write-up at paiml/claude-code-parity-apr / 2026-04-28. What ships: configs/aliases.yaml + new short name `qwen3-coder` → hf://unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF Now `apr pull qwen3-coder` works. crates/aprender-registry/src/aliases.rs + matching entry in the in-memory AliasRegistry (kept in sync with configs/aliases.yaml). crates/aprender-orchestrate/src/agent/manifest.rs + `~/.cache/pacha/models/` added to model_search_dirs so `apr pull`-cached files (content-hashed names) are visible to discovery; pair with a friendly symlink in `~/.apr/models/` for the preferred-name filter to recognize. + new module-level helper `is_preferred_default_model(path)`: case-insensitive substring match against a short list of recommended-default model names. Order: 1. qwen3-coder-30b-a3b 2. qwen3-coder-next 3. qwen2.5-coder-32b 4. qwen2.5-coder-14b + discover_model + sort_candidates updated to insert preferred-name as a sort key BETWEEN validity (still wins overall) and newest-mtime. So when a small recently-pulled model exists alongside the recommended default, the recommended default is selected — fixing the failure mode where Qwen3-1.7B (PMAT-190 thinking-loop bug, emits gibberish) was being auto-picked over a known-good 30B model. Tests (5 new in manifest_tests_discovery.rs, 49 → 54 in agent::manifest): - preferred_default_recognises_qwen3_coder_30b_a3b (any-case, any-quant matching) - preferred_default_rejects_small_fallbacks (1.7B / 1.5B / 1.1B / 7B all rejected — the 7B Qwen2.5-Coder is still useful but we don't anchor it as the recommended-default family for 24 GB GPUs) - sort_candidates_promotes_preferred_over_newer (preferred-name beats newer-but-smaller mtime) - sort_candidates_newer_preferred_beats_older_preferred (within preferred-names, mtime still tiebreaks) - sort_candidates_validity_outranks_preference (Jidoka — invalid preferred loses to valid non-preferred) Live verification (this PR): $ apr pull qwen3-coder ✓ Downloaded successfully Path: /home/noah/.cache/pacha/models/2b88b180a790988f.gguf Size: 17.3 GB $ ln -s /home/noah/.cache/pacha/models/2b88b180a790988f.gguf \ /home/noah/.apr/models/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf $ apr code -p "ping" --max-turns 1 Model: Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf (auto-discovered) ↑ default-model preference picked correctly. Known gap (NOT addressed by this PR): After auto-discovery picks the model, both apr-serve subprocess and embedded inference fail with: Error: driver error: inference failed: Invalid shape: Tensor 'blk.0.ffn_up.weight' not found Qwen3-Coder-30B-A3B is a Mixture-of-Experts model that uses per-expert tensor names (`ffn_up_exps`, `ffn_gate_exps`, etc.), not the dense `ffn_up.weight` the current realizar GGUF loader expects. qwen3moe architecture support is upstream realizar work — separate from this PR. The discovery / alias / preferred- name selection mechanism is fully ready for when that lands. In the interim users hitting the inference error should fall back to a dense model — either Qwen2.5-Coder-32B-Instruct (also recognized by is_preferred_default_model) or Qwen2.5-Coder-7B. Refs: - Research write-up: paiml/claude-code-parity-apr / chat 2026-04-28 - Hugging Face: unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF - aprender CLAUDE.md § Claude Messages-API proxy spec — same model is already declared as the default for `apr serve anthropic` Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift · 2026-04-28T10:07:10Z

Superseded by #1102 which landed the same emit-trace + default-model code. Closing to consolidate.

…SIFY-QW3-MOE-FORWARD-003 (#1127) ## What ships Adds `crates/apr-cli/tests/qwen3_moe_apr_run_live_falsifier.rs` — F-QW3-MOE-C22214-001, an integration test that invokes the user-facing `apr` binary as a subprocess and asserts: 1. exit 0 2. stdout contains ≥1 non-whitespace character against the cached 17.3 GB Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf with a fresh date-tagged prompt. This pins the M32c.2.2.2.1.3 dispatch flip (PR #1126, squash a902eea) in CI / regression-prevention. Without it, a future regression that re-routed qwen3_moe back to the dense `run_gguf_generate` path (which produces garbage on MoE weights) would slip through CI silently — there'd be no signal at the `apr run` user-facing surface. ## Live evidence (lambda-vector RTX 4090, 2026-04-29) ``` running 1 test test f_qw3_moe_c22214_001_apr_run_emits_at_least_one_non_whitespace_char ... F-QW3-MOE-C22214-001: live `apr run` against /home/noah/.cache/pacha/models/2b88b180a790988f.gguf F-QW3-MOE-C22214-001: elapsed = 130.945370974s stdout (first 200B): === APR Run === Source: /home/noah/.cache/pacha/models/2b88b180a790988f.gguf Output: . Completed in 130.83s (cached) stderr (first 200B): [BOS-FALLBACK] No tokenizer.ggml.bos_token_id in GGUF — using architecture default for 'qwen3moe' [BOS-FALLBACK] No tokenizer.ggml.bos_token_id in GGUF — using architecture default for 'qwen3moe' F-QW3-MOE-C22214-001: PASS ok test result: ok. 1 passed; 0 failed; 0 ignored ``` Token quality vs llama.cpp Q4_K (cosine on logits) is M32d. This test asserts ONLY emit/exit-0 — the discharge gate for FALSIFY-QW3-MOE-FORWARD-003. ## Skip path CI runners (and any host without the cached GGUF) print: F-QW3-MOE-C22214-001: SKIP — no cached Qwen3-Coder GGUF at any of [...] and return success. Same skip pattern as `crates/aprender-serve/tests/qwen3_moe_forward_one_token.rs` (M32c.2.2.2.1.1 in-process forward primitive). ## Contract chain status M32a qwen3-moe-forward-v1 contract scaffold SHIPPED (#1099) M32b arch-aware FFN load refuses qwen3_moe SHIPPED (#1100) M32c.1+ MoE descriptor load + per-expert byte slicer SHIPPED M32c.2.2.2.1.1 forward_qwen3_moe method SHIPPED (#1124) M32c.2.2.2.1.2 run_qwen3_moe_generate function SHIPPED (#1125) M32c.2.2.2.1.3 dispatch flip + Q4_K_M qtype dispatch SHIPPED (#1126) M32c.2.2.2.1.4 live `apr run` falsifier THIS PR M32d numerical parity vs llama.cpp PENDING After M32d the contract flips DRAFT → ACTIVE_RUNTIME, which unblocks the companion-repo FALSIFY-CCPA-013 measured tool-dispatch parity gate. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) April 28, 2026 08:29

noahgift mentioned this pull request Apr 28, 2026

M28: record apr code --emit-trace launch (paiml/aprender#1100) paiml/claude-code-parity-apr#33

Merged

noahgift and others added 2 commits April 28, 2026 12:00

noahgift force-pushed the feat/apr-code-emit-trace-m28 branch from da4ada6 to b8a6495 Compare April 28, 2026 10:01

noahgift closed this Apr 28, 2026

auto-merge was automatically disabled April 28, 2026 10:07
Pull request was closed

noahgift mentioned this pull request Apr 29, 2026

test(realizar): M32c.2.2.2.1.4 — live apr run falsifier pinning FALSIFY-QW3-MOE-FORWARD-003 #1127

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(apr-code): add --emit-trace flag (M28 — ccpa-trace.jsonl emission)#1100

feat(apr-code): add --emit-trace flag (M28 — ccpa-trace.jsonl emission)#1100
noahgift wants to merge 2 commits into
mainfrom
feat/apr-code-emit-trace-m28

noahgift commented Apr 28, 2026

Uh oh!

noahgift commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 28, 2026

Summary

Records emitted

Plumbing

Tests

Live dogfood

What I checked

Why now

Uh oh!

noahgift commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant