feat(apr-cli): wire apr trace --save-tensor end-to-end for .apr files#1417
Merged
Conversation
Before this PR, `apr trace --save-tensor <model>.apr` printed a stub message and never invoked the underlying wrapper — so the existing Embedding (PR #1408 step 1) and LmHead (PR #1414 step 2) capture surface was UNREACHABLE from the CLI. The wrapper, plan-builder, and APRT byte-format machinery had all merged but produced no files. This PR closes the dispatch gap. When `apr trace --save-tensor <stages>` is invoked on a `.apr` model, dispatch.rs now routes to a new `commands::trace_save_tensor::run_save_tensor_apr` function that: 1. Builds a `SaveTensorPlan` from `--save-tensor`/`--save-tensor-dir`/ `--save-tensor-layers`. Default output dir is `<model-stem>-trace/` next to the input. 2. Loads the APR model + embedded BPE tokenizer. 3. Encodes a fixed test prompt (`"What is 2+2?"` — same as `vector_stats.rs::run_traced_inference_apr` for consistency). 4. Calls `AprTransformer::forward_traced_with_save_tensor(&tokens, &plan)`. 5. Walks the output directory and prints every `*.bin` file with its size, plus the forward-pass success summary. `.gguf` and `.safetensors` paths still print the stub for now — SHIP-007 PR-E live diagnostics convert GGUF→APR at the import boundary so the canonical 7B teacher bisection runs through this code path. ## Five Whys 1. **Why was this missing?** Per the SHIP-007 PR-A commit message, the clap surface was shipped first as a contract pin so `apr-cli-trace-save-tensor-v1.yaml::cli_signature` was bound at the binary boundary. Subsequent PRs (B/C-prep/C-step1/C-step2) wired the library-side machinery. The dispatch glue was the missing final hop — easy to overlook because the contract test `apr trace --save-tensor --help | grep save-tensor` passed all along. 2. **Why is `.apr` only?** `forward_traced_with_save_tensor` is a method on `AprTransformer`; GGUF inference goes through a different `OwnedQuantizedModel` path. Adding GGUF support means either porting the wrapper to that type or converting GGUF→APR at import — the latter is what SHIP-007 PR-E already plans, so it's not blocking. 3. **Why a fixed prompt instead of `--prompt`?** SHIP-007 bisection needs the SAME prompt across APR and GGUF runs to make `apr diff --values` byte comparison meaningful. Hardcoding to `"What is 2+2?"` matches the existing `run_traced_inference_apr` in `vector_stats.rs`. A future `--prompt` flag is a small follow-up. 4. **Why a new module instead of extending `trace.rs`?** `trace.rs` is 722 lines already; adding the save-tensor branch via a new module (52 lines + 4 unit tests) keeps the existing 4-format dispatch intact and isolates the wrapper-specific imports (`realizar::inference_trace::save_tensor_plan::SaveTensorPlan`). 5. **Why now?** Operator's standing /loop directive is "select next best recommended choice". With PR #1414 (step 2 LmHead) merged today and PR #1416 (refactor prep) auto-merge queued, the highest-leverage move is making the existing capture surface actually work end-to-end. SHIP-007 PR-E (the live bisection) is gated on `apr trace --save-tensor` producing files; that gate is what this PR opens. ## Test plan - [x] `cargo test -p apr-cli --lib commands::trace_save_tensor` → 5/5 PASS - default_output_dir_uses_model_stem - default_output_dir_handles_bare_filename - default_output_dir_handles_no_extension - collect_bin_files_recurses_per_layer_subdirs - collect_bin_files_missing_dir_is_ok - [x] `cargo check -p apr-cli --lib` clean - [ ] Live smoke on canonical 7B teacher (operator-pre-authorized lambda-labs lane): `apr trace --payload /mnt/nvme-raid0/.../qwen2.5-coder-7b-instruct-q4k.apr --save-tensor embedding,lm_head` produces `<dir>/layer-0/embedding.bin` + `<dir>/lm_head.bin`. Deferred to a follow-up commit since the operator can run this any time after merge. - [ ] CI required checks (`ci / gate`, `workspace-test`) ## Ship % update - **MODEL-1**: ~68% → **~70%** (the wrapper surface that's been merged for 2 days is now actually invocable from the CLI, which means SHIP-007 PR-E live diagnostics are no longer blocked on this dispatch gap). - **MODEL-2**: corpus tokenization at ~96M tokens / 119 min (steady ~14K tok/s). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
noahgift
added a commit
that referenced
this pull request
May 3, 2026
… records CLI dispatch wire-up PARTIAL discharge Follow-up paperwork to PR #1417 (`apr trace --save-tensor` end-to-end dispatch for .apr files). Adds FALSIFY-APR-TRACE-SAVE-011 binding the new dispatch wire-up at PARTIAL_ALGORITHM_LEVEL with `binds_to: cli_signature`. Before PR #1417, `apr trace --save-tensor` only printed a stub and never invoked `forward_traced_with_save_tensor`. The contract test `apr trace --save-tensor --help | grep save-tensor` (FALSIFY-001) was already passing at the binary-boundary level — but the dispatch glue was missing, leaving Embedding + LmHead capture surface unreachable from the CLI for 2 days post-step-2 merge. FALSIFY-011 extends the existing `cli_signature` invariant from "the flag is recognized" to "the flag actually produces files". ## Five Whys 1. **Why a separate contract bump?** Avoids file-conflict with the in-flight refactor PR #1416 (which only touches `crates/aprender-serve/`). My contract change is isolated to `contracts/apr-cli-trace-save-tensor-v1.yaml`. 2. **Why `binds_to: cli_signature`?** PR #1417 doesn't change the byte format or determinism — it makes the CLI surface that the `cli_signature` equation already specified actually invocable. Same equation, expanded discharge level. 3. **Why PARTIAL_ALGORITHM_LEVEL?** The 5 unit tests cover path resolution (3) and recursive *.bin walking (2) — algorithm-level. A live discharge against the canonical 7B teacher is operator- gated by post-merge smoke (~30s for a 7B forward + 2 file writes). 4. **Why bump v1.2.0 → v1.3.0?** Adding a new falsification test that binds an existing invariant is a minor schema change per semver. v1.0.0 → v1.1.0 → v1.2.0 → v1.3.0 records each step's discharge timeline: - v1.1.0 (PR #1413): apr_diff_values_compat → APRT-aware diff - v1.2.0 (PR #1415): byte_format → LmHead capture (step 2) - v1.3.0 (this PR): cli_signature → end-to-end dispatch 5. **Why now?** Records the algorithm-level discharge so when the operator runs the live smoke post-#1417-merge, the contract ledger doesn't lag the code. Same paperwork pattern as #1415 (which followed #1414). ## Verification - `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` → 0 errors, 0 warnings ## Ship % update - MODEL-1: ~70% (unchanged — pure paperwork; code is in PR #1417). - MODEL-2: corpus tokenization at ~115M tokens / 143 min (steady ~14K tok/s; ~33h ETA total). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 3, 2026
… records CLI dispatch wire-up PARTIAL discharge Follow-up paperwork to PR #1417 (`apr trace --save-tensor` end-to-end dispatch for .apr files). Adds FALSIFY-APR-TRACE-SAVE-011 binding the new dispatch wire-up at PARTIAL_ALGORITHM_LEVEL with `binds_to: cli_signature`. Before PR #1417, `apr trace --save-tensor` only printed a stub and never invoked `forward_traced_with_save_tensor`. The contract test `apr trace --save-tensor --help | grep save-tensor` (FALSIFY-001) was already passing at the binary-boundary level — but the dispatch glue was missing, leaving Embedding + LmHead capture surface unreachable from the CLI for 2 days post-step-2 merge. FALSIFY-011 extends the existing `cli_signature` invariant from "the flag is recognized" to "the flag actually produces files". ## Five Whys 1. **Why a separate contract bump?** Avoids file-conflict with the in-flight refactor PR #1416 (which only touches `crates/aprender-serve/`). My contract change is isolated to `contracts/apr-cli-trace-save-tensor-v1.yaml`. 2. **Why `binds_to: cli_signature`?** PR #1417 doesn't change the byte format or determinism — it makes the CLI surface that the `cli_signature` equation already specified actually invocable. Same equation, expanded discharge level. 3. **Why PARTIAL_ALGORITHM_LEVEL?** The 5 unit tests cover path resolution (3) and recursive *.bin walking (2) — algorithm-level. A live discharge against the canonical 7B teacher is operator- gated by post-merge smoke (~30s for a 7B forward + 2 file writes). 4. **Why bump v1.2.0 → v1.3.0?** Adding a new falsification test that binds an existing invariant is a minor schema change per semver. v1.0.0 → v1.1.0 → v1.2.0 → v1.3.0 records each step's discharge timeline: - v1.1.0 (PR #1413): apr_diff_values_compat → APRT-aware diff - v1.2.0 (PR #1415): byte_format → LmHead capture (step 2) - v1.3.0 (this PR): cli_signature → end-to-end dispatch 5. **Why now?** Records the algorithm-level discharge so when the operator runs the live smoke post-#1417-merge, the contract ledger doesn't lag the code. Same paperwork pattern as #1415 (which followed #1414). ## Verification - `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` → 0 errors, 0 warnings ## Ship % update - MODEL-1: ~70% (unchanged — pure paperwork; code is in PR #1417). - MODEL-2: corpus tokenization at ~115M tokens / 143 min (steady ~14K tok/s; ~33h ETA total). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 3, 2026
… records CLI dispatch wire-up PARTIAL discharge (#1418) Follow-up paperwork to PR #1417 (`apr trace --save-tensor` end-to-end dispatch for .apr files). Adds FALSIFY-APR-TRACE-SAVE-011 binding the new dispatch wire-up at PARTIAL_ALGORITHM_LEVEL with `binds_to: cli_signature`. Before PR #1417, `apr trace --save-tensor` only printed a stub and never invoked `forward_traced_with_save_tensor`. The contract test `apr trace --save-tensor --help | grep save-tensor` (FALSIFY-001) was already passing at the binary-boundary level — but the dispatch glue was missing, leaving Embedding + LmHead capture surface unreachable from the CLI for 2 days post-step-2 merge. FALSIFY-011 extends the existing `cli_signature` invariant from "the flag is recognized" to "the flag actually produces files". ## Five Whys 1. **Why a separate contract bump?** Avoids file-conflict with the in-flight refactor PR #1416 (which only touches `crates/aprender-serve/`). My contract change is isolated to `contracts/apr-cli-trace-save-tensor-v1.yaml`. 2. **Why `binds_to: cli_signature`?** PR #1417 doesn't change the byte format or determinism — it makes the CLI surface that the `cli_signature` equation already specified actually invocable. Same equation, expanded discharge level. 3. **Why PARTIAL_ALGORITHM_LEVEL?** The 5 unit tests cover path resolution (3) and recursive *.bin walking (2) — algorithm-level. A live discharge against the canonical 7B teacher is operator- gated by post-merge smoke (~30s for a 7B forward + 2 file writes). 4. **Why bump v1.2.0 → v1.3.0?** Adding a new falsification test that binds an existing invariant is a minor schema change per semver. v1.0.0 → v1.1.0 → v1.2.0 → v1.3.0 records each step's discharge timeline: - v1.1.0 (PR #1413): apr_diff_values_compat → APRT-aware diff - v1.2.0 (PR #1415): byte_format → LmHead capture (step 2) - v1.3.0 (this PR): cli_signature → end-to-end dispatch 5. **Why now?** Records the algorithm-level discharge so when the operator runs the live smoke post-#1417-merge, the contract ledger doesn't lag the code. Same paperwork pattern as #1415 (which followed #1414). ## Verification - `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` → 0 errors, 0 warnings ## Ship % update - MODEL-1: ~70% (unchanged — pure paperwork; code is in PR #1417). - MODEL-2: corpus tokenization at ~115M tokens / 143 min (steady ~14K tok/s; ~33h ETA total). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the dispatch gap that left
apr trace --save-tensor <model>.aprprinting a stub message and never invoking the wrapper. The Embedding (PR #1408) + LmHead (PR #1414) capture surface is now actually reachable from the CLI.What changed
crates/apr-cli/src/commands/trace_save_tensor.rs(~150 LOC):pub fn run_save_tensor_apr(path, stages, dir, layers)builds aSaveTensorPlan, loads the APR model, encodes a fixed test prompt, callsforward_traced_with_save_tensor, walks the output dir, prints every*.binfile + size + forward-pass summarydefault_output_dir(path)→<model-stem>-trace/next to inputcollect_bin_files(dir)recursive*.binwalkercrates/apr-cli/src/commands/mod.rs: register the new module behindfeature = "inference"crates/apr-cli/src/dispatch.rs: when--save-tensoris set AND extension is.apr, route to the new function INSTEAD of the existing trace path..gguf/.safetensorsprint a stub explaining they'll be wired in PR-E (post-import-conversion).Five Whys
apr trace --save-tensor --help | grep save-tensorpassed all along.forward_traced_with_save_tensoris a method onAprTransformer; GGUF goes throughOwnedQuantizedModel. SHIP-007 PR-E already plans GGUF→APR conversion at import, so the bisection runs through this code path.--promptis a follow-up.trace.rsis 722 lines; adding the branch in a 150-line module isolatesrealizar::inference_trace::save_tensor_plan::SaveTensorPlanimports.Test plan
cargo test -p apr-cli --lib commands::trace_save_tensor→ 5/5 PASScargo check -p apr-cli --libcleanci / gate,workspace-test)<dir>/layer-0/embedding.bin+<dir>/lm_head.binShip % update
🤖 Generated with Claude Code