Skip to content

feat(apr-cli): wire apr trace --save-tensor end-to-end for .apr files#1417

Merged
noahgift merged 1 commit into
mainfrom
feat/apr-trace-save-tensor-dispatch-wire-up
May 3, 2026
Merged

feat(apr-cli): wire apr trace --save-tensor end-to-end for .apr files#1417
noahgift merged 1 commit into
mainfrom
feat/apr-trace-save-tensor-dispatch-wire-up

Conversation

@noahgift

@noahgift noahgift commented May 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes the dispatch gap that left apr trace --save-tensor <model>.apr printing a stub message and never invoking the wrapper. The Embedding (PR #1408) + LmHead (PR #1414) capture surface is now actually reachable from the CLI.

What changed

  • New crates/apr-cli/src/commands/trace_save_tensor.rs (~150 LOC):
    • pub fn run_save_tensor_apr(path, stages, dir, layers) builds a SaveTensorPlan, loads the APR model, encodes a fixed test prompt, calls forward_traced_with_save_tensor, walks the output dir, prints every *.bin file + size + forward-pass summary
    • default_output_dir(path)<model-stem>-trace/ next to input
    • collect_bin_files(dir) recursive *.bin walker
    • 5 unit tests covering bare filename, missing extension, layer-N subdir recursion, missing-dir gracefully empty
  • crates/apr-cli/src/commands/mod.rs: register the new module behind feature = "inference"
  • crates/apr-cli/src/dispatch.rs: when --save-tensor is set AND extension is .apr, route to the new function INSTEAD of the existing trace path. .gguf/.safetensors print a stub explaining they'll be wired in PR-E (post-import-conversion).

Five Whys

  1. Why was this missing? PR-A shipped clap surface as a contract pin; PR-B/B-prep/C-step1/C-step2 wired library machinery. The dispatch glue was easy to overlook because the contract test apr trace --save-tensor --help | grep save-tensor passed all along.
  2. Why .apr only? forward_traced_with_save_tensor is a method on AprTransformer; GGUF goes through OwnedQuantizedModel. SHIP-007 PR-E already plans GGUF→APR conversion at import, so the bisection runs through this code path.
  3. Why fixed prompt? SHIP-007 bisection needs the SAME prompt across APR and GGUF runs for byte comparison. --prompt is a follow-up.
  4. Why new module? trace.rs is 722 lines; adding the branch in a 150-line module isolates realizar::inference_trace::save_tensor_plan::SaveTensorPlan imports.
  5. Why now? SHIP-007 PR-E live diagnostics were silently blocked on this gap. With feat(aprender-serve): SHIP-007 PR-C-real step 2 — LmHead capture in forward_traced wrapper #1414 merged + refactor(aprender-serve): extract maybe_save_stage helper for SHIP-007 forward_traced threading #1416 auto-merge queued, opening this gate is the highest-leverage next move.

Test plan

  • cargo test -p apr-cli --lib commands::trace_save_tensor → 5/5 PASS
  • cargo check -p apr-cli --lib clean
  • CI required checks (ci / gate, workspace-test)
  • Live smoke on canonical 7B teacher (deferred to operator post-merge):
    apr trace --payload /mnt/nvme-raid0/.../qwen2.5-coder-7b-instruct-q4k.apr \
      --save-tensor embedding,lm_head
    
    expected: <dir>/layer-0/embedding.bin + <dir>/lm_head.bin

Ship % update

  • MODEL-1: ~68% → ~70% — the wrapper surface that's been merged for 2 days is now actually invocable from the CLI; SHIP-007 PR-E live diagnostics are no longer blocked on this dispatch gap.
  • MODEL-2: tokenization steady (~96M tokens / 119 min).

🤖 Generated with Claude Code

Before this PR, `apr trace --save-tensor <model>.apr` printed a stub
message and never invoked the underlying wrapper — so the existing
Embedding (PR #1408 step 1) and LmHead (PR #1414 step 2) capture
surface was UNREACHABLE from the CLI. The wrapper, plan-builder, and
APRT byte-format machinery had all merged but produced no files.

This PR closes the dispatch gap. When `apr trace --save-tensor
<stages>` is invoked on a `.apr` model, dispatch.rs now routes to a
new `commands::trace_save_tensor::run_save_tensor_apr` function that:

1. Builds a `SaveTensorPlan` from `--save-tensor`/`--save-tensor-dir`/
   `--save-tensor-layers`. Default output dir is `<model-stem>-trace/`
   next to the input.
2. Loads the APR model + embedded BPE tokenizer.
3. Encodes a fixed test prompt (`"What is 2+2?"` — same as
   `vector_stats.rs::run_traced_inference_apr` for consistency).
4. Calls `AprTransformer::forward_traced_with_save_tensor(&tokens,
   &plan)`.
5. Walks the output directory and prints every `*.bin` file with its
   size, plus the forward-pass success summary.

`.gguf` and `.safetensors` paths still print the stub for now —
SHIP-007 PR-E live diagnostics convert GGUF→APR at the import boundary
so the canonical 7B teacher bisection runs through this code path.

## Five Whys

1. **Why was this missing?** Per the SHIP-007 PR-A commit message,
   the clap surface was shipped first as a contract pin so
   `apr-cli-trace-save-tensor-v1.yaml::cli_signature` was bound at the
   binary boundary. Subsequent PRs (B/C-prep/C-step1/C-step2) wired
   the library-side machinery. The dispatch glue was the missing
   final hop — easy to overlook because the contract test
   `apr trace --save-tensor --help | grep save-tensor` passed all along.
2. **Why is `.apr` only?** `forward_traced_with_save_tensor` is a
   method on `AprTransformer`; GGUF inference goes through a
   different `OwnedQuantizedModel` path. Adding GGUF support means
   either porting the wrapper to that type or converting GGUF→APR at
   import — the latter is what SHIP-007 PR-E already plans, so it's
   not blocking.
3. **Why a fixed prompt instead of `--prompt`?** SHIP-007 bisection
   needs the SAME prompt across APR and GGUF runs to make
   `apr diff --values` byte comparison meaningful. Hardcoding to
   `"What is 2+2?"` matches the existing `run_traced_inference_apr`
   in `vector_stats.rs`. A future `--prompt` flag is a small follow-up.
4. **Why a new module instead of extending `trace.rs`?** `trace.rs`
   is 722 lines already; adding the save-tensor branch via a new
   module (52 lines + 4 unit tests) keeps the existing 4-format
   dispatch intact and isolates the wrapper-specific imports
   (`realizar::inference_trace::save_tensor_plan::SaveTensorPlan`).
5. **Why now?** Operator's standing /loop directive is "select next
   best recommended choice". With PR #1414 (step 2 LmHead) merged
   today and PR #1416 (refactor prep) auto-merge queued, the
   highest-leverage move is making the existing capture surface
   actually work end-to-end. SHIP-007 PR-E (the live bisection) is
   gated on `apr trace --save-tensor` producing files; that gate is
   what this PR opens.

## Test plan

- [x] `cargo test -p apr-cli --lib commands::trace_save_tensor` →
  5/5 PASS
  - default_output_dir_uses_model_stem
  - default_output_dir_handles_bare_filename
  - default_output_dir_handles_no_extension
  - collect_bin_files_recurses_per_layer_subdirs
  - collect_bin_files_missing_dir_is_ok
- [x] `cargo check -p apr-cli --lib` clean
- [ ] Live smoke on canonical 7B teacher (operator-pre-authorized
  lambda-labs lane): `apr trace --payload
  /mnt/nvme-raid0/.../qwen2.5-coder-7b-instruct-q4k.apr
  --save-tensor embedding,lm_head` produces
  `<dir>/layer-0/embedding.bin` + `<dir>/lm_head.bin`. Deferred to a
  follow-up commit since the operator can run this any time after
  merge.
- [ ] CI required checks (`ci / gate`, `workspace-test`)

## Ship % update

- **MODEL-1**: ~68% → **~70%** (the wrapper surface that's been
  merged for 2 days is now actually invocable from the CLI, which
  means SHIP-007 PR-E live diagnostics are no longer blocked on
  this dispatch gap).
- **MODEL-2**: corpus tokenization at ~96M tokens / 119 min
  (steady ~14K tok/s).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 3, 2026 09:34
@noahgift noahgift merged commit 420eabc into main May 3, 2026
11 checks passed
@noahgift noahgift deleted the feat/apr-trace-save-tensor-dispatch-wire-up branch May 3, 2026 09:57
noahgift added a commit that referenced this pull request May 3, 2026
… records CLI dispatch wire-up PARTIAL discharge

Follow-up paperwork to PR #1417 (`apr trace --save-tensor` end-to-end
dispatch for .apr files). Adds FALSIFY-APR-TRACE-SAVE-011 binding the
new dispatch wire-up at PARTIAL_ALGORITHM_LEVEL with `binds_to:
cli_signature`.

Before PR #1417, `apr trace --save-tensor` only printed a stub and
never invoked `forward_traced_with_save_tensor`. The contract test
`apr trace --save-tensor --help | grep save-tensor` (FALSIFY-001) was
already passing at the binary-boundary level — but the dispatch glue
was missing, leaving Embedding + LmHead capture surface unreachable
from the CLI for 2 days post-step-2 merge.

FALSIFY-011 extends the existing `cli_signature` invariant from
"the flag is recognized" to "the flag actually produces files".

## Five Whys

1. **Why a separate contract bump?** Avoids file-conflict with the
   in-flight refactor PR #1416 (which only touches
   `crates/aprender-serve/`). My contract change is isolated to
   `contracts/apr-cli-trace-save-tensor-v1.yaml`.
2. **Why `binds_to: cli_signature`?** PR #1417 doesn't change the
   byte format or determinism — it makes the CLI surface that the
   `cli_signature` equation already specified actually invocable.
   Same equation, expanded discharge level.
3. **Why PARTIAL_ALGORITHM_LEVEL?** The 5 unit tests cover path
   resolution (3) and recursive *.bin walking (2) — algorithm-level.
   A live discharge against the canonical 7B teacher is operator-
   gated by post-merge smoke (~30s for a 7B forward + 2 file writes).
4. **Why bump v1.2.0 → v1.3.0?** Adding a new falsification test
   that binds an existing invariant is a minor schema change per
   semver. v1.0.0 → v1.1.0 → v1.2.0 → v1.3.0 records each step's
   discharge timeline:
     - v1.1.0 (PR #1413): apr_diff_values_compat → APRT-aware diff
     - v1.2.0 (PR #1415): byte_format → LmHead capture (step 2)
     - v1.3.0 (this PR): cli_signature → end-to-end dispatch
5. **Why now?** Records the algorithm-level discharge so when the
   operator runs the live smoke post-#1417-merge, the contract
   ledger doesn't lag the code. Same paperwork pattern as #1415
   (which followed #1414).

## Verification

- `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` →
  0 errors, 0 warnings

## Ship % update

- MODEL-1: ~70% (unchanged — pure paperwork; code is in PR #1417).
- MODEL-2: corpus tokenization at ~115M tokens / 143 min (steady
  ~14K tok/s; ~33h ETA total).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 3, 2026
… records CLI dispatch wire-up PARTIAL discharge

Follow-up paperwork to PR #1417 (`apr trace --save-tensor` end-to-end
dispatch for .apr files). Adds FALSIFY-APR-TRACE-SAVE-011 binding the
new dispatch wire-up at PARTIAL_ALGORITHM_LEVEL with `binds_to:
cli_signature`.

Before PR #1417, `apr trace --save-tensor` only printed a stub and
never invoked `forward_traced_with_save_tensor`. The contract test
`apr trace --save-tensor --help | grep save-tensor` (FALSIFY-001) was
already passing at the binary-boundary level — but the dispatch glue
was missing, leaving Embedding + LmHead capture surface unreachable
from the CLI for 2 days post-step-2 merge.

FALSIFY-011 extends the existing `cli_signature` invariant from
"the flag is recognized" to "the flag actually produces files".

## Five Whys

1. **Why a separate contract bump?** Avoids file-conflict with the
   in-flight refactor PR #1416 (which only touches
   `crates/aprender-serve/`). My contract change is isolated to
   `contracts/apr-cli-trace-save-tensor-v1.yaml`.
2. **Why `binds_to: cli_signature`?** PR #1417 doesn't change the
   byte format or determinism — it makes the CLI surface that the
   `cli_signature` equation already specified actually invocable.
   Same equation, expanded discharge level.
3. **Why PARTIAL_ALGORITHM_LEVEL?** The 5 unit tests cover path
   resolution (3) and recursive *.bin walking (2) — algorithm-level.
   A live discharge against the canonical 7B teacher is operator-
   gated by post-merge smoke (~30s for a 7B forward + 2 file writes).
4. **Why bump v1.2.0 → v1.3.0?** Adding a new falsification test
   that binds an existing invariant is a minor schema change per
   semver. v1.0.0 → v1.1.0 → v1.2.0 → v1.3.0 records each step's
   discharge timeline:
     - v1.1.0 (PR #1413): apr_diff_values_compat → APRT-aware diff
     - v1.2.0 (PR #1415): byte_format → LmHead capture (step 2)
     - v1.3.0 (this PR): cli_signature → end-to-end dispatch
5. **Why now?** Records the algorithm-level discharge so when the
   operator runs the live smoke post-#1417-merge, the contract
   ledger doesn't lag the code. Same paperwork pattern as #1415
   (which followed #1414).

## Verification

- `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` →
  0 errors, 0 warnings

## Ship % update

- MODEL-1: ~70% (unchanged — pure paperwork; code is in PR #1417).
- MODEL-2: corpus tokenization at ~115M tokens / 143 min (steady
  ~14K tok/s; ~33h ETA total).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 3, 2026
… records CLI dispatch wire-up PARTIAL discharge (#1418)

Follow-up paperwork to PR #1417 (`apr trace --save-tensor` end-to-end
dispatch for .apr files). Adds FALSIFY-APR-TRACE-SAVE-011 binding the
new dispatch wire-up at PARTIAL_ALGORITHM_LEVEL with `binds_to:
cli_signature`.

Before PR #1417, `apr trace --save-tensor` only printed a stub and
never invoked `forward_traced_with_save_tensor`. The contract test
`apr trace --save-tensor --help | grep save-tensor` (FALSIFY-001) was
already passing at the binary-boundary level — but the dispatch glue
was missing, leaving Embedding + LmHead capture surface unreachable
from the CLI for 2 days post-step-2 merge.

FALSIFY-011 extends the existing `cli_signature` invariant from
"the flag is recognized" to "the flag actually produces files".

## Five Whys

1. **Why a separate contract bump?** Avoids file-conflict with the
   in-flight refactor PR #1416 (which only touches
   `crates/aprender-serve/`). My contract change is isolated to
   `contracts/apr-cli-trace-save-tensor-v1.yaml`.
2. **Why `binds_to: cli_signature`?** PR #1417 doesn't change the
   byte format or determinism — it makes the CLI surface that the
   `cli_signature` equation already specified actually invocable.
   Same equation, expanded discharge level.
3. **Why PARTIAL_ALGORITHM_LEVEL?** The 5 unit tests cover path
   resolution (3) and recursive *.bin walking (2) — algorithm-level.
   A live discharge against the canonical 7B teacher is operator-
   gated by post-merge smoke (~30s for a 7B forward + 2 file writes).
4. **Why bump v1.2.0 → v1.3.0?** Adding a new falsification test
   that binds an existing invariant is a minor schema change per
   semver. v1.0.0 → v1.1.0 → v1.2.0 → v1.3.0 records each step's
   discharge timeline:
     - v1.1.0 (PR #1413): apr_diff_values_compat → APRT-aware diff
     - v1.2.0 (PR #1415): byte_format → LmHead capture (step 2)
     - v1.3.0 (this PR): cli_signature → end-to-end dispatch
5. **Why now?** Records the algorithm-level discharge so when the
   operator runs the live smoke post-#1417-merge, the contract
   ledger doesn't lag the code. Same paperwork pattern as #1415
   (which followed #1414).

## Verification

- `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` →
  0 errors, 0 warnings

## Ship % update

- MODEL-1: ~70% (unchanged — pure paperwork; code is in PR #1417).
- MODEL-2: corpus tokenization at ~115M tokens / 143 min (steady
  ~14K tok/s; ~33h ETA total).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant