feat(aprender-serve): SHIP-007 PR-B — `SaveTensorPlan` plan-builder by noahgift · Pull Request #1406 · paiml/aprender

noahgift · 2026-05-02T11:20:27Z

Summary

Add inference_trace::save_tensor_plan that converts the three apr trace --save-tensor* argument strings into a validated, ready-to-execute plan exposing should_save(stage, layer) and stage_path(stage, layer).

Pure module — no I/O, no transformer state, no apr-cli coupling. PR-C will thread the plan through AprTransformer::forward_traced as a side-channel.

SHIP-007 cascade position

This is PR-B of a 4-PR cascade authorized by the operator on 2026-05-02:

PR	Status	Scope
A (#1405)	OPEN	clap surface + dispatch stub
B (this PR)	OPEN	plan-builder (validates strings → struct)
C	TBD	wire plan into `forward_traced` + `write_stage_file`
D	TBD	`apr diff --stage` consuming saved tensors

Independence from PR-A

PR-B does NOT modify apr-cli. The branch is off origin/main (which does not contain PR-A's clap fields). PR-A and PR-B can land in either order; PR-C will rebase off whichever lands first and add the dispatch wiring.

Five Whys

Why split plan-builder from forward-pass wiring? Argument validation must fail BEFORE multi-second model load.
Why pure (no I/O)? ensure_layer_dir/file writes are stateful side effects best owned by PR-C's writer. Tests stay fast: 28 cases in 0.00s.
Why 0..1 default? Matches PR-A clap default. SHIP-007 root cause pinned at layer-0 qkv matmul.
Why all keyword? Contract §invariants line 63 specifies multiple stages per run; all is the obvious shorthand for the 18-stage exhaustive bisection.
Why now? Operator directive 2026-05-02 — "Path forward to ship MODEL-1 is SHIP-007 layer-0 stage diff (extend apr trace --save-tensor, multi-PR work)".

Test coverage (28 cases, all pass)

Provenance pin (1)
Pass band (6): realistic 3-stage, all expansion, case-insensitive all, whitespace tolerance in stages + range, wide range
Fail band (7): unknown stage, empty token, malformed range, negative start, END=START, END<START, garbage END
should_save (5): in-range, out-of-range, unselected, whole-model bypass, default 0..1
stage_path (4): per-layer 0, per-layer 3, whole-model skips layer segment, sentinel layer
Edge (5): layer_output alias, duplicate preservation, min range, high layer index, Clone+Eq

Ship % update

MODEL-1: ~64% (unchanged — plan-builder is preparatory; element-wise diff still pending PR-C/D)
MODEL-2: blocked on Stack v2 corpus access (operator action — accept terms at https://huggingface.co/datasets/bigcode/starcoderdata)

Test plan

cargo check -p aprender-serve --lib clean
cargo test -p aprender-serve --lib save_tensor_plan — 28/28 pass
cargo clippy -p aprender-serve --lib -- -D warnings clean (test-target clippy errors are pre-existing in unrelated files)
cargo fmt --check clean
CI green (gate + workspace-test)

🤖 Generated with Claude Code

Add `inference_trace::save_tensor_plan` (`SaveTensorPlan` + `PlanParseError`) that converts the three `apr trace --save-tensor*` argument strings into a validated, ready-to-execute plan: - `--save-tensor <STAGES>` comma-list, or the literal `all` - `--save-tensor-dir <DIR>` output root - `--save-tensor-layers <RANGE>` Rust `START..END`, END exclusive The plan exposes `should_save(stage, layer)` and `stage_path(stage, layer)` queries. Per-layer stages obey the layer range; whole-model stages (`final_norm`, `lm_head`) ignore the range and are saved iff selected. Pure module — no I/O, no transformer state, no `apr-cli` coupling. PR-C threads the plan into `AprTransformer::forward_traced` as a side-channel. 28 unit tests covering: provenance pin, pass band (whitespace, `all`, `ALL`/`aLL`, wide ranges), fail band (unknown stage, empty token, malformed range, negative start, END<=START, garbage END), should_save semantics (in-range, out-of-range, unselected, whole-model bypass, default `0..1`), stage_path semantics (per-layer 0/3, whole-model with sentinel), edge cases (`layer_output` alias, duplicate preservation, min range, high layer index, Clone+Eq). ## Five Whys 1. Why split plan-builder from forward-pass wiring? — Argument validation should fail BEFORE multi-second model load; the only way to keep the error surface tight is to parse eagerly. Splitting also lets PR-C touch `forward_traced` internals without touching argument parsing. 2. Why pure (no I/O)? — `ensure_layer_dir`/file open are stateful side effects best owned by the writer (PR-C). Tests stay fast and deterministic; `cargo test --lib` runs the 28 cases in 0.00s. 3. Why `0..1` default? — Matches the clap default in PR-A (#1405). SHIP-007 root cause is pinned at layer-0 qkv matmul; default keeps disk-write blast radius bounded; users opt in to wider ranges. 4. Why `all` keyword? — Contract §invariants line 63 specifies "comma-delimited; multiple stages MAY be saved in one run". `all` is the obvious shorthand for the 18-stage exhaustive bisection the SHIP-007 hypothesis chain (§24-§32) is heading toward. 5. Why now in the SHIP-TWO loop? — Operator directive 2026-05-02: "Path forward to ship MODEL-1 is SHIP-007 layer-0 stage diff (extend apr trace --save-tensor, multi-PR work)". PR-B is step 2/4. ## Independence from PR-A PR-B does NOT modify `apr-cli`. The branch is off `origin/main` (which does not yet contain PR-A's clap fields). PR-A and PR-B can land in either order; PR-C will rebase off whichever is already in main and introduce the dispatch wiring. ## Ship % update MODEL-1: ~64% (unchanged — plan-builder is preparatory; element-wise diff still pending PR-C/D forward_traced wiring). MODEL-2: blocked on Stack v2 corpus access (operator action). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…tion (#1407) Add 6 integration tests in `tests/save_tensor_plan_integration.rs` that drive `SaveTensorPlan` (PR-B, #1406) end-to-end through the public `save_tensor::*` writer + `save_tensor_paths::*` directory helpers. These pin the plan ↔ writer contract from the OUTSIDE so the upcoming `forward_traced` wiring (PR-C-real) has a stable target. The tests mirror the future dispatch flow: build plan from CLI strings → iterate forward-pass-style `(stage, layer, data)` tuples → assert files appear exactly where the plan predicted, and only for selected stages. ## Tests (6/6 pass) | Test | What it pins | |------|-------------| | `plan_three_stages_layer_zero_writes_three_files` | multi-stage selection + skip unselected | | `plan_layer_range_filter_excludes_out_of_range` | END-exclusive range honoured end-to-end | | `plan_whole_model_stage_writes_to_root_not_layer_dir` | final_norm/lm_head bypass layer-N segment | | `plan_unselected_stage_produces_no_file` | should_save==false → zero I/O | | `plan_byte_determinism_across_two_runs` | FALSIFY-APR-TRACE-SAVE-002 at integration boundary | | `plan_all_keyword_writes_18_per_layer_files_for_one_layer` | `all` keyword → 18 stages routed correctly | ## Five Whys 1. **Why integration tests now, not in PR-B?** PR-B unit tests verify `should_save`/`stage_path` in isolation. These integration tests verify the plan ACTUALLY drives the writer correctly when used the way `forward_traced` will use it. 2. **Why ship before PR-C-real?** Locks down the plan ↔ writer contract so PR-C-real (forward_traced surgery) only has to plumb the call sites; if a future PR breaks the contract, these tests fail BEFORE the model loads. 3. **Why a separate file from `save_tensor_integration.rs`?** That file exercises the writer in isolation; this file exercises the plan-driven flow. Mixing would obscure which level of the contract regressed when a test fails. 4. **Why include the `all` keyword test?** SHIP-007 layer-0 stage diff may run `--save-tensor all --save-tensor-layers 0..1` for an exhaustive bisection; routing 18 stages through the plan is the single most stressed path and deserves its own pin. 5. **Why now in the SHIP-TWO loop?** Ship-cadence move that closes the contract surface PR-C-real depends on, while PR-A (#1405) and PR-B (#1406) CI is still in flight. Stack-friendly: branched off PR-B, merges cleanly when both land. ## Stack relationship This PR depends on PR-B (#1406) — SaveTensorPlan must be in the tree. Branched off `feat/ship-007-pr-b-save-tensor-plan`. GitHub will set PR-B as the base; once PR-B merges to main, this PR auto-rebases. ## Ship % update MODEL-1: ~64% (unchanged — integration tests are preparatory; the forward_traced surgery in PR-C-real and the `apr trace --save-tensor` dispatch wiring in PR-D are still pending). MODEL-2: blocked on Stack v2 corpus access (operator action). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…h_save_tensor` wrapper (#1408) * feat(aprender-serve): SHIP-007 PR-B — SaveTensorPlan plan-builder Add `inference_trace::save_tensor_plan` (`SaveTensorPlan` + `PlanParseError`) that converts the three `apr trace --save-tensor*` argument strings into a validated, ready-to-execute plan: - `--save-tensor <STAGES>` comma-list, or the literal `all` - `--save-tensor-dir <DIR>` output root - `--save-tensor-layers <RANGE>` Rust `START..END`, END exclusive The plan exposes `should_save(stage, layer)` and `stage_path(stage, layer)` queries. Per-layer stages obey the layer range; whole-model stages (`final_norm`, `lm_head`) ignore the range and are saved iff selected. Pure module — no I/O, no transformer state, no `apr-cli` coupling. PR-C threads the plan into `AprTransformer::forward_traced` as a side-channel. 28 unit tests covering: provenance pin, pass band (whitespace, `all`, `ALL`/`aLL`, wide ranges), fail band (unknown stage, empty token, malformed range, negative start, END<=START, garbage END), should_save semantics (in-range, out-of-range, unselected, whole-model bypass, default `0..1`), stage_path semantics (per-layer 0/3, whole-model with sentinel), edge cases (`layer_output` alias, duplicate preservation, min range, high layer index, Clone+Eq). ## Five Whys 1. Why split plan-builder from forward-pass wiring? — Argument validation should fail BEFORE multi-second model load; the only way to keep the error surface tight is to parse eagerly. Splitting also lets PR-C touch `forward_traced` internals without touching argument parsing. 2. Why pure (no I/O)? — `ensure_layer_dir`/file open are stateful side effects best owned by the writer (PR-C). Tests stay fast and deterministic; `cargo test --lib` runs the 28 cases in 0.00s. 3. Why `0..1` default? — Matches the clap default in PR-A (#1405). SHIP-007 root cause is pinned at layer-0 qkv matmul; default keeps disk-write blast radius bounded; users opt in to wider ranges. 4. Why `all` keyword? — Contract §invariants line 63 specifies "comma-delimited; multiple stages MAY be saved in one run". `all` is the obvious shorthand for the 18-stage exhaustive bisection the SHIP-007 hypothesis chain (§24-§32) is heading toward. 5. Why now in the SHIP-TWO loop? — Operator directive 2026-05-02: "Path forward to ship MODEL-1 is SHIP-007 layer-0 stage diff (extend apr trace --save-tensor, multi-PR work)". PR-B is step 2/4. ## Independence from PR-A PR-B does NOT modify `apr-cli`. The branch is off `origin/main` (which does not yet contain PR-A's clap fields). PR-A and PR-B can land in either order; PR-C will rebase off whichever is already in main and introduce the dispatch wiring. ## Ship % update MODEL-1: ~64% (unchanged — plan-builder is preparatory; element-wise diff still pending PR-C/D forward_traced wiring). MODEL-2: blocked on Stack v2 corpus access (operator action). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(aprender-serve): SHIP-007 PR-C-prep — SaveTensorPlan I/O integration (#1407) Add 6 integration tests in `tests/save_tensor_plan_integration.rs` that drive `SaveTensorPlan` (PR-B, #1406) end-to-end through the public `save_tensor::*` writer + `save_tensor_paths::*` directory helpers. These pin the plan ↔ writer contract from the OUTSIDE so the upcoming `forward_traced` wiring (PR-C-real) has a stable target. The tests mirror the future dispatch flow: build plan from CLI strings → iterate forward-pass-style `(stage, layer, data)` tuples → assert files appear exactly where the plan predicted, and only for selected stages. ## Tests (6/6 pass) | Test | What it pins | |------|-------------| | `plan_three_stages_layer_zero_writes_three_files` | multi-stage selection + skip unselected | | `plan_layer_range_filter_excludes_out_of_range` | END-exclusive range honoured end-to-end | | `plan_whole_model_stage_writes_to_root_not_layer_dir` | final_norm/lm_head bypass layer-N segment | | `plan_unselected_stage_produces_no_file` | should_save==false → zero I/O | | `plan_byte_determinism_across_two_runs` | FALSIFY-APR-TRACE-SAVE-002 at integration boundary | | `plan_all_keyword_writes_18_per_layer_files_for_one_layer` | `all` keyword → 18 stages routed correctly | ## Five Whys 1. **Why integration tests now, not in PR-B?** PR-B unit tests verify `should_save`/`stage_path` in isolation. These integration tests verify the plan ACTUALLY drives the writer correctly when used the way `forward_traced` will use it. 2. **Why ship before PR-C-real?** Locks down the plan ↔ writer contract so PR-C-real (forward_traced surgery) only has to plumb the call sites; if a future PR breaks the contract, these tests fail BEFORE the model loads. 3. **Why a separate file from `save_tensor_integration.rs`?** That file exercises the writer in isolation; this file exercises the plan-driven flow. Mixing would obscure which level of the contract regressed when a test fails. 4. **Why include the `all` keyword test?** SHIP-007 layer-0 stage diff may run `--save-tensor all --save-tensor-layers 0..1` for an exhaustive bisection; routing 18 stages through the plan is the single most stressed path and deserves its own pin. 5. **Why now in the SHIP-TWO loop?** Ship-cadence move that closes the contract surface PR-C-real depends on, while PR-A (#1405) and PR-B (#1406) CI is still in flight. Stack-friendly: branched off PR-B, merges cleanly when both land. ## Stack relationship This PR depends on PR-B (#1406) — SaveTensorPlan must be in the tree. Branched off `feat/ship-007-pr-b-save-tensor-plan`. GitHub will set PR-B as the base; once PR-B merges to main, this PR auto-rebases. ## Ship % update MODEL-1: ~64% (unchanged — integration tests are preparatory; the forward_traced surgery in PR-C-real and the `apr trace --save-tensor` dispatch wiring in PR-D are still pending). MODEL-2: blocked on Stack v2 corpus access (operator action). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> * feat(aprender-serve): SHIP-007 PR-C-real step 1 — `forward_traced_with_save_tensor` wrapper Add public method `AprTransformer::forward_traced_with_save_tensor( tokens, plan)` that delegates to existing `forward_traced` and additionally writes the **embedding** stage to disk if the supplied [`SaveTensorPlan`] (PR-B, #1406) selects it. This is the first of several SHIP-007 forward_traced wiring steps. Step 1 covers only the embedding stage — the one stage that can be re-extracted by calling `self.embed(token_ids)` a second time, so we ship working save-tensor I/O without modifying the 360-line `forward_traced` body. Subsequent steps will thread `Option<&SaveTensorPlan>` through `forward_traced` itself so per-layer stages (qkv_matmul, ffn_gate, …) emit during the single forward pass without re-runs. ## Five Whys 1. **Why a wrapper instead of modifying forward_traced directly?** Keeps the high-risk forward-pass surgery out of step 1. The wrapper compiles in 4.5 s and connects cleanly to `forward_traced`'s existing return value; modifying the 360-line body would be a large, risky diff for a single ship-cadence iteration. 2. **Why only the embedding stage?** Embedding is the only stage that can be cheaply re-extracted post-hoc via `self.embed()` (token-table lookup, no matmuls). Other stages live inside `forward_traced` and require threading the plan through to capture. 3. **Why error-typed as `RealizarError::IoError`?** The existing project error type already has an `IoError { message }` variant; using it keeps the `Result<ForwardTrace>` signature compatible with `forward_traced` so callers don't need a `From<>` impl. 4. **Why no integration test on a real model in this PR?** Building a live `AprTransformer` requires loading a real .apr file — appropriate for PR-D (`apr trace --save-tensor` on canonical 7B teacher), not for a step-1 unit-level wrapper. The plan ↔ writer contract is already pinned by PR-C-prep (#1407). 5. **Why now in the SHIP-TWO loop?** Operator directive 2026-05-02 — "Path forward to ship MODEL-1 is SHIP-007 layer-0 stage diff (extend apr trace --save-tensor, multi-PR work)". Step 1 establishes the public method that PR-D's dispatch will call. ## Stack relationship This PR depends on PR-B (#1406) — `SaveTensorPlan` must be in the tree. Branched off `feat/ship-007-pr-b-save-tensor-plan`. GitHub will auto-rebase once PR-B merges to main. ## Ship % update MODEL-1: ~64% (unchanged — wrapper is preparatory; effective tensor capture for any layer-0 stage other than embedding still requires follow-up PRs threading the plan through forward_traced). MODEL-2: blocked on Stack v2 corpus access (operator action). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(aprender-serve): SHIP-007 PR-C-real-step1 — clippy doc list overindentation CI lint failed with `error: doc list item overindented` on the "## Role in the cascade" bulleted list (4 occurrences). The continuation lines inside the **This file** bullet were indented 25 spaces; clippy expects 2-space continuation indent. Reformat the bullet so the continuation text uses 2-space indent and the file body stays inside one bullet. Also update PR-B/PR-B-prep status from OPEN → MERGED (both landed this session). No code changes, doc-only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…age tensors (#1413) Closes the `apr_diff_values_compat` invariant of `apr-cli-trace-save-tensor-v1` at PARTIAL_ALGORITHM_LEVEL via a new `diff_05_aprt_stage.rs` include slot. When both inputs to `apr diff --values` start with magic bytes `APRT` (the 12-byte header written by `apr trace --save-tensor`), the dispatch now bypasses the RosettaStone whole-model walker and runs an element-wise stage-tensor diff: - max|diff| with index - RMS diff - Cosine similarity (f64-accumulated for numerical stability) - Top-K divergences sorted by |a - b| Both JSON and pretty text output are supported. Mismatched dim_product or layer fields fail-fast with a diagnostic error so callers don't silently compare incompatible stages. ## Five Whys (why now, why this scope) 1. **Why is this needed?** `apr trace --save-tensor` (PR-A #1405, PR-B #1406, PR-C-prep #1407) writes per-stage f32 tensors as `APRT`-prefixed files. Without an APRT-aware diff, layer-0 stage-by-stage element-wise bisection per `feedback_model_1_ships_gpu_only.md` is gated on external tooling — exactly the kind of muda the APR-MONO §26.8 rule forbids. 2. **Why extend `apr diff` and not write a new subcommand?** The `apr_diff_values_compat` invariant in `apr-cli-trace-save-tensor-v1` already names `apr diff --values` as the verifier. Extending the existing flag keeps the contract surface stable. 3. **Why an include!() file instead of inlining into diff.rs?** diff.rs already follows that pattern (diff_accumulator, diff_output_json_text, diff_04). Keeping APRT logic in `diff_05_aprt_stage.rs` lets it be audited / removed independently and doesn't grow the parent file. 4. **Why pin via `provenance_pin_pr_d_rev1`?** Future renames of either `is_aprt_stage_file` or the file path break the include!() chain; the pin makes that visible at test-time and forces a contract bump. 5. **Why now?** Tokenization of the 27 GB Stack v1.2 Python corpus is running in the background for MODEL-2 (PR #1412 merged). The SHIP-007 PR-C-real cascade for MODEL-1 needs PR-D infrastructure ready when step 2 (forward_traced threading) lands. PR-D is independent and can merge in parallel with #1408. ## Verification - `cargo test -p apr-cli --lib commands::diff::aprt` → 11/11 PASS - is_aprt_stage_file: detects/rejects/truncated/missing (4 tests) - compute_aprt_stage_stats: identical=zero, known max/RMS, top-K sort (3) - run_aprt_stage_diff: dim/layer mismatch errors, identical succeeds (3) - provenance_pin_pr_d_rev1 (1) - `cargo clippy -p apr-cli --lib --no-deps -- -D warnings` clean - `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` → 0 errors ## Contract update `apr-cli-trace-save-tensor-v1` v1.0.0 → v1.1.0: - New FALSIFY-APR-TRACE-SAVE-009 binding `apr_diff_values_compat` at PARTIAL_ALGORITHM_LEVEL with 4-line `algorithm_evidence` block citing this PR's unit tests. ## Ship % update MODEL-1: ~64% → ~66% (PR-D is small but discharges 1 PARTIAL invariant and clears infrastructure blocker for SHIP-007 step E). MODEL-2: corpus tokenization in progress (~33h ETA). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift mentioned this pull request May 2, 2026

test(aprender-serve): SHIP-007 PR-C-prep — SaveTensorPlan I/O integration tests #1407

Merged

4 tasks

Merge branch 'main' into feat/ship-007-pr-b-save-tensor-plan

91afdf8

noahgift enabled auto-merge (squash) May 2, 2026 12:17

noahgift and others added 2 commits May 2, 2026 14:46

Merge branch 'main' into feat/ship-007-pr-b-save-tensor-plan

c3de3a2

noahgift mentioned this pull request May 2, 2026

feat(aprender-serve): SHIP-007 PR-C-real step 1 — forward_traced_with_save_tensor wrapper #1408

Merged

4 tasks

noahgift merged commit d1b284d into main May 2, 2026
10 checks passed

noahgift deleted the feat/ship-007-pr-b-save-tensor-plan branch May 2, 2026 13:03

noahgift mentioned this pull request May 3, 2026

contract(apr-cli-trace-save-tensor-v1): v1.2.0 → v1.3.0 — FALSIFY-011 records CLI dispatch wire-up PARTIAL discharge #1418

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(aprender-serve): SHIP-007 PR-B — `SaveTensorPlan` plan-builder#1406

feat(aprender-serve): SHIP-007 PR-B — `SaveTensorPlan` plan-builder#1406
noahgift merged 4 commits into
mainfrom
feat/ship-007-pr-b-save-tensor-plan

noahgift commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 2, 2026

Summary

SHIP-007 cascade position

Independence from PR-A

Five Whys

Test coverage (28 cases, all pass)

Ship % update

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant