contract(apr-cli-trace-save-tensor-v1): v1.1.0 → v1.2.0 — FALSIFY-010 records LmHead step-2 PARTIAL by noahgift · Pull Request #1415 · paiml/aprender

noahgift · 2026-05-03T08:26:47Z

Summary

Follow-up paperwork to PR #1414 (SHIP-007 PR-C-real step 2 — LmHead capture). Adds FALSIFY-APR-TRACE-SAVE-010 binding the LmHead branch at PARTIAL_ALGORITHM_LEVEL with explicit algorithm-level evidence pointing at the four new pin tests in `traced_save_tensor_step2_tests`.

The bump was deferred from PR #1414 itself to avoid file-conflict with PR #1413 (which independently bumped v1.0.0 → v1.1.0 with FALSIFY-009). With #1413 now merged on main, this is the natural follow-up.

What changed

`contracts/apr-cli-trace-save-tensor-v1.yaml` v1.1.0 → v1.2.0
New FALSIFY-APR-TRACE-SAVE-010 binding `byte_format` (the equation that already specifies WHOLE_MODEL_LAYER + f32 LE + NaN preservation — exactly what step 2's LmHead branch invokes via `write_tensor_file`).
6-line `algorithm_evidence` block citing the 4 unit tests + impl path + deferred-live-discharge note pointing at SHIP-007 PR-E.

Test plan

`pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` → 0 errors
CI required checks (`ci / gate`, `workspace-test`)

Five Whys

Why a separate contract follow-up? PR feat(aprender-serve): SHIP-007 PR-C-real step 2 — LmHead capture in forward_traced wrapper #1414 needed PR feat(apr-cli): SHIP-007 PR-D — apr diff --values recognizes APRT stage tensors #1413 to land first to avoid file-conflict on `metadata.version`.
Why `binds_to: byte_format`? Step 2 doesn't add a new clap surface; it invokes the same write path with `WHOLE_MODEL_LAYER` sentinel that `byte_format` invariants already specify (NaN-bit preservation, f32 LE, 12-byte header).
Why PARTIAL_ALGORITHM_LEVEL not full? Pin tests use synthetic plans + fake logits; live discharge against canonical 7B teacher is deferred to SHIP-007 PR-E (layer-0 bisection).
Why v1.2.0? Adding a new falsification test that binds an existing invariant is a minor schema change per semver.
Why now? Records the algorithm-level discharge while the operator is still building MODEL-2 corpus context — keeps the contract ledger in sync with what's already in code (`forward_traced_with_save_tensor` step 2 in PR feat(aprender-serve): SHIP-007 PR-C-real step 2 — LmHead capture in forward_traced wrapper #1414).

Ship % update

MODEL-1: ~68% (unchanged — this is paperwork recording PR feat(aprender-serve): SHIP-007 PR-C-real step 2 — LmHead capture in forward_traced wrapper #1414's algorithm-level discharge).
MODEL-2: tokenization ~46.5M tokens / 56 min; ~33h ETA.

🤖 Generated with Claude Code

… records LmHead step-2 PARTIAL discharge Follow-up to PR #1414 (`forward_traced_with_save_tensor` step 2). Adds FALSIFY-APR-TRACE-SAVE-010 binding the LmHead branch at PARTIAL_ALGORITHM_LEVEL; the algorithm-level evidence cites the four new pin tests in `traced_save_tensor_step2_tests`: - step2_lm_head_writes_to_output_root_not_per_layer_dir - step2_lm_head_header_uses_whole_model_sentinel - step2_lm_head_skipped_when_plan_does_not_select_it - step2_lm_head_writes_logits_bytes_verbatim (NaN-bit preserving) `binds_to: byte_format` because step 2 invokes the same write_tensor_file path with `WHOLE_MODEL_LAYER` sentinel as the existing `byte_format` equation specifies. Live discharge against the canonical 7B teacher is deferred to SHIP-007 PR-E (layer-0 bisection). ## Five Whys 1. **Why a separate contract follow-up?** The PR #1414 commit needed to land before this bump to avoid file-conflict with PR #1413 (which independently bumped v1.0.0 → v1.1.0 with FALSIFY-009). 2. **Why `binds_to: byte_format` and not `cli_signature`?** The wrapper doesn't add a new clap surface (PR-A already did that); it adds a new branch that emits files conforming to the existing byte-format equation. The new branch's verbatim f32 LE round-trip + NaN preservation is exactly the property `byte_format` invariants pin. 3. **Why PARTIAL_ALGORITHM_LEVEL not full discharge?** The 4 unit tests simulate the wrapper's byte-flow at the contract level using synthetic plans and fake logits — they do NOT instantiate a full AprTransformer or load a real APR model. Live discharge requires SHIP-007 PR-E. 4. **Why bump to v1.2.0?** Adding a new falsification test (FALSIFY-010) that binds an additional invariant is a minor schema change. Per semver, that's a minor bump. 5. **Why `pv validate` clean even with two new falsifiers in 24h?** The contract uses metadata.kind=schema, so falsification_tests entries are flexible; pv validates structure, IDs are unique, and binds_to references are valid. ## Verification - `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` → 0 errors - v1.0.0 → v1.1.0 (PR #1413, FALSIFY-009 binding apr_diff_values_compat) - v1.1.0 → v1.2.0 (this PR, FALSIFY-010 binding LmHead step-2 capture) ## Ship % update - MODEL-1: ~68% (unchanged — this is paperwork that records yesterday's algorithm-level discharge of step 2; the actual capture surface expansion happened in PR #1414). - MODEL-2: corpus tokenization at ~46.5M tokens / 56 min (steady ~14K tok/s); ~33h ETA for full 27 GB Stack v1.2 corpus. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… records CLI dispatch wire-up PARTIAL discharge Follow-up paperwork to PR #1417 (`apr trace --save-tensor` end-to-end dispatch for .apr files). Adds FALSIFY-APR-TRACE-SAVE-011 binding the new dispatch wire-up at PARTIAL_ALGORITHM_LEVEL with `binds_to: cli_signature`. Before PR #1417, `apr trace --save-tensor` only printed a stub and never invoked `forward_traced_with_save_tensor`. The contract test `apr trace --save-tensor --help | grep save-tensor` (FALSIFY-001) was already passing at the binary-boundary level — but the dispatch glue was missing, leaving Embedding + LmHead capture surface unreachable from the CLI for 2 days post-step-2 merge. FALSIFY-011 extends the existing `cli_signature` invariant from "the flag is recognized" to "the flag actually produces files". ## Five Whys 1. **Why a separate contract bump?** Avoids file-conflict with the in-flight refactor PR #1416 (which only touches `crates/aprender-serve/`). My contract change is isolated to `contracts/apr-cli-trace-save-tensor-v1.yaml`. 2. **Why `binds_to: cli_signature`?** PR #1417 doesn't change the byte format or determinism — it makes the CLI surface that the `cli_signature` equation already specified actually invocable. Same equation, expanded discharge level. 3. **Why PARTIAL_ALGORITHM_LEVEL?** The 5 unit tests cover path resolution (3) and recursive *.bin walking (2) — algorithm-level. A live discharge against the canonical 7B teacher is operator- gated by post-merge smoke (~30s for a 7B forward + 2 file writes). 4. **Why bump v1.2.0 → v1.3.0?** Adding a new falsification test that binds an existing invariant is a minor schema change per semver. v1.0.0 → v1.1.0 → v1.2.0 → v1.3.0 records each step's discharge timeline: - v1.1.0 (PR #1413): apr_diff_values_compat → APRT-aware diff - v1.2.0 (PR #1415): byte_format → LmHead capture (step 2) - v1.3.0 (this PR): cli_signature → end-to-end dispatch 5. **Why now?** Records the algorithm-level discharge so when the operator runs the live smoke post-#1417-merge, the contract ledger doesn't lag the code. Same paperwork pattern as #1415 (which followed #1414). ## Verification - `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` → 0 errors, 0 warnings ## Ship % update - MODEL-1: ~70% (unchanged — pure paperwork; code is in PR #1417). - MODEL-2: corpus tokenization at ~115M tokens / 143 min (steady ~14K tok/s; ~33h ETA total). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…7 forward_traced threading Centralizes the boilerplate that `forward_traced_with_save_tensor` inlines twice (Embedding step 1, LmHead step 2). When the SHIP-007 PR-C-real step-3+ surgery threads `Option<&SaveTensorPlan>` through `AprTransformer::forward_traced` itself, every per-layer stage will call this single function instead of repeating the `should_save → ensure_layer_dir → File::create → write_tensor_file → flush` chain at each capture point. ## What changed - New `crates/aprender-serve/src/inference_trace/save_tensor_emit.rs`: - `pub fn maybe_save_stage(plan: Option<&SaveTensorPlan>, stage, layer, values) -> io::Result<()>` — the gated entry point. Cheap no-op when plan is None or stage/layer not selected. Forwards to `write_stage_file` when the gate passes. - `pub fn write_stage_file(output_dir, stage, layer, values) -> io::Result<()>` — the unconditional write, exposed separately for tests and any future callers that have already gated. - 7 unit tests pinning: None=no-op, unselected-stage=no-op, per-layer→layer-N/<stage>.bin, whole-model→<root>/<stage>.bin with WHOLE_MODEL_LAYER sentinel, layer-range filter excludes out-of-range, NaN-bit-preserving f32 LE round-trip, missing parent dirs auto-created. - `crates/aprender-serve/src/inference_trace/mod.rs`: register the new module. - `crates/aprender-serve/src/apr_transformer/traced_save_tensor.rs`: - Replace 60-line Embedding+LmHead inline blocks with two calls to `maybe_save_stage`. Net 50-line shrink. - Wrapper's behavior is byte-identical: same API surface, same file layout, same NaN preservation. Existing 4 `traced_save_tensor_step2_tests` tests still PASS. ## Five Whys 1. **Why now?** PR #1414 (step 2) merged earlier today landed the second copy of the inline block. Pre-step-3 is the right time to factor — before 15 more capture points get added inside the 360-line `forward_traced` body. 2. **Why a new module instead of inlining in `apr_transformer/`?** The helper has zero coupling to `AprTransformer` (it takes a plan + stage + values). Living next to `save_tensor`, `save_tensor_paths`, `save_tensor_plan` matches the existing `inference_trace::save_tensor_*` family pattern. 3. **Why `Option<&SaveTensorPlan>` instead of `&SaveTensorPlan`?** Step 3 will thread this through `forward_traced`, which is also called from non-instrumented contexts (HTTP serving, training evals). The `Option` lets a single `forward_traced` body serve both — `maybe_save_stage(None, ...)` is a single discriminant compare in hot paths. 4. **Why expose `write_stage_file` separately from `maybe_save_stage`?** Test ergonomics (the unit tests need to verify the unconditional write path, not just the gated path) and forward-compatibility for a future `forward_traced_inner` that does its own `should_save` filtering inside a hot loop and wants to skip the option indirection. 5. **Why no contract bump in this PR?** This is a pure refactor — no behavior change, no new invariants. The existing `byte_format`, `determinism`, and `apr_diff_values_compat` invariants in `apr-cli-trace-save-tensor-v1.yaml` flow through exactly one function instead of two now, which makes future contract obligations easier to satisfy. PR #1415 (already in flight) bumps the contract for step 2; step 3 will bump again to v1.3.0. ## Test plan - [x] `cargo test -p aprender-serve --lib save_tensor_emit::tests` → 7/7 PASS - [x] `cargo test -p aprender-serve --lib traced_save_tensor_step2_tests` → 4/4 PASS (existing tests unchanged behavior verified) - [x] `cargo test -p aprender-serve --test save_tensor_plan_integration` → 6/6 PASS (contract-level integration unchanged) - [x] `cargo clippy -p aprender-serve --lib --no-deps -- -D warnings` clean ## Ship % update - MODEL-1: ~68% (unchanged — pure refactor; no new capture surface). Step 3 (per-layer threading inside `forward_traced`) is now a trivial follow-up: each capture point becomes one line: `maybe_save_stage(plan, SaveTensorStage::FfnGate, layer_idx, &gate)?;`. - MODEL-2: corpus tokenization still running (~83 min elapsed, 46.5M tokens, ~33h ETA). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… records CLI dispatch wire-up PARTIAL discharge (#1418) Follow-up paperwork to PR #1417 (`apr trace --save-tensor` end-to-end dispatch for .apr files). Adds FALSIFY-APR-TRACE-SAVE-011 binding the new dispatch wire-up at PARTIAL_ALGORITHM_LEVEL with `binds_to: cli_signature`. Before PR #1417, `apr trace --save-tensor` only printed a stub and never invoked `forward_traced_with_save_tensor`. The contract test `apr trace --save-tensor --help | grep save-tensor` (FALSIFY-001) was already passing at the binary-boundary level — but the dispatch glue was missing, leaving Embedding + LmHead capture surface unreachable from the CLI for 2 days post-step-2 merge. FALSIFY-011 extends the existing `cli_signature` invariant from "the flag is recognized" to "the flag actually produces files". ## Five Whys 1. **Why a separate contract bump?** Avoids file-conflict with the in-flight refactor PR #1416 (which only touches `crates/aprender-serve/`). My contract change is isolated to `contracts/apr-cli-trace-save-tensor-v1.yaml`. 2. **Why `binds_to: cli_signature`?** PR #1417 doesn't change the byte format or determinism — it makes the CLI surface that the `cli_signature` equation already specified actually invocable. Same equation, expanded discharge level. 3. **Why PARTIAL_ALGORITHM_LEVEL?** The 5 unit tests cover path resolution (3) and recursive *.bin walking (2) — algorithm-level. A live discharge against the canonical 7B teacher is operator- gated by post-merge smoke (~30s for a 7B forward + 2 file writes). 4. **Why bump v1.2.0 → v1.3.0?** Adding a new falsification test that binds an existing invariant is a minor schema change per semver. v1.0.0 → v1.1.0 → v1.2.0 → v1.3.0 records each step's discharge timeline: - v1.1.0 (PR #1413): apr_diff_values_compat → APRT-aware diff - v1.2.0 (PR #1415): byte_format → LmHead capture (step 2) - v1.3.0 (this PR): cli_signature → end-to-end dispatch 5. **Why now?** Records the algorithm-level discharge so when the operator runs the live smoke post-#1417-merge, the contract ledger doesn't lag the code. Same paperwork pattern as #1415 (which followed #1414). ## Verification - `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` → 0 errors, 0 warnings ## Ship % update - MODEL-1: ~70% (unchanged — pure paperwork; code is in PR #1417). - MODEL-2: corpus tokenization at ~115M tokens / 143 min (steady ~14K tok/s; ~33h ETA total). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…7 forward_traced threading Centralizes the boilerplate that `forward_traced_with_save_tensor` inlines twice (Embedding step 1, LmHead step 2). When the SHIP-007 PR-C-real step-3+ surgery threads `Option<&SaveTensorPlan>` through `AprTransformer::forward_traced` itself, every per-layer stage will call this single function instead of repeating the `should_save → ensure_layer_dir → File::create → write_tensor_file → flush` chain at each capture point. ## What changed - New `crates/aprender-serve/src/inference_trace/save_tensor_emit.rs`: - `pub fn maybe_save_stage(plan: Option<&SaveTensorPlan>, stage, layer, values) -> io::Result<()>` — the gated entry point. Cheap no-op when plan is None or stage/layer not selected. Forwards to `write_stage_file` when the gate passes. - `pub fn write_stage_file(output_dir, stage, layer, values) -> io::Result<()>` — the unconditional write, exposed separately for tests and any future callers that have already gated. - 7 unit tests pinning: None=no-op, unselected-stage=no-op, per-layer→layer-N/<stage>.bin, whole-model→<root>/<stage>.bin with WHOLE_MODEL_LAYER sentinel, layer-range filter excludes out-of-range, NaN-bit-preserving f32 LE round-trip, missing parent dirs auto-created. - `crates/aprender-serve/src/inference_trace/mod.rs`: register the new module. - `crates/aprender-serve/src/apr_transformer/traced_save_tensor.rs`: - Replace 60-line Embedding+LmHead inline blocks with two calls to `maybe_save_stage`. Net 50-line shrink. - Wrapper's behavior is byte-identical: same API surface, same file layout, same NaN preservation. Existing 4 `traced_save_tensor_step2_tests` tests still PASS. ## Five Whys 1. **Why now?** PR #1414 (step 2) merged earlier today landed the second copy of the inline block. Pre-step-3 is the right time to factor — before 15 more capture points get added inside the 360-line `forward_traced` body. 2. **Why a new module instead of inlining in `apr_transformer/`?** The helper has zero coupling to `AprTransformer` (it takes a plan + stage + values). Living next to `save_tensor`, `save_tensor_paths`, `save_tensor_plan` matches the existing `inference_trace::save_tensor_*` family pattern. 3. **Why `Option<&SaveTensorPlan>` instead of `&SaveTensorPlan`?** Step 3 will thread this through `forward_traced`, which is also called from non-instrumented contexts (HTTP serving, training evals). The `Option` lets a single `forward_traced` body serve both — `maybe_save_stage(None, ...)` is a single discriminant compare in hot paths. 4. **Why expose `write_stage_file` separately from `maybe_save_stage`?** Test ergonomics (the unit tests need to verify the unconditional write path, not just the gated path) and forward-compatibility for a future `forward_traced_inner` that does its own `should_save` filtering inside a hot loop and wants to skip the option indirection. 5. **Why no contract bump in this PR?** This is a pure refactor — no behavior change, no new invariants. The existing `byte_format`, `determinism`, and `apr_diff_values_compat` invariants in `apr-cli-trace-save-tensor-v1.yaml` flow through exactly one function instead of two now, which makes future contract obligations easier to satisfy. PR #1415 (already in flight) bumps the contract for step 2; step 3 will bump again to v1.3.0. ## Test plan - [x] `cargo test -p aprender-serve --lib save_tensor_emit::tests` → 7/7 PASS - [x] `cargo test -p aprender-serve --lib traced_save_tensor_step2_tests` → 4/4 PASS (existing tests unchanged behavior verified) - [x] `cargo test -p aprender-serve --test save_tensor_plan_integration` → 6/6 PASS (contract-level integration unchanged) - [x] `cargo clippy -p aprender-serve --lib --no-deps -- -D warnings` clean ## Ship % update - MODEL-1: ~68% (unchanged — pure refactor; no new capture surface). Step 3 (per-layer threading inside `forward_traced`) is now a trivial follow-up: each capture point becomes one line: `maybe_save_stage(plan, SaveTensorStage::FfnGate, layer_idx, &gate)?;`. - MODEL-2: corpus tokenization still running (~83 min elapsed, 46.5M tokens, ~33h ETA). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…7 forward_traced threading (#1416) * refactor(aprender-serve): extract maybe_save_stage helper for SHIP-007 forward_traced threading Centralizes the boilerplate that `forward_traced_with_save_tensor` inlines twice (Embedding step 1, LmHead step 2). When the SHIP-007 PR-C-real step-3+ surgery threads `Option<&SaveTensorPlan>` through `AprTransformer::forward_traced` itself, every per-layer stage will call this single function instead of repeating the `should_save → ensure_layer_dir → File::create → write_tensor_file → flush` chain at each capture point. ## What changed - New `crates/aprender-serve/src/inference_trace/save_tensor_emit.rs`: - `pub fn maybe_save_stage(plan: Option<&SaveTensorPlan>, stage, layer, values) -> io::Result<()>` — the gated entry point. Cheap no-op when plan is None or stage/layer not selected. Forwards to `write_stage_file` when the gate passes. - `pub fn write_stage_file(output_dir, stage, layer, values) -> io::Result<()>` — the unconditional write, exposed separately for tests and any future callers that have already gated. - 7 unit tests pinning: None=no-op, unselected-stage=no-op, per-layer→layer-N/<stage>.bin, whole-model→<root>/<stage>.bin with WHOLE_MODEL_LAYER sentinel, layer-range filter excludes out-of-range, NaN-bit-preserving f32 LE round-trip, missing parent dirs auto-created. - `crates/aprender-serve/src/inference_trace/mod.rs`: register the new module. - `crates/aprender-serve/src/apr_transformer/traced_save_tensor.rs`: - Replace 60-line Embedding+LmHead inline blocks with two calls to `maybe_save_stage`. Net 50-line shrink. - Wrapper's behavior is byte-identical: same API surface, same file layout, same NaN preservation. Existing 4 `traced_save_tensor_step2_tests` tests still PASS. ## Five Whys 1. **Why now?** PR #1414 (step 2) merged earlier today landed the second copy of the inline block. Pre-step-3 is the right time to factor — before 15 more capture points get added inside the 360-line `forward_traced` body. 2. **Why a new module instead of inlining in `apr_transformer/`?** The helper has zero coupling to `AprTransformer` (it takes a plan + stage + values). Living next to `save_tensor`, `save_tensor_paths`, `save_tensor_plan` matches the existing `inference_trace::save_tensor_*` family pattern. 3. **Why `Option<&SaveTensorPlan>` instead of `&SaveTensorPlan`?** Step 3 will thread this through `forward_traced`, which is also called from non-instrumented contexts (HTTP serving, training evals). The `Option` lets a single `forward_traced` body serve both — `maybe_save_stage(None, ...)` is a single discriminant compare in hot paths. 4. **Why expose `write_stage_file` separately from `maybe_save_stage`?** Test ergonomics (the unit tests need to verify the unconditional write path, not just the gated path) and forward-compatibility for a future `forward_traced_inner` that does its own `should_save` filtering inside a hot loop and wants to skip the option indirection. 5. **Why no contract bump in this PR?** This is a pure refactor — no behavior change, no new invariants. The existing `byte_format`, `determinism`, and `apr_diff_values_compat` invariants in `apr-cli-trace-save-tensor-v1.yaml` flow through exactly one function instead of two now, which makes future contract obligations easier to satisfy. PR #1415 (already in flight) bumps the contract for step 2; step 3 will bump again to v1.3.0. ## Test plan - [x] `cargo test -p aprender-serve --lib save_tensor_emit::tests` → 7/7 PASS - [x] `cargo test -p aprender-serve --lib traced_save_tensor_step2_tests` → 4/4 PASS (existing tests unchanged behavior verified) - [x] `cargo test -p aprender-serve --test save_tensor_plan_integration` → 6/6 PASS (contract-level integration unchanged) - [x] `cargo clippy -p aprender-serve --lib --no-deps -- -D warnings` clean ## Ship % update - MODEL-1: ~68% (unchanged — pure refactor; no new capture surface). Step 3 (per-layer threading inside `forward_traced`) is now a trivial follow-up: each capture point becomes one line: `maybe_save_stage(plan, SaveTensorStage::FfnGate, layer_idx, &gate)?;`. - MODEL-2: corpus tokenization still running (~83 min elapsed, 46.5M tokens, ~33h ETA). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(aprender-serve): SHIP-007 PR-C-real step 3 — per-layer SaveTensorPlan threading through forward_traced (#1421) End-to-end live smoke on canonical Qwen2.5-Coder-7B-Instruct-Q4K teacher captures all 16 stages in a single forward pass: - 14 per-layer: Embedding, AttnNorm, QkvMatmul, QkvBias, Attention, AttnOut, PostAttnResidual, FfnNorm, FfnGate, FfnUp, FfnSilu, FfnSwigl, FfnOut, PostFfnResidual - 2 whole-model: FinalNorm, LmHead Buffer sizes verified against Qwen2.5-Coder-7B config: - 100,364 B = 7 tokens × 3,584 hidden_dim × 4 + 12-byte APRT header - 530,444 B = 7 tokens × 18,944 intermediate_dim × 4 + 12 (FFN intermediates) - 129,036 B = 7 tokens × 4,608 qkv_dim × 4 + 12 (qkv_dim = hidden + 2*kv_dim) - 608,268 B = 152,064 vocab × 4 + 12 (whole-model lm_head) ## What changed `AprTransformer::forward_traced_with_plan(tokens, plan: Option<&SaveTensorPlan>)` is the new private impl that threads the plan through every natural capture point. `forward_traced(tokens)` becomes a 1-line wrapper that calls it with `None` (zero-overhead — `maybe_save_stage` early-returns on `None`). `forward_traced_with_save_tensor` is now a pure delegator: no more double-embed or post-loop re-emission of LmHead. All 6 forward_traced regression tests pass. ## Five Whys 1. Why thread plan through forward_traced? Step 3 of SHIP-007 PR-C-real per `apr-cli-trace-save-tensor-v1.yaml`. 2. Why all 16 stages, not subset? The bisection target (root cause of MODEL-1 `apr run` divergence) is unknown; capturing every stage lets the operator element-wise diff against a reference at any of 16 points. 3. Why single-pass via `Option<&SaveTensorPlan>` (not separate method)? Avoids double-compute of embeddings and the wrapper's double-bookkeeping. DRY. 4. Why `Option<&SaveTensorPlan>` (not always-required)? Existing `forward_traced(tokens)` is called from many test sites and the trait `TracedForward`. Optional preserves the public API. 5. Why only inserted `maybe_save_stage` calls (no helpers refactor)? Each capture is a bare `emit(Stage, layer, &buf)?` — adding helpers would mask the natural buffer-name → stage-name pairing and make audit harder. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 3, 2026 08:26

Merge branch 'main' into chore/contract-trace-save-tensor-v1.2.0-step2

a34335f

noahgift merged commit 5f88d88 into main May 3, 2026
10 checks passed

noahgift deleted the chore/contract-trace-save-tensor-v1.2.0-step2 branch May 3, 2026 09:08

noahgift mentioned this pull request May 3, 2026

contract(apr-cli-trace-save-tensor-v1): v1.2.0 → v1.3.0 — FALSIFY-011 records CLI dispatch wire-up PARTIAL discharge #1418

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contract(apr-cli-trace-save-tensor-v1): v1.1.0 → v1.2.0 — FALSIFY-010 records LmHead step-2 PARTIAL#1415

contract(apr-cli-trace-save-tensor-v1): v1.1.0 → v1.2.0 — FALSIFY-010 records LmHead step-2 PARTIAL#1415
noahgift merged 2 commits into
mainfrom
chore/contract-trace-save-tensor-v1.2.0-step2

noahgift commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 3, 2026

Summary

What changed

Test plan

Five Whys

Ship % update

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant