contract(apr-cli-trace-save-tensor-v1): v1.1.0 → v1.2.0 — FALSIFY-010 records LmHead step-2 PARTIAL#1415
Merged
Conversation
… records LmHead step-2 PARTIAL discharge Follow-up to PR #1414 (`forward_traced_with_save_tensor` step 2). Adds FALSIFY-APR-TRACE-SAVE-010 binding the LmHead branch at PARTIAL_ALGORITHM_LEVEL; the algorithm-level evidence cites the four new pin tests in `traced_save_tensor_step2_tests`: - step2_lm_head_writes_to_output_root_not_per_layer_dir - step2_lm_head_header_uses_whole_model_sentinel - step2_lm_head_skipped_when_plan_does_not_select_it - step2_lm_head_writes_logits_bytes_verbatim (NaN-bit preserving) `binds_to: byte_format` because step 2 invokes the same write_tensor_file path with `WHOLE_MODEL_LAYER` sentinel as the existing `byte_format` equation specifies. Live discharge against the canonical 7B teacher is deferred to SHIP-007 PR-E (layer-0 bisection). ## Five Whys 1. **Why a separate contract follow-up?** The PR #1414 commit needed to land before this bump to avoid file-conflict with PR #1413 (which independently bumped v1.0.0 → v1.1.0 with FALSIFY-009). 2. **Why `binds_to: byte_format` and not `cli_signature`?** The wrapper doesn't add a new clap surface (PR-A already did that); it adds a new branch that emits files conforming to the existing byte-format equation. The new branch's verbatim f32 LE round-trip + NaN preservation is exactly the property `byte_format` invariants pin. 3. **Why PARTIAL_ALGORITHM_LEVEL not full discharge?** The 4 unit tests simulate the wrapper's byte-flow at the contract level using synthetic plans and fake logits — they do NOT instantiate a full AprTransformer or load a real APR model. Live discharge requires SHIP-007 PR-E. 4. **Why bump to v1.2.0?** Adding a new falsification test (FALSIFY-010) that binds an additional invariant is a minor schema change. Per semver, that's a minor bump. 5. **Why `pv validate` clean even with two new falsifiers in 24h?** The contract uses metadata.kind=schema, so falsification_tests entries are flexible; pv validates structure, IDs are unique, and binds_to references are valid. ## Verification - `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` → 0 errors - v1.0.0 → v1.1.0 (PR #1413, FALSIFY-009 binding apr_diff_values_compat) - v1.1.0 → v1.2.0 (this PR, FALSIFY-010 binding LmHead step-2 capture) ## Ship % update - MODEL-1: ~68% (unchanged — this is paperwork that records yesterday's algorithm-level discharge of step 2; the actual capture surface expansion happened in PR #1414). - MODEL-2: corpus tokenization at ~46.5M tokens / 56 min (steady ~14K tok/s); ~33h ETA for full 27 GB Stack v1.2 corpus. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
noahgift
added a commit
that referenced
this pull request
May 3, 2026
… records CLI dispatch wire-up PARTIAL discharge Follow-up paperwork to PR #1417 (`apr trace --save-tensor` end-to-end dispatch for .apr files). Adds FALSIFY-APR-TRACE-SAVE-011 binding the new dispatch wire-up at PARTIAL_ALGORITHM_LEVEL with `binds_to: cli_signature`. Before PR #1417, `apr trace --save-tensor` only printed a stub and never invoked `forward_traced_with_save_tensor`. The contract test `apr trace --save-tensor --help | grep save-tensor` (FALSIFY-001) was already passing at the binary-boundary level — but the dispatch glue was missing, leaving Embedding + LmHead capture surface unreachable from the CLI for 2 days post-step-2 merge. FALSIFY-011 extends the existing `cli_signature` invariant from "the flag is recognized" to "the flag actually produces files". ## Five Whys 1. **Why a separate contract bump?** Avoids file-conflict with the in-flight refactor PR #1416 (which only touches `crates/aprender-serve/`). My contract change is isolated to `contracts/apr-cli-trace-save-tensor-v1.yaml`. 2. **Why `binds_to: cli_signature`?** PR #1417 doesn't change the byte format or determinism — it makes the CLI surface that the `cli_signature` equation already specified actually invocable. Same equation, expanded discharge level. 3. **Why PARTIAL_ALGORITHM_LEVEL?** The 5 unit tests cover path resolution (3) and recursive *.bin walking (2) — algorithm-level. A live discharge against the canonical 7B teacher is operator- gated by post-merge smoke (~30s for a 7B forward + 2 file writes). 4. **Why bump v1.2.0 → v1.3.0?** Adding a new falsification test that binds an existing invariant is a minor schema change per semver. v1.0.0 → v1.1.0 → v1.2.0 → v1.3.0 records each step's discharge timeline: - v1.1.0 (PR #1413): apr_diff_values_compat → APRT-aware diff - v1.2.0 (PR #1415): byte_format → LmHead capture (step 2) - v1.3.0 (this PR): cli_signature → end-to-end dispatch 5. **Why now?** Records the algorithm-level discharge so when the operator runs the live smoke post-#1417-merge, the contract ledger doesn't lag the code. Same paperwork pattern as #1415 (which followed #1414). ## Verification - `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` → 0 errors, 0 warnings ## Ship % update - MODEL-1: ~70% (unchanged — pure paperwork; code is in PR #1417). - MODEL-2: corpus tokenization at ~115M tokens / 143 min (steady ~14K tok/s; ~33h ETA total). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 3, 2026
… records CLI dispatch wire-up PARTIAL discharge Follow-up paperwork to PR #1417 (`apr trace --save-tensor` end-to-end dispatch for .apr files). Adds FALSIFY-APR-TRACE-SAVE-011 binding the new dispatch wire-up at PARTIAL_ALGORITHM_LEVEL with `binds_to: cli_signature`. Before PR #1417, `apr trace --save-tensor` only printed a stub and never invoked `forward_traced_with_save_tensor`. The contract test `apr trace --save-tensor --help | grep save-tensor` (FALSIFY-001) was already passing at the binary-boundary level — but the dispatch glue was missing, leaving Embedding + LmHead capture surface unreachable from the CLI for 2 days post-step-2 merge. FALSIFY-011 extends the existing `cli_signature` invariant from "the flag is recognized" to "the flag actually produces files". ## Five Whys 1. **Why a separate contract bump?** Avoids file-conflict with the in-flight refactor PR #1416 (which only touches `crates/aprender-serve/`). My contract change is isolated to `contracts/apr-cli-trace-save-tensor-v1.yaml`. 2. **Why `binds_to: cli_signature`?** PR #1417 doesn't change the byte format or determinism — it makes the CLI surface that the `cli_signature` equation already specified actually invocable. Same equation, expanded discharge level. 3. **Why PARTIAL_ALGORITHM_LEVEL?** The 5 unit tests cover path resolution (3) and recursive *.bin walking (2) — algorithm-level. A live discharge against the canonical 7B teacher is operator- gated by post-merge smoke (~30s for a 7B forward + 2 file writes). 4. **Why bump v1.2.0 → v1.3.0?** Adding a new falsification test that binds an existing invariant is a minor schema change per semver. v1.0.0 → v1.1.0 → v1.2.0 → v1.3.0 records each step's discharge timeline: - v1.1.0 (PR #1413): apr_diff_values_compat → APRT-aware diff - v1.2.0 (PR #1415): byte_format → LmHead capture (step 2) - v1.3.0 (this PR): cli_signature → end-to-end dispatch 5. **Why now?** Records the algorithm-level discharge so when the operator runs the live smoke post-#1417-merge, the contract ledger doesn't lag the code. Same paperwork pattern as #1415 (which followed #1414). ## Verification - `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` → 0 errors, 0 warnings ## Ship % update - MODEL-1: ~70% (unchanged — pure paperwork; code is in PR #1417). - MODEL-2: corpus tokenization at ~115M tokens / 143 min (steady ~14K tok/s; ~33h ETA total). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 3, 2026
…7 forward_traced threading
Centralizes the boilerplate that `forward_traced_with_save_tensor`
inlines twice (Embedding step 1, LmHead step 2). When the SHIP-007
PR-C-real step-3+ surgery threads `Option<&SaveTensorPlan>` through
`AprTransformer::forward_traced` itself, every per-layer stage will
call this single function instead of repeating the
`should_save → ensure_layer_dir → File::create → write_tensor_file →
flush` chain at each capture point.
## What changed
- New `crates/aprender-serve/src/inference_trace/save_tensor_emit.rs`:
- `pub fn maybe_save_stage(plan: Option<&SaveTensorPlan>, stage,
layer, values) -> io::Result<()>` — the gated entry point. Cheap
no-op when plan is None or stage/layer not selected. Forwards to
`write_stage_file` when the gate passes.
- `pub fn write_stage_file(output_dir, stage, layer, values) ->
io::Result<()>` — the unconditional write, exposed separately
for tests and any future callers that have already gated.
- 7 unit tests pinning: None=no-op, unselected-stage=no-op,
per-layer→layer-N/<stage>.bin, whole-model→<root>/<stage>.bin
with WHOLE_MODEL_LAYER sentinel, layer-range filter excludes
out-of-range, NaN-bit-preserving f32 LE round-trip, missing
parent dirs auto-created.
- `crates/aprender-serve/src/inference_trace/mod.rs`: register the
new module.
- `crates/aprender-serve/src/apr_transformer/traced_save_tensor.rs`:
- Replace 60-line Embedding+LmHead inline blocks with two calls
to `maybe_save_stage`. Net 50-line shrink.
- Wrapper's behavior is byte-identical: same API surface, same
file layout, same NaN preservation. Existing 4
`traced_save_tensor_step2_tests` tests still PASS.
## Five Whys
1. **Why now?** PR #1414 (step 2) merged earlier today landed the
second copy of the inline block. Pre-step-3 is the right time to
factor — before 15 more capture points get added inside the
360-line `forward_traced` body.
2. **Why a new module instead of inlining in
`apr_transformer/`?** The helper has zero coupling to
`AprTransformer` (it takes a plan + stage + values). Living next
to `save_tensor`, `save_tensor_paths`, `save_tensor_plan` matches
the existing `inference_trace::save_tensor_*` family pattern.
3. **Why `Option<&SaveTensorPlan>` instead of `&SaveTensorPlan`?**
Step 3 will thread this through `forward_traced`, which is also
called from non-instrumented contexts (HTTP serving, training
evals). The `Option` lets a single `forward_traced` body serve
both — `maybe_save_stage(None, ...)` is a single discriminant
compare in hot paths.
4. **Why expose `write_stage_file` separately from
`maybe_save_stage`?** Test ergonomics (the unit tests need to
verify the unconditional write path, not just the gated path)
and forward-compatibility for a future `forward_traced_inner`
that does its own `should_save` filtering inside a hot loop and
wants to skip the option indirection.
5. **Why no contract bump in this PR?** This is a pure refactor —
no behavior change, no new invariants. The existing
`byte_format`, `determinism`, and `apr_diff_values_compat`
invariants in `apr-cli-trace-save-tensor-v1.yaml` flow through
exactly one function instead of two now, which makes future
contract obligations easier to satisfy. PR #1415 (already in
flight) bumps the contract for step 2; step 3 will bump again
to v1.3.0.
## Test plan
- [x] `cargo test -p aprender-serve --lib save_tensor_emit::tests` →
7/7 PASS
- [x] `cargo test -p aprender-serve --lib
traced_save_tensor_step2_tests` → 4/4 PASS (existing tests
unchanged behavior verified)
- [x] `cargo test -p aprender-serve --test
save_tensor_plan_integration` → 6/6 PASS (contract-level
integration unchanged)
- [x] `cargo clippy -p aprender-serve --lib --no-deps -- -D
warnings` clean
## Ship % update
- MODEL-1: ~68% (unchanged — pure refactor; no new capture surface).
Step 3 (per-layer threading inside `forward_traced`) is now a
trivial follow-up: each capture point becomes one line:
`maybe_save_stage(plan, SaveTensorStage::FfnGate, layer_idx,
&gate)?;`.
- MODEL-2: corpus tokenization still running (~83 min elapsed,
46.5M tokens, ~33h ETA).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 3, 2026
… records CLI dispatch wire-up PARTIAL discharge (#1418) Follow-up paperwork to PR #1417 (`apr trace --save-tensor` end-to-end dispatch for .apr files). Adds FALSIFY-APR-TRACE-SAVE-011 binding the new dispatch wire-up at PARTIAL_ALGORITHM_LEVEL with `binds_to: cli_signature`. Before PR #1417, `apr trace --save-tensor` only printed a stub and never invoked `forward_traced_with_save_tensor`. The contract test `apr trace --save-tensor --help | grep save-tensor` (FALSIFY-001) was already passing at the binary-boundary level — but the dispatch glue was missing, leaving Embedding + LmHead capture surface unreachable from the CLI for 2 days post-step-2 merge. FALSIFY-011 extends the existing `cli_signature` invariant from "the flag is recognized" to "the flag actually produces files". ## Five Whys 1. **Why a separate contract bump?** Avoids file-conflict with the in-flight refactor PR #1416 (which only touches `crates/aprender-serve/`). My contract change is isolated to `contracts/apr-cli-trace-save-tensor-v1.yaml`. 2. **Why `binds_to: cli_signature`?** PR #1417 doesn't change the byte format or determinism — it makes the CLI surface that the `cli_signature` equation already specified actually invocable. Same equation, expanded discharge level. 3. **Why PARTIAL_ALGORITHM_LEVEL?** The 5 unit tests cover path resolution (3) and recursive *.bin walking (2) — algorithm-level. A live discharge against the canonical 7B teacher is operator- gated by post-merge smoke (~30s for a 7B forward + 2 file writes). 4. **Why bump v1.2.0 → v1.3.0?** Adding a new falsification test that binds an existing invariant is a minor schema change per semver. v1.0.0 → v1.1.0 → v1.2.0 → v1.3.0 records each step's discharge timeline: - v1.1.0 (PR #1413): apr_diff_values_compat → APRT-aware diff - v1.2.0 (PR #1415): byte_format → LmHead capture (step 2) - v1.3.0 (this PR): cli_signature → end-to-end dispatch 5. **Why now?** Records the algorithm-level discharge so when the operator runs the live smoke post-#1417-merge, the contract ledger doesn't lag the code. Same paperwork pattern as #1415 (which followed #1414). ## Verification - `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` → 0 errors, 0 warnings ## Ship % update - MODEL-1: ~70% (unchanged — pure paperwork; code is in PR #1417). - MODEL-2: corpus tokenization at ~115M tokens / 143 min (steady ~14K tok/s; ~33h ETA total). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 3, 2026
…7 forward_traced threading
Centralizes the boilerplate that `forward_traced_with_save_tensor`
inlines twice (Embedding step 1, LmHead step 2). When the SHIP-007
PR-C-real step-3+ surgery threads `Option<&SaveTensorPlan>` through
`AprTransformer::forward_traced` itself, every per-layer stage will
call this single function instead of repeating the
`should_save → ensure_layer_dir → File::create → write_tensor_file →
flush` chain at each capture point.
## What changed
- New `crates/aprender-serve/src/inference_trace/save_tensor_emit.rs`:
- `pub fn maybe_save_stage(plan: Option<&SaveTensorPlan>, stage,
layer, values) -> io::Result<()>` — the gated entry point. Cheap
no-op when plan is None or stage/layer not selected. Forwards to
`write_stage_file` when the gate passes.
- `pub fn write_stage_file(output_dir, stage, layer, values) ->
io::Result<()>` — the unconditional write, exposed separately
for tests and any future callers that have already gated.
- 7 unit tests pinning: None=no-op, unselected-stage=no-op,
per-layer→layer-N/<stage>.bin, whole-model→<root>/<stage>.bin
with WHOLE_MODEL_LAYER sentinel, layer-range filter excludes
out-of-range, NaN-bit-preserving f32 LE round-trip, missing
parent dirs auto-created.
- `crates/aprender-serve/src/inference_trace/mod.rs`: register the
new module.
- `crates/aprender-serve/src/apr_transformer/traced_save_tensor.rs`:
- Replace 60-line Embedding+LmHead inline blocks with two calls
to `maybe_save_stage`. Net 50-line shrink.
- Wrapper's behavior is byte-identical: same API surface, same
file layout, same NaN preservation. Existing 4
`traced_save_tensor_step2_tests` tests still PASS.
## Five Whys
1. **Why now?** PR #1414 (step 2) merged earlier today landed the
second copy of the inline block. Pre-step-3 is the right time to
factor — before 15 more capture points get added inside the
360-line `forward_traced` body.
2. **Why a new module instead of inlining in
`apr_transformer/`?** The helper has zero coupling to
`AprTransformer` (it takes a plan + stage + values). Living next
to `save_tensor`, `save_tensor_paths`, `save_tensor_plan` matches
the existing `inference_trace::save_tensor_*` family pattern.
3. **Why `Option<&SaveTensorPlan>` instead of `&SaveTensorPlan`?**
Step 3 will thread this through `forward_traced`, which is also
called from non-instrumented contexts (HTTP serving, training
evals). The `Option` lets a single `forward_traced` body serve
both — `maybe_save_stage(None, ...)` is a single discriminant
compare in hot paths.
4. **Why expose `write_stage_file` separately from
`maybe_save_stage`?** Test ergonomics (the unit tests need to
verify the unconditional write path, not just the gated path)
and forward-compatibility for a future `forward_traced_inner`
that does its own `should_save` filtering inside a hot loop and
wants to skip the option indirection.
5. **Why no contract bump in this PR?** This is a pure refactor —
no behavior change, no new invariants. The existing
`byte_format`, `determinism`, and `apr_diff_values_compat`
invariants in `apr-cli-trace-save-tensor-v1.yaml` flow through
exactly one function instead of two now, which makes future
contract obligations easier to satisfy. PR #1415 (already in
flight) bumps the contract for step 2; step 3 will bump again
to v1.3.0.
## Test plan
- [x] `cargo test -p aprender-serve --lib save_tensor_emit::tests` →
7/7 PASS
- [x] `cargo test -p aprender-serve --lib
traced_save_tensor_step2_tests` → 4/4 PASS (existing tests
unchanged behavior verified)
- [x] `cargo test -p aprender-serve --test
save_tensor_plan_integration` → 6/6 PASS (contract-level
integration unchanged)
- [x] `cargo clippy -p aprender-serve --lib --no-deps -- -D
warnings` clean
## Ship % update
- MODEL-1: ~68% (unchanged — pure refactor; no new capture surface).
Step 3 (per-layer threading inside `forward_traced`) is now a
trivial follow-up: each capture point becomes one line:
`maybe_save_stage(plan, SaveTensorStage::FfnGate, layer_idx,
&gate)?;`.
- MODEL-2: corpus tokenization still running (~83 min elapsed,
46.5M tokens, ~33h ETA).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 3, 2026
…7 forward_traced threading (#1416) * refactor(aprender-serve): extract maybe_save_stage helper for SHIP-007 forward_traced threading Centralizes the boilerplate that `forward_traced_with_save_tensor` inlines twice (Embedding step 1, LmHead step 2). When the SHIP-007 PR-C-real step-3+ surgery threads `Option<&SaveTensorPlan>` through `AprTransformer::forward_traced` itself, every per-layer stage will call this single function instead of repeating the `should_save → ensure_layer_dir → File::create → write_tensor_file → flush` chain at each capture point. ## What changed - New `crates/aprender-serve/src/inference_trace/save_tensor_emit.rs`: - `pub fn maybe_save_stage(plan: Option<&SaveTensorPlan>, stage, layer, values) -> io::Result<()>` — the gated entry point. Cheap no-op when plan is None or stage/layer not selected. Forwards to `write_stage_file` when the gate passes. - `pub fn write_stage_file(output_dir, stage, layer, values) -> io::Result<()>` — the unconditional write, exposed separately for tests and any future callers that have already gated. - 7 unit tests pinning: None=no-op, unselected-stage=no-op, per-layer→layer-N/<stage>.bin, whole-model→<root>/<stage>.bin with WHOLE_MODEL_LAYER sentinel, layer-range filter excludes out-of-range, NaN-bit-preserving f32 LE round-trip, missing parent dirs auto-created. - `crates/aprender-serve/src/inference_trace/mod.rs`: register the new module. - `crates/aprender-serve/src/apr_transformer/traced_save_tensor.rs`: - Replace 60-line Embedding+LmHead inline blocks with two calls to `maybe_save_stage`. Net 50-line shrink. - Wrapper's behavior is byte-identical: same API surface, same file layout, same NaN preservation. Existing 4 `traced_save_tensor_step2_tests` tests still PASS. ## Five Whys 1. **Why now?** PR #1414 (step 2) merged earlier today landed the second copy of the inline block. Pre-step-3 is the right time to factor — before 15 more capture points get added inside the 360-line `forward_traced` body. 2. **Why a new module instead of inlining in `apr_transformer/`?** The helper has zero coupling to `AprTransformer` (it takes a plan + stage + values). Living next to `save_tensor`, `save_tensor_paths`, `save_tensor_plan` matches the existing `inference_trace::save_tensor_*` family pattern. 3. **Why `Option<&SaveTensorPlan>` instead of `&SaveTensorPlan`?** Step 3 will thread this through `forward_traced`, which is also called from non-instrumented contexts (HTTP serving, training evals). The `Option` lets a single `forward_traced` body serve both — `maybe_save_stage(None, ...)` is a single discriminant compare in hot paths. 4. **Why expose `write_stage_file` separately from `maybe_save_stage`?** Test ergonomics (the unit tests need to verify the unconditional write path, not just the gated path) and forward-compatibility for a future `forward_traced_inner` that does its own `should_save` filtering inside a hot loop and wants to skip the option indirection. 5. **Why no contract bump in this PR?** This is a pure refactor — no behavior change, no new invariants. The existing `byte_format`, `determinism`, and `apr_diff_values_compat` invariants in `apr-cli-trace-save-tensor-v1.yaml` flow through exactly one function instead of two now, which makes future contract obligations easier to satisfy. PR #1415 (already in flight) bumps the contract for step 2; step 3 will bump again to v1.3.0. ## Test plan - [x] `cargo test -p aprender-serve --lib save_tensor_emit::tests` → 7/7 PASS - [x] `cargo test -p aprender-serve --lib traced_save_tensor_step2_tests` → 4/4 PASS (existing tests unchanged behavior verified) - [x] `cargo test -p aprender-serve --test save_tensor_plan_integration` → 6/6 PASS (contract-level integration unchanged) - [x] `cargo clippy -p aprender-serve --lib --no-deps -- -D warnings` clean ## Ship % update - MODEL-1: ~68% (unchanged — pure refactor; no new capture surface). Step 3 (per-layer threading inside `forward_traced`) is now a trivial follow-up: each capture point becomes one line: `maybe_save_stage(plan, SaveTensorStage::FfnGate, layer_idx, &gate)?;`. - MODEL-2: corpus tokenization still running (~83 min elapsed, 46.5M tokens, ~33h ETA). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(aprender-serve): SHIP-007 PR-C-real step 3 — per-layer SaveTensorPlan threading through forward_traced (#1421) End-to-end live smoke on canonical Qwen2.5-Coder-7B-Instruct-Q4K teacher captures all 16 stages in a single forward pass: - 14 per-layer: Embedding, AttnNorm, QkvMatmul, QkvBias, Attention, AttnOut, PostAttnResidual, FfnNorm, FfnGate, FfnUp, FfnSilu, FfnSwigl, FfnOut, PostFfnResidual - 2 whole-model: FinalNorm, LmHead Buffer sizes verified against Qwen2.5-Coder-7B config: - 100,364 B = 7 tokens × 3,584 hidden_dim × 4 + 12-byte APRT header - 530,444 B = 7 tokens × 18,944 intermediate_dim × 4 + 12 (FFN intermediates) - 129,036 B = 7 tokens × 4,608 qkv_dim × 4 + 12 (qkv_dim = hidden + 2*kv_dim) - 608,268 B = 152,064 vocab × 4 + 12 (whole-model lm_head) ## What changed `AprTransformer::forward_traced_with_plan(tokens, plan: Option<&SaveTensorPlan>)` is the new private impl that threads the plan through every natural capture point. `forward_traced(tokens)` becomes a 1-line wrapper that calls it with `None` (zero-overhead — `maybe_save_stage` early-returns on `None`). `forward_traced_with_save_tensor` is now a pure delegator: no more double-embed or post-loop re-emission of LmHead. All 6 forward_traced regression tests pass. ## Five Whys 1. Why thread plan through forward_traced? Step 3 of SHIP-007 PR-C-real per `apr-cli-trace-save-tensor-v1.yaml`. 2. Why all 16 stages, not subset? The bisection target (root cause of MODEL-1 `apr run` divergence) is unknown; capturing every stage lets the operator element-wise diff against a reference at any of 16 points. 3. Why single-pass via `Option<&SaveTensorPlan>` (not separate method)? Avoids double-compute of embeddings and the wrapper's double-bookkeeping. DRY. 4. Why `Option<&SaveTensorPlan>` (not always-required)? Existing `forward_traced(tokens)` is called from many test sites and the trait `TracedForward`. Optional preserves the public API. 5. Why only inserted `maybe_save_stage` calls (no helpers refactor)? Each capture is a bare `emit(Stage, layer, &buf)?` — adding helpers would mask the natural buffer-name → stage-name pairing and make audit harder. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up paperwork to PR #1414 (SHIP-007 PR-C-real step 2 — LmHead capture). Adds FALSIFY-APR-TRACE-SAVE-010 binding the LmHead branch at PARTIAL_ALGORITHM_LEVEL with explicit algorithm-level evidence pointing at the four new pin tests in `traced_save_tensor_step2_tests`.
The bump was deferred from PR #1414 itself to avoid file-conflict with PR #1413 (which independently bumped v1.0.0 → v1.1.0 with FALSIFY-009). With #1413 now merged on main, this is the natural follow-up.
What changed
Test plan
Five Whys
Ship % update
🤖 Generated with Claude Code