Skip to content

feat(aprender-serve): apr-cli-trace-save-tensor-v1 byte format helpers#1133

Merged
noahgift merged 1 commit into
mainfrom
feat/apr-trace-save-tensor-byte-format
Apr 29, 2026
Merged

feat(aprender-serve): apr-cli-trace-save-tensor-v1 byte format helpers#1133
noahgift merged 1 commit into
mainfrom
feat/apr-trace-save-tensor-byte-format

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Per `apr-cli-trace-save-tensor-v1.yaml` v1.0.0 PROPOSED: contract has 5 falsifiers, 0 tested at authoring time. FALSIFY-APR-TRACE-SAVE-004 (header self-describing format) is partial-dischargeable today as a pure-byte-format module without waiting on the full `apr trace --save-tensor` CLI implementation.

This unblocks the next-session SHIP-007 priority per project memory (`project_2026_04_28_ship_007_state_machine.md`): "extend apr trace with --save-tensor flag for per-stage element-wise comparison."

What's added

New module `crates/aprender-serve/src/inference_trace/save_tensor.rs` (~250 LOC):

  • `pub const MAGIC = *b"APRT"`, `HEADER_SIZE = 12`, `WHOLE_MODEL_LAYER = 0xFFFFFFFF`
  • `pub struct TensorHeader { layer: u32, dim_product: u32 }` + `is_whole_model()` + `total_file_size()`
  • `pub fn write_header(layer, dim_product) -> [u8; 12]`
  • `pub fn parse_header(&[u8]) -> Result<TensorHeader, HeaderError>`
  • `pub fn write_tensor_file<W: Write>(w, layer, &[f32]) -> io::Result<()>`
  • `pub fn read_tensor_file<R: Read>(r) -> Result<(TensorHeader, Vec)>`
  • `thiserror`-derived `HeaderError` (`Truncated`, `BadMagic`) and `ReadError` (`Io`, `Header`, `BodyLengthMismatch`)

File format (per contract `byte_format` equation)

```
offset 0-3: magic "APRT" (b"APRT")
offset 4-7: u32 LE — layer index (0..num_layers-1; or WHOLE_MODEL_LAYER for whole-model stages)
offset 8-11: u32 LE — dim_product (number of f32 elements following)
offset 12+: f32 LE × dim_product values
```

Total file size = `12 + dim_product × 4` bytes.

Tests (16 unit tests, all green)

  • `magic_is_aprt`, `header_size_is_twelve` (constant pinning)
  • `falsify_apr_trace_save_004_header_format_layer_zero`
  • `falsify_apr_trace_save_004_header_format_arbitrary_layer`
  • `header_roundtrip`, `header_roundtrip_whole_model`
  • `parse_header_rejects_short_input`, `parse_header_rejects_bad_magic`
  • `parse_header_ignores_trailing_body`
  • `total_file_size_per_layer`, `total_file_size_empty`
  • `write_and_read_tensor_file_roundtrip`, `write_tensor_file_empty`
  • `write_preserves_nan_verbatim` (per contract `byte_format` invariant)
  • `read_tensor_file_truncated_body`
  • `write_layer_index_max_u32_minus_one` (boundary at WHOLE_MODEL_LAYER)

Live verification

```
$ cargo test -p aprender-serve --lib inference_trace::save_tensor
test result: ok. 16 passed; 0 failed; 0 ignored
```

Lib-only clippy clean. Pre-existing test-target clippy errors in unrelated files (`residual_add_parity` etc.) are out of scope for this slice.

Why this is small

This PR is tight: 1 new file (~365 LOC including tests), 1 line added to `mod.rs` (`pub mod save_tensor;`). No CLI surface change, no behavior change to existing binaries. The full `apr trace --save-tensor` wiring (CLI flag + capture-point hooks across `forward.rs`) is the next slice; that work is multi-day and outside this PR's scope.

Five-Whys (Toyota Way)

  1. Project memory says SHIP-007 next-session priority is per-stage element-wise diff via `apr trace --save-tensor`.
  2. Contract is PROPOSED; full impl needs CLI flag + capture-point hooks across forward.rs (multi-day).
  3. The byte format is self-contained — pure serializer/parser with no model state. Authorable today as a tight slice with full test coverage.
  4. Pinning the format NOW means the future CLI-wiring PR cannot drift the on-disk schema.
  5. §26.8 stack-tool-extension methodology — extend apr in falsifier-sized slices, never one big PR.

Test plan

  • `cargo test -p aprender-serve --lib inference_trace::save_tensor` — 16 pass green
  • `cargo clippy -p aprender-serve --lib -- -D warnings` — clean
  • `cargo fmt -p aprender-serve` — formatted
  • Pre-commit quality gates passed

🤖 Generated with Claude Code

Per apr-cli-trace-save-tensor-v1.yaml v1.0.0 PROPOSED: contract has 5
falsifiers, 0 tested at authoring time. FALSIFY-APR-TRACE-SAVE-004
(header self-describing format) is partial-dischargeable today as a
pure-byte-format module without waiting on the full apr trace
--save-tensor CLI implementation.

New module crates/aprender-serve/src/inference_trace/save_tensor.rs:
- pub const MAGIC = *b"APRT", HEADER_SIZE = 12, WHOLE_MODEL_LAYER = 0xFFFFFFFF
- pub struct TensorHeader { layer: u32, dim_product: u32 }
  + is_whole_model() + total_file_size()
- pub fn write_header(layer, dim_product) -> [u8; 12]
- pub fn parse_header(&[u8]) -> Result<TensorHeader, HeaderError>
- pub fn write_tensor_file<W: Write>(w, layer, &[f32]) -> io::Result<()>
- pub fn read_tensor_file<R: Read>(r) -> Result<(TensorHeader, Vec<f32>)>
- thiserror-derived HeaderError (Truncated, BadMagic) and ReadError
  (Io, Header, BodyLengthMismatch)

16 unit tests cover:
- magic_is_aprt, header_size_is_twelve (constant pinning)
- falsify_apr_trace_save_004_header_format_layer_zero
- falsify_apr_trace_save_004_header_format_arbitrary_layer
- header_roundtrip, header_roundtrip_whole_model
- parse_header_rejects_short_input, parse_header_rejects_bad_magic
- parse_header_ignores_trailing_body
- total_file_size_per_layer, total_file_size_empty
- write_and_read_tensor_file_roundtrip, write_tensor_file_empty
- write_preserves_nan_verbatim (per contract byte_format invariant)
- read_tensor_file_truncated_body
- write_layer_index_max_u32_minus_one (boundary at WHOLE_MODEL_LAYER)

All 16 pass green. Lib-only clippy clean. Pre-existing test-target
clippy errors in unrelated files (residual_add_parity etc.) are out
of scope for this slice.

Five-Whys (Toyota Way):
  Why 1: Project memory says SHIP-007 next-session priority is
         per-stage element-wise diff via apr trace --save-tensor.
  Why 2: Contract is PROPOSED; full impl needs CLI flag + capture-
         point hooks across forward.rs (multi-day Rust work).
  Why 3: The byte format is self-contained — pure serializer/parser
         with no model state. Authorable today as a tight ~250 LOC
         slice with full test coverage.
  Why 4: Pinning the format NOW means the future CLI-wiring PR
         cannot drift the on-disk schema (12-byte header, b"APRT"
         magic, u32 LE layer + dim_product, f32 LE body).
  Why 5: §26.8 stack-tool-extension methodology — extend apr in
         falsifier-sized slices, never one big PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) April 29, 2026 12:00
@noahgift noahgift merged commit 025f604 into main Apr 29, 2026
11 checks passed
@noahgift noahgift deleted the feat/apr-trace-save-tensor-byte-format branch April 29, 2026 12:22
noahgift added a commit that referenced this pull request Apr 29, 2026
#1135)

Per apr-cli-trace-save-tensor-v1.yaml v1.0.0 PROPOSED `cli_signature`
invariants: per-layer stages MUST be written under
`<DIR>/layer-<N>/<STAGE>.bin`; whole-model stages (final_norm, lm_head)
go directly under `<DIR>/<STAGE>.bin` with no `layer-N` segment.

This module is the layout primitive that the future apr trace
--save-tensor CLI implementation calls before invoking the byte writer
in inference_trace::save_tensor (PR #1133). Pure-path / minimal-fs slice
keeps the writer and reader (apr diff --values) from drifting apart on
the on-disk layout.

New module crates/aprender-serve/src/inference_trace/save_tensor_paths.rs:
- pub fn output_path(dir, layer, stage_name) -> PathBuf
  * Per-layer: <dir>/layer-<N>/<stage>.bin
  * Whole-model (layer == WHOLE_MODEL_LAYER): <dir>/<stage>.bin
- pub fn ensure_layer_dir(dir, layer) -> io::Result<()>
  * Creates the appropriate parent directory; idempotent

13 unit tests + 1 doctest cover:
- output_path_per_layer_layer_zero, _arbitrary, _absolute_dir
- output_path_whole_model_no_layer_segment (final_norm + lm_head)
- output_path_layer_max_minus_one_is_per_layer (boundary at WHOLE_MODEL_LAYER)
- output_path_nested_relative_dir
- output_path_appends_bin_extension
- output_path_preserves_stage_name_verbatim (no implicit case-fold;
  canonicalisation is the caller's responsibility via SaveTensorStage)
- ensure_layer_dir_creates_per_layer_dir (tempfile)
- ensure_layer_dir_creates_whole_model_dir (no layer-* subdir)
- ensure_layer_dir_is_idempotent
- ensure_layer_dir_creates_nested_parents
- ensure_layer_dir_no_collision_between_per_layer_and_whole_model

Live results: 13 unit + 1 doctest, all green.

Independent of #1134 (stage enum) and depends on #1133 (just merged) for
WHOLE_MODEL_LAYER. Composes with both: the future CLI-wiring PR will
combine SaveTensorStage::canonical_name() + output_path + write_tensor_file.

Five-Whys (Toyota Way):
  Why 1: SHIP-007 next-session priority is per-stage element-wise diff.
  Why 2: Contract `cli_signature` mandates a precise filesystem layout.
  Why 3: Path building is pure formatter logic — fully testable today.
  Why 4: Pinning the layout NOW means writer and reader cannot drift
         (fewer surprises when apr diff --values reads these files).
  Why 5: §26.8 stack-tool-extension methodology — extend apr in
         falsifier-sized slices.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 29, 2026
…omposition (#1136)

Per apr-cli-trace-save-tensor-v1.yaml v1.0.0 PROPOSED: combine #1133's
byte-format primitives with #1135's directory-layout helpers into a
single ergonomic API. The future apr trace --save-tensor CLI calls
write_stage_file(dir, layer, stage, values) once per (layer, stage)
without managing file handles or paths separately.

New module crates/aprender-serve/src/inference_trace/save_tensor_compose.rs:
- pub fn write_stage_file(dir, layer, stage_name, values) -> Result<PathBuf>
  * ensure_layer_dir → File::create → BufWriter → write_tensor_file → flush
  * Returns resolved path so callers can log it / pass to apr diff
- pub fn read_stage_file(path) -> Result<(TensorHeader, Vec<f32>)>
  * Symmetric one-shot reader for apr diff --values consumers
- thiserror-derived WriteStageError (Io)

10 unit tests cover:
- write_stage_file_per_layer_roundtrip (canonical case)
- write_stage_file_whole_model_roundtrip (no layer-* segment)
- write_stage_file_creates_missing_parent (mkdir -p)
- write_stage_file_truncates_existing (no append behavior)
- write_stage_file_zero_length_tensor (12-byte file)
- write_stage_file_preserves_nan_inf (sign-bit roundtrip via to_bits())
- write_stage_file_header_has_expected_magic_and_layer (raw byte check)
- read_stage_file_propagates_missing_path
- write_stage_file_returns_resolved_path_for_logging (matches output_path)
- write_then_read_three_stages_in_one_layer
  (mirrors FALSIFY-APR-TRACE-SAVE-005 multi-stage scenario at the
   filesystem level: 3 distinct .bin files under same layer-N/)

Live results: 10 passed; 0 failed.

Save-tensor contract progress: 4 modules now in main —
  save_tensor (#1133) + save_tensor_paths (#1135) + save_tensor_compose
  (this PR), plus save_tensor_stage in flight (#1134). The CLI-wiring PR
  now needs only to: parse stages via SaveTensorStage::from_str + call
  write_stage_file at each capture point in the forward pass.

Five-Whys (Toyota Way):
  Why 1: SHIP-007 next-session priority is per-stage element-wise diff.
  Why 2: 3 building blocks already merged (byte format + paths) but no
         single ergonomic API. CLI authors would have to re-derive the
         compose pattern, which invites drift.
  Why 3: A 60-LOC wrapper + 10 tests pins the writer ↔ reader ↔ layout
         invariants in one place, including BufWriter::flush() durability
         + truncating-not-appending semantics.
  Why 4: write_stage_file_returns_resolved_path_for_logging asserts the
         returned path matches output_path() — downstream tooling
         (apr diff --values) relies on this invariant.
  Why 5: §26.8 stack-tool-extension methodology — extend apr in
         falsifier-sized slices.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 29, 2026
… parser (#1134)

Per apr-cli-trace-save-tensor-v1.yaml v1.0.0 PROPOSED `cli_signature`
equation: stages MUST be one of 18 distinct names (plus `layer_output`
alias for `post_ffn_residual`), passable as a comma-delimited list.
This module pre-commits to the typed enum + parser so the future
CLI-wiring PR cannot drift the canonical name set or alias mapping.

New module crates/aprender-serve/src/inference_trace/save_tensor_stage.rs:
- pub enum SaveTensorStage with 18 variants matching the contract
- pub const ALL: [SaveTensorStage; 18] in canonical contract order
- canonical_name() -> &'static str (file naming + CLI help)
- is_per_layer() -> bool (16 per-layer, 2 whole-model)
- impl FromStr for SaveTensorStage (case-insensitive, trimmed,
  layer_output → PostFfnResidual alias)
- pub fn parse_stage_list(s) -> Result<Vec<SaveTensorStage>, _>
  (comma-delim with whitespace tolerance)
- thiserror-derived StageParseError (Empty, Unknown { got, valid })

19 unit tests + 1 doctest cover:
- Uniqueness: all_eighteen_stages_have_unique_canonical_names
- Schema parity: canonical_names_match_contract_enumeration
- Roundtrip: from_str_round_trip_for_every_canonical_name
- Case insensitivity + trimming
- layer_output alias canonicalisation
- Empty + unknown rejection
- is_per_layer correctness for each stage
- 16/2 partition counts match contract topology
- FALSIFY-APR-TRACE-SAVE-005 multi-stage parser:
  * 3-element list parses
  * whitespace-around-commas tolerated
  * empty list returns Ok(vec![])
  * double-comma rejected
  * trailing comma rejected
  * unknown token in middle rejected
  * duplicates preserved
  * single stage parses
  * all 18 in one call parses

Live results from cargo test -p aprender-serve --lib inference_trace::save_tensor_stage:
  test result: ok. 19 passed; 0 failed; 0 ignored
Plus 1 passing doctest.

Independent of PR #1133 (byte format helpers); both can be merged in
either order. Future CLI-wiring PR composes both modules.

Five-Whys (Toyota Way):
  Why 1: SHIP-007 next-session priority is per-stage element-wise diff
         via apr trace --save-tensor.
  Why 2: Contract has 5 falsifiers (1 partial-discharged by #1133);
         FALSIFY-APR-TRACE-SAVE-005 (multi-stage parser) is also
         partial-dischargeable as a pure module today.
  Why 3: Stage names + parse semantics are pure-string-in-typed-enum-out.
         No model state, no I/O. Authorable as a tight ~370 LOC slice.
  Why 4: Pinning the canonical name set + the layer_output alias
         mapping NOW means the future CLI wiring cannot accidentally
         drop or rename a stage.
  Why 5: §26.8 stack-tool-extension methodology — extend apr in
         falsifier-sized slices, never one big PR.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 29, 2026
…1137)

Per apr-cli-trace-save-tensor-v1.yaml v1.0.0 PROPOSED: integration tests
that exercise the public API of #1133 byte format + #1135 path helpers
exactly as a future apr trace --save-tensor CLI implementation will, and
as apr diff --values will when reading the produced files. These complement
the unit tests in those modules with public-API-surface assertions, catching
regressions that internal tests can miss.

New file crates/aprender-serve/tests/save_tensor_integration.rs (5 tests):

- falsify_apr_trace_save_002_byte_determinism_two_writes
  Two writes with identical inputs MUST produce byte-identical files.
  Partial-discharge of FALSIFY-APR-TRACE-SAVE-002 at the library level.

- falsify_apr_trace_save_004_header_format_via_public_api
  Reads raw file bytes, verifies APRT magic, decodes header via
  parse_header, asserts header.total_file_size() == actual file size,
  decodes f32 LE body element-wise. Partial-discharge of FALSIFY-APR-
  TRACE-SAVE-004 at the public-API surface (complements unit tests).

- falsify_apr_trace_save_005_three_stages_one_layer_independent_files
  Three writes (embedding, ffn_gate, ffn_swigl) at layer 0 produce
  exactly 3 distinct .bin files under layer-0/, each with its own
  correct dim_product. Partial-discharge of FALSIFY-APR-TRACE-SAVE-005
  at the filesystem level (complements parser-level test in #1134).

- whole_model_stages_dont_collide_with_per_layer_zero
  Writes lm_head at WHOLE_MODEL_LAYER and at layer=0; verifies both files
  exist at distinct paths and preserve their own dim_product values.
  Defends against future bugs where the WHOLE_MODEL_LAYER sentinel is
  miscompared at the path-builder layer.

- parse_header_on_truncated_file_errors_via_public_api
  Writes 8 bytes of an APRT header (truncated below the 12-byte minimum);
  parse_header MUST error cleanly. Defends against silent zero-fill on
  filesystem corruption.

Live results from cargo test -p aprender-serve --test save_tensor_integration:
  test result: ok. 5 passed; 0 failed.

Save-tensor contract progress:
- 4 lib modules merged/in-flight (#1133 + #1134 + #1135 + #1136)
- 2 public-API integration tests added (this PR)
- Independent of all in-flight save-tensor PRs (#1134, #1136); compiles
  against the modules already in main from #1133 + #1135.

Five-Whys (Toyota Way):
  Why 1: SHIP-007 next-session priority is per-stage element-wise diff.
  Why 2: Lib-level unit tests cover internal-state invariants well; but
         a public-API caller can violate invariants the unit tests can't
         see (e.g., header offsets at the byte level).
  Why 3: Integration tests against the same public surface that the
         future CLI uses catch a different regression class.
  Why 4: 3 of the 5 falsifiers (002, 004, 005) are partial-dischargeable
         at the integration level today without waiting on the CLI.
  Why 5: §26.8 stack-tool-extension methodology — extend apr in
         falsifier-sized slices.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant