feat(aprender-serve): apr-cli-trace-save-tensor-v1 byte format helpers by noahgift · Pull Request #1133 · paiml/aprender

noahgift · 2026-04-29T12:00:45Z

Summary

Per `apr-cli-trace-save-tensor-v1.yaml` v1.0.0 PROPOSED: contract has 5 falsifiers, 0 tested at authoring time. FALSIFY-APR-TRACE-SAVE-004 (header self-describing format) is partial-dischargeable today as a pure-byte-format module without waiting on the full `apr trace --save-tensor` CLI implementation.

This unblocks the next-session SHIP-007 priority per project memory (`project_2026_04_28_ship_007_state_machine.md`): "extend apr trace with --save-tensor flag for per-stage element-wise comparison."

What's added

New module `crates/aprender-serve/src/inference_trace/save_tensor.rs` (~250 LOC):

`pub const MAGIC = *b"APRT"`, `HEADER_SIZE = 12`, `WHOLE_MODEL_LAYER = 0xFFFFFFFF`
`pub struct TensorHeader { layer: u32, dim_product: u32 }` + `is_whole_model()` + `total_file_size()`
`pub fn write_header(layer, dim_product) -> [u8; 12]`
`pub fn parse_header(&[u8]) -> Result<TensorHeader, HeaderError>`
`pub fn write_tensor_file<W: Write>(w, layer, &[f32]) -> io::Result<()>`
`pub fn read_tensor_file<R: Read>(r) -> Result<(TensorHeader, Vec)>`
`thiserror`-derived `HeaderError` (`Truncated`, `BadMagic`) and `ReadError` (`Io`, `Header`, `BodyLengthMismatch`)

File format (per contract `byte_format` equation)

```
offset 0-3: magic "APRT" (b"APRT")
offset 4-7: u32 LE — layer index (0..num_layers-1; or WHOLE_MODEL_LAYER for whole-model stages)
offset 8-11: u32 LE — dim_product (number of f32 elements following)
offset 12+: f32 LE × dim_product values
```

Total file size = `12 + dim_product × 4` bytes.

Tests (16 unit tests, all green)

`magic_is_aprt`, `header_size_is_twelve` (constant pinning)
`falsify_apr_trace_save_004_header_format_layer_zero`
`falsify_apr_trace_save_004_header_format_arbitrary_layer`
`header_roundtrip`, `header_roundtrip_whole_model`
`parse_header_rejects_short_input`, `parse_header_rejects_bad_magic`
`parse_header_ignores_trailing_body`
`total_file_size_per_layer`, `total_file_size_empty`
`write_and_read_tensor_file_roundtrip`, `write_tensor_file_empty`
`write_preserves_nan_verbatim` (per contract `byte_format` invariant)
`read_tensor_file_truncated_body`
`write_layer_index_max_u32_minus_one` (boundary at WHOLE_MODEL_LAYER)

Live verification

```
$ cargo test -p aprender-serve --lib inference_trace::save_tensor
test result: ok. 16 passed; 0 failed; 0 ignored
```

Lib-only clippy clean. Pre-existing test-target clippy errors in unrelated files (`residual_add_parity` etc.) are out of scope for this slice.

Why this is small

This PR is tight: 1 new file (~365 LOC including tests), 1 line added to `mod.rs` (`pub mod save_tensor;`). No CLI surface change, no behavior change to existing binaries. The full `apr trace --save-tensor` wiring (CLI flag + capture-point hooks across `forward.rs`) is the next slice; that work is multi-day and outside this PR's scope.

Five-Whys (Toyota Way)

Project memory says SHIP-007 next-session priority is per-stage element-wise diff via `apr trace --save-tensor`.
Contract is PROPOSED; full impl needs CLI flag + capture-point hooks across forward.rs (multi-day).
The byte format is self-contained — pure serializer/parser with no model state. Authorable today as a tight slice with full test coverage.
Pinning the format NOW means the future CLI-wiring PR cannot drift the on-disk schema.
§26.8 stack-tool-extension methodology — extend apr in falsifier-sized slices, never one big PR.

Test plan

`cargo test -p aprender-serve --lib inference_trace::save_tensor` — 16 pass green
`cargo clippy -p aprender-serve --lib -- -D warnings` — clean
`cargo fmt -p aprender-serve` — formatted
Pre-commit quality gates passed

🤖 Generated with Claude Code

Per apr-cli-trace-save-tensor-v1.yaml v1.0.0 PROPOSED: contract has 5 falsifiers, 0 tested at authoring time. FALSIFY-APR-TRACE-SAVE-004 (header self-describing format) is partial-dischargeable today as a pure-byte-format module without waiting on the full apr trace --save-tensor CLI implementation. New module crates/aprender-serve/src/inference_trace/save_tensor.rs: - pub const MAGIC = *b"APRT", HEADER_SIZE = 12, WHOLE_MODEL_LAYER = 0xFFFFFFFF - pub struct TensorHeader { layer: u32, dim_product: u32 } + is_whole_model() + total_file_size() - pub fn write_header(layer, dim_product) -> [u8; 12] - pub fn parse_header(&[u8]) -> Result<TensorHeader, HeaderError> - pub fn write_tensor_file<W: Write>(w, layer, &[f32]) -> io::Result<()> - pub fn read_tensor_file<R: Read>(r) -> Result<(TensorHeader, Vec<f32>)> - thiserror-derived HeaderError (Truncated, BadMagic) and ReadError (Io, Header, BodyLengthMismatch) 16 unit tests cover: - magic_is_aprt, header_size_is_twelve (constant pinning) - falsify_apr_trace_save_004_header_format_layer_zero - falsify_apr_trace_save_004_header_format_arbitrary_layer - header_roundtrip, header_roundtrip_whole_model - parse_header_rejects_short_input, parse_header_rejects_bad_magic - parse_header_ignores_trailing_body - total_file_size_per_layer, total_file_size_empty - write_and_read_tensor_file_roundtrip, write_tensor_file_empty - write_preserves_nan_verbatim (per contract byte_format invariant) - read_tensor_file_truncated_body - write_layer_index_max_u32_minus_one (boundary at WHOLE_MODEL_LAYER) All 16 pass green. Lib-only clippy clean. Pre-existing test-target clippy errors in unrelated files (residual_add_parity etc.) are out of scope for this slice. Five-Whys (Toyota Way): Why 1: Project memory says SHIP-007 next-session priority is per-stage element-wise diff via apr trace --save-tensor. Why 2: Contract is PROPOSED; full impl needs CLI flag + capture- point hooks across forward.rs (multi-day Rust work). Why 3: The byte format is self-contained — pure serializer/parser with no model state. Authorable today as a tight ~250 LOC slice with full test coverage. Why 4: Pinning the format NOW means the future CLI-wiring PR cannot drift the on-disk schema (12-byte header, b"APRT" magic, u32 LE layer + dim_product, f32 LE body). Why 5: §26.8 stack-tool-extension methodology — extend apr in falsifier-sized slices, never one big PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

#1135) Per apr-cli-trace-save-tensor-v1.yaml v1.0.0 PROPOSED `cli_signature` invariants: per-layer stages MUST be written under `<DIR>/layer-<N>/<STAGE>.bin`; whole-model stages (final_norm, lm_head) go directly under `<DIR>/<STAGE>.bin` with no `layer-N` segment. This module is the layout primitive that the future apr trace --save-tensor CLI implementation calls before invoking the byte writer in inference_trace::save_tensor (PR #1133). Pure-path / minimal-fs slice keeps the writer and reader (apr diff --values) from drifting apart on the on-disk layout. New module crates/aprender-serve/src/inference_trace/save_tensor_paths.rs: - pub fn output_path(dir, layer, stage_name) -> PathBuf * Per-layer: <dir>/layer-<N>/<stage>.bin * Whole-model (layer == WHOLE_MODEL_LAYER): <dir>/<stage>.bin - pub fn ensure_layer_dir(dir, layer) -> io::Result<()> * Creates the appropriate parent directory; idempotent 13 unit tests + 1 doctest cover: - output_path_per_layer_layer_zero, _arbitrary, _absolute_dir - output_path_whole_model_no_layer_segment (final_norm + lm_head) - output_path_layer_max_minus_one_is_per_layer (boundary at WHOLE_MODEL_LAYER) - output_path_nested_relative_dir - output_path_appends_bin_extension - output_path_preserves_stage_name_verbatim (no implicit case-fold; canonicalisation is the caller's responsibility via SaveTensorStage) - ensure_layer_dir_creates_per_layer_dir (tempfile) - ensure_layer_dir_creates_whole_model_dir (no layer-* subdir) - ensure_layer_dir_is_idempotent - ensure_layer_dir_creates_nested_parents - ensure_layer_dir_no_collision_between_per_layer_and_whole_model Live results: 13 unit + 1 doctest, all green. Independent of #1134 (stage enum) and depends on #1133 (just merged) for WHOLE_MODEL_LAYER. Composes with both: the future CLI-wiring PR will combine SaveTensorStage::canonical_name() + output_path + write_tensor_file. Five-Whys (Toyota Way): Why 1: SHIP-007 next-session priority is per-stage element-wise diff. Why 2: Contract `cli_signature` mandates a precise filesystem layout. Why 3: Path building is pure formatter logic — fully testable today. Why 4: Pinning the layout NOW means writer and reader cannot drift (fewer surprises when apr diff --values reads these files). Why 5: §26.8 stack-tool-extension methodology — extend apr in falsifier-sized slices. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…omposition (#1136) Per apr-cli-trace-save-tensor-v1.yaml v1.0.0 PROPOSED: combine #1133's byte-format primitives with #1135's directory-layout helpers into a single ergonomic API. The future apr trace --save-tensor CLI calls write_stage_file(dir, layer, stage, values) once per (layer, stage) without managing file handles or paths separately. New module crates/aprender-serve/src/inference_trace/save_tensor_compose.rs: - pub fn write_stage_file(dir, layer, stage_name, values) -> Result<PathBuf> * ensure_layer_dir → File::create → BufWriter → write_tensor_file → flush * Returns resolved path so callers can log it / pass to apr diff - pub fn read_stage_file(path) -> Result<(TensorHeader, Vec<f32>)> * Symmetric one-shot reader for apr diff --values consumers - thiserror-derived WriteStageError (Io) 10 unit tests cover: - write_stage_file_per_layer_roundtrip (canonical case) - write_stage_file_whole_model_roundtrip (no layer-* segment) - write_stage_file_creates_missing_parent (mkdir -p) - write_stage_file_truncates_existing (no append behavior) - write_stage_file_zero_length_tensor (12-byte file) - write_stage_file_preserves_nan_inf (sign-bit roundtrip via to_bits()) - write_stage_file_header_has_expected_magic_and_layer (raw byte check) - read_stage_file_propagates_missing_path - write_stage_file_returns_resolved_path_for_logging (matches output_path) - write_then_read_three_stages_in_one_layer (mirrors FALSIFY-APR-TRACE-SAVE-005 multi-stage scenario at the filesystem level: 3 distinct .bin files under same layer-N/) Live results: 10 passed; 0 failed. Save-tensor contract progress: 4 modules now in main — save_tensor (#1133) + save_tensor_paths (#1135) + save_tensor_compose (this PR), plus save_tensor_stage in flight (#1134). The CLI-wiring PR now needs only to: parse stages via SaveTensorStage::from_str + call write_stage_file at each capture point in the forward pass. Five-Whys (Toyota Way): Why 1: SHIP-007 next-session priority is per-stage element-wise diff. Why 2: 3 building blocks already merged (byte format + paths) but no single ergonomic API. CLI authors would have to re-derive the compose pattern, which invites drift. Why 3: A 60-LOC wrapper + 10 tests pins the writer ↔ reader ↔ layout invariants in one place, including BufWriter::flush() durability + truncating-not-appending semantics. Why 4: write_stage_file_returns_resolved_path_for_logging asserts the returned path matches output_path() — downstream tooling (apr diff --values) relies on this invariant. Why 5: §26.8 stack-tool-extension methodology — extend apr in falsifier-sized slices. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… parser (#1134) Per apr-cli-trace-save-tensor-v1.yaml v1.0.0 PROPOSED `cli_signature` equation: stages MUST be one of 18 distinct names (plus `layer_output` alias for `post_ffn_residual`), passable as a comma-delimited list. This module pre-commits to the typed enum + parser so the future CLI-wiring PR cannot drift the canonical name set or alias mapping. New module crates/aprender-serve/src/inference_trace/save_tensor_stage.rs: - pub enum SaveTensorStage with 18 variants matching the contract - pub const ALL: [SaveTensorStage; 18] in canonical contract order - canonical_name() -> &'static str (file naming + CLI help) - is_per_layer() -> bool (16 per-layer, 2 whole-model) - impl FromStr for SaveTensorStage (case-insensitive, trimmed, layer_output → PostFfnResidual alias) - pub fn parse_stage_list(s) -> Result<Vec<SaveTensorStage>, _> (comma-delim with whitespace tolerance) - thiserror-derived StageParseError (Empty, Unknown { got, valid }) 19 unit tests + 1 doctest cover: - Uniqueness: all_eighteen_stages_have_unique_canonical_names - Schema parity: canonical_names_match_contract_enumeration - Roundtrip: from_str_round_trip_for_every_canonical_name - Case insensitivity + trimming - layer_output alias canonicalisation - Empty + unknown rejection - is_per_layer correctness for each stage - 16/2 partition counts match contract topology - FALSIFY-APR-TRACE-SAVE-005 multi-stage parser: * 3-element list parses * whitespace-around-commas tolerated * empty list returns Ok(vec![]) * double-comma rejected * trailing comma rejected * unknown token in middle rejected * duplicates preserved * single stage parses * all 18 in one call parses Live results from cargo test -p aprender-serve --lib inference_trace::save_tensor_stage: test result: ok. 19 passed; 0 failed; 0 ignored Plus 1 passing doctest. Independent of PR #1133 (byte format helpers); both can be merged in either order. Future CLI-wiring PR composes both modules. Five-Whys (Toyota Way): Why 1: SHIP-007 next-session priority is per-stage element-wise diff via apr trace --save-tensor. Why 2: Contract has 5 falsifiers (1 partial-discharged by #1133); FALSIFY-APR-TRACE-SAVE-005 (multi-stage parser) is also partial-dischargeable as a pure module today. Why 3: Stage names + parse semantics are pure-string-in-typed-enum-out. No model state, no I/O. Authorable as a tight ~370 LOC slice. Why 4: Pinning the canonical name set + the layer_output alias mapping NOW means the future CLI wiring cannot accidentally drop or rename a stage. Why 5: §26.8 stack-tool-extension methodology — extend apr in falsifier-sized slices, never one big PR. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…1137) Per apr-cli-trace-save-tensor-v1.yaml v1.0.0 PROPOSED: integration tests that exercise the public API of #1133 byte format + #1135 path helpers exactly as a future apr trace --save-tensor CLI implementation will, and as apr diff --values will when reading the produced files. These complement the unit tests in those modules with public-API-surface assertions, catching regressions that internal tests can miss. New file crates/aprender-serve/tests/save_tensor_integration.rs (5 tests): - falsify_apr_trace_save_002_byte_determinism_two_writes Two writes with identical inputs MUST produce byte-identical files. Partial-discharge of FALSIFY-APR-TRACE-SAVE-002 at the library level. - falsify_apr_trace_save_004_header_format_via_public_api Reads raw file bytes, verifies APRT magic, decodes header via parse_header, asserts header.total_file_size() == actual file size, decodes f32 LE body element-wise. Partial-discharge of FALSIFY-APR- TRACE-SAVE-004 at the public-API surface (complements unit tests). - falsify_apr_trace_save_005_three_stages_one_layer_independent_files Three writes (embedding, ffn_gate, ffn_swigl) at layer 0 produce exactly 3 distinct .bin files under layer-0/, each with its own correct dim_product. Partial-discharge of FALSIFY-APR-TRACE-SAVE-005 at the filesystem level (complements parser-level test in #1134). - whole_model_stages_dont_collide_with_per_layer_zero Writes lm_head at WHOLE_MODEL_LAYER and at layer=0; verifies both files exist at distinct paths and preserve their own dim_product values. Defends against future bugs where the WHOLE_MODEL_LAYER sentinel is miscompared at the path-builder layer. - parse_header_on_truncated_file_errors_via_public_api Writes 8 bytes of an APRT header (truncated below the 12-byte minimum); parse_header MUST error cleanly. Defends against silent zero-fill on filesystem corruption. Live results from cargo test -p aprender-serve --test save_tensor_integration: test result: ok. 5 passed; 0 failed. Save-tensor contract progress: - 4 lib modules merged/in-flight (#1133 + #1134 + #1135 + #1136) - 2 public-API integration tests added (this PR) - Independent of all in-flight save-tensor PRs (#1134, #1136); compiles against the modules already in main from #1133 + #1135. Five-Whys (Toyota Way): Why 1: SHIP-007 next-session priority is per-stage element-wise diff. Why 2: Lib-level unit tests cover internal-state invariants well; but a public-API caller can violate invariants the unit tests can't see (e.g., header offsets at the byte level). Why 3: Integration tests against the same public surface that the future CLI uses catch a different regression class. Why 4: 3 of the 5 falsifiers (002, 004, 005) are partial-dischargeable at the integration level today without waiting on the CLI. Why 5: §26.8 stack-tool-extension methodology — extend apr in falsifier-sized slices. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) April 29, 2026 12:00

noahgift merged commit 025f604 into main Apr 29, 2026
11 checks passed

noahgift deleted the feat/apr-trace-save-tensor-byte-format branch April 29, 2026 12:22

This was referenced Apr 29, 2026

feat(aprender-serve): apr-cli-trace-save-tensor-v1 stage enum + comma parser #1134

Merged

feat(aprender-serve): apr-cli-trace-save-tensor-v1 output path helpers #1135

Merged

This was referenced Apr 29, 2026

feat(aprender-serve): apr-cli-trace-save-tensor-v1 write_stage_file composition #1136

Merged

test(aprender-serve): apr-cli-trace-save-tensor-v1 integration tests #1137

Merged

noahgift mentioned this pull request May 2, 2026

feat(apr-cli): SHIP-007 PR-A — apr trace --save-tensor clap surface #1405

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(aprender-serve): apr-cli-trace-save-tensor-v1 byte format helpers#1133

feat(aprender-serve): apr-cli-trace-save-tensor-v1 byte format helpers#1133
noahgift merged 1 commit into
mainfrom
feat/apr-trace-save-tensor-byte-format

noahgift commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 29, 2026

Summary

What's added

File format (per contract `byte_format` equation)

Tests (16 unit tests, all green)

Live verification

Why this is small

Five-Whys (Toyota Way)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant