Skip to content

test(aprender-serve): apr-cli-trace-save-tensor-v1 integration tests#1137

Merged
noahgift merged 4 commits into
mainfrom
feat/apr-trace-save-tensor-integration-tests
Apr 29, 2026
Merged

test(aprender-serve): apr-cli-trace-save-tensor-v1 integration tests#1137
noahgift merged 4 commits into
mainfrom
feat/apr-trace-save-tensor-integration-tests

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Per `apr-cli-trace-save-tensor-v1.yaml` v1.0.0 PROPOSED: integration tests that exercise the public API of #1133 byte format + #1135 path helpers exactly as a future `apr trace --save-tensor` CLI implementation will, and as `apr diff --values` will when reading the produced files. These complement the unit tests in those modules with public-API-surface assertions, catching regressions that internal tests can miss.

Discharge map

Falsifier Discharge level Test
FALSIFY-APR-TRACE-SAVE-002 (determinism) partial `falsify_apr_trace_save_002_byte_determinism_two_writes`
FALSIFY-APR-TRACE-SAVE-004 (header self-describing) partial `falsify_apr_trace_save_004_header_format_via_public_api`
FALSIFY-APR-TRACE-SAVE-005 (multi-stage in one run) partial `falsify_apr_trace_save_005_three_stages_one_layer_independent_files`

"Partial" because full discharge requires the live CLI implementation that calls these helpers from inside the forward pass.

Tests added

New file `crates/aprender-serve/tests/save_tensor_integration.rs` (5 tests):

  1. `falsify_apr_trace_save_002_byte_determinism_two_writes` — two writes with identical inputs MUST produce byte-identical files.

  2. `falsify_apr_trace_save_004_header_format_via_public_api` — reads raw file bytes, verifies APRT magic, decodes header via `parse_header`, asserts `header.total_file_size() == actual file size`, decodes f32 LE body element-wise.

  3. `falsify_apr_trace_save_005_three_stages_one_layer_independent_files` — three writes (`embedding`, `ffn_gate`, `ffn_swigl`) at layer 0 produce exactly 3 distinct `.bin` files under `layer-0/`, each with its own correct dim_product.

  4. `whole_model_stages_dont_collide_with_per_layer_zero` — writes `lm_head` at `WHOLE_MODEL_LAYER` and at `layer=0`; verifies both files exist at distinct paths and preserve their own dim_product values.

  5. `parse_header_on_truncated_file_errors_via_public_api` — writes 8 bytes of an APRT header (truncated below the 12-byte minimum); `parse_header` MUST error cleanly.

Live verification

```
$ cargo test -p aprender-serve --test save_tensor_integration
test result: ok. 5 passed; 0 failed; 0 ignored
```

Why this is small

This PR is tight: 1 new file (~200 LOC). No CLI surface change. No behavior change to existing binaries. Independent of all in-flight save-tensor PRs (#1134, #1136); compiles against the modules already in main from #1133 + #1135.

Five-Whys (Toyota Way)

  1. SHIP-007 next-session priority is per-stage element-wise diff via `apr trace --save-tensor`.
  2. Lib-level unit tests cover internal-state invariants well; but a public-API caller can violate invariants the unit tests can't see (e.g., header offsets at the byte level).
  3. Integration tests against the same public surface that the future CLI uses catch a different regression class.
  4. 3 of the 5 falsifiers (002, 004, 005) are partial-dischargeable at the integration level today without waiting on the CLI.
  5. §26.8 stack-tool-extension methodology — extend apr in falsifier-sized slices.

Test plan

  • `cargo test -p aprender-serve --test save_tensor_integration` — 5 pass green
  • `cargo fmt -p aprender-serve` — formatted
  • Pre-commit quality gates passed

🤖 Generated with Claude Code

Per apr-cli-trace-save-tensor-v1.yaml v1.0.0 PROPOSED: integration tests
that exercise the public API of #1133 byte format + #1135 path helpers
exactly as a future apr trace --save-tensor CLI implementation will, and
as apr diff --values will when reading the produced files. These complement
the unit tests in those modules with public-API-surface assertions, catching
regressions that internal tests can miss.

New file crates/aprender-serve/tests/save_tensor_integration.rs (5 tests):

- falsify_apr_trace_save_002_byte_determinism_two_writes
  Two writes with identical inputs MUST produce byte-identical files.
  Partial-discharge of FALSIFY-APR-TRACE-SAVE-002 at the library level.

- falsify_apr_trace_save_004_header_format_via_public_api
  Reads raw file bytes, verifies APRT magic, decodes header via
  parse_header, asserts header.total_file_size() == actual file size,
  decodes f32 LE body element-wise. Partial-discharge of FALSIFY-APR-
  TRACE-SAVE-004 at the public-API surface (complements unit tests).

- falsify_apr_trace_save_005_three_stages_one_layer_independent_files
  Three writes (embedding, ffn_gate, ffn_swigl) at layer 0 produce
  exactly 3 distinct .bin files under layer-0/, each with its own
  correct dim_product. Partial-discharge of FALSIFY-APR-TRACE-SAVE-005
  at the filesystem level (complements parser-level test in #1134).

- whole_model_stages_dont_collide_with_per_layer_zero
  Writes lm_head at WHOLE_MODEL_LAYER and at layer=0; verifies both files
  exist at distinct paths and preserve their own dim_product values.
  Defends against future bugs where the WHOLE_MODEL_LAYER sentinel is
  miscompared at the path-builder layer.

- parse_header_on_truncated_file_errors_via_public_api
  Writes 8 bytes of an APRT header (truncated below the 12-byte minimum);
  parse_header MUST error cleanly. Defends against silent zero-fill on
  filesystem corruption.

Live results from cargo test -p aprender-serve --test save_tensor_integration:
  test result: ok. 5 passed; 0 failed.

Save-tensor contract progress:
- 4 lib modules merged/in-flight (#1133 + #1134 + #1135 + #1136)
- 2 public-API integration tests added (this PR)
- Independent of all in-flight save-tensor PRs (#1134, #1136); compiles
  against the modules already in main from #1133 + #1135.

Five-Whys (Toyota Way):
  Why 1: SHIP-007 next-session priority is per-stage element-wise diff.
  Why 2: Lib-level unit tests cover internal-state invariants well; but
         a public-API caller can violate invariants the unit tests can't
         see (e.g., header offsets at the byte level).
  Why 3: Integration tests against the same public surface that the
         future CLI uses catch a different regression class.
  Why 4: 3 of the 5 falsifiers (002, 004, 005) are partial-dischargeable
         at the integration level today without waiting on the CLI.
  Why 5: §26.8 stack-tool-extension methodology — extend apr in
         falsifier-sized slices.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) April 29, 2026 13:33
@noahgift noahgift merged commit 06ca28d into main Apr 29, 2026
10 checks passed
@noahgift noahgift deleted the feat/apr-trace-save-tensor-integration-tests branch April 29, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant