feat(apr-cli): apr stamp subcommand — wire stamp_provenance_bytes helper to CLI#1051
Merged
Merged
Conversation
…per to CLI Wraps `aprender::format::v2::stamp_provenance_bytes` (PR #1050) so the shipped MODEL-1 teacher and any other pre-`GATE-APR-PROV-001..003` `.apr` can have its `license` / `data_source` / `data_license` populated post-hoc directly from the shell. ## Surface ``` apr stamp <FILE> \ --license <SPDX> \ --data-source <URL_OR_IDENTIFIER> \ --data-license <SPDX> \ --output <OUT.apr> \ [--force] \ [--json] ``` At least one of the three provenance flags must be set; an empty patch is rejected up-front with a clear CLI error so callers cannot accidentally produce a no-op rewrite. After writing, the output is re-read and parsed to confirm round-trip integrity (a stamped file that no longer parses is a hard ship-blocker — fail fast). ## Live dogfood (RTX 4090, noah-Lambda-Vector) ``` $ apr stamp /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr \ --license "Apache-2.0" \ --data-source "huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct" \ --data-license "Apache-2.0" \ --output /tmp/stamped.apr \ --json { "command": "stamp", "input_bytes": 8035635524, "output_bytes": 8035635652, "tensor_count": 339, "stamped": { "license": "Apache-2.0", "data_source": "huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct", "data_license": "Apache-2.0" } } $ apr inspect /tmp/stamped.apr | grep -A4 Provenance Provenance: license: Apache-2.0 data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct data_license: Apache-2.0 ``` Compared to the input artifact: ``` $ apr inspect <input> | grep -A4 Provenance Provenance: license: (missing) data_source: (missing) data_license: (missing) ``` — exactly the gap §v2.52.0 atomic next action (2) called out. Output size grew by 128 bytes (metadata JSON expansion) on a 7.48 GiB file, all 339 tensors preserved, header LAYOUT_ROW_MAJOR flag retained, checksum validates. ## Tests (5/5 PASS) - `stamp_cli_populates_all_three_fields` — end-to-end happy path. - `stamp_cli_rejects_empty_patch` — explicit "at least one" CLI error; output file is NOT created on failure. - `stamp_cli_rejects_missing_input` — surfaces `CliError::FileNotFound`. - `stamp_cli_rejects_existing_output_without_force` — error mentions `--force`; pre-existing content is untouched. - `stamp_cli_overwrites_existing_output_with_force` — `--force` works, resulting file parses as valid APR with patched license. ## What this does NOT include - Re-stamping the published teacher artifact + re-uploading to HF + refreshing the publish-manifest sha256 + retriggering EX-04..EX-07. That is the release-cycle portion of SHIP-009 full discharge — this PR is the tooling. ## Stacking Stacked on `feat/apr-stamp-provenance` (PR #1050). Auto-merge of #1050 will cascade into this branch's base. Spec reference: §v2.52.0 atomic next action (2) "Teacher provenance gap". Closes the CLI portion of task #142 / task #141 follow-up. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
Apr 25, 2026
…scharge enabler (#1050) * feat(format): apr_v2 stamp_provenance_bytes helper — SHIP-009 full-discharge enabler Adds a pure Rust helper to patch `license` / `data_source` / `data_license` fields on an existing APR v2 buffer, returning a re-serialized buffer with the same tensor bytes, the same header flags (QUANTIZED, HAS_VOCAB, HAS_MODEL_CARD, etc.), and the LAYOUT-002 jidoka guard always engaged. ## Why now The shipped MODEL-1 teacher (`paiml/qwen2.5-coder-7b-apache-q4k-v1`, `/mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr`) was built at commit `06a3eae38` (spec v2.11.0) — before the `GATE-APR-PROV-001/002/003` provenance-writing gates shipped at commit `8f0607d42` (post-v2.19 evidence branch, task #113). Consequence: `apr inspect` reports `Provenance: license: (missing), data_source: (missing), data_license: (missing)` Which means `GATE-APR-PROV-004` (the algorithm gate for `AC-SHIP1-009`) fails at full-discharge time — the `(None, None, None)` triple trips the "at least one None" counter-example class. SHIP-009 is stuck at PARTIAL_ALGORITHM_LEVEL with no tooling path to close it. This commit closes the tooling gap. The release-cycle portion (re-stamp the teacher → re-upload to HF → refresh publish-manifest sha256 → retrigger EX-04..EX-07) is a separate follow-up; that is full-discharge, not this scaffolding. ## Design - Public surface: `pub fn stamp_provenance_bytes(input: &[u8], patch: &ProvenancePatch) -> Result<Vec<u8>, V2FormatError>` and `pub struct ProvenancePatch { license, data_source, data_license }`. - Empty patch (`!patch.has_any()`) is rejected up-front so callers cannot accidentally rewrite without changing the artifact. - Header flags are carried across via new `AprV2Writer::set_header_flags` (2-LOC addition; LAYOUT_ROW_MAJOR is always OR-ed in regardless of input so the LAYOUT-002 jidoka never disengages). - Tensor bytes are copied verbatim — no quantize/dequantize round-trip. ## Tests (all PASS, 6/6) - `stamp_populates_all_three_fields_when_source_is_unpopulated` — happy path for the exact teacher-gap scenario. - `stamp_preserves_tensor_data_byte_for_byte` — regression guard against accidental f32/bytes round-tripping. - `stamp_preserves_header_flags` — QUANTIZED | HAS_VOCAB survive; also asserts LAYOUT_ROW_MAJOR stays engaged. - `stamp_rejects_empty_patch` — `ProvenancePatch::default()` is rejected with an explicit error message. - `stamp_allows_partial_patch_leaving_other_fields_unchanged` — patching only `data_source` preserves an already-set `license` and leaves `data_license` at `None`. - `stamp_is_idempotent_under_identical_patch` — applying the same patch twice yields byte-identical output. ## What this commit does NOT do - No CLI subcommand yet. Wiring `apr stamp <input.apr> --license X --data-source Y --data-license Z --output <out.apr>` is a follow-up under task #141. - No dogfood run on the 7.48 GiB teacher. The 6 unit tests prove the logic on synthetic buffers; a live run is release-cycle work. - No contract extension (no new `GATE-APR-PROV-005` or similar). `apr-provenance-v1` v1.1.0's existing `GATE-APR-PROV-004` is what gates full discharge; the stamp tool enables the evidence, it is not itself a new contract rule. Spec reference: `docs/specifications/aprender-train/ship-two-models-spec.md` §v2.52.0 atomic next action (2) "Teacher provenance gap". Closes scaffolding portion of task #141. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(apr-cli): apr stamp subcommand — wire stamp_provenance_bytes helper to CLI (#1051) Wraps `aprender::format::v2::stamp_provenance_bytes` (PR #1050) so the shipped MODEL-1 teacher and any other pre-`GATE-APR-PROV-001..003` `.apr` can have its `license` / `data_source` / `data_license` populated post-hoc directly from the shell. ## Surface ``` apr stamp <FILE> \ --license <SPDX> \ --data-source <URL_OR_IDENTIFIER> \ --data-license <SPDX> \ --output <OUT.apr> \ [--force] \ [--json] ``` At least one of the three provenance flags must be set; an empty patch is rejected up-front with a clear CLI error so callers cannot accidentally produce a no-op rewrite. After writing, the output is re-read and parsed to confirm round-trip integrity (a stamped file that no longer parses is a hard ship-blocker — fail fast). ## Live dogfood (RTX 4090, noah-Lambda-Vector) ``` $ apr stamp /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr \ --license "Apache-2.0" \ --data-source "huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct" \ --data-license "Apache-2.0" \ --output /tmp/stamped.apr \ --json { "command": "stamp", "input_bytes": 8035635524, "output_bytes": 8035635652, "tensor_count": 339, "stamped": { "license": "Apache-2.0", "data_source": "huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct", "data_license": "Apache-2.0" } } $ apr inspect /tmp/stamped.apr | grep -A4 Provenance Provenance: license: Apache-2.0 data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct data_license: Apache-2.0 ``` Compared to the input artifact: ``` $ apr inspect <input> | grep -A4 Provenance Provenance: license: (missing) data_source: (missing) data_license: (missing) ``` — exactly the gap §v2.52.0 atomic next action (2) called out. Output size grew by 128 bytes (metadata JSON expansion) on a 7.48 GiB file, all 339 tensors preserved, header LAYOUT_ROW_MAJOR flag retained, checksum validates. ## Tests (5/5 PASS) - `stamp_cli_populates_all_three_fields` — end-to-end happy path. - `stamp_cli_rejects_empty_patch` — explicit "at least one" CLI error; output file is NOT created on failure. - `stamp_cli_rejects_missing_input` — surfaces `CliError::FileNotFound`. - `stamp_cli_rejects_existing_output_without_force` — error mentions `--force`; pre-existing content is untouched. - `stamp_cli_overwrites_existing_output_with_force` — `--force` works, resulting file parses as valid APR with patched license. ## What this does NOT include - Re-stamping the published teacher artifact + re-uploading to HF + refreshing the publish-manifest sha256 + retriggering EX-04..EX-07. That is the release-cycle portion of SHIP-009 full discharge — this PR is the tooling. ## Stacking Stacked on `feat/apr-stamp-provenance` (PR #1050). Auto-merge of #1050 will cascade into this branch's base. Spec reference: §v2.52.0 atomic next action (2) "Teacher provenance gap". Closes the CLI portion of task #142 / task #141 follow-up. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> * fix(apr-cli): register `stamp` in CLI contract + test list (FALSIFY-CLI-002/005) Previous commit added the `apr stamp` subcommand but missed two sister entries that the contract-coverage tests require: 1. `contracts/apr-cli-commands-v1.yaml` — added a `stamp` entry under `model_ops` with `requires_model: true` and `side_effects: [filesystem]` to mirror `convert`. 2. `crates/apr-cli/tests/cli_commands.rs::registered_commands()` — added `"stamp"` next to `"convert"` so FALSIFY-CLI-002 (no-unregistered-commands) and FALSIFY-CLI-005 (count-matches) pass. CI on PR #1050 caught both via `apr-cli/tests/cli_commands.rs`: thread 'test_no_unregistered_commands' panicked: FALSIFY-CLI-002: Commands in `apr --help` but not in contract: ["stamp"] thread 'test_command_count_matches' panicked: FALSIFY-CLI-005: Command count mismatch. `apr --help` has 79 commands, contract has 78. Both now PASS locally (`cargo test -p apr-cli --test cli_commands` → 6/6 PASS). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires
apr stamp <input.apr> --license X --data-source Y --data-license Z --output <out.apr>over theaprender::format::v2::stamp_provenance_byteshelper from PR #1050.Stacked on PR #1050. Auto-merge of #1050 will cascade into this branch's base.
Live dogfood on the actual MODEL-1 teacher (RTX 4090 host)
Verified via
apr inspect: shipped teacher hadlicense/data_source/data_license: (missing); stamped output has all three populated. 339/339 tensors preserved, +128 bytes (JSON metadata expansion on 7.48 GiB file), checksum validates.Tests (5/5 PASS)
stamp_cli_populates_all_three_fieldsstamp_cli_rejects_empty_patch— explicit "at least one" error; output NOT created on failurestamp_cli_rejects_missing_input— surfacesCliError::FileNotFoundstamp_cli_rejects_existing_output_without_force— error mentions--force; pre-existing content untouchedstamp_cli_overwrites_existing_output_with_force—--forceworks correctlyWhat this PR does NOT include
Spec reference
docs/specifications/aprender-train/ship-two-models-spec.md§v2.52.0 atomic next action (2) "Teacher provenance gap".Closes
CLI portion of task #142 / task #141 follow-up.
🤖 Generated with Claude Code