Skip to content

feat(apr-cli): apr stamp subcommand — wire stamp_provenance_bytes helper to CLI#1051

Merged
noahgift merged 1 commit into
feat/apr-stamp-provenancefrom
feat/apr-stamp-cli
Apr 25, 2026
Merged

feat(apr-cli): apr stamp subcommand — wire stamp_provenance_bytes helper to CLI#1051
noahgift merged 1 commit into
feat/apr-stamp-provenancefrom
feat/apr-stamp-cli

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Wires apr stamp <input.apr> --license X --data-source Y --data-license Z --output <out.apr> over the aprender::format::v2::stamp_provenance_bytes helper from PR #1050.

Stacked on PR #1050. Auto-merge of #1050 will cascade into this branch's base.

Live dogfood on the actual MODEL-1 teacher (RTX 4090 host)

$ apr stamp /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr \
    --license "Apache-2.0" \
    --data-source "huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct" \
    --data-license "Apache-2.0" \
    --output /tmp/stamped.apr \
    --json
{
  "command":      "stamp",
  "input_bytes":  8035635524,
  "output_bytes": 8035635652,
  "tensor_count": 339,
  "stamped": {
    "license":      "Apache-2.0",
    "data_source":  "huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct",
    "data_license": "Apache-2.0"
  }
}

Verified via apr inspect: shipped teacher had license/data_source/data_license: (missing); stamped output has all three populated. 339/339 tensors preserved, +128 bytes (JSON metadata expansion on 7.48 GiB file), checksum validates.

Tests (5/5 PASS)

  • stamp_cli_populates_all_three_fields
  • stamp_cli_rejects_empty_patch — explicit "at least one" error; output NOT created on failure
  • stamp_cli_rejects_missing_input — surfaces CliError::FileNotFound
  • stamp_cli_rejects_existing_output_without_force — error mentions --force; pre-existing content untouched
  • stamp_cli_overwrites_existing_output_with_force--force works correctly

What this PR does NOT include

  • Re-stamping the published teacher artifact + re-uploading to HF + refreshing publish-manifest sha256 + retriggering EX-04..EX-07. That is the release-cycle portion of SHIP-009 full discharge — this PR is the tooling.

Spec reference

docs/specifications/aprender-train/ship-two-models-spec.md §v2.52.0 atomic next action (2) "Teacher provenance gap".

Closes

CLI portion of task #142 / task #141 follow-up.

🤖 Generated with Claude Code

…per to CLI

Wraps `aprender::format::v2::stamp_provenance_bytes` (PR #1050) so the
shipped MODEL-1 teacher and any other pre-`GATE-APR-PROV-001..003` `.apr`
can have its `license` / `data_source` / `data_license` populated
post-hoc directly from the shell.

## Surface

```
apr stamp <FILE>                                          \
    --license      <SPDX>                                 \
    --data-source  <URL_OR_IDENTIFIER>                    \
    --data-license <SPDX>                                 \
    --output       <OUT.apr>                              \
    [--force]                                             \
    [--json]
```

At least one of the three provenance flags must be set; an empty patch
is rejected up-front with a clear CLI error so callers cannot
accidentally produce a no-op rewrite. After writing, the output is
re-read and parsed to confirm round-trip integrity (a stamped file that
no longer parses is a hard ship-blocker — fail fast).

## Live dogfood (RTX 4090, noah-Lambda-Vector)

```
$ apr stamp /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr \
    --license "Apache-2.0" \
    --data-source "huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct" \
    --data-license "Apache-2.0" \
    --output /tmp/stamped.apr \
    --json
{
  "command": "stamp",
  "input_bytes": 8035635524,
  "output_bytes": 8035635652,
  "tensor_count": 339,
  "stamped": {
    "license": "Apache-2.0",
    "data_source": "huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct",
    "data_license": "Apache-2.0"
  }
}

$ apr inspect /tmp/stamped.apr | grep -A4 Provenance
  Provenance:
    license: Apache-2.0
    data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct
    data_license: Apache-2.0
```

Compared to the input artifact:
```
$ apr inspect <input> | grep -A4 Provenance
  Provenance:
    license:      (missing)
    data_source:  (missing)
    data_license: (missing)
```

— exactly the gap §v2.52.0 atomic next action (2) called out.

Output size grew by 128 bytes (metadata JSON expansion) on a 7.48 GiB
file, all 339 tensors preserved, header LAYOUT_ROW_MAJOR flag retained,
checksum validates.

## Tests (5/5 PASS)

- `stamp_cli_populates_all_three_fields` — end-to-end happy path.
- `stamp_cli_rejects_empty_patch` — explicit "at least one" CLI error;
  output file is NOT created on failure.
- `stamp_cli_rejects_missing_input` — surfaces `CliError::FileNotFound`.
- `stamp_cli_rejects_existing_output_without_force` — error mentions
  `--force`; pre-existing content is untouched.
- `stamp_cli_overwrites_existing_output_with_force` — `--force` works,
  resulting file parses as valid APR with patched license.

## What this does NOT include

- Re-stamping the published teacher artifact + re-uploading to HF +
  refreshing the publish-manifest sha256 + retriggering EX-04..EX-07.
  That is the release-cycle portion of SHIP-009 full discharge — this
  PR is the tooling.

## Stacking

Stacked on `feat/apr-stamp-provenance` (PR #1050). Auto-merge of #1050
will cascade into this branch's base.

Spec reference: §v2.52.0 atomic next action (2) "Teacher provenance gap".
Closes the CLI portion of task #142 / task #141 follow-up.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit f4ca457 into feat/apr-stamp-provenance Apr 25, 2026
1 check passed
@noahgift noahgift deleted the feat/apr-stamp-cli branch April 25, 2026 07:13
noahgift added a commit that referenced this pull request Apr 25, 2026
…scharge enabler (#1050)

* feat(format): apr_v2 stamp_provenance_bytes helper — SHIP-009 full-discharge enabler

Adds a pure Rust helper to patch `license` / `data_source` / `data_license`
fields on an existing APR v2 buffer, returning a re-serialized buffer
with the same tensor bytes, the same header flags (QUANTIZED, HAS_VOCAB,
HAS_MODEL_CARD, etc.), and the LAYOUT-002 jidoka guard always engaged.

## Why now

The shipped MODEL-1 teacher (`paiml/qwen2.5-coder-7b-apache-q4k-v1`,
`/mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr`)
was built at commit `06a3eae38` (spec v2.11.0) — before the
`GATE-APR-PROV-001/002/003` provenance-writing gates shipped at commit
`8f0607d42` (post-v2.19 evidence branch, task #113).

Consequence: `apr inspect` reports
  `Provenance: license: (missing), data_source: (missing), data_license: (missing)`

Which means `GATE-APR-PROV-004` (the algorithm gate for `AC-SHIP1-009`)
fails at full-discharge time — the `(None, None, None)` triple trips the
"at least one None" counter-example class. SHIP-009 is stuck at
PARTIAL_ALGORITHM_LEVEL with no tooling path to close it.

This commit closes the tooling gap. The release-cycle portion
(re-stamp the teacher → re-upload to HF → refresh publish-manifest
sha256 → retrigger EX-04..EX-07) is a separate follow-up; that is
full-discharge, not this scaffolding.

## Design

- Public surface: `pub fn stamp_provenance_bytes(input: &[u8], patch:
  &ProvenancePatch) -> Result<Vec<u8>, V2FormatError>` and
  `pub struct ProvenancePatch { license, data_source, data_license }`.
- Empty patch (`!patch.has_any()`) is rejected up-front so callers
  cannot accidentally rewrite without changing the artifact.
- Header flags are carried across via new `AprV2Writer::set_header_flags`
  (2-LOC addition; LAYOUT_ROW_MAJOR is always OR-ed in regardless of
  input so the LAYOUT-002 jidoka never disengages).
- Tensor bytes are copied verbatim — no quantize/dequantize round-trip.

## Tests (all PASS, 6/6)

- `stamp_populates_all_three_fields_when_source_is_unpopulated` —
  happy path for the exact teacher-gap scenario.
- `stamp_preserves_tensor_data_byte_for_byte` — regression guard against
  accidental f32/bytes round-tripping.
- `stamp_preserves_header_flags` — QUANTIZED | HAS_VOCAB survive; also
  asserts LAYOUT_ROW_MAJOR stays engaged.
- `stamp_rejects_empty_patch` — `ProvenancePatch::default()` is rejected
  with an explicit error message.
- `stamp_allows_partial_patch_leaving_other_fields_unchanged` —
  patching only `data_source` preserves an already-set `license` and
  leaves `data_license` at `None`.
- `stamp_is_idempotent_under_identical_patch` — applying the same
  patch twice yields byte-identical output.

## What this commit does NOT do

- No CLI subcommand yet. Wiring `apr stamp <input.apr>
  --license X --data-source Y --data-license Z --output <out.apr>`
  is a follow-up under task #141.
- No dogfood run on the 7.48 GiB teacher. The 6 unit tests prove the
  logic on synthetic buffers; a live run is release-cycle work.
- No contract extension (no new `GATE-APR-PROV-005` or similar).
  `apr-provenance-v1` v1.1.0's existing `GATE-APR-PROV-004` is what
  gates full discharge; the stamp tool enables the evidence, it is
  not itself a new contract rule.

Spec reference: `docs/specifications/aprender-train/ship-two-models-spec.md`
§v2.52.0 atomic next action (2) "Teacher provenance gap".

Closes scaffolding portion of task #141.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(apr-cli): apr stamp subcommand — wire stamp_provenance_bytes helper to CLI (#1051)

Wraps `aprender::format::v2::stamp_provenance_bytes` (PR #1050) so the
shipped MODEL-1 teacher and any other pre-`GATE-APR-PROV-001..003` `.apr`
can have its `license` / `data_source` / `data_license` populated
post-hoc directly from the shell.

## Surface

```
apr stamp <FILE>                                          \
    --license      <SPDX>                                 \
    --data-source  <URL_OR_IDENTIFIER>                    \
    --data-license <SPDX>                                 \
    --output       <OUT.apr>                              \
    [--force]                                             \
    [--json]
```

At least one of the three provenance flags must be set; an empty patch
is rejected up-front with a clear CLI error so callers cannot
accidentally produce a no-op rewrite. After writing, the output is
re-read and parsed to confirm round-trip integrity (a stamped file that
no longer parses is a hard ship-blocker — fail fast).

## Live dogfood (RTX 4090, noah-Lambda-Vector)

```
$ apr stamp /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr \
    --license "Apache-2.0" \
    --data-source "huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct" \
    --data-license "Apache-2.0" \
    --output /tmp/stamped.apr \
    --json
{
  "command": "stamp",
  "input_bytes": 8035635524,
  "output_bytes": 8035635652,
  "tensor_count": 339,
  "stamped": {
    "license": "Apache-2.0",
    "data_source": "huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct",
    "data_license": "Apache-2.0"
  }
}

$ apr inspect /tmp/stamped.apr | grep -A4 Provenance
  Provenance:
    license: Apache-2.0
    data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct
    data_license: Apache-2.0
```

Compared to the input artifact:
```
$ apr inspect <input> | grep -A4 Provenance
  Provenance:
    license:      (missing)
    data_source:  (missing)
    data_license: (missing)
```

— exactly the gap §v2.52.0 atomic next action (2) called out.

Output size grew by 128 bytes (metadata JSON expansion) on a 7.48 GiB
file, all 339 tensors preserved, header LAYOUT_ROW_MAJOR flag retained,
checksum validates.

## Tests (5/5 PASS)

- `stamp_cli_populates_all_three_fields` — end-to-end happy path.
- `stamp_cli_rejects_empty_patch` — explicit "at least one" CLI error;
  output file is NOT created on failure.
- `stamp_cli_rejects_missing_input` — surfaces `CliError::FileNotFound`.
- `stamp_cli_rejects_existing_output_without_force` — error mentions
  `--force`; pre-existing content is untouched.
- `stamp_cli_overwrites_existing_output_with_force` — `--force` works,
  resulting file parses as valid APR with patched license.

## What this does NOT include

- Re-stamping the published teacher artifact + re-uploading to HF +
  refreshing the publish-manifest sha256 + retriggering EX-04..EX-07.
  That is the release-cycle portion of SHIP-009 full discharge — this
  PR is the tooling.

## Stacking

Stacked on `feat/apr-stamp-provenance` (PR #1050). Auto-merge of #1050
will cascade into this branch's base.

Spec reference: §v2.52.0 atomic next action (2) "Teacher provenance gap".
Closes the CLI portion of task #142 / task #141 follow-up.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* fix(apr-cli): register `stamp` in CLI contract + test list (FALSIFY-CLI-002/005)

Previous commit added the `apr stamp` subcommand but missed two
sister entries that the contract-coverage tests require:

1. `contracts/apr-cli-commands-v1.yaml` — added a `stamp` entry under
   `model_ops` with `requires_model: true` and `side_effects:
   [filesystem]` to mirror `convert`.
2. `crates/apr-cli/tests/cli_commands.rs::registered_commands()` —
   added `"stamp"` next to `"convert"` so FALSIFY-CLI-002
   (no-unregistered-commands) and FALSIFY-CLI-005 (count-matches) pass.

CI on PR #1050 caught both via `apr-cli/tests/cli_commands.rs`:

  thread 'test_no_unregistered_commands' panicked:
    FALSIFY-CLI-002: Commands in `apr --help` but not in contract: ["stamp"]

  thread 'test_command_count_matches' panicked:
    FALSIFY-CLI-005: Command count mismatch.
    `apr --help` has 79 commands, contract has 78.

Both now PASS locally (`cargo test -p apr-cli --test cli_commands` →
6/6 PASS).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant