Skip to content

chore(deps): Bump entrenar from 0.2.6 to 0.2.9 in the production-dependencies group#117

Closed
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/cargo/production-dependencies-1fee42db50
Closed

chore(deps): Bump entrenar from 0.2.6 to 0.2.9 in the production-dependencies group#117
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/cargo/production-dependencies-1fee42db50

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Dec 15, 2025

Copy link
Copy Markdown
Contributor

Bumps the production-dependencies group with 1 update: entrenar.

Updates entrenar from 0.2.6 to 0.2.9

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore <dependency name> major version will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself)
  • @dependabot ignore <dependency name> minor version will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself)
  • @dependabot ignore <dependency name> will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself)
  • @dependabot unignore <dependency name> will remove all of the ignore conditions of the specified dependency
  • @dependabot unignore <dependency name> <ignore condition> will remove the ignore condition of the specified dependency and ignore conditions

Bumps the production-dependencies group with 1 update: [entrenar](https://github.com/paiml/entrenar).


Updates `entrenar` from 0.2.6 to 0.2.9
- [Release notes](https://github.com/paiml/entrenar/releases)
- [Changelog](https://github.com/paiml/entrenar/blob/main/CHANGELOG.md)
- [Commits](https://github.com/paiml/entrenar/commits)

---
updated-dependencies:
- dependency-name: entrenar
  dependency-version: 0.2.9
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: production-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot @github

dependabot Bot commented on behalf of github Dec 15, 2025

Copy link
Copy Markdown
Contributor Author

Labels

The following labels could not be found: dependencies, rust. Please create them before Dependabot can add them to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

@dependabot @github

dependabot Bot commented on behalf of github Dec 16, 2025

Copy link
Copy Markdown
Contributor Author

Looks like entrenar is no longer updatable, so this is no longer needed.

@dependabot dependabot Bot closed this Dec 16, 2025
@dependabot dependabot Bot deleted the dependabot/cargo/production-dependencies-1fee42db50 branch December 16, 2025 18:43
noahgift added a commit that referenced this pull request Feb 10, 2026
…efs GH-219)

Round 24: Zero SATD across all 3 projects (36 violations eliminated).
F-PROFILE-010 fixed: apr qa now reports Ollama parity letter grade.
Bugs #117-120. Popperian Score: 192/206 (93.2%), 12 FALSIFIED.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 19, 2026
…-SHIP2-009)

GATE-ARCH-370M-004 gains evidence_discharged_by + discharge_status:
PARTIAL_ALGORITHM_LEVEL. Three algorithm-level invariants wired without
training:

  1. Coverage — every 370M tensor (219 entries: 1 embed + 1 lm_head +
     9 per-layer × 24 layers + 1 final norm) resolves to a
     TensorContract entry in LayoutContract::new(). Pattern-normalises
     per-layer names; any uncovered tensor would be silently skipped
     by GGUF export.

  2. Row-major ordering (INV-ARCH-370M-009) — every 2D shape is
     [out_dim, in_dim]. Pinned lm_head/embed/q_proj/k_proj shapes
     verify GQA (k_proj = [kv_heads*head_dim, hidden]) and bind the
     370M architecture to the GH-202-regression-proof layout.

  3. Critical-tensor enforcement — validate_apr_shape accepts
     [vocab, hidden] AND rejects reversed [hidden, vocab] on
     lm_head.weight. Proves the validator catches layout bugs, not
     just passes silently.

Full discharge (GGUF cosine-parity on trained 370M, max_logit_cosine
≤ 1e-3 over 100 canary prompts) blocks on compute-dispatch
(AC-SHIP2-003/004). Harness is fixture-swap-ready once a trained .apr
exists — no test rewrite needed. Spec §9 Risk #2 names this exact
mitigation path.

Contract: llama-370m-sovereign-v1.yaml v1.2.0 → v1.3.0, stays ACTIVE.
Tests: 2 new test fns in crates/aprender-train/src/models/llama_370m.rs
(8/8 pass). `pv validate` = 0 errors, 0 warnings.

Closes #117. Binds to AC-SHIP2-009 / FALSIFY-SHIP-019.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 19, 2026
…tone

Records the SHIP-019 algorithm-level PARTIAL discharge (task #117,
commit 846cc1d) in the authoritative spec:

- Version bump 2.21.0 → 2.22.0
- Full amendment block #4 under post-v2.19 evidence window documenting
  GATE-ARCH-370M-004 wired to `layout_contract.rs` algorithm proofs
  (219-tensor coverage + row-major ordering + GH-202 rejection)
- New "counter-example hunting" pattern lesson: prior "exhausted
  PARTIAL levers" verdict was ~86% correct; re-running the 7-gate
  FALSIFY-SHIP survey with explicit counter-example hunting found
  exactly one genuine lever (SHIP-019). SHIP-017/018/020 need compute;
  SHIP-013/014/016 collapse into SHIP-011 wiring.
- Combined MODEL-2 ledger: 3/12 fully ACTIVE + 3/12 PARTIAL = 6/12
  touched (50%). Remaining 6 (003/004/006/007/008/010) all require
  real 370M compute, trained .apr + eval harness, or RTX 4090
  wall-clock benchmark. Genuine algorithm-level PARTIAL harvesting for
  MODEL-2 is now exhausted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 19, 2026
…/019/021/022) + spec v2.19→v2.22 (#898)

## MODEL-2 evidence burst (post-v2.19)

Six SHIP-TWO-001 ship-gate discharges on branch `chore/post-v2.19-evidence`:

| Gate | AC | Status | Commit | Task |
|------|----|--------|--------|------|
| FALSIFY-SHIP-021 | AC-SHIP2-011 | DISCHARGED | `0b8ca8c84` | #112 |
| FALSIFY-SHIP-022 | AC-SHIP2-012 (provenance) | DISCHARGED | `8f0607d42` | #113 |
| FALSIFY-SHIP-011 | AC-SHIP2-001 | DISCHARGED | `338c6eb3c` | #114 |
| FALSIFY-SHIP-012 | AC-SHIP2-002 | PARTIAL_ALGORITHM_LEVEL | `2e8b8b8e2` | #115 |
| FALSIFY-SHIP-015 | AC-SHIP2-005 | PARTIAL_ALGORITHM_LEVEL | `bfb883199` | #116 |
| FALSIFY-SHIP-019 | AC-SHIP2-009 | PARTIAL_ALGORITHM_LEVEL | `846cc1dbb` | #117 |

**Spec:** v2.19.0 → v2.22.0 (4 amendments recorded).

**MODEL-2 ledger after this PR:** 3/12 fully ACTIVE (001, 011, 012) + 3/12 PARTIAL_ALGORITHM_LEVEL (002, 005, 009) = 6/12 touched (50%). Remaining 6 (003/004/006/007/008/010) all require real 370M compute-dispatch, a trained on-disk `.apr` with eval harness, or RTX 4090 wall-clock benchmark — genuine algorithm-level PARTIAL harvesting for MODEL-2 is now exhausted.

**Pattern lessons codified:**
- **PARTIAL-inside-ACTIVE nesting** (SHIP-012/015/019): gates can carry `discharge_status: PARTIAL_ALGORITHM_LEVEL` + `ship_blocking: true` inside contracts that stay ACTIVE via their primary binding gate. Auditors must read both `status:` AND `gates[].discharge_status:`.
- **Counter-example hunting** (SHIP-019): re-run search surveys with explicit counter-example hunting before declaring a space exhausted. Spec §9 Risk mitigations are the highest-leverage hint source.
- **Parallel-safe stdout** (SHIP-022): pure formatter helper (`format_provenance_block`) instead of direct `println!()` so harness tests run in parallel without `gag` races.
- **Seed-mutex for reproducibility** (SHIP-021): `lock_init_seed` mutex fixes global `INIT_SEED` race in parallel tests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
noahgift added a commit that referenced this pull request Apr 19, 2026
…olicy (#901)

* evidence(ship-two-001): MODEL-2 pretrain smoke test — task #105 discharge

Records the end-to-end synthetic drive of `apr pretrain` on commit 1e7cf53
(now landed on main at 9209383 via PR #882 merge). Verifies task #105
deliverable: GATE-TRAIN-005 / INV-TRAIN-007 / GATE-TRAIN-008 wiring is
functional end-to-end.

Run: 20 steps, 4 epochs, batch=4, seq=128 — val_loss monotone 3.96 → 2.64.

Synthetic drive caveat: no real 370M forward pass, no real corpus read, no
checkpoint artifacts written yet. Real corpus + checkpoint wiring tracked as
task #111.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* spec(model-2): MVP plan for task #111 (pretrain real corpus + checkpoint)

7-step edit list from Plan agent afd391d1eb1395d30 against post-#882-merge commit
9209383. Identifies 5 critical files (pretrain.rs, apr-cli/commands/pretrain.rs,
trainer.rs, transformer/model.rs, io/save.rs) and 5 binary acceptance criteria
(AC-111-001..005). Host assignment: lambda-labs (impl), yoga (8GB smoke),
gx10 (parity).

Non-goals explicitly deferred: async H2D streaming, full corpus-ingest pipeline,
mixed-precision scaler tuning, distributed training, convergence budget, resume
round-trip, nvml telemetry, apr qa post-hoc validators.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* evidence(ship-two-001): yoga parity smoke — GATE-TRAIN-006 discharged

Cross-host byte-identical loss history on yoga RTX 4060 Laptop (8GB):
  lambda-labs: [3.96, 3.52, 3.08, 2.64]
  yoga:        [3.96, 3.52, 3.08, 2.64]

Discharges GATE-TRAIN-006 (seed=42 deterministic) across x86_64 RTX 4090 ↔
x86_64 RTX 4060 Laptop. Same synthetic drive — task #111 MVP will add the
real 370M forward pass; yoga stays as 8GB smoke-test host per MVP plan's
host assignment table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): RealStepFn/RealValFn + shard reader (task #111 steps 1-3)

Implements MODEL-2 pretrain MVP plan steps 1-3: the model-agnostic
PretrainLoop now has a real-corpus driver that runs a full forward +
backward + AdamW step through TransformerTrainer against the 370M
Llama scaffold — replacing the LinearDecaySynthetic/ScriptedVal pair
used for GATE-TRAIN-005/006/007/008 wiring verification in task #105.

**New modules**

- `train::shard_reader::ShardBatchIter`
  Streaming iterator over .bin token shards (little-endian u32).
  Reads seq_length+1 sequences, chunks into LMBatch of batch_size.
  Empty-dir errors; lexical shard ordering; EOF auto-advances to next
  shard. No MinHash dedup / PII scrub / license filter — those belong
  to `apr-corpus-ingest run`.

- `train::pretrain_real::{RealStepFn, RealValFn, build_shared_trainer}`
  - `llama_370m_transformer_config()` field-for-field from the frozen
    Llama370MConfig constants (INV-ARCH-370M-001..008 source of truth)
  - `llama_370m_train_config(lr, seq_length, seed)` builds
    TransformerTrainConfig with MODEL-2 v2-remedy defaults
  - `SharedTrainer = Rc<RefCell<TransformerTrainer>>` so both the
    mutable StepFn and the forward-only ValFn own the same model
  - `RealStepFn::step` pulls one LMBatch, runs train_batch, returns
    (loss, grad_norm=1.0 placeholder). Exhausted iterator returns a
    finite (1.0, 1.0) so GATE-TRAIN-007 (NaN/Inf) does not mis-fire
    on shard-stream EOF before the loop plans to stop.
  - `RealValFn::validate` runs forward-only across a held-out Vec,
    returns mean cross-entropy loss (or NaN if held-out is empty).
  - `build_shared_trainer` runs INV-ARCH-370M-001 as a debug_assert
    (param count must land in [366M, 374M]) so any drift in the
    Llama370MConfig constants fails the instant a dev build compiles.

**Contract coverage**

Existing `contracts/training-loop-pretrain-v1.yaml` covers all MVP
obligations already; no new contract needed. Task #111 follow-up will
add per-epoch APR checkpoint hooks (C-TRAIN-PRETRAIN INV-TRAIN-002)
and real optimizer-state sha256 (INV-TRAIN-003).

**Tests**

- shard_reader: single_shard_yields_expected_batch_count,
  empty_dir_errors, multi_shard_ordering_is_lexical
- pretrain_real: transformer_config_matches_llama_370m_constants,
  real_step_fn_exhausted_iterator_returns_finite_placeholder,
  real_val_fn_empty_held_out_returns_nan

All 6 new tests PASS. Steps 4-7 (SafeTensors→APR swap, `apr pretrain`
CLI wiring, real grad_norm, checkpoint hook) to follow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): wire real-corpus drive into apr pretrain (task #111 step 5)

Replaces the `if !synthetic { return Err(...) }` guard with a real
branch: build a shared 370M `TransformerTrainer`, split the shard
stream head-off into a `HELD_OUT_BATCHES`-entry validation set, and
drive the `PretrainLoop` with `RealStepFn`/`RealValFn` (from
`entrenar::train::pretrain_real`) against a `ShardBatchIter`.

**Structure**

- `run` is now a 2-branch dispatcher. `drive_synthetic` preserves the
  deterministic decay drive used for GATE-TRAIN-005/006/007/008 wiring
  verification (task #105). `drive_real` is the new real-corpus path.
- Both branches funnel into `run_and_report<S, V>` which owns the
  `PretrainLoop::new` + `run` + `report` sequence so the terminal
  status propagation (→ exit code) stays single-sourced.

**MVP invariants (documented)**

- `HELD_OUT_BATCHES = 2` — small constant; follow-up will plumb an
  explicit `--val-shards` flag so training and held-out shards are
  disjoint.
- `pad_id = eos_id = 0` — uniform-length sequences take the shared
  layout in `LMBatch::from_sequences`, so pad_id is never used; the
  real tokenizer's special-token ids plumb through in a follow-up.
- Empty dataset dir → `CliError::ValidationFailed` (shard iterator
  init failure), covered by the new test
  `real_mode_empty_dataset_dir_errors`.

**Test changes**

- `real_mode_empty_dataset_dir_errors` replaces the now-obsolete
  `synthetic_mode_false_rejected` test. Both synthetic and validation
  tests continue to pass (3/3 in `commands::pretrain::tests`).

**Remaining MVP steps (task #111)**

- Step 4: swap SafeTensors → APR in `trainer.rs` checkpoint writer.
- Step 6: real optimizer-state sha256 over AdamW m/v/t (INV-TRAIN-003).
- Step 7: per-epoch checkpoint hook in `PretrainLoop::run_epoch`
  post-gate-pass (C-TRAIN-PRETRAIN INV-TRAIN-002).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): CPU save_apr + per-epoch checkpoint hook (task #111 steps 4+7)

Steps 4 and 7 of the MODEL-2 pretrain MVP (SHIP-TWO-001 v2.19.0):

Step 4 — CPU save_apr
- Add `TransformerTrainer::save_apr(path, name, arch)` in
  crates/aprender-train/src/train/transformer_trainer/trainer.rs,
  mirroring the existing CudaTransformerTrainer::save_apr. Emits a
  sovereign row-major .apr via aprender's Model + SaveConfig::Apr.
- Existing `save()` (SafeTensors) left unchanged — three tests at
  trainer/core.rs:388,409 and tests.rs:423 still round-trip via
  safetensors for backward compat.
- Test `save_apr_writes_readable_apr_file`: write a tiny-config
  trainer, open with `AprReader`, assert APR magic (APR\0 / APRN),
  assert `architecture` metadata round-trips, assert
  `model.embed_tokens.weight` readable as f32. PASSES.

Step 7 — per-epoch APR checkpoint hook
- Add `pub trait CheckpointFn` in train/pretrain.rs:
    `fn save(&mut self, epoch, &EpochArtifact) -> Result<(), String>`
- Add `Option<Box<dyn CheckpointFn>>` field to `PretrainLoop` +
  builder method `with_checkpoint_fn`. Keeps PretrainLoop<S,V>
  at two generics (synthetic + real call-sites unify).
- Wire into `run_epoch` AFTER `check_non_divergence(...)?` passes,
  BEFORE `epoch_artifacts.push()`. Aborted epochs never produce
  checkpoint files (per contract `per_epoch_artifacts` invariant).
  Write failures log eprintln but are non-fatal — a flaky disk
  cannot lose training progress.
- Emit companion `metadata.json` (contract path_template).

Real-corpus wiring
- Add `AprCheckpointFn` in train/pretrain_real.rs holding the shared
  `Rc<RefCell<TransformerTrainer>>`; its `save()` delegates to
  `trainer.save_apr()` so the three hooks (RealStepFn, RealValFn,
  AprCheckpointFn) see the same in-memory weights.
- Re-export `CheckpointFn` from train/mod.rs.

CLI
- `apr pretrain` --real path (drive_real): construct
  `build_shared_trainer` once, clone Rc into RealStepFn +
  RealValFn + AprCheckpointFn, pass to `run_and_report`.
- `run_and_report` takes `Option<Box<dyn CheckpointFn>>`; synthetic
  branch passes `None` (no real weights to save).

Tests (all green, 21 pretrain + 4 pretrain_real/save_apr + 3 CLI)
- `pretrain_loop_calls_checkpoint_fn_once_per_passing_epoch`:
  mock `CheckpointFn` counts calls. Every successful epoch fires
  exactly one call; companion metadata.json written to disk.
- `pretrain_loop_skips_checkpoint_on_abort`: NaN step forces
  abort; mock hook recorded zero calls.
- `save_apr_writes_readable_apr_file`: magic + metadata + tensor
  round-trip via AprReader.

Contract discharge
- GATE-TRAIN-005 invariant preserved: checkpoint placement AFTER
  divergence guard means aborted epochs never touch disk.
- training-loop-pretrain-v1 `per_epoch_artifacts.path_template`
  honored: `{run_dir}/ckpt/epoch-{N:03d}.apr` + `.metadata.json`.

Deferred (Step 6)
- `fake_optimizer_sha(epoch)` at pretrain.rs:680 still returns a
  placeholder. INV-TRAIN-003 discharge needs TransformerTrainer
  to expose AdamW m/v/t buffers for a real sha256. Separate step.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): real AdamW optimizer-state sha256 (task #111 step 6)

INV-TRAIN-003 discharge for the MODEL-2 pretrain MVP.

TransformerTrainer::optimizer_state_sha256()
- New accessor in crates/aprender-train/src/train/transformer_trainer/trainer.rs
  that hashes (t, m_buffers, v_buffers) in fixed order.
- Uses sha2::Sha256 + bytemuck::cast_slice over each Array1<f32>.
- Versioned tag "aprender-train:adamw:optstate:v1" prefixes the
  digest so schema changes are loud, not silent.
- Uninitialized slots hash to the literal "none" so missing m[i]
  is semantically distinct from an all-zeros m[i].

StepFn trait extension
- Add `fn optimizer_state_sha256(&self) -> Option<String>` with
  default `None`. Synthetic harnesses keep returning None and
  continue using the `fake_optimizer_sha` epoch/seed fallback.
- `PretrainLoop::run_epoch` now reads `step_fn.optimizer_state_sha256()`
  and falls back to the fake fingerprint only when None.

RealStepFn override
- RealStepFn in pretrain_real.rs implements the new hook by
  delegating to `trainer.borrow().optimizer_state_sha256()`, so
  the real-corpus path records the actual AdamW digest.

Tests (all 25 + 3 green)
- `optimizer_state_sha256_is_hex_digest_on_fresh_trainer`: 64-char
  lowercase hex shape check on an un-stepped trainer.
- `optimizer_state_sha256_is_stable_across_fresh_trainers`: two
  fresh trainers hash to the same digest (reproducibility).
- `pretrain_loop_uses_step_fn_optimizer_sha_when_available`:
  a StepFn with override wins over fake_optimizer_sha.
- `pretrain_loop_falls_back_to_fake_optimizer_sha_for_synthetic`:
  default impl still produces a 64-char hex digest via fallback.

Task #111 MVP status
- Steps 1-3 shipped in commit b2b0329
- Step 5 shipped in commit e5a2f02
- Steps 4+7 shipped in commit 89db4b3
- Step 6 shipped in this commit
- All 7 steps of the task #111 plan are now committed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): FALSIFY-SHIP-021 seed=0 × 100-step reproducibility harness

Discharges GATE-TRAIN-006 / INV-TRAIN-006 from training-loop-pretrain-v1
(bumped 1.0.0 → 1.1.0 PROPOSED → ACTIVE).

Two new Rust tests in crates/aprender-train/src/train/transformer_trainer/tests.rs:
- falsify_ship_021_seed_0_100_step_reproducibility: two trainers built with
  seed=0 produce identical finite losses for 100 consecutive train_batch
  calls (|Δ| ≤ 1e-6) AND identical AdamW optimizer_state_sha256 digests.
- falsify_ship_021_different_seeds_do_diverge: seed=0 vs seed=1 counter-test
  must diverge > 1e-4 within 10 steps (guards against degenerate "always
  equal" implementations).

Seed plumbing fixes:
- TransformerTrainer::new now calls lock_init_seed(config.seed) before
  Transformer::new so direct (non-YAML) callers honor the configured seed
  instead of silently inheriting the global default of 42.
- transformer::init::INIT_SEED_LOCK (std::sync::Mutex) + lock_init_seed
  helper returning a #[must_use] MutexGuard. Held across the full
  Transformer::new call so cargo test's default parallel runner cannot
  clobber the global atomic INIT_SEED between one test's set_init_seed
  and another test's weight-init reads. Poisoned mutex is recovered
  transparently (seed itself is atomic; poison only signals prior panic).

Contract uplift (contracts/training-loop-pretrain-v1.yaml v1.1.0):
- status PROPOSED → ACTIVE
- INV-TRAIN-006 gains harness: block naming both test paths + assertions
- GATE-TRAIN-006 gains evidence_discharged_by: pointing to both tests
- metadata.changelog entry recording the discharge

Verification:
  cargo test -p aprender-train --lib falsify_ship_021 → 2 passed
  cargo clippy -p aprender-train --lib --no-deps -- -D warnings → clean
  pv validate contracts/training-loop-pretrain-v1.yaml → 0 errors

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ship-two): FALSIFY-SHIP-022 apr inspect provenance (AC-SHIP2-012)

Discharges FALSIFY-SHIP-022: apr inspect surfaces license + data_source
+ data_license on every .apr, with "(missing)" / null rendering when a
field is absent rather than silent skip. Makes a .apr binary a
sufficient provenance-audit artifact (no sidecar manifest required).

Contract: contracts/apr-provenance-v1.yaml (C-APR-PROVENANCE v1.0.0,
ACTIVE, kind: schema). 3 invariants + 3 gates + 3 failure modes, all
bound to AC-SHIP2-012 / FALSIFY-SHIP-022. pv validate PASS.

Code changes:
- AprV2Metadata: add data_source + data_license as named Option<String>
  fields (not buried in custom HashMap). No skip_serializing_if, so JSON
  round-trips them as null when None (FM-APR-PROV-SILENT-SKIP).
- apr inspect MetadataInfo: mirror all 3 provenance fields, also with
  no skip_serializing_if.
- apr inspect text output: new "Provenance:" block via pure helper
  format_provenance_block() — always emits all 3 keys, renders None as
  literal "(missing)".
- Two struct-literal construction sites updated for new fields.

Harness tests (5 passing):
- aprender-core:
  - falsify_ship_022_apr_metadata_provenance_round_trip
  - falsify_ship_022_inspect_emits_provenance_keys (JSON null half)
  - falsify_ship_022_partial_provenance_round_trip
- apr-cli:
  - falsify_ship_022_inspect_emits_provenance_keys (MetadataInfo JSON)
  - falsify_ship_022_inspect_missing_renders_as_missing (text half)
  - falsify_ship_022_inspect_populated_renders_values

Smoke test: apr inspect on existing .apr (no provenance stored)
correctly emits:
  Provenance:
    license: (missing)
    data_source: (missing)
    data_license: (missing)

cargo fmt + cargo clippy (aprender-core, apr-cli) clean.
3239 aprender-core format tests PASS, 85 apr-cli inspect tests PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(ship-two): v2.20.0 amendment — FALSIFY-SHIP-021 + FALSIFY-SHIP-022 DISCHARGED

Documents two MODEL-2 ship gates closed in the post-v2.19 evidence window:

1. FALSIFY-SHIP-021 (AC-SHIP2-011) — seed=0 × 100-step reproducibility
   harness + counter-test seed=0 vs seed=1 divergence proof. Root cause
   of original flake (sibling test racing on global INIT_SEED atomic)
   fixed via lock_init_seed(seed) -> MutexGuard. Contract
   training-loop-pretrain-v1.yaml bumped 1.0.0 → 1.1.0 ACTIVE.
   Commit 0b8ca8c, task #112.

2. FALSIFY-SHIP-022 (AC-SHIP2-012) — apr inspect provenance block
   (license + data_source + data_license) shipped. AprV2Metadata
   extended with 2 named Option<String> fields; no skip_serializing_if
   (FM-APR-PROV-SILENT-SKIP guard). Pure helper format_provenance_block
   replaces stdout-capture in tests (gag is NOT parallel-safe).
   New contract apr-provenance-v1.yaml (C-APR-PROVENANCE v1.0.0
   ACTIVE, kind: schema). pv validate PASS. Commit 8f0607d,
   task #113.

Combined status: 2/12 AC-SHIP2 gates DISCHARGED. Remaining 10 block
on 370M compute-dispatch (the long-pole from v2.19.0).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): FALSIFY-SHIP-011 llama-370m sovereign contract ACTIVE (AC-SHIP2-001)

Discharges FALSIFY-SHIP-011 / AC-SHIP2-001 — MODEL-2 370M architectural
contract registered AND byte-equally bound to the Rust scaffold that
aprender-train consumes.

Contract lift:
- contracts/model-families/llama-370m-sovereign-v1.yaml
  - version 1.0.0 → 1.1.0
  - status PROPOSED → ACTIVE
  - GATE-ARCH-370M-001 gains evidence_discharged_by (4 entries) and
    ship_blocking: true
  - changelog block added documenting the v1.1.0 discharge

Harness tests (crates/aprender-train/src/models/llama_370m.rs):
- `falsify_ship_011_rust_scaffold_matches_yaml_contract` — loads the
  contract via include_str! (compile-time-embedded, no path deps at
  runtime) and asserts every architecture.* and constraints.* key
  matches the corresponding Llama370MConfig::* const byte-equally
- `falsify_ship_011_sovereign_contract_is_active` — asserts status ==
  ACTIVE (a PROPOSED contract cannot gate a ship)

Test run: 6/6 aprender-train::models::llama_370m tests PASS (4 pre-
existing + 2 new). pv validate on contract: 0 errors, 0 warnings.

Why this discharge is strong:
- Rust scaffold already encodes INV-ARCH-370M-002..008 as compile-time
  `const _: () = Llama370MConfig::validate();` — a drift of any value
  fails `cargo build`, not just `cargo test`
- The new YAML-vs-Rust binding test adds the missing half: drift of a
  YAML key that the Rust scaffold doesn't mirror is now also caught at
  test time, preventing the MODEL-1-v2 QLoRA class of recipe/artifact
  drift (rank=16 actual vs rank=32 recipe — see
  project_ship_two_001_model1_qlora_divergence.md)
- INV-ARCH-370M-001 (param count band) is discharged by the existing
  `estimated_param_count_within_contract_band` test
- INV-ARCH-370M-009 (row-major layout) is discharged by
  aprender::format::layout_contract at APR load time

Combined MODEL-2 status after this commit: 3/12 AC-SHIP2 gates
DISCHARGED (001, 011, 012). Remaining 9 (002–010) still block on
actual 370M training compute-dispatch — the pretrain loop driver from
v2.19.0 is ready to exercise them once the weights exist.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): FALSIFY-SHIP-012 algorithm-level PARTIAL discharge (AC-SHIP2-002)

Bumps C-TOK-BPE to v1.1.0 and wires evidence_discharged_by into
GATE-BPE-003 pointing at 3 existing harness tests in
crates/apr-cli/tests/falsify_ship_012_tokenizer_roundtrip.rs and
the emitted evidence JSON at
evidence/ship-two-001/model-2/falsify-ship-012-tokenizer-roundtrip.json.

Status intentionally stays PROPOSED. The gate requires 10K-doc
byte-exact round-trip on The Stack v2 Python holdout; task #91 shipped
the ingest scaffold (corpus-ingest dry-run CLI) but the 10K fixture
itself is not yet materialized — so this lands as PARTIAL_ALGORITHM_LEVEL
discharge with full_discharge_blocks_on: task #91 data.

What passes algorithm-level today (all 3 tests green at commit time):
- falsify_ship_012_tokenizer_roundtrip_byte_exact — decode(encode(nfc(doc)))
  byte-equals nfc(doc) on every doc in a 20-doc synthetic Python-like
  holdout (ASCII keywords + Unicode identifiers + docstrings + emoji +
  combining marks). Hard-asserts evidence.docs_failed == 0 — regressions
  reintroducing whitespace splitting or dropping the byte encoder panic.
- falsify_ship_012_nfc_idempotence_only — INV-BPE-005 standalone: nfc(nfc(x))
  byte-equals nfc(x) on every holdout doc.
- falsify_ship_012_train_corpus_sanity — train/holdout set disjointness
  plus minimum corpus sizes (>=20 docs each).

When task #91's 10K Stack-v2 Python holdout lands the fixture swap is
data-only: the harness module doc-comment already flagged this path so
no test rewrite will be required.

Evidence: evidence/ship-two-001/model-2/falsify-ship-012-tokenizer-roundtrip.json
(20/20 passed, nfc_idempotent: true, vocab_size_trained: 489/512).

Verification:
- pv validate contracts/tokenizer-bpe-v1.yaml -> 0 errors, 0 warnings
- cargo test -p apr-cli --test falsify_ship_012_tokenizer_roundtrip -> 3/3 passed

Bound to: AC-SHIP2-002 (ship-two-models-spec §5).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): FALSIFY-SHIP-015 algorithm-level PARTIAL discharge (AC-SHIP2-005)

Bumps C-LLAMA-370M-SOVEREIGN v1.1.0 → v1.2.0 and wires
evidence_discharged_by into GATE-ARCH-370M-003 (the param-count gate that
binds AC-SHIP2-005 via FALSIFY-SHIP-015). Contract stays ACTIVE — the
FALSIFY-SHIP-011 discharge (v1.1.0) is what gates the ACTIVE promotion,
not SHIP-015.

GATE-ARCH-370M-003's evidence_required asks for
  apr inspect --json model.apr | jq '.param_count' ∈ [366M, 374M]
on a real 370M `.apr` checkpoint. That file does not exist yet — it
blocks on AC-SHIP2-003/004 pretraining compute-dispatch. Rather than
leave the gate's evidence blank, this commit wires the algorithm-level
proof that already exists:

- estimated_param_count() / estimated_stored_param_count() — const fn
  over Llama370MConfig::*, so the count is computed at compile time.
- estimated_param_count_within_contract_band (unit test) hard-asserts:
    * p ∈ [PARAMETERS_MIN=366M, PARAMETERS_MAX=374M]  (INV-ARCH-370M-001)
    * |p − 370M| / 370M < 5%                          (tighter sanity)
    * p − stored == VOCAB_SIZE × HIDDEN_DIM           (tied embeddings)

Any edit to Llama370MConfig that moves the count out of the
INV-ARCH-370M-001 band fails `cargo test -p aprender-train --lib
llama_370m` — before any compute runs.

The gate now carries:
  discharge_status: PARTIAL_ALGORITHM_LEVEL
  full_discharge_blocks_on: "real 370M .apr checkpoint from pretraining
                             compute-dispatch (AC-SHIP2-003/004)"
  ship_blocking: true

so the data-scale gap is first-class contract state, not an unspoken
assumption.

Verification:
- pv validate contracts/model-families/llama-370m-sovereign-v1.yaml
  -> 0 errors, 0 warnings
- cargo test -p aprender-train --lib models::llama_370m
  -> 6/6 passed (including the newly-cited
     estimated_param_count_within_contract_band and the pre-existing
     falsify_ship_011_* pair)

MODEL-2 AC-SHIP2 ledger after this: 3/12 fully ACTIVE (001, 011, 012)
+ 2/12 PARTIAL (002 via SHIP-012, 005 via SHIP-015) = 5/12 touched.
Remaining 7 (003/004/006/007/008/009/010) block on 370M compute.

Bound to: AC-SHIP2-005 (ship-two-models-spec §5).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(ship-two-001): spec v2.21.0 — FALSIFY-SHIP-011 DISCHARGED + SHIP-012/015 PARTIAL

Captures the three evidence-wiring commits landed on
chore/post-v2.19-evidence since v2.20.0:

1. FALSIFY-SHIP-011 (AC-SHIP2-001) DISCHARGED at 338c6eb (task #114)
   C-LLAMA-370M-SOVEREIGN v1.0.0 PROPOSED -> v1.1.0 ACTIVE.
   Rust-YAML byte-equality binding via include_str! + serde_yaml::Value.

2. FALSIFY-SHIP-012 (AC-SHIP2-002) PARTIAL_ALGORITHM_LEVEL at 2e8b8b8
   (task #115). C-TOK-BPE v1.0.0 -> v1.1.0 stays PROPOSED.
   3 tokenizer harness tests wired; full discharge blocks on task #91
   10K Stack-v2 Python holdout (fixture-swap is data-only).

3. FALSIFY-SHIP-015 (AC-SHIP2-005) PARTIAL_ALGORITHM_LEVEL at bfb8831
   (task #116). Sovereign contract v1.1.0 -> v1.2.0 stays ACTIVE.
   estimated_param_count_within_contract_band + const fns wired;
   full discharge blocks on real 370M .apr from compute-dispatch.

Also codifies the PARTIAL_ALGORITHM_LEVEL pattern as a first-class
spec concept: when a gate's evidence_required describes a
production-scale check that is not yet runnable but the underlying
invariant is provable today at algorithm/compile/unit-test level,
wire the algorithm proofs and carry discharge_status +
partial_discharge_note + full_discharge_blocks_on + ship_blocking=true
to make the data gap first-class contract state.

MODEL-2 ship-gate status after v2.21.0: 3/12 fully ACTIVE (001, 011,
012) + 2/12 PARTIAL_ALGORITHM_LEVEL (002, 005) = 5/12 touched (~42%).
Remaining 7 block on real 370M compute-dispatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): FALSIFY-SHIP-019 algorithm-level PARTIAL discharge (AC-SHIP2-009)

GATE-ARCH-370M-004 gains evidence_discharged_by + discharge_status:
PARTIAL_ALGORITHM_LEVEL. Three algorithm-level invariants wired without
training:

  1. Coverage — every 370M tensor (219 entries: 1 embed + 1 lm_head +
     9 per-layer × 24 layers + 1 final norm) resolves to a
     TensorContract entry in LayoutContract::new(). Pattern-normalises
     per-layer names; any uncovered tensor would be silently skipped
     by GGUF export.

  2. Row-major ordering (INV-ARCH-370M-009) — every 2D shape is
     [out_dim, in_dim]. Pinned lm_head/embed/q_proj/k_proj shapes
     verify GQA (k_proj = [kv_heads*head_dim, hidden]) and bind the
     370M architecture to the GH-202-regression-proof layout.

  3. Critical-tensor enforcement — validate_apr_shape accepts
     [vocab, hidden] AND rejects reversed [hidden, vocab] on
     lm_head.weight. Proves the validator catches layout bugs, not
     just passes silently.

Full discharge (GGUF cosine-parity on trained 370M, max_logit_cosine
≤ 1e-3 over 100 canary prompts) blocks on compute-dispatch
(AC-SHIP2-003/004). Harness is fixture-swap-ready once a trained .apr
exists — no test rewrite needed. Spec §9 Risk #2 names this exact
mitigation path.

Contract: llama-370m-sovereign-v1.yaml v1.2.0 → v1.3.0, stays ACTIVE.
Tests: 2 new test fns in crates/aprender-train/src/models/llama_370m.rs
(8/8 pass). `pv validate` = 0 errors, 0 warnings.

Closes #117. Binds to AC-SHIP2-009 / FALSIFY-SHIP-019.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(ship-two-001): v2.22.0 — FALSIFY-SHIP-019 PARTIAL discharge capstone

Records the SHIP-019 algorithm-level PARTIAL discharge (task #117,
commit 846cc1d) in the authoritative spec:

- Version bump 2.21.0 → 2.22.0
- Full amendment block #4 under post-v2.19 evidence window documenting
  GATE-ARCH-370M-004 wired to `layout_contract.rs` algorithm proofs
  (219-tensor coverage + row-major ordering + GH-202 rejection)
- New "counter-example hunting" pattern lesson: prior "exhausted
  PARTIAL levers" verdict was ~86% correct; re-running the 7-gate
  FALSIFY-SHIP survey with explicit counter-example hunting found
  exactly one genuine lever (SHIP-019). SHIP-017/018/020 need compute;
  SHIP-013/014/016 collapse into SHIP-011 wiring.
- Combined MODEL-2 ledger: 3/12 fully ACTIVE + 3/12 PARTIAL = 6/12
  touched (50%). Remaining 6 (003/004/006/007/008/010) all require
  real 370M compute, trained .apr + eval harness, or RTX 4090
  wall-clock benchmark. Genuine algorithm-level PARTIAL harvesting for
  MODEL-2 is now exhausted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(publish): mark 5 QA harness crates publish = false + document policy

Evidence: aprender-qa-{cli,gen,runner,report,certify} have never been
published to crates.io (verified against crates.io API 2026-04-19).
They are reached through `apr qa` (the user-facing binary), not through
`cargo add`, so marking them publish = false prevents accidental
version-bump-with-no-publish drift across the workspace.

Spec §A.12 rewritten from the stale "63 crates (49 published + 14 internal)"
snapshot to the real 80-crate layout: 9 publish = false (4 benchmarks/xtask
+ 5 QA harness) plus 71 publishable. §A.12.1 codifies publishing policy:
three opt-out categories (benchmarks, xtask, QA harness), and the rule
that a v0.31.0-style release does NOT require cargo publish across all
80 crates — crates.io publish is selective (via cargo workspaces publish
--from-git or cargo publish -p <name>), workspace-wide tag/release is not.

Verified: cargo check --workspace clean after the flip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(mcp-spec): refresh header — M1–M3 SHIPPED in v0.31.0, M4 in flight

Five-whys on the stale 2026-04-17 draft status:
1. Why stale? Spec said "DRAFT (pre-implementation)" + target "v0.32.0"
   but M1–M3 actually shipped in v0.31.0 on 2026-04-19 (tag 62893da).
2. Why not refreshed? M1–M3 landed across multiple PRs without a
   spec-header refresh pass.
3. Why is that a problem? New contributors reading the spec think MCP
   is unshipped — contradicted by `cargo install aprender` already
   exposing `apr mcp` with 9 tools.
4. Root cause: spec headers are not on the release checklist.
5. Fix here: update status to ACTIVE, version to 1.2.0, delivery line
   to "v0.31.0 M1–M3 SHIPPED / M4 in flight (PRs #886-892)". No body
   changes — architecture/tool-surface/protocol sections are still
   accurate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(publish): mark aprender-viz-ttop publish = false + 4th category

Evidence: `aprender-viz-ttop` has never been published to crates.io
(release workflow explicitly never invokes `cargo publish` for it).
Its `description` field calls it a "Terminal Top: 10X better than btop"
system monitor — ships as a binary subcommand inside the `apr` facade,
not as a library dependency.

Five-whys:
1. Why flip it? Because it's a bundled binary, not a library.
2. Why does that matter? `cargo add aprender-viz-ttop` would mislead
   library authors into taking a user-facing TUI as a dep.
3. Why wasn't it already flipped? It predated the A.12 policy audit
   performed in 42907db.
4. Why a 4th category? Benchmarks / xtask / QA harness all leave
   outputs as artifacts; this one ships a runnable subcommand. The
   distinction matters because `apr cbtop` dispatches to it.
5. Why document it? To prevent a future reader from re-opening the
   "publish all 80 crates" question when we only publish ~70.

Changes:
- crates/aprender-viz-ttop/Cargo.toml: add `publish = false`
- docs/specifications/aprender-monorepo-consolidation.md:
  - §A.12: add viz-ttop to internal-crates table (10 rows)
  - §A.12.1: add 4th category (Bundled binaries); update total to
    "10 opted out / 70 publishable"; remove stale "Candidates to
    migrate" paragraph (superseded by 42907db + this commit)

Refs: APR-MONO, PR #901

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 19, 2026
#902)

* evidence(ship-two-001): MODEL-2 pretrain smoke test — task #105 discharge

Records the end-to-end synthetic drive of `apr pretrain` on commit 1e7cf53
(now landed on main at 9209383 via PR #882 merge). Verifies task #105
deliverable: GATE-TRAIN-005 / INV-TRAIN-007 / GATE-TRAIN-008 wiring is
functional end-to-end.

Run: 20 steps, 4 epochs, batch=4, seq=128 — val_loss monotone 3.96 → 2.64.

Synthetic drive caveat: no real 370M forward pass, no real corpus read, no
checkpoint artifacts written yet. Real corpus + checkpoint wiring tracked as
task #111.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* spec(model-2): MVP plan for task #111 (pretrain real corpus + checkpoint)

7-step edit list from Plan agent afd391d1eb1395d30 against post-#882-merge commit
9209383. Identifies 5 critical files (pretrain.rs, apr-cli/commands/pretrain.rs,
trainer.rs, transformer/model.rs, io/save.rs) and 5 binary acceptance criteria
(AC-111-001..005). Host assignment: lambda-labs (impl), yoga (8GB smoke),
gx10 (parity).

Non-goals explicitly deferred: async H2D streaming, full corpus-ingest pipeline,
mixed-precision scaler tuning, distributed training, convergence budget, resume
round-trip, nvml telemetry, apr qa post-hoc validators.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* evidence(ship-two-001): yoga parity smoke — GATE-TRAIN-006 discharged

Cross-host byte-identical loss history on yoga RTX 4060 Laptop (8GB):
  lambda-labs: [3.96, 3.52, 3.08, 2.64]
  yoga:        [3.96, 3.52, 3.08, 2.64]

Discharges GATE-TRAIN-006 (seed=42 deterministic) across x86_64 RTX 4090 ↔
x86_64 RTX 4060 Laptop. Same synthetic drive — task #111 MVP will add the
real 370M forward pass; yoga stays as 8GB smoke-test host per MVP plan's
host assignment table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): RealStepFn/RealValFn + shard reader (task #111 steps 1-3)

Implements MODEL-2 pretrain MVP plan steps 1-3: the model-agnostic
PretrainLoop now has a real-corpus driver that runs a full forward +
backward + AdamW step through TransformerTrainer against the 370M
Llama scaffold — replacing the LinearDecaySynthetic/ScriptedVal pair
used for GATE-TRAIN-005/006/007/008 wiring verification in task #105.

**New modules**

- `train::shard_reader::ShardBatchIter`
  Streaming iterator over .bin token shards (little-endian u32).
  Reads seq_length+1 sequences, chunks into LMBatch of batch_size.
  Empty-dir errors; lexical shard ordering; EOF auto-advances to next
  shard. No MinHash dedup / PII scrub / license filter — those belong
  to `apr-corpus-ingest run`.

- `train::pretrain_real::{RealStepFn, RealValFn, build_shared_trainer}`
  - `llama_370m_transformer_config()` field-for-field from the frozen
    Llama370MConfig constants (INV-ARCH-370M-001..008 source of truth)
  - `llama_370m_train_config(lr, seq_length, seed)` builds
    TransformerTrainConfig with MODEL-2 v2-remedy defaults
  - `SharedTrainer = Rc<RefCell<TransformerTrainer>>` so both the
    mutable StepFn and the forward-only ValFn own the same model
  - `RealStepFn::step` pulls one LMBatch, runs train_batch, returns
    (loss, grad_norm=1.0 placeholder). Exhausted iterator returns a
    finite (1.0, 1.0) so GATE-TRAIN-007 (NaN/Inf) does not mis-fire
    on shard-stream EOF before the loop plans to stop.
  - `RealValFn::validate` runs forward-only across a held-out Vec,
    returns mean cross-entropy loss (or NaN if held-out is empty).
  - `build_shared_trainer` runs INV-ARCH-370M-001 as a debug_assert
    (param count must land in [366M, 374M]) so any drift in the
    Llama370MConfig constants fails the instant a dev build compiles.

**Contract coverage**

Existing `contracts/training-loop-pretrain-v1.yaml` covers all MVP
obligations already; no new contract needed. Task #111 follow-up will
add per-epoch APR checkpoint hooks (C-TRAIN-PRETRAIN INV-TRAIN-002)
and real optimizer-state sha256 (INV-TRAIN-003).

**Tests**

- shard_reader: single_shard_yields_expected_batch_count,
  empty_dir_errors, multi_shard_ordering_is_lexical
- pretrain_real: transformer_config_matches_llama_370m_constants,
  real_step_fn_exhausted_iterator_returns_finite_placeholder,
  real_val_fn_empty_held_out_returns_nan

All 6 new tests PASS. Steps 4-7 (SafeTensors→APR swap, `apr pretrain`
CLI wiring, real grad_norm, checkpoint hook) to follow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): wire real-corpus drive into apr pretrain (task #111 step 5)

Replaces the `if !synthetic { return Err(...) }` guard with a real
branch: build a shared 370M `TransformerTrainer`, split the shard
stream head-off into a `HELD_OUT_BATCHES`-entry validation set, and
drive the `PretrainLoop` with `RealStepFn`/`RealValFn` (from
`entrenar::train::pretrain_real`) against a `ShardBatchIter`.

**Structure**

- `run` is now a 2-branch dispatcher. `drive_synthetic` preserves the
  deterministic decay drive used for GATE-TRAIN-005/006/007/008 wiring
  verification (task #105). `drive_real` is the new real-corpus path.
- Both branches funnel into `run_and_report<S, V>` which owns the
  `PretrainLoop::new` + `run` + `report` sequence so the terminal
  status propagation (→ exit code) stays single-sourced.

**MVP invariants (documented)**

- `HELD_OUT_BATCHES = 2` — small constant; follow-up will plumb an
  explicit `--val-shards` flag so training and held-out shards are
  disjoint.
- `pad_id = eos_id = 0` — uniform-length sequences take the shared
  layout in `LMBatch::from_sequences`, so pad_id is never used; the
  real tokenizer's special-token ids plumb through in a follow-up.
- Empty dataset dir → `CliError::ValidationFailed` (shard iterator
  init failure), covered by the new test
  `real_mode_empty_dataset_dir_errors`.

**Test changes**

- `real_mode_empty_dataset_dir_errors` replaces the now-obsolete
  `synthetic_mode_false_rejected` test. Both synthetic and validation
  tests continue to pass (3/3 in `commands::pretrain::tests`).

**Remaining MVP steps (task #111)**

- Step 4: swap SafeTensors → APR in `trainer.rs` checkpoint writer.
- Step 6: real optimizer-state sha256 over AdamW m/v/t (INV-TRAIN-003).
- Step 7: per-epoch checkpoint hook in `PretrainLoop::run_epoch`
  post-gate-pass (C-TRAIN-PRETRAIN INV-TRAIN-002).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): CPU save_apr + per-epoch checkpoint hook (task #111 steps 4+7)

Steps 4 and 7 of the MODEL-2 pretrain MVP (SHIP-TWO-001 v2.19.0):

Step 4 — CPU save_apr
- Add `TransformerTrainer::save_apr(path, name, arch)` in
  crates/aprender-train/src/train/transformer_trainer/trainer.rs,
  mirroring the existing CudaTransformerTrainer::save_apr. Emits a
  sovereign row-major .apr via aprender's Model + SaveConfig::Apr.
- Existing `save()` (SafeTensors) left unchanged — three tests at
  trainer/core.rs:388,409 and tests.rs:423 still round-trip via
  safetensors for backward compat.
- Test `save_apr_writes_readable_apr_file`: write a tiny-config
  trainer, open with `AprReader`, assert APR magic (APR\0 / APRN),
  assert `architecture` metadata round-trips, assert
  `model.embed_tokens.weight` readable as f32. PASSES.

Step 7 — per-epoch APR checkpoint hook
- Add `pub trait CheckpointFn` in train/pretrain.rs:
    `fn save(&mut self, epoch, &EpochArtifact) -> Result<(), String>`
- Add `Option<Box<dyn CheckpointFn>>` field to `PretrainLoop` +
  builder method `with_checkpoint_fn`. Keeps PretrainLoop<S,V>
  at two generics (synthetic + real call-sites unify).
- Wire into `run_epoch` AFTER `check_non_divergence(...)?` passes,
  BEFORE `epoch_artifacts.push()`. Aborted epochs never produce
  checkpoint files (per contract `per_epoch_artifacts` invariant).
  Write failures log eprintln but are non-fatal — a flaky disk
  cannot lose training progress.
- Emit companion `metadata.json` (contract path_template).

Real-corpus wiring
- Add `AprCheckpointFn` in train/pretrain_real.rs holding the shared
  `Rc<RefCell<TransformerTrainer>>`; its `save()` delegates to
  `trainer.save_apr()` so the three hooks (RealStepFn, RealValFn,
  AprCheckpointFn) see the same in-memory weights.
- Re-export `CheckpointFn` from train/mod.rs.

CLI
- `apr pretrain` --real path (drive_real): construct
  `build_shared_trainer` once, clone Rc into RealStepFn +
  RealValFn + AprCheckpointFn, pass to `run_and_report`.
- `run_and_report` takes `Option<Box<dyn CheckpointFn>>`; synthetic
  branch passes `None` (no real weights to save).

Tests (all green, 21 pretrain + 4 pretrain_real/save_apr + 3 CLI)
- `pretrain_loop_calls_checkpoint_fn_once_per_passing_epoch`:
  mock `CheckpointFn` counts calls. Every successful epoch fires
  exactly one call; companion metadata.json written to disk.
- `pretrain_loop_skips_checkpoint_on_abort`: NaN step forces
  abort; mock hook recorded zero calls.
- `save_apr_writes_readable_apr_file`: magic + metadata + tensor
  round-trip via AprReader.

Contract discharge
- GATE-TRAIN-005 invariant preserved: checkpoint placement AFTER
  divergence guard means aborted epochs never touch disk.
- training-loop-pretrain-v1 `per_epoch_artifacts.path_template`
  honored: `{run_dir}/ckpt/epoch-{N:03d}.apr` + `.metadata.json`.

Deferred (Step 6)
- `fake_optimizer_sha(epoch)` at pretrain.rs:680 still returns a
  placeholder. INV-TRAIN-003 discharge needs TransformerTrainer
  to expose AdamW m/v/t buffers for a real sha256. Separate step.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): real AdamW optimizer-state sha256 (task #111 step 6)

INV-TRAIN-003 discharge for the MODEL-2 pretrain MVP.

TransformerTrainer::optimizer_state_sha256()
- New accessor in crates/aprender-train/src/train/transformer_trainer/trainer.rs
  that hashes (t, m_buffers, v_buffers) in fixed order.
- Uses sha2::Sha256 + bytemuck::cast_slice over each Array1<f32>.
- Versioned tag "aprender-train:adamw:optstate:v1" prefixes the
  digest so schema changes are loud, not silent.
- Uninitialized slots hash to the literal "none" so missing m[i]
  is semantically distinct from an all-zeros m[i].

StepFn trait extension
- Add `fn optimizer_state_sha256(&self) -> Option<String>` with
  default `None`. Synthetic harnesses keep returning None and
  continue using the `fake_optimizer_sha` epoch/seed fallback.
- `PretrainLoop::run_epoch` now reads `step_fn.optimizer_state_sha256()`
  and falls back to the fake fingerprint only when None.

RealStepFn override
- RealStepFn in pretrain_real.rs implements the new hook by
  delegating to `trainer.borrow().optimizer_state_sha256()`, so
  the real-corpus path records the actual AdamW digest.

Tests (all 25 + 3 green)
- `optimizer_state_sha256_is_hex_digest_on_fresh_trainer`: 64-char
  lowercase hex shape check on an un-stepped trainer.
- `optimizer_state_sha256_is_stable_across_fresh_trainers`: two
  fresh trainers hash to the same digest (reproducibility).
- `pretrain_loop_uses_step_fn_optimizer_sha_when_available`:
  a StepFn with override wins over fake_optimizer_sha.
- `pretrain_loop_falls_back_to_fake_optimizer_sha_for_synthetic`:
  default impl still produces a 64-char hex digest via fallback.

Task #111 MVP status
- Steps 1-3 shipped in commit b2b0329
- Step 5 shipped in commit e5a2f02
- Steps 4+7 shipped in commit 89db4b3
- Step 6 shipped in this commit
- All 7 steps of the task #111 plan are now committed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): FALSIFY-SHIP-021 seed=0 × 100-step reproducibility harness

Discharges GATE-TRAIN-006 / INV-TRAIN-006 from training-loop-pretrain-v1
(bumped 1.0.0 → 1.1.0 PROPOSED → ACTIVE).

Two new Rust tests in crates/aprender-train/src/train/transformer_trainer/tests.rs:
- falsify_ship_021_seed_0_100_step_reproducibility: two trainers built with
  seed=0 produce identical finite losses for 100 consecutive train_batch
  calls (|Δ| ≤ 1e-6) AND identical AdamW optimizer_state_sha256 digests.
- falsify_ship_021_different_seeds_do_diverge: seed=0 vs seed=1 counter-test
  must diverge > 1e-4 within 10 steps (guards against degenerate "always
  equal" implementations).

Seed plumbing fixes:
- TransformerTrainer::new now calls lock_init_seed(config.seed) before
  Transformer::new so direct (non-YAML) callers honor the configured seed
  instead of silently inheriting the global default of 42.
- transformer::init::INIT_SEED_LOCK (std::sync::Mutex) + lock_init_seed
  helper returning a #[must_use] MutexGuard. Held across the full
  Transformer::new call so cargo test's default parallel runner cannot
  clobber the global atomic INIT_SEED between one test's set_init_seed
  and another test's weight-init reads. Poisoned mutex is recovered
  transparently (seed itself is atomic; poison only signals prior panic).

Contract uplift (contracts/training-loop-pretrain-v1.yaml v1.1.0):
- status PROPOSED → ACTIVE
- INV-TRAIN-006 gains harness: block naming both test paths + assertions
- GATE-TRAIN-006 gains evidence_discharged_by: pointing to both tests
- metadata.changelog entry recording the discharge

Verification:
  cargo test -p aprender-train --lib falsify_ship_021 → 2 passed
  cargo clippy -p aprender-train --lib --no-deps -- -D warnings → clean
  pv validate contracts/training-loop-pretrain-v1.yaml → 0 errors

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ship-two): FALSIFY-SHIP-022 apr inspect provenance (AC-SHIP2-012)

Discharges FALSIFY-SHIP-022: apr inspect surfaces license + data_source
+ data_license on every .apr, with "(missing)" / null rendering when a
field is absent rather than silent skip. Makes a .apr binary a
sufficient provenance-audit artifact (no sidecar manifest required).

Contract: contracts/apr-provenance-v1.yaml (C-APR-PROVENANCE v1.0.0,
ACTIVE, kind: schema). 3 invariants + 3 gates + 3 failure modes, all
bound to AC-SHIP2-012 / FALSIFY-SHIP-022. pv validate PASS.

Code changes:
- AprV2Metadata: add data_source + data_license as named Option<String>
  fields (not buried in custom HashMap). No skip_serializing_if, so JSON
  round-trips them as null when None (FM-APR-PROV-SILENT-SKIP).
- apr inspect MetadataInfo: mirror all 3 provenance fields, also with
  no skip_serializing_if.
- apr inspect text output: new "Provenance:" block via pure helper
  format_provenance_block() — always emits all 3 keys, renders None as
  literal "(missing)".
- Two struct-literal construction sites updated for new fields.

Harness tests (5 passing):
- aprender-core:
  - falsify_ship_022_apr_metadata_provenance_round_trip
  - falsify_ship_022_inspect_emits_provenance_keys (JSON null half)
  - falsify_ship_022_partial_provenance_round_trip
- apr-cli:
  - falsify_ship_022_inspect_emits_provenance_keys (MetadataInfo JSON)
  - falsify_ship_022_inspect_missing_renders_as_missing (text half)
  - falsify_ship_022_inspect_populated_renders_values

Smoke test: apr inspect on existing .apr (no provenance stored)
correctly emits:
  Provenance:
    license: (missing)
    data_source: (missing)
    data_license: (missing)

cargo fmt + cargo clippy (aprender-core, apr-cli) clean.
3239 aprender-core format tests PASS, 85 apr-cli inspect tests PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(ship-two): v2.20.0 amendment — FALSIFY-SHIP-021 + FALSIFY-SHIP-022 DISCHARGED

Documents two MODEL-2 ship gates closed in the post-v2.19 evidence window:

1. FALSIFY-SHIP-021 (AC-SHIP2-011) — seed=0 × 100-step reproducibility
   harness + counter-test seed=0 vs seed=1 divergence proof. Root cause
   of original flake (sibling test racing on global INIT_SEED atomic)
   fixed via lock_init_seed(seed) -> MutexGuard. Contract
   training-loop-pretrain-v1.yaml bumped 1.0.0 → 1.1.0 ACTIVE.
   Commit 0b8ca8c, task #112.

2. FALSIFY-SHIP-022 (AC-SHIP2-012) — apr inspect provenance block
   (license + data_source + data_license) shipped. AprV2Metadata
   extended with 2 named Option<String> fields; no skip_serializing_if
   (FM-APR-PROV-SILENT-SKIP guard). Pure helper format_provenance_block
   replaces stdout-capture in tests (gag is NOT parallel-safe).
   New contract apr-provenance-v1.yaml (C-APR-PROVENANCE v1.0.0
   ACTIVE, kind: schema). pv validate PASS. Commit 8f0607d,
   task #113.

Combined status: 2/12 AC-SHIP2 gates DISCHARGED. Remaining 10 block
on 370M compute-dispatch (the long-pole from v2.19.0).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): FALSIFY-SHIP-011 llama-370m sovereign contract ACTIVE (AC-SHIP2-001)

Discharges FALSIFY-SHIP-011 / AC-SHIP2-001 — MODEL-2 370M architectural
contract registered AND byte-equally bound to the Rust scaffold that
aprender-train consumes.

Contract lift:
- contracts/model-families/llama-370m-sovereign-v1.yaml
  - version 1.0.0 → 1.1.0
  - status PROPOSED → ACTIVE
  - GATE-ARCH-370M-001 gains evidence_discharged_by (4 entries) and
    ship_blocking: true
  - changelog block added documenting the v1.1.0 discharge

Harness tests (crates/aprender-train/src/models/llama_370m.rs):
- `falsify_ship_011_rust_scaffold_matches_yaml_contract` — loads the
  contract via include_str! (compile-time-embedded, no path deps at
  runtime) and asserts every architecture.* and constraints.* key
  matches the corresponding Llama370MConfig::* const byte-equally
- `falsify_ship_011_sovereign_contract_is_active` — asserts status ==
  ACTIVE (a PROPOSED contract cannot gate a ship)

Test run: 6/6 aprender-train::models::llama_370m tests PASS (4 pre-
existing + 2 new). pv validate on contract: 0 errors, 0 warnings.

Why this discharge is strong:
- Rust scaffold already encodes INV-ARCH-370M-002..008 as compile-time
  `const _: () = Llama370MConfig::validate();` — a drift of any value
  fails `cargo build`, not just `cargo test`
- The new YAML-vs-Rust binding test adds the missing half: drift of a
  YAML key that the Rust scaffold doesn't mirror is now also caught at
  test time, preventing the MODEL-1-v2 QLoRA class of recipe/artifact
  drift (rank=16 actual vs rank=32 recipe — see
  project_ship_two_001_model1_qlora_divergence.md)
- INV-ARCH-370M-001 (param count band) is discharged by the existing
  `estimated_param_count_within_contract_band` test
- INV-ARCH-370M-009 (row-major layout) is discharged by
  aprender::format::layout_contract at APR load time

Combined MODEL-2 status after this commit: 3/12 AC-SHIP2 gates
DISCHARGED (001, 011, 012). Remaining 9 (002–010) still block on
actual 370M training compute-dispatch — the pretrain loop driver from
v2.19.0 is ready to exercise them once the weights exist.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): FALSIFY-SHIP-012 algorithm-level PARTIAL discharge (AC-SHIP2-002)

Bumps C-TOK-BPE to v1.1.0 and wires evidence_discharged_by into
GATE-BPE-003 pointing at 3 existing harness tests in
crates/apr-cli/tests/falsify_ship_012_tokenizer_roundtrip.rs and
the emitted evidence JSON at
evidence/ship-two-001/model-2/falsify-ship-012-tokenizer-roundtrip.json.

Status intentionally stays PROPOSED. The gate requires 10K-doc
byte-exact round-trip on The Stack v2 Python holdout; task #91 shipped
the ingest scaffold (corpus-ingest dry-run CLI) but the 10K fixture
itself is not yet materialized — so this lands as PARTIAL_ALGORITHM_LEVEL
discharge with full_discharge_blocks_on: task #91 data.

What passes algorithm-level today (all 3 tests green at commit time):
- falsify_ship_012_tokenizer_roundtrip_byte_exact — decode(encode(nfc(doc)))
  byte-equals nfc(doc) on every doc in a 20-doc synthetic Python-like
  holdout (ASCII keywords + Unicode identifiers + docstrings + emoji +
  combining marks). Hard-asserts evidence.docs_failed == 0 — regressions
  reintroducing whitespace splitting or dropping the byte encoder panic.
- falsify_ship_012_nfc_idempotence_only — INV-BPE-005 standalone: nfc(nfc(x))
  byte-equals nfc(x) on every holdout doc.
- falsify_ship_012_train_corpus_sanity — train/holdout set disjointness
  plus minimum corpus sizes (>=20 docs each).

When task #91's 10K Stack-v2 Python holdout lands the fixture swap is
data-only: the harness module doc-comment already flagged this path so
no test rewrite will be required.

Evidence: evidence/ship-two-001/model-2/falsify-ship-012-tokenizer-roundtrip.json
(20/20 passed, nfc_idempotent: true, vocab_size_trained: 489/512).

Verification:
- pv validate contracts/tokenizer-bpe-v1.yaml -> 0 errors, 0 warnings
- cargo test -p apr-cli --test falsify_ship_012_tokenizer_roundtrip -> 3/3 passed

Bound to: AC-SHIP2-002 (ship-two-models-spec §5).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): FALSIFY-SHIP-015 algorithm-level PARTIAL discharge (AC-SHIP2-005)

Bumps C-LLAMA-370M-SOVEREIGN v1.1.0 → v1.2.0 and wires
evidence_discharged_by into GATE-ARCH-370M-003 (the param-count gate that
binds AC-SHIP2-005 via FALSIFY-SHIP-015). Contract stays ACTIVE — the
FALSIFY-SHIP-011 discharge (v1.1.0) is what gates the ACTIVE promotion,
not SHIP-015.

GATE-ARCH-370M-003's evidence_required asks for
  apr inspect --json model.apr | jq '.param_count' ∈ [366M, 374M]
on a real 370M `.apr` checkpoint. That file does not exist yet — it
blocks on AC-SHIP2-003/004 pretraining compute-dispatch. Rather than
leave the gate's evidence blank, this commit wires the algorithm-level
proof that already exists:

- estimated_param_count() / estimated_stored_param_count() — const fn
  over Llama370MConfig::*, so the count is computed at compile time.
- estimated_param_count_within_contract_band (unit test) hard-asserts:
    * p ∈ [PARAMETERS_MIN=366M, PARAMETERS_MAX=374M]  (INV-ARCH-370M-001)
    * |p − 370M| / 370M < 5%                          (tighter sanity)
    * p − stored == VOCAB_SIZE × HIDDEN_DIM           (tied embeddings)

Any edit to Llama370MConfig that moves the count out of the
INV-ARCH-370M-001 band fails `cargo test -p aprender-train --lib
llama_370m` — before any compute runs.

The gate now carries:
  discharge_status: PARTIAL_ALGORITHM_LEVEL
  full_discharge_blocks_on: "real 370M .apr checkpoint from pretraining
                             compute-dispatch (AC-SHIP2-003/004)"
  ship_blocking: true

so the data-scale gap is first-class contract state, not an unspoken
assumption.

Verification:
- pv validate contracts/model-families/llama-370m-sovereign-v1.yaml
  -> 0 errors, 0 warnings
- cargo test -p aprender-train --lib models::llama_370m
  -> 6/6 passed (including the newly-cited
     estimated_param_count_within_contract_band and the pre-existing
     falsify_ship_011_* pair)

MODEL-2 AC-SHIP2 ledger after this: 3/12 fully ACTIVE (001, 011, 012)
+ 2/12 PARTIAL (002 via SHIP-012, 005 via SHIP-015) = 5/12 touched.
Remaining 7 (003/004/006/007/008/009/010) block on 370M compute.

Bound to: AC-SHIP2-005 (ship-two-models-spec §5).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(ship-two-001): spec v2.21.0 — FALSIFY-SHIP-011 DISCHARGED + SHIP-012/015 PARTIAL

Captures the three evidence-wiring commits landed on
chore/post-v2.19-evidence since v2.20.0:

1. FALSIFY-SHIP-011 (AC-SHIP2-001) DISCHARGED at 338c6eb (task #114)
   C-LLAMA-370M-SOVEREIGN v1.0.0 PROPOSED -> v1.1.0 ACTIVE.
   Rust-YAML byte-equality binding via include_str! + serde_yaml::Value.

2. FALSIFY-SHIP-012 (AC-SHIP2-002) PARTIAL_ALGORITHM_LEVEL at 2e8b8b8
   (task #115). C-TOK-BPE v1.0.0 -> v1.1.0 stays PROPOSED.
   3 tokenizer harness tests wired; full discharge blocks on task #91
   10K Stack-v2 Python holdout (fixture-swap is data-only).

3. FALSIFY-SHIP-015 (AC-SHIP2-005) PARTIAL_ALGORITHM_LEVEL at bfb8831
   (task #116). Sovereign contract v1.1.0 -> v1.2.0 stays ACTIVE.
   estimated_param_count_within_contract_band + const fns wired;
   full discharge blocks on real 370M .apr from compute-dispatch.

Also codifies the PARTIAL_ALGORITHM_LEVEL pattern as a first-class
spec concept: when a gate's evidence_required describes a
production-scale check that is not yet runnable but the underlying
invariant is provable today at algorithm/compile/unit-test level,
wire the algorithm proofs and carry discharge_status +
partial_discharge_note + full_discharge_blocks_on + ship_blocking=true
to make the data gap first-class contract state.

MODEL-2 ship-gate status after v2.21.0: 3/12 fully ACTIVE (001, 011,
012) + 2/12 PARTIAL_ALGORITHM_LEVEL (002, 005) = 5/12 touched (~42%).
Remaining 7 block on real 370M compute-dispatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(model-2): FALSIFY-SHIP-019 algorithm-level PARTIAL discharge (AC-SHIP2-009)

GATE-ARCH-370M-004 gains evidence_discharged_by + discharge_status:
PARTIAL_ALGORITHM_LEVEL. Three algorithm-level invariants wired without
training:

  1. Coverage — every 370M tensor (219 entries: 1 embed + 1 lm_head +
     9 per-layer × 24 layers + 1 final norm) resolves to a
     TensorContract entry in LayoutContract::new(). Pattern-normalises
     per-layer names; any uncovered tensor would be silently skipped
     by GGUF export.

  2. Row-major ordering (INV-ARCH-370M-009) — every 2D shape is
     [out_dim, in_dim]. Pinned lm_head/embed/q_proj/k_proj shapes
     verify GQA (k_proj = [kv_heads*head_dim, hidden]) and bind the
     370M architecture to the GH-202-regression-proof layout.

  3. Critical-tensor enforcement — validate_apr_shape accepts
     [vocab, hidden] AND rejects reversed [hidden, vocab] on
     lm_head.weight. Proves the validator catches layout bugs, not
     just passes silently.

Full discharge (GGUF cosine-parity on trained 370M, max_logit_cosine
≤ 1e-3 over 100 canary prompts) blocks on compute-dispatch
(AC-SHIP2-003/004). Harness is fixture-swap-ready once a trained .apr
exists — no test rewrite needed. Spec §9 Risk #2 names this exact
mitigation path.

Contract: llama-370m-sovereign-v1.yaml v1.2.0 → v1.3.0, stays ACTIVE.
Tests: 2 new test fns in crates/aprender-train/src/models/llama_370m.rs
(8/8 pass). `pv validate` = 0 errors, 0 warnings.

Closes #117. Binds to AC-SHIP2-009 / FALSIFY-SHIP-019.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(ship-two-001): v2.22.0 — FALSIFY-SHIP-019 PARTIAL discharge capstone

Records the SHIP-019 algorithm-level PARTIAL discharge (task #117,
commit 846cc1d) in the authoritative spec:

- Version bump 2.21.0 → 2.22.0
- Full amendment block #4 under post-v2.19 evidence window documenting
  GATE-ARCH-370M-004 wired to `layout_contract.rs` algorithm proofs
  (219-tensor coverage + row-major ordering + GH-202 rejection)
- New "counter-example hunting" pattern lesson: prior "exhausted
  PARTIAL levers" verdict was ~86% correct; re-running the 7-gate
  FALSIFY-SHIP survey with explicit counter-example hunting found
  exactly one genuine lever (SHIP-019). SHIP-017/018/020 need compute;
  SHIP-013/014/016 collapse into SHIP-011 wiring.
- Combined MODEL-2 ledger: 3/12 fully ACTIVE + 3/12 PARTIAL = 6/12
  touched (50%). Remaining 6 (003/004/006/007/008/010) all require
  real 370M compute, trained .apr + eval harness, or RTX 4090
  wall-clock benchmark. Genuine algorithm-level PARTIAL harvesting for
  MODEL-2 is now exhausted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(publish): mark 5 QA harness crates publish = false + document policy

Evidence: aprender-qa-{cli,gen,runner,report,certify} have never been
published to crates.io (verified against crates.io API 2026-04-19).
They are reached through `apr qa` (the user-facing binary), not through
`cargo add`, so marking them publish = false prevents accidental
version-bump-with-no-publish drift across the workspace.

Spec §A.12 rewritten from the stale "63 crates (49 published + 14 internal)"
snapshot to the real 80-crate layout: 9 publish = false (4 benchmarks/xtask
+ 5 QA harness) plus 71 publishable. §A.12.1 codifies publishing policy:
three opt-out categories (benchmarks, xtask, QA harness), and the rule
that a v0.31.0-style release does NOT require cargo publish across all
80 crates — crates.io publish is selective (via cargo workspaces publish
--from-git or cargo publish -p <name>), workspace-wide tag/release is not.

Verified: cargo check --workspace clean after the flip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(mcp-spec): refresh header — M1–M3 SHIPPED in v0.31.0, M4 in flight

Five-whys on the stale 2026-04-17 draft status:
1. Why stale? Spec said "DRAFT (pre-implementation)" + target "v0.32.0"
   but M1–M3 actually shipped in v0.31.0 on 2026-04-19 (tag 62893da).
2. Why not refreshed? M1–M3 landed across multiple PRs without a
   spec-header refresh pass.
3. Why is that a problem? New contributors reading the spec think MCP
   is unshipped — contradicted by `cargo install aprender` already
   exposing `apr mcp` with 9 tools.
4. Root cause: spec headers are not on the release checklist.
5. Fix here: update status to ACTIVE, version to 1.2.0, delivery line
   to "v0.31.0 M1–M3 SHIPPED / M4 in flight (PRs #886-892)". No body
   changes — architecture/tool-surface/protocol sections are still
   accurate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(publish): mark aprender-viz-ttop publish = false + 4th category

Evidence: `aprender-viz-ttop` has never been published to crates.io
(release workflow explicitly never invokes `cargo publish` for it).
Its `description` field calls it a "Terminal Top: 10X better than btop"
system monitor — ships as a binary subcommand inside the `apr` facade,
not as a library dependency.

Five-whys:
1. Why flip it? Because it's a bundled binary, not a library.
2. Why does that matter? `cargo add aprender-viz-ttop` would mislead
   library authors into taking a user-facing TUI as a dep.
3. Why wasn't it already flipped? It predated the A.12 policy audit
   performed in 42907db.
4. Why a 4th category? Benchmarks / xtask / QA harness all leave
   outputs as artifacts; this one ships a runnable subcommand. The
   distinction matters because `apr cbtop` dispatches to it.
5. Why document it? To prevent a future reader from re-opening the
   "publish all 80 crates" question when we only publish ~70.

Changes:
- crates/aprender-viz-ttop/Cargo.toml: add `publish = false`
- docs/specifications/aprender-monorepo-consolidation.md:
  - §A.12: add viz-ttop to internal-crates table (10 rows)
  - §A.12.1: add 4th category (Bundled binaries); update total to
    "10 opted out / 70 publishable"; remove stale "Candidates to
    migrate" paragraph (superseded by 42907db + this commit)

Refs: APR-MONO, PR #901

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(task-123): native Rust pretokenize CLI — close MODEL-2 corpus gap

Root-cause fix for pretokenize-to-.bin gap that was blocking task #119
MODEL-2 370M real-compute pretrain smoke. User 2026-04-19 callout
"why not fix root cause vs 'hack'" rejected the Python shim path.

What ships (uncommitted WIP in `pretrain.rs`/`llama_370m.rs` left out):

- `contracts/pretokenize-bin-v1.yaml` v1.0.0 PROPOSED
  * `pv validate` PASS (0 errors / 0 warnings)
  * GATE-PRETOK-003 ship-blocking round-trip gate gains
    `evidence_discharged_by` (4 tests) + `discharge_status:
    PARTIAL_ALGORITHM_LEVEL`. Full discharge still blocks on
    cross-host byte-identical test (task #119 lambda-labs dispatch).

- `BPETokenizer::from_vocab_merges(vocab, merges, cfg)` loader
  (crates/aprender-train/src/tokenizer/bpe.rs)
  * Reads HEX-encoded vocab.json + merges.txt
  * Detects id collisions, rejects orphan merges
  * 2 new round-trip tests PASS

- `apr tokenize encode-corpus` CLI subcommand
  (crates/apr-cli/src/commands/tokenize.rs::run_encode_corpus,
   crates/apr-cli/src/tokenize_commands.rs,
   crates/apr-cli/src/dispatch_analysis.rs)
  * Gated `#[cfg(feature = "training")]`
  * Writes `shard-NNNNN.bin` (u32 LE) + `manifest.json` (schema
    `pretokenize-bin-v1`)
  * Flags: --corpus --tokenizer --output --shard-tokens
    --content-field --normalization --eos-policy
  * EOS lookup order: `</s>`, `<|endoftext|>`, `<eos>`, `<|eos|>`
  * "between" policy fix: emit EOS BEFORE each doc except the
    first (N-1 separators for N docs)

- `tests/pretokenize_shard_roundtrip.rs`
  * `cli_shard_layout_is_read_by_shard_batch_iter`
    — INV-PRETOK-002 + INV-PRETOK-007
  * `multi_shard_names_preserve_order` — INV-PRETOK-004

- `evidence/ship-two-001/pretokenize-bin-v1-partial-discharge.json`
  documents algorithm-level partial discharge.

Manual dogfood: 5-doc fixture → 78 tokens / 1 shard / 312 bytes /
4 EOS separators (N-1 for between-policy) / EOS id = 2 (`</s>`).

Next session: wait on task #118 (50257-vocab tokenizer training,
PID 2832743, 79min+) then run `apr tokenize encode-corpus` on
CSN-Python train split and dispatch to lambda-labs RTX 4090.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants