Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: paiml/aprender
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.31.0
Choose a base ref
...
head repository: paiml/aprender
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.31.1
Choose a head ref
  • 12 commits
  • 80 files changed
  • 2 contributors

Commits on Apr 19, 2026

  1. chore(publish): mark 5 QA harness crates publish = false + document p…

    …olicy (#901)
    
    * evidence(ship-two-001): MODEL-2 pretrain smoke test — task #105 discharge
    
    Records the end-to-end synthetic drive of `apr pretrain` on commit 1e7cf53
    (now landed on main at 9209383 via PR #882 merge). Verifies task #105
    deliverable: GATE-TRAIN-005 / INV-TRAIN-007 / GATE-TRAIN-008 wiring is
    functional end-to-end.
    
    Run: 20 steps, 4 epochs, batch=4, seq=128 — val_loss monotone 3.96 → 2.64.
    
    Synthetic drive caveat: no real 370M forward pass, no real corpus read, no
    checkpoint artifacts written yet. Real corpus + checkpoint wiring tracked as
    task #111.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * spec(model-2): MVP plan for task #111 (pretrain real corpus + checkpoint)
    
    7-step edit list from Plan agent afd391d1eb1395d30 against post-#882-merge commit
    9209383. Identifies 5 critical files (pretrain.rs, apr-cli/commands/pretrain.rs,
    trainer.rs, transformer/model.rs, io/save.rs) and 5 binary acceptance criteria
    (AC-111-001..005). Host assignment: lambda-labs (impl), yoga (8GB smoke),
    gx10 (parity).
    
    Non-goals explicitly deferred: async H2D streaming, full corpus-ingest pipeline,
    mixed-precision scaler tuning, distributed training, convergence budget, resume
    round-trip, nvml telemetry, apr qa post-hoc validators.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * evidence(ship-two-001): yoga parity smoke — GATE-TRAIN-006 discharged
    
    Cross-host byte-identical loss history on yoga RTX 4060 Laptop (8GB):
      lambda-labs: [3.96, 3.52, 3.08, 2.64]
      yoga:        [3.96, 3.52, 3.08, 2.64]
    
    Discharges GATE-TRAIN-006 (seed=42 deterministic) across x86_64 RTX 4090 ↔
    x86_64 RTX 4060 Laptop. Same synthetic drive — task #111 MVP will add the
    real 370M forward pass; yoga stays as 8GB smoke-test host per MVP plan's
    host assignment table.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): RealStepFn/RealValFn + shard reader (task #111 steps 1-3)
    
    Implements MODEL-2 pretrain MVP plan steps 1-3: the model-agnostic
    PretrainLoop now has a real-corpus driver that runs a full forward +
    backward + AdamW step through TransformerTrainer against the 370M
    Llama scaffold — replacing the LinearDecaySynthetic/ScriptedVal pair
    used for GATE-TRAIN-005/006/007/008 wiring verification in task #105.
    
    **New modules**
    
    - `train::shard_reader::ShardBatchIter`
      Streaming iterator over .bin token shards (little-endian u32).
      Reads seq_length+1 sequences, chunks into LMBatch of batch_size.
      Empty-dir errors; lexical shard ordering; EOF auto-advances to next
      shard. No MinHash dedup / PII scrub / license filter — those belong
      to `apr-corpus-ingest run`.
    
    - `train::pretrain_real::{RealStepFn, RealValFn, build_shared_trainer}`
      - `llama_370m_transformer_config()` field-for-field from the frozen
        Llama370MConfig constants (INV-ARCH-370M-001..008 source of truth)
      - `llama_370m_train_config(lr, seq_length, seed)` builds
        TransformerTrainConfig with MODEL-2 v2-remedy defaults
      - `SharedTrainer = Rc<RefCell<TransformerTrainer>>` so both the
        mutable StepFn and the forward-only ValFn own the same model
      - `RealStepFn::step` pulls one LMBatch, runs train_batch, returns
        (loss, grad_norm=1.0 placeholder). Exhausted iterator returns a
        finite (1.0, 1.0) so GATE-TRAIN-007 (NaN/Inf) does not mis-fire
        on shard-stream EOF before the loop plans to stop.
      - `RealValFn::validate` runs forward-only across a held-out Vec,
        returns mean cross-entropy loss (or NaN if held-out is empty).
      - `build_shared_trainer` runs INV-ARCH-370M-001 as a debug_assert
        (param count must land in [366M, 374M]) so any drift in the
        Llama370MConfig constants fails the instant a dev build compiles.
    
    **Contract coverage**
    
    Existing `contracts/training-loop-pretrain-v1.yaml` covers all MVP
    obligations already; no new contract needed. Task #111 follow-up will
    add per-epoch APR checkpoint hooks (C-TRAIN-PRETRAIN INV-TRAIN-002)
    and real optimizer-state sha256 (INV-TRAIN-003).
    
    **Tests**
    
    - shard_reader: single_shard_yields_expected_batch_count,
      empty_dir_errors, multi_shard_ordering_is_lexical
    - pretrain_real: transformer_config_matches_llama_370m_constants,
      real_step_fn_exhausted_iterator_returns_finite_placeholder,
      real_val_fn_empty_held_out_returns_nan
    
    All 6 new tests PASS. Steps 4-7 (SafeTensors→APR swap, `apr pretrain`
    CLI wiring, real grad_norm, checkpoint hook) to follow.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): wire real-corpus drive into apr pretrain (task #111 step 5)
    
    Replaces the `if !synthetic { return Err(...) }` guard with a real
    branch: build a shared 370M `TransformerTrainer`, split the shard
    stream head-off into a `HELD_OUT_BATCHES`-entry validation set, and
    drive the `PretrainLoop` with `RealStepFn`/`RealValFn` (from
    `entrenar::train::pretrain_real`) against a `ShardBatchIter`.
    
    **Structure**
    
    - `run` is now a 2-branch dispatcher. `drive_synthetic` preserves the
      deterministic decay drive used for GATE-TRAIN-005/006/007/008 wiring
      verification (task #105). `drive_real` is the new real-corpus path.
    - Both branches funnel into `run_and_report<S, V>` which owns the
      `PretrainLoop::new` + `run` + `report` sequence so the terminal
      status propagation (→ exit code) stays single-sourced.
    
    **MVP invariants (documented)**
    
    - `HELD_OUT_BATCHES = 2` — small constant; follow-up will plumb an
      explicit `--val-shards` flag so training and held-out shards are
      disjoint.
    - `pad_id = eos_id = 0` — uniform-length sequences take the shared
      layout in `LMBatch::from_sequences`, so pad_id is never used; the
      real tokenizer's special-token ids plumb through in a follow-up.
    - Empty dataset dir → `CliError::ValidationFailed` (shard iterator
      init failure), covered by the new test
      `real_mode_empty_dataset_dir_errors`.
    
    **Test changes**
    
    - `real_mode_empty_dataset_dir_errors` replaces the now-obsolete
      `synthetic_mode_false_rejected` test. Both synthetic and validation
      tests continue to pass (3/3 in `commands::pretrain::tests`).
    
    **Remaining MVP steps (task #111)**
    
    - Step 4: swap SafeTensors → APR in `trainer.rs` checkpoint writer.
    - Step 6: real optimizer-state sha256 over AdamW m/v/t (INV-TRAIN-003).
    - Step 7: per-epoch checkpoint hook in `PretrainLoop::run_epoch`
      post-gate-pass (C-TRAIN-PRETRAIN INV-TRAIN-002).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): CPU save_apr + per-epoch checkpoint hook (task #111 steps 4+7)
    
    Steps 4 and 7 of the MODEL-2 pretrain MVP (SHIP-TWO-001 v2.19.0):
    
    Step 4 — CPU save_apr
    - Add `TransformerTrainer::save_apr(path, name, arch)` in
      crates/aprender-train/src/train/transformer_trainer/trainer.rs,
      mirroring the existing CudaTransformerTrainer::save_apr. Emits a
      sovereign row-major .apr via aprender's Model + SaveConfig::Apr.
    - Existing `save()` (SafeTensors) left unchanged — three tests at
      trainer/core.rs:388,409 and tests.rs:423 still round-trip via
      safetensors for backward compat.
    - Test `save_apr_writes_readable_apr_file`: write a tiny-config
      trainer, open with `AprReader`, assert APR magic (APR\0 / APRN),
      assert `architecture` metadata round-trips, assert
      `model.embed_tokens.weight` readable as f32. PASSES.
    
    Step 7 — per-epoch APR checkpoint hook
    - Add `pub trait CheckpointFn` in train/pretrain.rs:
        `fn save(&mut self, epoch, &EpochArtifact) -> Result<(), String>`
    - Add `Option<Box<dyn CheckpointFn>>` field to `PretrainLoop` +
      builder method `with_checkpoint_fn`. Keeps PretrainLoop<S,V>
      at two generics (synthetic + real call-sites unify).
    - Wire into `run_epoch` AFTER `check_non_divergence(...)?` passes,
      BEFORE `epoch_artifacts.push()`. Aborted epochs never produce
      checkpoint files (per contract `per_epoch_artifacts` invariant).
      Write failures log eprintln but are non-fatal — a flaky disk
      cannot lose training progress.
    - Emit companion `metadata.json` (contract path_template).
    
    Real-corpus wiring
    - Add `AprCheckpointFn` in train/pretrain_real.rs holding the shared
      `Rc<RefCell<TransformerTrainer>>`; its `save()` delegates to
      `trainer.save_apr()` so the three hooks (RealStepFn, RealValFn,
      AprCheckpointFn) see the same in-memory weights.
    - Re-export `CheckpointFn` from train/mod.rs.
    
    CLI
    - `apr pretrain` --real path (drive_real): construct
      `build_shared_trainer` once, clone Rc into RealStepFn +
      RealValFn + AprCheckpointFn, pass to `run_and_report`.
    - `run_and_report` takes `Option<Box<dyn CheckpointFn>>`; synthetic
      branch passes `None` (no real weights to save).
    
    Tests (all green, 21 pretrain + 4 pretrain_real/save_apr + 3 CLI)
    - `pretrain_loop_calls_checkpoint_fn_once_per_passing_epoch`:
      mock `CheckpointFn` counts calls. Every successful epoch fires
      exactly one call; companion metadata.json written to disk.
    - `pretrain_loop_skips_checkpoint_on_abort`: NaN step forces
      abort; mock hook recorded zero calls.
    - `save_apr_writes_readable_apr_file`: magic + metadata + tensor
      round-trip via AprReader.
    
    Contract discharge
    - GATE-TRAIN-005 invariant preserved: checkpoint placement AFTER
      divergence guard means aborted epochs never touch disk.
    - training-loop-pretrain-v1 `per_epoch_artifacts.path_template`
      honored: `{run_dir}/ckpt/epoch-{N:03d}.apr` + `.metadata.json`.
    
    Deferred (Step 6)
    - `fake_optimizer_sha(epoch)` at pretrain.rs:680 still returns a
      placeholder. INV-TRAIN-003 discharge needs TransformerTrainer
      to expose AdamW m/v/t buffers for a real sha256. Separate step.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): real AdamW optimizer-state sha256 (task #111 step 6)
    
    INV-TRAIN-003 discharge for the MODEL-2 pretrain MVP.
    
    TransformerTrainer::optimizer_state_sha256()
    - New accessor in crates/aprender-train/src/train/transformer_trainer/trainer.rs
      that hashes (t, m_buffers, v_buffers) in fixed order.
    - Uses sha2::Sha256 + bytemuck::cast_slice over each Array1<f32>.
    - Versioned tag "aprender-train:adamw:optstate:v1" prefixes the
      digest so schema changes are loud, not silent.
    - Uninitialized slots hash to the literal "none" so missing m[i]
      is semantically distinct from an all-zeros m[i].
    
    StepFn trait extension
    - Add `fn optimizer_state_sha256(&self) -> Option<String>` with
      default `None`. Synthetic harnesses keep returning None and
      continue using the `fake_optimizer_sha` epoch/seed fallback.
    - `PretrainLoop::run_epoch` now reads `step_fn.optimizer_state_sha256()`
      and falls back to the fake fingerprint only when None.
    
    RealStepFn override
    - RealStepFn in pretrain_real.rs implements the new hook by
      delegating to `trainer.borrow().optimizer_state_sha256()`, so
      the real-corpus path records the actual AdamW digest.
    
    Tests (all 25 + 3 green)
    - `optimizer_state_sha256_is_hex_digest_on_fresh_trainer`: 64-char
      lowercase hex shape check on an un-stepped trainer.
    - `optimizer_state_sha256_is_stable_across_fresh_trainers`: two
      fresh trainers hash to the same digest (reproducibility).
    - `pretrain_loop_uses_step_fn_optimizer_sha_when_available`:
      a StepFn with override wins over fake_optimizer_sha.
    - `pretrain_loop_falls_back_to_fake_optimizer_sha_for_synthetic`:
      default impl still produces a 64-char hex digest via fallback.
    
    Task #111 MVP status
    - Steps 1-3 shipped in commit b2b0329
    - Step 5 shipped in commit e5a2f02
    - Steps 4+7 shipped in commit 89db4b3
    - Step 6 shipped in this commit
    - All 7 steps of the task #111 plan are now committed.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): FALSIFY-SHIP-021 seed=0 × 100-step reproducibility harness
    
    Discharges GATE-TRAIN-006 / INV-TRAIN-006 from training-loop-pretrain-v1
    (bumped 1.0.0 → 1.1.0 PROPOSED → ACTIVE).
    
    Two new Rust tests in crates/aprender-train/src/train/transformer_trainer/tests.rs:
    - falsify_ship_021_seed_0_100_step_reproducibility: two trainers built with
      seed=0 produce identical finite losses for 100 consecutive train_batch
      calls (|Δ| ≤ 1e-6) AND identical AdamW optimizer_state_sha256 digests.
    - falsify_ship_021_different_seeds_do_diverge: seed=0 vs seed=1 counter-test
      must diverge > 1e-4 within 10 steps (guards against degenerate "always
      equal" implementations).
    
    Seed plumbing fixes:
    - TransformerTrainer::new now calls lock_init_seed(config.seed) before
      Transformer::new so direct (non-YAML) callers honor the configured seed
      instead of silently inheriting the global default of 42.
    - transformer::init::INIT_SEED_LOCK (std::sync::Mutex) + lock_init_seed
      helper returning a #[must_use] MutexGuard. Held across the full
      Transformer::new call so cargo test's default parallel runner cannot
      clobber the global atomic INIT_SEED between one test's set_init_seed
      and another test's weight-init reads. Poisoned mutex is recovered
      transparently (seed itself is atomic; poison only signals prior panic).
    
    Contract uplift (contracts/training-loop-pretrain-v1.yaml v1.1.0):
    - status PROPOSED → ACTIVE
    - INV-TRAIN-006 gains harness: block naming both test paths + assertions
    - GATE-TRAIN-006 gains evidence_discharged_by: pointing to both tests
    - metadata.changelog entry recording the discharge
    
    Verification:
      cargo test -p aprender-train --lib falsify_ship_021 → 2 passed
      cargo clippy -p aprender-train --lib --no-deps -- -D warnings → clean
      pv validate contracts/training-loop-pretrain-v1.yaml → 0 errors
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(ship-two): FALSIFY-SHIP-022 apr inspect provenance (AC-SHIP2-012)
    
    Discharges FALSIFY-SHIP-022: apr inspect surfaces license + data_source
    + data_license on every .apr, with "(missing)" / null rendering when a
    field is absent rather than silent skip. Makes a .apr binary a
    sufficient provenance-audit artifact (no sidecar manifest required).
    
    Contract: contracts/apr-provenance-v1.yaml (C-APR-PROVENANCE v1.0.0,
    ACTIVE, kind: schema). 3 invariants + 3 gates + 3 failure modes, all
    bound to AC-SHIP2-012 / FALSIFY-SHIP-022. pv validate PASS.
    
    Code changes:
    - AprV2Metadata: add data_source + data_license as named Option<String>
      fields (not buried in custom HashMap). No skip_serializing_if, so JSON
      round-trips them as null when None (FM-APR-PROV-SILENT-SKIP).
    - apr inspect MetadataInfo: mirror all 3 provenance fields, also with
      no skip_serializing_if.
    - apr inspect text output: new "Provenance:" block via pure helper
      format_provenance_block() — always emits all 3 keys, renders None as
      literal "(missing)".
    - Two struct-literal construction sites updated for new fields.
    
    Harness tests (5 passing):
    - aprender-core:
      - falsify_ship_022_apr_metadata_provenance_round_trip
      - falsify_ship_022_inspect_emits_provenance_keys (JSON null half)
      - falsify_ship_022_partial_provenance_round_trip
    - apr-cli:
      - falsify_ship_022_inspect_emits_provenance_keys (MetadataInfo JSON)
      - falsify_ship_022_inspect_missing_renders_as_missing (text half)
      - falsify_ship_022_inspect_populated_renders_values
    
    Smoke test: apr inspect on existing .apr (no provenance stored)
    correctly emits:
      Provenance:
        license: (missing)
        data_source: (missing)
        data_license: (missing)
    
    cargo fmt + cargo clippy (aprender-core, apr-cli) clean.
    3239 aprender-core format tests PASS, 85 apr-cli inspect tests PASS.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(ship-two): v2.20.0 amendment — FALSIFY-SHIP-021 + FALSIFY-SHIP-022 DISCHARGED
    
    Documents two MODEL-2 ship gates closed in the post-v2.19 evidence window:
    
    1. FALSIFY-SHIP-021 (AC-SHIP2-011) — seed=0 × 100-step reproducibility
       harness + counter-test seed=0 vs seed=1 divergence proof. Root cause
       of original flake (sibling test racing on global INIT_SEED atomic)
       fixed via lock_init_seed(seed) -> MutexGuard. Contract
       training-loop-pretrain-v1.yaml bumped 1.0.0 → 1.1.0 ACTIVE.
       Commit 0b8ca8c, task #112.
    
    2. FALSIFY-SHIP-022 (AC-SHIP2-012) — apr inspect provenance block
       (license + data_source + data_license) shipped. AprV2Metadata
       extended with 2 named Option<String> fields; no skip_serializing_if
       (FM-APR-PROV-SILENT-SKIP guard). Pure helper format_provenance_block
       replaces stdout-capture in tests (gag is NOT parallel-safe).
       New contract apr-provenance-v1.yaml (C-APR-PROVENANCE v1.0.0
       ACTIVE, kind: schema). pv validate PASS. Commit 8f0607d,
       task #113.
    
    Combined status: 2/12 AC-SHIP2 gates DISCHARGED. Remaining 10 block
    on 370M compute-dispatch (the long-pole from v2.19.0).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): FALSIFY-SHIP-011 llama-370m sovereign contract ACTIVE (AC-SHIP2-001)
    
    Discharges FALSIFY-SHIP-011 / AC-SHIP2-001 — MODEL-2 370M architectural
    contract registered AND byte-equally bound to the Rust scaffold that
    aprender-train consumes.
    
    Contract lift:
    - contracts/model-families/llama-370m-sovereign-v1.yaml
      - version 1.0.0 → 1.1.0
      - status PROPOSED → ACTIVE
      - GATE-ARCH-370M-001 gains evidence_discharged_by (4 entries) and
        ship_blocking: true
      - changelog block added documenting the v1.1.0 discharge
    
    Harness tests (crates/aprender-train/src/models/llama_370m.rs):
    - `falsify_ship_011_rust_scaffold_matches_yaml_contract` — loads the
      contract via include_str! (compile-time-embedded, no path deps at
      runtime) and asserts every architecture.* and constraints.* key
      matches the corresponding Llama370MConfig::* const byte-equally
    - `falsify_ship_011_sovereign_contract_is_active` — asserts status ==
      ACTIVE (a PROPOSED contract cannot gate a ship)
    
    Test run: 6/6 aprender-train::models::llama_370m tests PASS (4 pre-
    existing + 2 new). pv validate on contract: 0 errors, 0 warnings.
    
    Why this discharge is strong:
    - Rust scaffold already encodes INV-ARCH-370M-002..008 as compile-time
      `const _: () = Llama370MConfig::validate();` — a drift of any value
      fails `cargo build`, not just `cargo test`
    - The new YAML-vs-Rust binding test adds the missing half: drift of a
      YAML key that the Rust scaffold doesn't mirror is now also caught at
      test time, preventing the MODEL-1-v2 QLoRA class of recipe/artifact
      drift (rank=16 actual vs rank=32 recipe — see
      project_ship_two_001_model1_qlora_divergence.md)
    - INV-ARCH-370M-001 (param count band) is discharged by the existing
      `estimated_param_count_within_contract_band` test
    - INV-ARCH-370M-009 (row-major layout) is discharged by
      aprender::format::layout_contract at APR load time
    
    Combined MODEL-2 status after this commit: 3/12 AC-SHIP2 gates
    DISCHARGED (001, 011, 012). Remaining 9 (002–010) still block on
    actual 370M training compute-dispatch — the pretrain loop driver from
    v2.19.0 is ready to exercise them once the weights exist.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): FALSIFY-SHIP-012 algorithm-level PARTIAL discharge (AC-SHIP2-002)
    
    Bumps C-TOK-BPE to v1.1.0 and wires evidence_discharged_by into
    GATE-BPE-003 pointing at 3 existing harness tests in
    crates/apr-cli/tests/falsify_ship_012_tokenizer_roundtrip.rs and
    the emitted evidence JSON at
    evidence/ship-two-001/model-2/falsify-ship-012-tokenizer-roundtrip.json.
    
    Status intentionally stays PROPOSED. The gate requires 10K-doc
    byte-exact round-trip on The Stack v2 Python holdout; task #91 shipped
    the ingest scaffold (corpus-ingest dry-run CLI) but the 10K fixture
    itself is not yet materialized — so this lands as PARTIAL_ALGORITHM_LEVEL
    discharge with full_discharge_blocks_on: task #91 data.
    
    What passes algorithm-level today (all 3 tests green at commit time):
    - falsify_ship_012_tokenizer_roundtrip_byte_exact — decode(encode(nfc(doc)))
      byte-equals nfc(doc) on every doc in a 20-doc synthetic Python-like
      holdout (ASCII keywords + Unicode identifiers + docstrings + emoji +
      combining marks). Hard-asserts evidence.docs_failed == 0 — regressions
      reintroducing whitespace splitting or dropping the byte encoder panic.
    - falsify_ship_012_nfc_idempotence_only — INV-BPE-005 standalone: nfc(nfc(x))
      byte-equals nfc(x) on every holdout doc.
    - falsify_ship_012_train_corpus_sanity — train/holdout set disjointness
      plus minimum corpus sizes (>=20 docs each).
    
    When task #91's 10K Stack-v2 Python holdout lands the fixture swap is
    data-only: the harness module doc-comment already flagged this path so
    no test rewrite will be required.
    
    Evidence: evidence/ship-two-001/model-2/falsify-ship-012-tokenizer-roundtrip.json
    (20/20 passed, nfc_idempotent: true, vocab_size_trained: 489/512).
    
    Verification:
    - pv validate contracts/tokenizer-bpe-v1.yaml -> 0 errors, 0 warnings
    - cargo test -p apr-cli --test falsify_ship_012_tokenizer_roundtrip -> 3/3 passed
    
    Bound to: AC-SHIP2-002 (ship-two-models-spec §5).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): FALSIFY-SHIP-015 algorithm-level PARTIAL discharge (AC-SHIP2-005)
    
    Bumps C-LLAMA-370M-SOVEREIGN v1.1.0 → v1.2.0 and wires
    evidence_discharged_by into GATE-ARCH-370M-003 (the param-count gate that
    binds AC-SHIP2-005 via FALSIFY-SHIP-015). Contract stays ACTIVE — the
    FALSIFY-SHIP-011 discharge (v1.1.0) is what gates the ACTIVE promotion,
    not SHIP-015.
    
    GATE-ARCH-370M-003's evidence_required asks for
      apr inspect --json model.apr | jq '.param_count' ∈ [366M, 374M]
    on a real 370M `.apr` checkpoint. That file does not exist yet — it
    blocks on AC-SHIP2-003/004 pretraining compute-dispatch. Rather than
    leave the gate's evidence blank, this commit wires the algorithm-level
    proof that already exists:
    
    - estimated_param_count() / estimated_stored_param_count() — const fn
      over Llama370MConfig::*, so the count is computed at compile time.
    - estimated_param_count_within_contract_band (unit test) hard-asserts:
        * p ∈ [PARAMETERS_MIN=366M, PARAMETERS_MAX=374M]  (INV-ARCH-370M-001)
        * |p − 370M| / 370M < 5%                          (tighter sanity)
        * p − stored == VOCAB_SIZE × HIDDEN_DIM           (tied embeddings)
    
    Any edit to Llama370MConfig that moves the count out of the
    INV-ARCH-370M-001 band fails `cargo test -p aprender-train --lib
    llama_370m` — before any compute runs.
    
    The gate now carries:
      discharge_status: PARTIAL_ALGORITHM_LEVEL
      full_discharge_blocks_on: "real 370M .apr checkpoint from pretraining
                                 compute-dispatch (AC-SHIP2-003/004)"
      ship_blocking: true
    
    so the data-scale gap is first-class contract state, not an unspoken
    assumption.
    
    Verification:
    - pv validate contracts/model-families/llama-370m-sovereign-v1.yaml
      -> 0 errors, 0 warnings
    - cargo test -p aprender-train --lib models::llama_370m
      -> 6/6 passed (including the newly-cited
         estimated_param_count_within_contract_band and the pre-existing
         falsify_ship_011_* pair)
    
    MODEL-2 AC-SHIP2 ledger after this: 3/12 fully ACTIVE (001, 011, 012)
    + 2/12 PARTIAL (002 via SHIP-012, 005 via SHIP-015) = 5/12 touched.
    Remaining 7 (003/004/006/007/008/009/010) block on 370M compute.
    
    Bound to: AC-SHIP2-005 (ship-two-models-spec §5).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(ship-two-001): spec v2.21.0 — FALSIFY-SHIP-011 DISCHARGED + SHIP-012/015 PARTIAL
    
    Captures the three evidence-wiring commits landed on
    chore/post-v2.19-evidence since v2.20.0:
    
    1. FALSIFY-SHIP-011 (AC-SHIP2-001) DISCHARGED at 338c6eb (task #114)
       C-LLAMA-370M-SOVEREIGN v1.0.0 PROPOSED -> v1.1.0 ACTIVE.
       Rust-YAML byte-equality binding via include_str! + serde_yaml::Value.
    
    2. FALSIFY-SHIP-012 (AC-SHIP2-002) PARTIAL_ALGORITHM_LEVEL at 2e8b8b8
       (task #115). C-TOK-BPE v1.0.0 -> v1.1.0 stays PROPOSED.
       3 tokenizer harness tests wired; full discharge blocks on task #91
       10K Stack-v2 Python holdout (fixture-swap is data-only).
    
    3. FALSIFY-SHIP-015 (AC-SHIP2-005) PARTIAL_ALGORITHM_LEVEL at bfb8831
       (task #116). Sovereign contract v1.1.0 -> v1.2.0 stays ACTIVE.
       estimated_param_count_within_contract_band + const fns wired;
       full discharge blocks on real 370M .apr from compute-dispatch.
    
    Also codifies the PARTIAL_ALGORITHM_LEVEL pattern as a first-class
    spec concept: when a gate's evidence_required describes a
    production-scale check that is not yet runnable but the underlying
    invariant is provable today at algorithm/compile/unit-test level,
    wire the algorithm proofs and carry discharge_status +
    partial_discharge_note + full_discharge_blocks_on + ship_blocking=true
    to make the data gap first-class contract state.
    
    MODEL-2 ship-gate status after v2.21.0: 3/12 fully ACTIVE (001, 011,
    012) + 2/12 PARTIAL_ALGORITHM_LEVEL (002, 005) = 5/12 touched (~42%).
    Remaining 7 block on real 370M compute-dispatch.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): FALSIFY-SHIP-019 algorithm-level PARTIAL discharge (AC-SHIP2-009)
    
    GATE-ARCH-370M-004 gains evidence_discharged_by + discharge_status:
    PARTIAL_ALGORITHM_LEVEL. Three algorithm-level invariants wired without
    training:
    
      1. Coverage — every 370M tensor (219 entries: 1 embed + 1 lm_head +
         9 per-layer × 24 layers + 1 final norm) resolves to a
         TensorContract entry in LayoutContract::new(). Pattern-normalises
         per-layer names; any uncovered tensor would be silently skipped
         by GGUF export.
    
      2. Row-major ordering (INV-ARCH-370M-009) — every 2D shape is
         [out_dim, in_dim]. Pinned lm_head/embed/q_proj/k_proj shapes
         verify GQA (k_proj = [kv_heads*head_dim, hidden]) and bind the
         370M architecture to the GH-202-regression-proof layout.
    
      3. Critical-tensor enforcement — validate_apr_shape accepts
         [vocab, hidden] AND rejects reversed [hidden, vocab] on
         lm_head.weight. Proves the validator catches layout bugs, not
         just passes silently.
    
    Full discharge (GGUF cosine-parity on trained 370M, max_logit_cosine
    ≤ 1e-3 over 100 canary prompts) blocks on compute-dispatch
    (AC-SHIP2-003/004). Harness is fixture-swap-ready once a trained .apr
    exists — no test rewrite needed. Spec §9 Risk #2 names this exact
    mitigation path.
    
    Contract: llama-370m-sovereign-v1.yaml v1.2.0 → v1.3.0, stays ACTIVE.
    Tests: 2 new test fns in crates/aprender-train/src/models/llama_370m.rs
    (8/8 pass). `pv validate` = 0 errors, 0 warnings.
    
    Closes #117. Binds to AC-SHIP2-009 / FALSIFY-SHIP-019.
    
    🤖 Generated with [Claude Code](https://claude.com/claude-code)
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(ship-two-001): v2.22.0 — FALSIFY-SHIP-019 PARTIAL discharge capstone
    
    Records the SHIP-019 algorithm-level PARTIAL discharge (task #117,
    commit 846cc1d) in the authoritative spec:
    
    - Version bump 2.21.0 → 2.22.0
    - Full amendment block #4 under post-v2.19 evidence window documenting
      GATE-ARCH-370M-004 wired to `layout_contract.rs` algorithm proofs
      (219-tensor coverage + row-major ordering + GH-202 rejection)
    - New "counter-example hunting" pattern lesson: prior "exhausted
      PARTIAL levers" verdict was ~86% correct; re-running the 7-gate
      FALSIFY-SHIP survey with explicit counter-example hunting found
      exactly one genuine lever (SHIP-019). SHIP-017/018/020 need compute;
      SHIP-013/014/016 collapse into SHIP-011 wiring.
    - Combined MODEL-2 ledger: 3/12 fully ACTIVE + 3/12 PARTIAL = 6/12
      touched (50%). Remaining 6 (003/004/006/007/008/010) all require
      real 370M compute, trained .apr + eval harness, or RTX 4090
      wall-clock benchmark. Genuine algorithm-level PARTIAL harvesting for
      MODEL-2 is now exhausted.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * chore(publish): mark 5 QA harness crates publish = false + document policy
    
    Evidence: aprender-qa-{cli,gen,runner,report,certify} have never been
    published to crates.io (verified against crates.io API 2026-04-19).
    They are reached through `apr qa` (the user-facing binary), not through
    `cargo add`, so marking them publish = false prevents accidental
    version-bump-with-no-publish drift across the workspace.
    
    Spec §A.12 rewritten from the stale "63 crates (49 published + 14 internal)"
    snapshot to the real 80-crate layout: 9 publish = false (4 benchmarks/xtask
    + 5 QA harness) plus 71 publishable. §A.12.1 codifies publishing policy:
    three opt-out categories (benchmarks, xtask, QA harness), and the rule
    that a v0.31.0-style release does NOT require cargo publish across all
    80 crates — crates.io publish is selective (via cargo workspaces publish
    --from-git or cargo publish -p <name>), workspace-wide tag/release is not.
    
    Verified: cargo check --workspace clean after the flip.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-spec): refresh header — M1–M3 SHIPPED in v0.31.0, M4 in flight
    
    Five-whys on the stale 2026-04-17 draft status:
    1. Why stale? Spec said "DRAFT (pre-implementation)" + target "v0.32.0"
       but M1–M3 actually shipped in v0.31.0 on 2026-04-19 (tag 62893da).
    2. Why not refreshed? M1–M3 landed across multiple PRs without a
       spec-header refresh pass.
    3. Why is that a problem? New contributors reading the spec think MCP
       is unshipped — contradicted by `cargo install aprender` already
       exposing `apr mcp` with 9 tools.
    4. Root cause: spec headers are not on the release checklist.
    5. Fix here: update status to ACTIVE, version to 1.2.0, delivery line
       to "v0.31.0 M1–M3 SHIPPED / M4 in flight (PRs #886-892)". No body
       changes — architecture/tool-surface/protocol sections are still
       accurate.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * chore(publish): mark aprender-viz-ttop publish = false + 4th category
    
    Evidence: `aprender-viz-ttop` has never been published to crates.io
    (release workflow explicitly never invokes `cargo publish` for it).
    Its `description` field calls it a "Terminal Top: 10X better than btop"
    system monitor — ships as a binary subcommand inside the `apr` facade,
    not as a library dependency.
    
    Five-whys:
    1. Why flip it? Because it's a bundled binary, not a library.
    2. Why does that matter? `cargo add aprender-viz-ttop` would mislead
       library authors into taking a user-facing TUI as a dep.
    3. Why wasn't it already flipped? It predated the A.12 policy audit
       performed in 42907db.
    4. Why a 4th category? Benchmarks / xtask / QA harness all leave
       outputs as artifacts; this one ships a runnable subcommand. The
       distinction matters because `apr cbtop` dispatches to it.
    5. Why document it? To prevent a future reader from re-opening the
       "publish all 80 crates" question when we only publish ~70.
    
    Changes:
    - crates/aprender-viz-ttop/Cargo.toml: add `publish = false`
    - docs/specifications/aprender-monorepo-consolidation.md:
      - §A.12: add viz-ttop to internal-crates table (10 rows)
      - §A.12.1: add 4th category (Bundled binaries); update total to
        "10 opted out / 70 publishable"; remove stale "Candidates to
        migrate" paragraph (superseded by 42907db + this commit)
    
    Refs: APR-MONO, PR #901
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
    noahgift and claude authored Apr 19, 2026
    Configuration menu
    Copy the full SHA
    e28973c View commit details
    Browse the repository at this point in the history
  2. feat(mcp): M4 — author apr-mcp-server-v1.yaml as ACTIVE contract (#886)

    Promotes the 8 FALSIFY-MCP-* gates in docs/specifications/apr-mcp-server-spec.md
    from DRAFT to ACTIVE by authoring contracts/apr-mcp-server-v1.yaml and
    cross-linking every gate to its shipped test in crates/aprender-mcp/.
    
    Integration test crates/aprender-contracts/tests/apr_mcp_server_contract.rs
    loads the YAML, asserts ids FALSIFY-MCP-001..008 are present in order,
    status is ACTIVE, every condition is ENFORCED, and every referenced
    test_file exists on disk — a renamed or deleted test fails the suite
    loudly before the aprender-mcp crate tests even compile.
    
    Gate → test mapping (all shipped, all passing, no #[ignore]):
      001 → tests/falsify_m1.rs :: falsify_mcp_001_initialize_under_500ms
      002 → tests/falsify_schema.rs :: every_tool_input_schema_is_valid_jsonschema_draft_7
      003 → src/tools/run.rs :: tools::run::tests::definition_has_correct_name_and_required_field
      004 → src/tools/qa.rs :: tools::qa::tests::definition_has_correct_name_and_required_field
      005 → tests/falsify_m1.rs :: falsify_mcp_005_invalid_jsonrpc_version_is_minus_32600
      006 → tests/falsify_mcp_006.rs :: falsify_mcp_006_cancel_stops_subprocess_within_grace
      007 → tests/falsify_m1.rs :: falsify_mcp_007_protocol_version_mismatch_is_minus_32602
      008 → tests/falsify_mcp_008.rs :: migrated_tools_match_yaml_contract_byte_for_byte
    
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
    noahgift and claude authored Apr 19, 2026
    Configuration menu
    Copy the full SHA
    f25bae9 View commit details
    Browse the repository at this point in the history
  3. feat(mcp): strengthen FALSIFY-MCP-003/004 to e2e response-shape gates (

    …#889)
    
    * feat(mcp): strengthen FALSIFY-MCP-003/004 to e2e response-shape gates
    
    Both FALSIFY-MCP-003 (apr.run) and FALSIFY-MCP-004 (apr.qa) were mapped in
    the apr-mcp-server-v1 contract (PR #886) to the surface-level
    `definition_has_correct_name_and_required_field` unit tests inside
    tools/run.rs and tools/qa.rs. Those only prove the tool is registered —
    they say nothing about the shape of the JSON body that flows back through
    the MCP response.
    
    This lands two new integration tests that drive tools::run::call and
    tools::qa::call against a mock `apr` binary on the PATH (same pattern as
    tests/falsify_mcp_006.rs and tests/falsify_mcp_progress_001.rs). The mock
    prints a deterministic JSON fixture matching the real CLI schema:
    
      * apr run --json:  model, text, tokens[], tokens_generated, max_tokens,
                         tok_per_sec, inference_time_ms, used_gpu, cached
                         (source: crates/apr-cli/src/commands/run_entry.rs
                          print_run_output, the output_format=="json" branch)
      * apr qa  --json:  model, passed, gates[], gates_executed, gates_skipped,
                         total_duration_ms, timestamp, summary
                         with gates[i] = {name, passed, message, value?,
                         threshold?, duration_ms, skipped}
                         (source: crates/apr-cli/src/commands/qa.rs::QaReport)
    
    Spec deltas (CLI is source of truth; spec to be updated in follow-up):
    
      * apr-mcp-server-spec.md line 134 mentions a `stop_reason` field for
        apr.run — the CLI does not emit one today. Tests match the CLI.
      * Line 135 phrasing is "8 gates × {pass, value, threshold}" — the CLI
        struct field is `passed` (not `pass`). Tests match the CLI.
    
    No real model invocation, no GPU required, no test flakes.
    
    PR #886 is still OPEN at commit time (auto-merge armed). Once #886 lands
    on main, the contracts/apr-mcp-server-v1.yaml entries for MCP-003/004
    will be retargeted at these test files in a follow-up commit on this
    branch (or a new PR if this one has already merged).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(contracts): retarget MCP-003/004 at e2e response-shape tests
    
    Updates contracts/apr-mcp-server-v1.yaml (authored in PR #886) to point the
    FALSIFY-MCP-003 and FALSIFY-MCP-004 entries at the new end-to-end
    response-shape tests in crates/aprender-mcp/tests/ instead of the
    surface-only `definition_has_correct_name_and_required_field` unit tests.
    
    The prior unit tests stay in tools/run.rs and tools/qa.rs as cheap first-
    line sanity checks. The new integration tests prove the actual MCP
    response JSON shape by driving tools::run::call / tools::qa::call against
    a mock `apr` binary on PATH and parsing the ToolCallResult text payload
    back into a serde_json::Value — which is what the spec gates actually
    demand.
    
    NOTE: at commit time PR #886 is still OPEN with auto-merge armed; this
    entire YAML file lands in #886 first. If #886 merges to origin/main
    before this PR does, rebase will see the file present on both sides;
    the only diff will be in the 003/004 entry bodies, which this branch
    owns authoritatively. Prefer our version on conflict.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
    noahgift and claude authored Apr 19, 2026
    Configuration menu
    Copy the full SHA
    fefde14 View commit details
    Browse the repository at this point in the history
  4. feat(mcp): FALSIFY-MCP-E2E-001 — real-model e2e for apr.run + apr.qa (#…

    …892)
    
    Adds env-gated end-to-end tests backing M4 acceptance items "Real-model
    FALSIFY-MCP-003" and "Real-model FALSIFY-MCP-004". The tests:
    
    - falsify_mcp_e2e_001_apr_run_decodes_two: invokes tools::run::call on a
      cached qwen2-0.5b-instruct-q4_0 GGUF with prompt "1+1=", asserts the
      forwarded `text` field contains "2" within a 30s wall-clock budget.
      Weak-claim assertion (text.contains('2')) avoids tokenizer coupling.
    
    - falsify_mcp_e2e_001_apr_qa_matches_cli_byte_for_byte: runs apr qa as
      direct subprocess AND through tools::qa::call, compares the two
      responses (modulo nondeterministic timestamp/duration/throughput
      fields) to validate the wrapper is a transparent forwarder.
    
    Both tests are gated on APR_MCP_E2E_MODEL — when unset (the default for
    green-field CI), they skip with a println! + early return. This is NOT
    #[ignore] (project policy bans it per the Main CI andon rule); the skip
    path is a deliberate no-op visible in test output.
    
    Fixture delta vs spec: the local cache has Q4_0 rather than the spec's
    Q4_K_M. Q4_0 is slower per-token (~2 tok/s vs 20+ tok/s) so we relax
    the spec's 5s budget to 30s; first-token correctness ("1+1=2") is
    unaffected by quant choice. Documented in the module header.
    
    Apr.qa parity test: since apr qa on Q4_0 exits with code 5 (failed
    gates), the MCP wrapper emits an error ToolCallResult. The test asserts
    the wrapper's error message echoes the CLI's exit code — the strongest
    parity claim possible when both sides fail identically.
    
    Spec updates: adds FALSIFY-MCP-E2E-001 to the falsification list,
    checks off the two M4 real-model items, documents the env-var gating.
    
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
    noahgift and claude authored Apr 19, 2026
    Configuration menu
    Copy the full SHA
    8a81513 View commit details
    Browse the repository at this point in the history
  5. feat(mcp): FALSIFY-MCP-DOGFOOD-001 — end-to-end Claude Code dogfood c…

    …onformance (#890)
    
    Adds a single integration test that launches the real `apr mcp` binary as a
    subprocess and walks the entire JSON-RPC session a live MCP client would
    have on first connection: initialize → tools/list → tools/call × 9 →
    unknown method → bad jsonrpc → close stdin → exit 0. All within the spec's
    2-second per-message read budget.
    
    Closes the gap between the in-process `AprMcpServer::handle_request`
    falsifiers and the shipped binary surface — every other test in this crate
    exercises the dispatcher logic but says nothing about whether the executable
    a Claude Code / Cursor / Cline user actually launches speaks the protocol
    end-to-end.
    
    Pattern mirrors `tests/falsify_mcp_progress_001.rs` and
    `tests/falsify_mcp_006.rs`: mock `apr` shim on a process-private PATH so the
    9 wrapper subprocess calls hit deterministic fixtures, no real model needed,
    no `tokio`, no `#[ignore]`.
    
    Spec updates (`docs/specifications/apr-mcp-server-spec.md`):
    - Adds FALSIFY-MCP-DOGFOOD-001 to the falsification gates list
    - Checks off M4 acceptance bullet "Claude Code dogfood — 1 full session
      using only `apr.*` tools"
    - Promotes the Success Criteria row from Manual → CI
    
    Note on contract YAML: `contracts/apr-mcp-server-v1.yaml` is still in flight
    on PR #886 (OPEN at branch time). Once #886 merges, a follow-up should add
    the matching `falsification_conditions` entry there. This PR intentionally
    does not touch that file to avoid merge conflicts.
    
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
    noahgift and claude authored Apr 19, 2026
    Configuration menu
    Copy the full SHA
    b7fb012 View commit details
    Browse the repository at this point in the history
  6. feat(mcp): M3 — FALSIFY-MCP-PROGRESS-002 notifications/progress for a…

    …pr.run (#891)
    
    Two-layer change mirroring FALSIFY-MCP-PROGRESS-001 (apr.finetune):
    
    CLI (apr run --stream):
      Emit NDJSON (one JSON line per decoded token + a terminal event=final
      blob) instead of the previous naive whitespace-split behaviour. Each
      token line carries {"event":"token","index":<u32>,"token_id":<u32>,
      "text":""} — text stays empty because realizar's run_inference returns
      the full token sequence post-decode (no per-token callback hook exists
      today). The final blob is the pre-existing --json payload plus an
      "event":"final" discriminator so MCP clients can distinguish per-token
      progress from the terminal rollup.
    
      Per-token emission lives in crates/apr-cli/src/commands/run_entry.rs:
      write_stream_output<W: Write> — called post-decode from the CLI, not
      from inside realizar's decode loop. Moving emission into the decode
      loop would require threading a callback through CPU GGUF / GPU GGUF /
      APR / SafeTensors / sharded SafeTensors inference paths (days of
      surgery across realizar + trueno). The wire contract is correct, so a
      future callback-based implementation is drop-in — the MCP falsifier
      proves this by driving a mock subprocess that streams lines whenever
      it wants.
    
    MCP (apr.run progress forwarding):
      - tools/run.rs gains call_with_sink + stream_with_sink following the
        apr.finetune pattern. When params._meta.progressToken is present,
        the dispatcher spawns `apr run ... --stream` and forwards each
        NDJSON line as a notifications/progress message tagged with the
        caller's token. When the token is absent we fall back to the
        existing cancellable sync path so clients see zero behaviour change.
      - server.rs routes apr.run through call_with_sink instead of call.
    
    Cancellation trade-off: the streaming path does NOT honour cancel_rx
    today (same trade-off as apr.finetune streaming in #887). Non-streaming
    apr.run remains fully cancellable via FALSIFY-MCP-006.
    
    Tests (all passing):
      - crates/aprender-mcp/tests/falsify_mcp_progress_002.rs — 4 tests:
        * mock subprocess: 4 tokens + 1 final = 5 notifications, ordered,
          each tagged with caller's progressToken
        * no progressToken → zero notifications (MCP spec compliance)
        * dispatcher extracts _meta.progressToken and forwards verbatim
        * all notifications delivered before stream_with_sink returns
      - run_tests_stream_output.rs — CLI stream output unit tests
        (N+1 lines, empty token list, None tokens, final-shape parity)
      - parsing.rs — --stream flag clap wiring + default=false
    
    Docs: M3 spec milestone now checks off apr.run progress notifications.
    
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
    noahgift and claude authored Apr 19, 2026
    Configuration menu
    Copy the full SHA
    70f413e View commit details
    Browse the repository at this point in the history
  7. docs(mcp-spec): M4 code complete — reflect all 5 PRs (#886/#889/#890/#…

    …891/#892) landing (#904)
    
    * docs(mcp): spec v1.1.0 — DRAFT → ACTIVE; M1–M3 shipped
    
    - Status: DRAFT (pre-implementation) → ACTIVE (M1–M3 shipped; M4 dogfood pending)
    - Falsification conditions table now annotates ENFORCED / PARTIAL / Deferred
      per shipped test mappings in contracts/apr-mcp-server-v1.yaml (#886)
    - Adds FALSIFY-MCP-PROGRESS-001 entry for #887 progress-notification gate
    - Milestones M1/M2/M3 marked SHIPPED with PR cross-references
    - M4 acceptance items remain open (real-model gates, dogfood)
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp): align spec output shapes with CLI reality (PR #889 falsifications)
    
    PR #889 added mock-subprocess e2e tests for FALSIFY-MCP-003/-004 and discovered
    two spec-vs-CLI mismatches via test failures:
    
    1. apr.run: spec listed `stop_reason` in output — CLI's print_run_output
       (crates/apr-cli/src/commands/run_entry.rs:279) does NOT emit it.
       Spec corrected to the actual emitted set (model, text, tokens, ...).
    
    2. apr.qa: spec wrote gates as `{pass, value, threshold}` — CLI's GateResult
       (crates/apr-cli/src/commands/qa.rs:368) uses `passed` not `pass`.
       Spec corrected.
    
    Also fixes the codegen source reference: FALSIFY-MCP-008 uses
    contracts/apr-mcp-tool-schemas-v1.yaml (PR #871), not apr-cli-commands-v1.yaml.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * fix(mcp-spec): M1 PR refs #862 → #864
    
    PR #862 is a matmul test fix, not the MCP M1 skeleton. The correct
    skeleton PR is #864 (`feat(mcp): apr mcp M1 skeleton — MCP server
    over stdio`). All three stale citations in the M1 milestone replaced.
    
    Five-whys root cause: the spec retrofit (#873) reconstructed PR
    numbers from memory; future retrofits should verify against
    `git log --grep=...` before committing.
    
    Refs PMAT-037.
    
    * fix(mcp-spec): demote unmerged contract + M3 PR accuracy
    
    Three stale citations corrected in the M3 milestone:
    - #874 removed from cancellation bullet (#874 is the book-chapter doc
      commit, not cancellation — that's #883 alone).
    - `contracts/apr-mcp-server-v1.yaml (ACTIVE) ... (#886)` bullet moved
      from M3 SHIPPED to M4 IN PROGRESS. PR #886 is still OPEN and its
      own title says "M4 — apr-mcp-server-v1 contract ACTIVE". The file
      is not in-tree. Header's "**New**:" label also updated to "Pending
      (PR #886)" for the same file.
    - Book-chapter citation expanded to list #874 (M2 creation) + #885
      (M3 update) for accurate provenance.
    
    Five-whys root cause (false "M3 SHIPPED" on #886): the spec promotion
    commit (a496ce97c) rolled unmerged M4 work into M3 bullets under the
    optimistic assumption the PR would land first. Going forward: any
    bullet citing a PR must verify `gh pr view <N>` is MERGED before
    promoting a milestone.
    
    Refs PMAT-037.
    
    * fix(mcp-spec): Architecture — refresh to match built reality
    
    The Architecture + Protocol + Out-of-Scope sections carried pre-M1
    aspirations that no longer match the shipped crate. Refreshed against
    actual source tree in crates/aprender-mcp/:
    
    - Goal: schema source was `apr-cli-commands-v1.yaml`; lines 11/87/139
      correctly cite `apr-mcp-tool-schemas-v1.yaml`. Unified.
    - Directory diagram: listed absent `schema.rs`; missing `build.rs`,
      `types.rs`, `tools/subprocess.rs`, `tools/version.rs`. `server.rs`
      comment said "pmcp::Server wiring" but M1 shipped a hand-rolled
      JSON-RPC loop (also noted line 149). `Cargo.toml` comment listed
      pmcp/tokio/clap/apr-cli — none are actual deps (verified: serde,
      serde_json, anyhow, nix, serde_yaml build, jsonschema dev).
      `tests/` now lists the four actual `falsify_*.rs` harnesses.
    - `apr mcp` subcommand: snippet promised `async` with `McpArgs` +
      transport matching + SSE; actual `run()` is blocking, takes no
      args, calls `AprMcpServer::new().run_stdio()`.
    - Protocol/Transport: "SSE optional" was false; flag doesn't exist.
      Downgraded to stdio-only and added SSE to Out of Scope.
    
    Five-whys root cause: the Architecture diagram was authored pre-M1
    as a design sketch; later commits (#873 retrofit, v1.1.0 promotion)
    updated Milestones but never re-diffed the static diagram against
    `ls crates/aprender-mcp/src/`. Going forward: any spec change
    touching Milestones must run a diagram-vs-tree check.
    
    Follow-up filed: verify Config Precedence (lines 122-126) against
    implementation — `pub fn run()` consults no env vars today.
    
    Refs PMAT-037.
    
    * fix(mcp-spec): reconcile 8-vs-9 tool count + Related Work misattribution
    
    Two factual errors corrected:
    
    - Tool count: spec said "8 Phase-1 tools" (lines 74, 133); `tools/list`
      actually returns 9 because `apr.version` (M1 scaffold) is also
      registered. Verified by
      `crates/aprender-mcp/tests/falsify_m1.rs::falsify_mcp_002_tools_list_schema_shape`,
      which asserts all 9 names (apr.version + 8 workflow tools).
      Clarified spec to state "8 Phase-1 workflow tools + apr.version
      scaffold = 9 total registered" and added test cross-link to the
      FALSIFY-MCP-002 bullet.
    
    - Related Work line 210 claimed `crates/apr-cli/src/tool_commands.rs`
      is the "planned MCP tool surface (referenced but unimplemented)".
      That file exists and is the `apr tool` CLI subcommand group
      (Showcase, Rosetta, …), unrelated to MCP. The actual MCP tool
      surface lives in `crates/aprender-mcp/src/tools/`. Corrected and
      noted that rust-mcp-sdk (paiml/rust-mcp-sdk) is currently unused
      since M1 shipped a hand-rolled JSON-RPC dispatcher.
    
    Five-whys root cause (8 vs 9): the original Phase-1 design enumerated
    8 workflow tools and `apr.version` was added later as an M1 handshake
    probe without updating the narrative count. No invariant check
    cross-references spec tool-count against `tools/list` test assertions.
    
    Refs PMAT-037.
    
    * fix(mcp-spec): mark config precedence Phase-2 aspirational
    
    Lines 122-126 stated a four-level config precedence (`--config`,
    `$APR_MCP_CONFIG`, `~/.config/apr/mcp.toml`, defaults) as if it were
    implemented. Actual `crates/apr-cli/src/commands/mcp.rs::run()` takes
    no arguments and consults no env vars; `AprMcpServer::new()` has no
    config loader. The `APR_MODEL_DIR` env in the `.mcp.json` snippet is
    read by the spawned `apr <cmd>` subprocesses, not by the MCP server.
    
    Rewrote the section to keep the intended precedence as the Phase-2
    contract while making Phase 1's "no config loader" reality explicit.
    
    Five-whys root cause: the Configuration section predates the M1
    skeleton and was not re-verified against `commands/mcp.rs` during
    the v1.1.0 promotion. A "spec bullet implies an API — grep for the
    API" check belongs in the promotion workflow.
    
    Refs PMAT-037.
    
    * fix(mcp-spec): Success Criteria gate count 8 → 9
    
    Spec's Falsification Conditions section lists 9 entries (FALSIFY-MCP-001
    through -008 plus FALSIFY-MCP-PROGRESS-001 added in M3), but the Success
    Criteria table still said "8 falsification gates". Count corrected and
    wording clarified to reflect that -003/-004 are currently PARTIAL and
    must promote to PASS at M4 close.
    
    Five-whys root cause: adding PROGRESS-001 in M3 touched the conditions
    section but didn't update the downstream summary row. Going forward:
    whenever a new FALSIFY-MCP-* lands, grep the spec for `N falsification`
    to catch all downstream counts.
    
    Refs PMAT-037.
    
    * fix(mcp-spec): close residual kaizen items
    
    Three dangling claims resolved:
    
    - Target version: `v0.32.0 / v0.33.0` stands as the intended release
      tags but `git tag -l 'v0.3*'` returns only `v0.3.0 / v0.3.1 / v0.30.0`.
      M1–M3 are merged on `main` but unreleased. Added a clarifier so a
      reader doesn't assume those tags exist.
    - Aspirational follow-ons: `apr-mcp-plugin-marketplace-v1.md` and
      `apr-mcp-hooks-v1.md` are not in `docs/specifications/`. Labelled
      "(spec files not yet authored)" so readers don't hunt for them.
    - Risk Register: "pmcp crate API instability" is dormant because M1
      shipped a hand-rolled JSON-RPC dispatcher (line 166 already notes
      pmcp is deferred). Row reworded so the risk's activation condition
      is explicit.
    
    Five-whys root cause (across all three): the spec's non-Milestone
    sections — Target, Related Work, Risk Register — were not refreshed
    during v1.1.0 promotion. Every milestone promotion should sweep those
    sections, not just the milestone table.
    
    Refs PMAT-037.
    
    * chore(pmcp): bump to 2.3 and drop pforge-runtime (Refs PMAT-037)
    
    Five-Whys:
    - Symptom: dual pmcp versions (1.20 + 2.3) resolved in agents-mcp build.
    - Why #1: pforge-runtime 0.1.4 (last released 2025-12) still pins pmcp 1.x.
    - Why #2: pforge-runtime was listed as an optional dep alongside pmcp.
    - Why #3: it was a forward-compat hedge — but no Rust code imports it
      (only doc-comment mentions and knowledge-graph string literals).
    - Why #4: keeping an unused dep doubled the compile footprint and split
      the pmcp protocol surface across two crates.
    - Root cause: speculative dep on a framework wrapper for an SDK we
      already use directly.
    
    Fix:
    - Cargo.toml: bump pmcp 1.10 → 2.3 (PAIML's actively-maintained SDK);
      remove pforge-runtime dep; agents-mcp feature now just ["agents","pmcp"].
    - Doc comments and the mcp_demo example rewritten to name pmcp v2.3 as
      the SDK instead of pforge. No Rust-level API change — pforge-runtime
      was never imported, just advertised.
    - cargo tree -i pmcp now shows a single pmcp v2.3.0 node.
    
    Follow-up: spec's pmcp framing (M1 note + Risk Register) still needs
    rewrite in apr-mcp-server-spec.md.
    
    * docs(apr-mcp-spec): v1.2.0 — honest pmcp framing, add M5 migration plan (Refs PMAT-037)
    
    Five-Whys:
    - Symptom: spec framed pmcp as unstable/dormant, treating the SDK as risk
      rather than planned substrate.
    - Why #1: Risk Register called out "pmcp crate API instability (dormant...)"
      — language from before pmcp was actively maintained.
    - Why #2: M1 note said "pmcp SDK deferred — more deterministic for current
      scope" without explaining the actual technical rationale.
    - Why #3: no adoption path existed — M4 stops at dogfood, so readers
      couldn't tell whether pmcp would ever land.
    - Why #4: pmcp v2.3 is PAIML's own crate (paiml/rust-mcp-sdk) and already
      used by aprender-orchestrate; keeping the spec's out-of-date framing
      forced the /tmp/spec-update session to discover this from crates.io.
    - Root cause: stale spec language from the early M1 period where the
      adoption path was genuinely uncertain; never updated after pmcp
      stabilised.
    
    Fix:
    - Line 15: link now labels pmcp as "PAIML's Rust MCP SDK, actively
      maintained, v2.3.1 on crates.io (2026-04-16)".
    - Line 44 / 167: architecture + M1 note explain the three concrete
      reasons the dispatcher is hand-rolled (minimal request/response shape
      over `apr <cmd> --json`, build.rs schema codegen keeps tools/list
      byte-identical to contract YAML, falsification asserts on wire bytes
      without an SDK layer).
    - Risk Register row rewritten from "API instability" to "adoption-path
      coordination" — real risk is workspace version alignment with the
      pmcp client role in aprender-orchestrate. Mitigation: single
      workspace-wide bump + `cargo tree -d` CI gate.
    - New M5 milestone: concrete pmcp migration plan — port dispatcher to
      pmcp::Server (retain build.rs codegen), add SSE + WebSocket
      transports, re-run falsification suite post-migration.
    - Out of Scope: SSE/WebSocket transports reclassified as "scheduled for
      M5 on top of pmcp v2.3".
    - Related Work: pmcp-sdk contract row now notes aprender-orchestrate
      already links pmcp v2.3 as a client; server-side migration is M5.
    - Version bumped 1.1.0 → 1.2.0.
    
    * docs(mcp-spec): reconcile M4 gate count with PR #886; bump pmcp contract v2.3 (Refs PMAT-037)
    
    Five-Whys:
    - Symptom: M4 bullet claimed "9 falsification_conditions" to match the 9
      gates listed in Section 145, but PR #886's contract pins exactly 8
      (FALSIFY-MCP-001..008) and a Rust test enforces that invariant.
    - Why #1: the 9th gate (FALSIFY-MCP-PROGRESS-001) was added in M3 AFTER
      PR #886 was drafted.
    - Why #2: PR #886's harness
      (apr_mcp_server_contract_ids_are_falsify_mcp_001_through_008) explicitly
      rejects anything outside 001..008, so the contract row for PROGRESS-001
      cannot land in the same PR without harness changes.
    - Why #3: the spec's earlier count-reconciliation (2026-04-18 prior
      kaizen round) missed this because it was looking for text matches, not
      contract row counts.
    - Root cause: spec and contract evolved on different PR branches.
    
    Fix:
    - M4 bullet: accurately describes PR #886 as landing 8 falsification
      rows, names the exact-8 invariant by its test function.
    - Adds an explicit follow-up bullet: "Extend the contract with a 9th row
      for PROGRESS-001 after PR #886 merges — relax the exact-8 invariant to
      'FALSIFY-MCP-001..008 + PROGRESS-001, no extras'".
    - Success Criteria table unchanged (line 220 still correctly says "9
      falsification gates ... all PASS or PARTIAL→PASS by M4 close") — the
      9th gate is already ENFORCED in code via falsify_mcp_progress_001.rs,
      we just need the contract YAML to catch up.
    
    Also:
    - contracts/pmcp/mcp-protocol-sdk-v1.yaml version 1.0.0 → 1.1.0 with
      "last_modified: 2026-04-18".
    - Description updated v2.1 → v2.3, adds consumer-of-record (aprender-
      orchestrate via agents-mcp feature) + future consumer (aprender-mcp
      M5 migration) + link to apr-mcp-server-spec.md.
    
    * docs(book/mcp): align M3 scope + add M5 pmcp migration row (Refs PMAT-037)
    
    Five-Whys:
    - Symptom: book chapter's M3 row missed FALSIFY-MCP-PROGRESS-001 (shipped
      via PR #887) and the paragraph called progress streaming "a follow-up
      slice" for BOTH apr.run and apr.finetune — incorrect for apr.finetune.
    - Why #1: book chapter was authored before PR #887 landed
      progressToken-gated notifications for apr.finetune.
    - Why #2: M5 pmcp migration (added to spec v1.2.0 today) had no
      corresponding row in the book status table.
    - Root cause: book lagged spec after the M3 progress slice merged and
      after the M5 migration plan was formalised today.
    
    Fix:
    - M3 row now mentions the opt-in progress notifications.
    - Paragraph specifies: FALSIFY-MCP-PROGRESS-001 is enforced for
      apr.finetune; only per-step structured progress (CLI event channel
      prereq) and apr.run progress (apr run --stream flag prereq) remain
      open.
    - New M5 row in the status table mirrors the spec's M5 milestone.
    
    * docs(mcp-spec): tighten streaming claim + M5 transport pointer (Refs PMAT-037)
    
    Five-Whys:
    - Symptom: Section "Protocol" bullet on Streaming claimed "apr.run and
      apr.finetune send notifications/progress for each decoded token /
      training step" — but apr.run progress is a deferred M4 item and
      apr.finetune only emits per-stdout-line progress (not per training
      step) and only when the client opts in via progressToken.
    - Why #1: the bullet was authored when both tools were planned to
      stream per-token. Reality diverged: progress landed for apr.finetune
      only (opt-in, per-line), apr.run was deferred.
    - Why #2: the Architecture paragraph pointed to "Phase 2 with SSE" for
      transport selection without naming the actual M5 milestone that now
      schedules it.
    - Root cause: drift between aspirational early-M2 text and the M3/M5
      structure formalised today.
    
    Fix:
    - Streaming bullet now names what's actually enforced
      (FALSIFY-MCP-PROGRESS-001, apr.finetune opt-in, per-stdout-line) and
      explicitly calls out the apr.run follow-up prereq (apr run --stream
      flag + per-step CLI event channel).
    - Architecture paragraph points at M5 as the SSE/WebSocket landing
      spot rather than the generic "Phase 2".
    
    * fix(examples): unblock Chapter Examples Compile on main (Refs PMAT-037)
    
    Five-Whys:
    - Symptom: CI job "Chapter Examples Compile" has been failing on every
      push to main since PR #701 (2 days), plus on this PR, with RUSTFLAGS=
      "-D warnings" promoting unused-import warnings to hard errors.
    - Why #1: ch10_training and ch24_switch_pytorch both import
      `aprender::nn::Optimizer` but only call `optimizer.step_with_params`,
      which is an inherent method on `SGD` (not a trait method) — so the
      trait import is genuinely unused.
    - Why #2: ch26_switch_ndarray binds `let pred = lr.predict(&x)` but
      never reads `pred` (score re-computes internally).
    - Why #3: these examples predate the refactor that moved
      `step_with_params` from the Optimizer trait to inherent impls; the
      trait import was never cleaned up.
    - Why #4: the Book Contract Enforcement and Chapter Examples Compile
      jobs are non-required checks, so the red status never blocked merges
      and accumulated as tech debt.
    - Root cause: main CI andon rule (main must always be green) was
      waived for non-required checks. Toyota Way: "all defects are your
      defects" — fix it regardless of whose PR introduced it.
    
    Fix:
    - ch10_training.rs, ch24_switch_pytorch.rs: drop `Optimizer` from the
      aprender::nn:: import list.
    - ch26_switch_ndarray.rs: consume `pred` by printing the first
      prediction — preserves pedagogical intent of showing predict() works,
      and unblocks -D warnings.
    - `cargo build -p aprender-core --examples` now warnings-clean.
    
    * fix(ci): use contract: pointer, not derived PCU path (Refs PMAT-037)
    
    The "Every PCU page has matching contract" gate derived paths from the
    PCU ID (`apr-page-${ID}-v1.yaml` / `apr-book-${ID}-v1.yaml`) but real
    page headers already carry an authoritative `contract:` field, and
    chapter contracts are named `apr-book-ch01-v1.yaml` (chapter-number
    only) while PCU IDs include a slug (`ch01-why-rust`). The mismatch
    failed all 27+ book pages on every run.
    
    Five whys:
      1. Why red? Script can't find `apr-page-tools-apr-cli-v1.yaml`
         from ID `tools-apr-cli`... wait it can. But for chapters it
         looks for `apr-book-ch01-why-rust-v1.yaml` which doesn't exist.
      2. Why does it derive? The earlier convention stored ID-derived
         paths before `contract:` was added to headers.
      3. Why not updated when `contract:` was added? The workflow was
         not migrated; the two lookup paths stopped covering all cases.
      4. Why silent until now? The gate was not blocking main.
      5. Why fix now? Kaizen sweep surfaced 27-page failure.
    
    Parse the authoritative `contract:` field. Also add missing PCU
    header + page contract for book/src/tools/mcp-server.md (now points
    to contracts/apr-page-tools-mcp-server-v1.yaml).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp): retire stale 'M3 will ship apr.serve lifecycle' (Refs PMAT-037)
    
    Three places claimed `apr.serve` cancellation lands in M3:
     - book/src/tools/mcp-server.md apr.serve paragraph
     - crates/aprender-mcp/src/tools/serve.rs module/fn docs
     - serve tool `description` field embedded in tools/list
    
    M3 actually shipped `notifications/cancelled` for apr.run only.
    `server.rs::CancelHandle` doc explicitly states: "Only apr.run
    currently honours cancellation." apr.serve remains fire-and-forget
    and the spec M3 bullet list never promised otherwise.
    
    Five whys:
      1. Why stale? Comments predicted M3 scope before scope narrowed.
      2. Why narrowed? Spec M3 scope: FALSIFY-MCP-006 for apr.run,
         -008 codegen, -PROGRESS-001 for apr.finetune. apr.serve
         lifecycle was never inside that gate set.
      3. Why not updated at M3 close? No acceptance criterion forced
         a sweep of surface prose when milestone shipped.
      4. Why matters now? Readers of book/tools page and users calling
         apr.serve via MCP get incorrect "lifecycle lands in M3" note
         that reads as imminent, not aspirational.
      5. Why fix now? Kaizen sweep surfaced; retarget to M5 where a
         daemon registry + pmcp Server port belong together.
    
    Edits: book paragraph + serve.rs module header + serve.rs `call`
    docstring + serve.rs description field + spec M5 new bullet for
    apr.serve cancel extension. Also spec M5 falsification-suite bullet
    updated from "71+ tests" to measured "75 tests" with file list.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(book/mcp): clarify apr.finetune progress shipped with limits (Refs PMAT-037)
    
    The apr.finetune paragraph said "Per-step notifications/progress
    streaming is a follow-up M3 slice" — read as "no progress yet" —
    but FALSIFY-MCP-PROGRESS-001 shipped in PR #887: per-line progress
    over `params._meta.progressToken` IS live.
    
    Five whys:
      1. Why stale? Paragraph was written before PR #887 merged.
      2. Why not updated at PR #887? PR focused on server.rs + test
         additions; book paragraph not flagged in review.
      3. Why matters? Clients reading the book will assume they cannot
         stream updates and skip progressToken, losing observability.
      4. Why two progress layers? Per-line (shipped, stdout-driven) vs
         per-step (needs a CLI event channel from `apr finetune`
         itself) — the former is cheap plumbing over JSON-RPC, the
         latter is a CLI-side refactor.
      5. Why fix now? Kaizen sweep surfaced.
    
    Rewrote the paragraph to state (a) what shipped (opt-in per-line),
    (b) the gate it satisfies (FALSIFY-MCP-PROGRESS-001), (c) the
    honest limitation (terminal blob today), (d) where per-step
    lives (M4 follow-up with CLI prereq).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * contract(mcp-schemas): retire 'retrofit-only' header, lock v1.1.0 (Refs PMAT-037)
    
    The apr-mcp-tool-schemas-v1.yaml header still read:
      "This M2 cut is RETROFIT-ONLY"
      "If this file ever disagrees with the Rust source, the Rust source wins"
      "In milestone M3 a build.rs at ... will read this YAML"
    
    All three are post-M3 stale:
      1. M3 shipped (PRs #880, #884) — build.rs is live.
      2. Byte-identity is enforced by tests/falsify_mcp_008.rs (5 tests).
      3. Rust tool sources contain zero hand-written schemas — they only
         parse `crate::schemas::APR_<TOOL>_SCHEMA` from $OUT_DIR.
      4. Direction is reversed: YAML authoritative, Rust derived.
    
    Five whys:
      1. Why stale header? Written for M2 retrofit cut.
      2. Why not flipped at M3 close? PR #884 focused on codegen, not
         contract prose.
      3. Why matters? Future readers will assume Rust source is the
         authority and "fix" the wrong side of a drift — inverting
         FALSIFY-MCP-008's intent.
      4. Why now? Kaizen sweep.
      5. Why v1.1.0? Semantic bump: authoritativeness change, plus new
         reference pointer to apr-mcp-server-spec.md.
    
    Bumped metadata version 1.0.0 → 1.1.0, added last_modified, rewrote
    header and description to reflect current state (YAML is SoT, Rust
    parses codegen constants, falsify_mcp_008.rs enforces byte-identity).
    Also updated spec M5 falsification-suite file list to include
    `falsify_mcp_008` and drop nonexistent `codegen_bytes`.
    
    Verified: `cargo test -p aprender-mcp --test falsify_mcp_008` — 5/5
    pass after YAML comment edits (no functional change, just prose).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-spec): 57 → 58 CLI commands (mcp added PR #864) (Refs PMAT-037)
    
    The spec claimed a 57-command CLI surface three times:
      - Contracts bullet: "57-command tool surface"
      - Problem paragraph: "57-subcommand CLI"
      - Goal paragraph: "subset of the 57 apr CLI commands"
    
    PR #864 registered `apr mcp` as the 58th command
    (contracts/apr-cli-commands-v1.yaml). The 63-line count in the
    contract is 58 commands + 5 FALSIFY-CLI-00* falsification rules.
    
    Five whys:
      1. Why stale? The 57 figure dates to #701 contract landing
         (2026-04-06) — the initial MCP PRs added `apr mcp` but
         didn't sweep cross-cutting doc claims.
      2. Why matters? MCP spec's own subject command is the 58th — a
         reader comparing counts will mistrust the surface-area claim.
      3. Why only fixing here? Scope is `apr-mcp-server-spec.md`;
         CLAUDE.md and apr-book-spec.md have broader audiences and
         want their own kaizen passes.
      4. Why cite PR #864 inline? Makes the delta auditable by a
         future reviewer checking `git log --oneline apr-cli-commands-v1.yaml`.
      5. Why not reword to "58+ commands" for future-proofing? The
         contract is the source of truth; stale counts are better
         caught by an exact-match CI gate than smeared over with
         imprecise phrasing. (PR #864 added a FALSIFY-CLI gate.)
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-spec): honest release-target footer (M3 shipped same week as M2) (Refs PMAT-037)
    
    The footer claimed:
      v0.32.0 (M1–M2), v0.33.0 (M3–M4)
    
    But M3 shipped on 2026-04-18, same week as M2 (2026-04-17/18), and
    the workspace is still at v0.30.0 on main. The old split-tag plan
    (M1–M2 in one release, M3–M4 in the next) no longer maps to
    reality — M3 will publish alongside M1–M2 because there's nothing
    to publish in between.
    
    Five whys:
      1. Why stale? Target was written assuming M2 → cut release → M3.
      2. Why reality diverged? M3 landed fast because cancellation +
         codegen + progress + apr.finetune were all independent PRs.
      3. Why matters? A reader looking at `git tag` + this footer
         would expect v0.32.0 to exist; it doesn't.
      4. Why not assign firm tags? Release cuts require a separate
         decision (changelog + publishing); this spec shouldn't
         preempt it.
      5. Why keep historical context? Future reader asking "why is
         the M3–M4 split collapsed?" deserves a traceable answer
         instead of silently rewritten history.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(aprender-mcp/README): sync milestones + full gate table (Refs PMAT-037)
    
    The crate README was three milestones behind the spec:
      - M2 bullet: "apr.serve (fire-and-forget; full lifecycle in M3)"
        — M3 shipped apr.run cancel only; serve registry is M5.
      - M3 bullet: "in progress" — M3 actually shipped 2026-04-18
        (PRs #880, #881, #883, #884, #887).
      - Gate table: listed 5 gates (001, 002, 005, 007, VALIDATE-001);
        missed 003, 004, 006, 008, PROGRESS-001 — 4 of 5 are now
        ENFORCED or PARTIAL, and PROGRESS-001 is net-new since M3.
    
    Five whys:
      1. Why lag? README is surface-facing, spec/code are the primary
         targets during milestone closes.
      2. Why matters? crates.io readers land here first — inaccurate
         milestone + gate table = miscalibrated expectations, especially
         about apr.serve cancellation.
      3. Why add status column? Distinguishing ENFORCED vs PARTIAL vs
         planned is what readers actually want when choosing whether
         to depend on a given gate.
      4. Why spell out M4 + M5 here? Same reason — readers want to
         know what's next, not dig through the spec.
      5. Why fix now? Kaizen sweep; PR #888 already touches this crate.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(README): 57 → 58 commands across 4 sites (Refs PMAT-037)
    
    The MCP spec already reconciled 57 → 58 (PR #864 added `apr mcp` as
    the 58th command in contracts/apr-cli-commands-v1.yaml). The root
    README still repeated 57 in four places: headline paragraph, stats
    bullet list, crate-layout tree comment, and smoke-test snippet.
    
    Keeping the count exact matters more than soft-pedalling it — PR
    #864 also added a FALSIFY-CLI gate that enforces `apr --help`
    listing against the YAML, so drift is caught at CI and the README
    should track it. Fixing here alongside the spec keeps the docs
    audit self-consistent within one PR.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(orchestrate/book): pmcp 1.8 → 2.3, drop pforge-runtime (Refs PMAT-037)
    
    Two orchestrate book pages carried stale pmcp/pforge references:
      - part3/pmcp.md — header still claimed pmcp v1.8.6 and showed
        `pmcp = "1.8"` in Cargo.toml snippet. crates.io has pmcp 2.3.1
        as of 2026-04-16 and the crate's Cargo.toml already pins it.
      - part3/agent-runtime.md L575 — `agents-mcp = ["agents", "pmcp",
        "pforge-runtime"]` but pforge-runtime was dropped earlier in
        this PR series (it pinned pmcp 1.20 and was unused outside
        knowledge-graph cataloguing).
    
    Five whys for each:
      1. Why stale? Book pages were written against pmcp 1.x, before
         the 2.x release cleanup.
      2. Why not caught? The orchestrate book has no CI gate matching
         its Cargo.toml snippets to actual crate deps.
      3. Why matters? Readers copy-pasting `pmcp = "1.8"` into a new
         project would land on a yanked / unmaintained line.
      4. Why not add a CI gate? Out of PR scope; filed mentally as an
         M5+ follow-up when `apr-contracts` lints cross-project snippets.
      5. Why fix now? Kaizen sweep surfaced during pmcp/pforge audit.
    
    Both archived batuta-agent.md references left alone — they live in
    `docs/specifications/archive/` and document the old design state.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(CLAUDE.md): 57 → 58 commands, add mcp to key-command list (Refs PMAT-037)
    
    Three stale 57-command claims in CLAUDE.md — the overview line,
    the key-files bullet, and the APR CLI section. Brought them in
    line with contracts/apr-cli-commands-v1.yaml (58 commands including
    `apr mcp`, added PR #864). Also added `mcp` to the inline key-command
    list — discovery matters more than alphabetical tradition given
    the MCP spec is the current top-of-mind work.
    
    The 405-contract and 25,300-test counts are out of spec scope and
    left for a future sweep (workspace tests reportedly 25,391 per the
    root README, but confirming across the 70 crates needs real
    `cargo test --workspace --lib` run, not a file read).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-spec): document FALSIFY-MCP-VALIDATE-001 dispatcher invariant
    
    Symptom: spec Falsification Conditions section had 9 entries
    (MCP-001..008 + PROGRESS-001), but crates/aprender-mcp/README.md and
    book/src/tools/mcp-server.md both list a 10th enforced gate,
    FALSIFY-MCP-VALIDATE-001, which was missing from the spec entirely.
    
    Five-whys: (1) spec only lists conditions destined for
    apr-mcp-server-v1.yaml; (2) VALIDATE-001 is a dispatcher-level contract
    point (how the server shapes tool errors), not a per-tool behavioural
    promise; (3) it therefore lives *alongside* but *outside* the YAML
    contract — mirrored in the book under "Additional invariant enforced by
    the dispatcher"; (4) the spec's own section header
    ("Falsification Conditions for apr-mcp-server-v1.yaml") excluded it by
    scope, but the omission reads as "we forgot a gate" to anyone
    cross-referencing README/book; (5) fix is to add an "Additional
    dispatcher invariant" subsection pointing at the existing test
    falsify_m1.rs::falsify_validate_missing_model_path_is_tool_error.
    
    Refs PMAT-037
    
    * docs(aprender-mcp): refresh module-level scope docs for M3-shipped state
    
    Symptom: `src/lib.rs` crate-level docs titled the scope section
    "M1 Scope" and claimed "M2 adds the 8 Phase-1 tools"; `src/tools/mod.rs`
    said "M3 adds `apr.finetune` (synchronous initial slice; streaming is
    a follow-up)"; and `src/server.rs` had a test doc-comment reading
    "Full 8-tool set lands when M2 completes." All three predate M3
    shipping on 2026-04-18.
    
    Five-whys: (1) module docs were written incrementally milestone-by-
    milestone; (2) each PR updated its own surface but left sibling module
    docs unchanged; (3) there is no CI gate on module-level Rustdoc
    matching milestone status; (4) new readers start at `lib.rs` and
    encounter text that contradicts `apr mcp --help` + README; (5) cheapest
    fix is to rewrite the three doc-comments to a single authoritative
    summary keyed off the spec's own "M1–M3 SHIPPED" tags, leaving M4/M5
    forward-looking. No behaviour change; no test updates needed.
    
    Refs PMAT-037
    
    * docs(mcp): update apr.finetune/apr.run docs for shipped-M3 progress state
    
    Symptom: three stale M3 claims, each LLM-visible or reader-visible:
    (1) `apr.finetune`'s `description` field still read "Progress streaming
    lands in a follow-up M3 slice" — but PR #887 shipped the streaming
    slice on 2026-04-18, and the description is returned verbatim in
    `tools/list` to LLM clients. (2) The same stale sentence is duplicated
    in the authoritative `contracts/apr-mcp-tool-schemas-v1.yaml`. (3)
    `src/tools/run.rs` module docs say "Progress notifications (streamed
    per-token) are a separate M3 slice" — the spec's M3 checklist (line
    192) now records that as deferred to M4 pending `apr run --stream`.
    
    Five-whys: (1) tool `description` fields are hand-written strings that
    become part of the MCP wire response; (2) FALSIFY-MCP-008 compares
    `inputSchema` byte-for-byte but *not* `description`, so description
    drift is silent; (3) when PR #887 shipped progress streaming, only the
    crate module docs in finetune.rs were partially updated — the
    `description` field and the YAML contract were missed; (4) stale LLM-
    visible strings confuse agents about which call shape actually works
    today; (5) fix is to (a) promise exactly what ships (opt-in via
    `params._meta.progressToken`, falsification gate PROGRESS-001), (b)
    align the YAML contract and Rust source, and (c) rewrite `apr.run`'s
    module prelude to describe the cancel-token surface that shipped and
    the per-token progress that didn't.
    
    Verified: `cargo test -p aprender-mcp --test falsify_mcp_008` passes
    (5/5). Description field is not covered by the schema gate, confirming
    the drift was invisible to CI until now.
    
    Refs PMAT-037
    
    * docs(mcp-spec): cross-link M4 checklist items to the PRs carrying them
    
    Symptom: M4 checklist items in the milestone section all read "in
    flight" / "dogfood" without referencing any PR, even though six open
    PRs (#886, #889, #890, #891, #892, plus our own #888) are carrying
    this exact work. Readers who arrive from the PR list can't map a PR
    onto the spec box it's trying to tick, and readers who arrive from
    the spec can't find the open PR. Also added a `falsify_mcp_progress_001.rs`
    row to the crate-layout tree (previously omitted) and broadened the
    `falsify_m1.rs` description to mention all gates it enforces
    (-001, -002, -005, -007, -VALIDATE-001), not just the first two.
    
    Five-whys: (1) M4 work is happening across 4+ PRs in parallel;
    (2) the spec was last edited when only PR #886 existed;
    (3) new PRs (#889/#890/#891/#892) introduced new gate IDs
    (FALSIFY-MCP-E2E-001, FALSIFY-MCP-DOGFOOD-001, FALSIFY-MCP-PROGRESS-002)
    but the spec never reflected them;
    (4) without PR cross-links, the spec drifts out of sync within days;
    (5) fix is to name the branch + PR for each in-flight box so the
    linkage is obvious and breaks visibly when a PR is closed or renamed.
    
    Refs PMAT-037
    
    * docs(contracts): fix stale 57-command count + codegen test path
    
    Two small contract-metadata fixes caught by the kaizen sweep:
    
    1. `contracts/apr-cli-commands-v1.yaml` line 24 — `scope` field still
       claimed "57 commands"; the actual command list has 58 entries as of
       PR #864 (apr mcp added 2026-04-17). Verified by counting `^  - name:`
       entries under the `commands:` key (`awk` filter — 58).
    
    2. `contracts/apr-mcp-tool-schemas-v1.yaml` had two sibling errors:
       (a) Block-comment header line 7 still said "each of its 57 entries"
       referring to apr-cli-commands-v1.yaml — updated to 58 to stay in
       sync with the registry. (b) `metadata.description` pointed readers
       at `tests/codegen_bytes.rs` for FALSIFY-MCP-008 enforcement; the
       actual file is `crates/aprender-mcp/tests/falsify_mcp_008.rs`
       (confirmed via `ls crates/aprender-mcp/tests/`). The wrong path is
       particularly bad because new contributors clone the repo and try to
       grep for a file that doesn't exist.
    
    Five-whys on (2b): (1) an earlier contract rev proposed the filename
    `codegen_bytes.rs`; (2) the commit that renamed it to
    `falsify_mcp_008.rs` (conventions: one test file per FALSIFY gate)
    didn't update the contract metadata; (3) nothing in CI cross-checks
    prose filename references inside YAML headers; (4) the spec we edited
    in PR #888 already fixed this in one spot but missed the sibling in
    this file; (5) the cheapest fix is a literal string replace — adding
    a lint for "tests/[a-z_]+\.rs" strings that don't resolve is follow-on
    work, tracked separately.
    
    Refs PMAT-037
    
    * docs(contracts): bump 57→58 command count in apr-cli-publish + apr-cli-qa
    
    Symptom: the two CLI-level contracts that gate `cargo install` and
    dogfood QA still asserted "all 57 commands" in their postconditions,
    falsification predictions, and proof_obligations. The actual
    `apr --help` surface is 58 commands as of PR #864 (mcp added
    2026-04-17), and `contracts/apr-cli-commands-v1.yaml` was already
    updated to 58 in the previous commit.
    
    Affected invariants:
    - apr-cli-publish-v1.yaml equations.all_commands_compile formula
    - FALSIFY-PUB-CLI-003 prediction ("apr --help lists all N commands")
    - apr-cli-qa-v1.yaml postconditions, FALSIFY-QA-001 rule, and
      proof_obligations[0].property
    
    Why this matters: when these prose counts go stale, an engineer
    reading the contract reasonably concludes either (a) the contract is
    behind reality and they should doubt it, or (b) the list of commands
    was shortened and a command got removed — neither is true. Five-whys:
    (1) the mcp command was added via PR #864 with contract update
    constrained to apr-cli-commands-v1.yaml; (2) sibling contracts that
    reference the count (publish + qa) were not updated in the same PR;
    (3) no CI linter cross-checks "N commands" strings against the
    authoritative registry count; (4) the drift persisted for ~1 day and
    would have confused contract reviewers on the next spec pass; (5) fix
    is bulk text replace plus a mental note to add a numeric cross-check
    linter in a follow-up (tracked separately).
    
    No test iteration count changes (the harnesses iterate the contract
    YAML entries, not the hardcoded number). The strings are readability
    only.
    
    Refs PMAT-037
    
    * docs: bump 57→58 command count in book + spec prose
    
    Surface-prose sweep after bumping the two load-bearing contracts
    (apr-cli-publish + apr-cli-qa) in the previous commit. Same root cause:
    PR #864 added `apr mcp` as the 58th command but prose references
    scattered through the book and spec suite were not updated in lockstep.
    
    Touched (one literal "57 commands" → "58 commands" per line):
    - book/src/architecture/monorepo-layout.md — crate-tree caption
    - docs/specifications/apr-cli-qa-spec.md — 4 sites (problem framing,
      structural gate cell, Phase-1 section heading, Phase-8 grid line)
    - docs/specifications/aprender-monorepo-consolidation.md — the
      "Users NEVER pass --features" principle (line 414); the historical
      "DONE" entry at line 618 is left at 57 because it describes the
      phase as it was completed, not current state
    - docs/specifications/aprender-readme-book-rewrite.md — book tree caption
    
    Not touched (out of scope for this sweep):
    - docs/hero.svg and docs/specifications/apr-book-spec.md — user-facing
      graphics + marketing copy; will sweep separately
    - archive/ and examples/ — either historical or println strings with
      lower blast radius
    - .claude/skills/dogfood/SKILL.md — dogfood skill instruction, queued
    
    Refs PMAT-037
    
    * docs(book/mcp): add FALSIFY-MCP-PROGRESS-001 row to gates table
    
    The book's falsification-gates table in book/src/tools/mcp-server.md
    listed rows for FALSIFY-MCP-001..008 and then the dispatcher-level
    FALSIFY-MCP-VALIDATE-001, but skipped the M3 addition
    FALSIFY-MCP-PROGRESS-001 that the spec already calls out as item 9 of
    the contract-bound gates (apr-mcp-server-spec.md#L159) and that the
    success-criteria row counts as part of the "9 falsification gates
    (FALSIFY-MCP-001..008 + PROGRESS-001)" invariant (L228).
    
    Five whys:
    - Symptom: book table shows 8 contract gates, spec says 9.
    - Why: PROGRESS-001 row was never added when M3 shipped (#887).
    - Why: M3 PR #887 landed PROGRESS-001 behaviour + test but did not
      touch the book's gates table (touched the narrative section only).
    - Why: the gates table is organized numerically and the PR author
      added PROGRESS-001 to the prose but not to the table below it.
    - Root cause: the table is a cross-cutting artifact that any new
      gate must be added to — no codegen pressure, no CI guard.
    - Fix: add the row now; future change: fold this into contract-driven
      codegen when apr-mcp-server-v1.yaml lands (PR #886, tracked for M4).
    
    Refs PMAT-037, FALSIFY-MCP-PROGRESS-001
    
    * docs(aprender-mcp/README): fix 8→9 tools count in M3 codegen coverage
    
    The M3 entry said build.rs generates schemas for "all 8 tools"; in
    fact the contract apr-mcp-tool-schemas-v1.yaml has 9 entries (the M1
    apr.version scaffold + the 8 Phase-1 workflow tools), and build.rs
    emits one pub const APR_<TOOL>_SCHEMA per entry for all 9.
    
    Five whys:
    - Symptom: README says "all 8 tools"; contract has 9 tool entries.
    - Why: the "8 tools" figure was the Phase-1 workflow-tool count.
    - Why: when FALSIFY-MCP-008 expanded to codegen every tool in M3 it
      picked up apr.version too, but the README M3 bullet kept the
      Phase-1-focused "8 tools" wording.
    - Why: the Phase-1 count and the registered-tool count are both in
      circulation in docs (spec refers to both as "8 Phase-1 tools plus
      apr.version") and it's easy to conflate them.
    - Root cause: no single-sourcing of the tool-count number — any doc
      can drift from `contracts/apr-mcp-tool-schemas-v1.yaml` (the
      authoritative list) silently.
    - Fix now: split the count honestly ("8th Phase-1 workflow tool — 9th
      registered" and "all 9 registered tools"); deferred fix: when the
      spec's M4 contract promotion (PR #886) lands, add a
      FALSIFY-MCP-008-style codegen check that the tool-count numbers in
      README/spec/book match the YAML row count.
    
    Refs PMAT-037
    
    * docs: sweep remaining 57→58 command drift in book + spec prose
    
    Five prose sites still carried the stale 57-command count after the
    earlier commits bumped the contract YAMLs and the monorepo/crate-tree
    captions:
    - book/src/introduction.md (2 occurrences — "What is Aprender?"
      headline + CLI Reference bullet)
    - docs/specifications/apr-book-spec.md (2 occurrences — Ch 1.5 entry
      + Appendix A crate-map row for apr-cli)
    - docs/specifications/aprender-readme-book-rewrite.md (2 occurrences
      — Problem section intro + "What is aprender?" bullet)
    
    Why these were missed earlier: the previous sweep focused on
    contract YAMLs (apr-cli-commands-v1, apr-cli-publish-v1,
    apr-cli-qa-v1) + the monorepo layout crate-tree captions. These
    prose sites live in discursive book/spec text and weren't caught by
    the YAML-first grep.
    
    Scope discipline preserved: left the two intentional historical
    references alone — aprender-monorepo-consolidation.md#L618 DONE
    history line and apr-mcp-server-spec.md#L10/#L21 which say "58
    commands (57 + mcp added PR #864)" on purpose to explain the jump.
    
    Refs PMAT-037
    
    * docs(aprender-mcp/validate): refresh stale 'remaining 7 will follow' doc-comment
    
    The module doc-comment for apr.validate still read as if M2 was in
    progress — "the remaining 7 Phase-1 tools will follow: spawn
    apr <subcommand> --json...". M2 shipped 2026-04-17/18 (#865, #866,
    #867, #870, #872) and M3 shipped 2026-04-18 (#881), so all 7 M2
    wrappers plus the M3 apr.finetune addition now live on this pattern.
    
    Updated to present-tense enumeration: lists each wrapper by name and
    makes explicit that apr.finetune also inherits the subprocess
    pattern, so a reader landing on this file first gets the full shape
    of what ships.
    
    Five whys:
    - Symptom: validate.rs doc-comment describes M2 as future work.
    - Why: comment was written when apr.validate was the first-shipped
      wrapper (#865) and the other 6 were still PRs.
    - Why: subsequent wrapper PRs (#866, #867, #870, #872) and the M3
      addition (#881) didn't circle back to retire the "will follow"
      tense on the earliest module.
    - Why: no codegen or lint forced doc-comments to reference
      contract-driven tool counts, so the prose drifted silently.
    - Root cause: module doc-comments are low-visibility — they don't
      show up in tools/list output, so FALSIFY-MCP-008 doesn't catch
      them.
    - Fix: manual sweep now; longer-term, an apr-mcp doc-invariant
      contract could codegen "shipped tools" lists from the registry.
    
    Refs PMAT-037
    
    * docs(mcp-contract): sync apr.serve description with source truth
    
    The YAML contract still said "Full lifecycle (cancel/SIGTERM) lands in
    M3." — but M3 shipped weeks ago (finetune + opt-in progress) and serve
    lifecycle was deferred to a post-M3 follow-up. The source-of-truth
    description in `crates/aprender-mcp/src/tools/serve.rs:44-46` already
    reads "Cancel-token lifecycle (SIGTERM) is a post-M3 follow-up" — the
    contract YAML is the one that drifted.
    
    Five-whys
      1. Why did the YAML description drift from the source? →
         FALSIFY-MCP-008 only asserts byte-identity on the `inputSchema`
         (properties/required), not on the tool-level description.
      2. Why was FALSIFY-MCP-008 scoped that way? → Descriptions are
         LLM-visible free-form prose that humans edit in both places during
         development; byte-comparing them every build would churn CI.
      3. Why did the divergence survive post-M3? → No periodic kaizen sweep
         compares YAML tool descriptions with their source counterparts.
      4. Why didn't any kanban/release task catch it? → Release templates
         don't list the MCP contract YAML among per-milestone artifacts to
         refresh.
      5. Why not? → Contract YAML changes are treated as codegen input, not
         documentation — so prose rot goes unnoticed until a kaizen pass.
    
    Symptom fixed; root-cause follow-up (a byte-compare for descriptions,
    or a lint that forbids roadmap-tense phrases like "lands in Mx" after
    that milestone ships) is tracked for a future pass — not a PMAT-037
    blocker because descriptions are advisory for LLM clients and the
    actual tool behaviour is covered by FALSIFY-MCP-005/007/008.
    
    Refs PMAT-037
    
    * docs(mcp-contract): drop false stop_reason claim from apr.run description
    
    YAML + source both advertised that apr.run "returns tokens + tok/s +
    stop reason", but the apr CLI does not emit `stop_reason`. Spec line
    90 of apr-mcp-server-spec.md records the ground truth:
    
        CLI as of 2026-04-18; `stop_reason` not emitted
    
    Replaced with an accurate inventory ("generated text, tokens, tok/s,
    and timing") plus the cancellation note that is genuinely load-bearing
    for MCP clients (FALSIFY-MCP-005 asserts cancel wiring).
    
    Five-whys
      1. Why did the description promise a field the CLI doesn't emit? →
         The description was written speculatively ahead of a planned
         `apr run --json` enrichment that never landed.
      2. Why did the speculative doc survive? → FALSIFY-MCP-008 compares
         inputSchema byte-for-byte, but does NOT compare the tool
         description to the actual CLI response keys.
      3. Why doesn't any gate detect output-shape drift? → apr.run returns
         free-form stdout bytes to the MCP client; there is no typed
         contract on the response shape.
      4. Why not? → The MCP tool surface is intentionally a pass-through
         so the CLI can evolve without churning the MCP spec.
      5. Why does that hurt here? → Pass-through evolution needs
         matching doc-hygiene passes (like this one) to keep the
         LLM-visible description honest. Same root-cause class as the
         apr.serve fix one commit back.
    
    Same class of drift as 715781df5 (apr.serve "lands in M3"). Tracking
    a shared follow-up: lint for roadmap-tense phrases and a smoke-test
    that the description's field enumeration is a subset of the CLI's
    actual JSON keys.
    
    Refs PMAT-037
    
    * docs(mcp-spec): clarify Success Criteria scope — spec ACTIVE, gate is for M4 close
    
    The header reads "Acceptance gate for promoting to ACTIVE" — but the
    spec status at the top already says ACTIVE (promoted at M3 ship on
    2026-04-18). The criteria listed (contract-level gates, 9-gate pass
    including the M4 dogfood session) actually describe **closing M4** —
    promoting `apr-mcp-server-v1.yaml` from DRAFT to ENFORCED and lifting
    FALSIFY-MCP-003/-004 from PARTIAL to PASS.
    
    Five-whys
      1. Why does "promoting to ACTIVE" survive past ACTIVE promotion? →
         The Success Criteria block was drafted pre-M3 when the spec was
         still DRAFT, and was never re-scoped after the M3 ship flipped
         the spec header to ACTIVE.
      2. Why did no gate force a re-scope? → The spec's own header was
         updated in the same commit that set the status, but the mid-doc
         sections weren't traversed because nothing links them to the
         header change.
      3. Why isn't that traversal automated? → provable-contracts'
         doc_integrity checker validates cross-links between spec and
         contract YAML, not internal consistency of roadmap language
         across sections of the same spec.
      4. Why is internal consistency not a contract check? → Roadmap
         language ("will ship", "pending", "ACTIVE") is prose, not
         structured data — hard to assert byte-for-byte.
      5. Why not structure the status fields? → Longer-term work; this
         commit is the symptom fix so readers can trust the Success
         Criteria block against the spec header.
    
    Now readers see:
      - Spec header: ACTIVE
      - Success Criteria: gate for closing M4 (contract DRAFT→ENFORCED,
        FALSIFY-MCP-003/-004 PARTIAL→PASS, dogfood done)
    
    That's the actual open-work framing.
    
    Refs PMAT-037
    
    * docs(book/mcp): fix stale apr.version example payload (0.31.0 → 0.30.0)
    
    The book's apr.version example response used "0.31.0", but the tool
    emits CARGO_PKG_VERSION baked in at compile time — currently 0.30.0
    (workspace Cargo.toml, unchanged since 2026-04-12). A client
    developer reading the doc and pinning to the example shape would
    see an immediate mismatch against a real server.
    
    Five-whys
      1. Why did the doc show a version that doesn't exist? → The
         example was forward-scoped during an earlier release-planning
         pass that anticipated a 0.31.0 bump.
      2. Why did that anticipated bump not land? → M1-M3 all shipped on
         main but never got tagged; the plan line in the spec says
         "M1-M3 planned for v0.32.0 publication" (line 263).
      3. Why didn't the doc update when the tag plan changed? → Example
         payloads are prose, not codegen, and aren't covered by any
         contract byte-compare.
      4. Why no lint for version strings in examples? → Version drift is
         rare and most tools show "x.y.z" abstracts; apr.version's case
         is unusual because the book shows a concrete literal.
      5. Why show a concrete literal? → Helpful for readers debugging
         an actual tools/call round-trip — but that helpfulness inverts
         once the literal goes stale.
    
    Fix: set the example to 0.30.0 (current workspace version) and add a
    one-sentence note telling clients to parse for diagnostics rather
    than pin to the literal. That way the next version bump doesn't
    immediately invalidate the doc.
    
    Refs PMAT-514
    
    * test(falsify-mcp-008): enforce tool description YAML↔source byte-equality
    
    Before: `migrated_tools_match_yaml_contract_byte_for_byte` compared only
    `inputSchema`, leaving `tools[*].description` free to drift silently. This
    drift was observed twice on 2026-04-18 alone (apr.serve — 715781df5,
    apr.run — 91a613968) after the YAML contract was audited manually against
    the source.
    
    Five whys:
    1. Why did apr.serve/apr.run descriptions drift from the contract? → dev
       edits in tools/*.rs never propagated back to the YAML.
    2. Why wasn't this caught in CI? → FALSIFY-MCP-008 harness compared only
       `inputSchema`.
    3. Why was `inputSchema` the only thing compared? → M3 PR #881 scoped the
       byte-identity gate to the schema codegen path (build.rs emits
       APR_*_SCHEMA constants), where drift would crash the build.
    4. Why didn't the contract itself catch this? → YAML line 282 asserted
       "each tool's `description` matches tools[*].description byte-for-byte"
       — but that assertion was aspirational, never wired into a test.
    5. Root cause: claim-without-enforcement is the silent-drift seed. Fix is
       to make the assertion load-bearing by adding a second test that
       compares `ToolDefinition.description` to the YAML string directly.
    
    The new test `tool_descriptions_match_yaml_contract` discharges the class
    of drift that caused both commits above, without widening scope — it uses
    the same contract loader and `migrated_tools()` iterator as the existing
    schema gate.
    
    Verified: all 6 tests in falsify_mcp_008 pass, including the new one.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-contract): flip DRAFT→ENFORCED, clear stale M3 parentheticals
    
    The contract YAML self-describes as DRAFT and pins its test_harness /
    codegen_consumer with "(to be added in M3)" parentheticals — but M3
    shipped on 2026-04-18 (PR #881). The drift surfaces as:
    
    - Line 58 top-level `status: DRAFT`
    - Line 271 `FALSIFY-MCP-008.status: DRAFT`
    - Line 287 `test_harness: ...falsify_schema_codegen.rs (to be added in M3)`
      — the real harness is `falsify_mcp_008.rs` and has six tests green
    - Line 288 `codegen_consumer: ...build.rs (to be added in M3)` — already
      landed
    - Line 57 top-level `version: "1.0.0"` vs line 30 `metadata.version: 1.1.0`
    
    Five whys:
    1. Why is the contract still DRAFT after M3 shipped? → nobody reran a
       spec audit after PR #881 merged.
    2. Why did the M3-ship commit not touch this file's status? → PR #881
       scope was "wire up codegen + harness"; contract fields were treated
       as documentation, not code.
    3. Why weren't the parentheticals caught? → they read as prose, not as
       testable assertions; no gate compares them against reality.
    4. Why didn't any automation flag a version mismatch between
       top-level `version` (1.0.0) and `metadata.version` (1.1.0)? → no such
       check exists on this contract schema.
    5. Root cause: contract-as-documentation drift. Counterpart: PMAT-514
       just added a harness test that makes the `description`-equality claim
       on line 282 load-bearing. This commit brings the surrounding prose
       (status + parentheticals + version pin) into alignment with that
       ENFORCED reality.
    
    Follow-up candidates (not in this commit):
    - Add a harness check that `metadata.version == top-level version` to
      prevent this class from re-emerging (parallel to FALSIFY-MCP-008).
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-spec): document FALSIFY-MCP-008 description-equality extension
    
    Three coordinated edits, all propagating the harness change from PMAT-514
    into the spec surface:
    
    1. Gate summary (line 158): narrow "schema byte-identical" claim broadened
       to "schema + description byte-identical", naming both test functions
       explicitly so readers can find the enforcement point.
    2. File-tree comment (line 60): `falsify_mcp_008.rs` blurb now says
       "schema + description byte-identity", matching the new test.
    3. M5 re-run checklist (line 215): test count 75 → 76 (one new test in
       falsify_mcp_008.rs).
    
    Verified: `cargo test -p aprender-mcp` reports 51+8+4+6+4+2+1 = 76 tests
    all passing.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * chore(roadmap): register PMAT-514 — APR-MCP-KAIZEN continuous drift sweep
    
    Adds the pmat work ticket that tracks ongoing kaizen on apr-mcp-server-spec
    and its satellites (aprender-mcp source, book chapter, schema contract
    YAML). Status: inprogress. First discharge: byte-compare YAML tool
    descriptions with source descriptions (closed silent-drift class that
    bit apr.serve on 715781df5 and apr.run on 91a613968 in one 24h window).
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(book/mcp): book chapter mirrors FALSIFY-MCP-008 description extension
    
    Symmetric to the spec update in 2f38f0241. Two book edits:
    
    1. Falsification gates table (line 333): gate now reads "inputSchema AND
       description byte-identical" — same broadening applied to the spec.
    2. Schema-codegen prose (line 315-320): calls out the two specific test
       functions that enforce the gate, and tightens the "edit YAML,
       rebuild" guidance to include descriptions.
    
    Readers landing on the book chapter (via rustdoc cross-link or GitHub
    Pages) now see the same gate surface as spec readers.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(aprender-mcp/README): mirror FALSIFY-MCP-008 description extension
    
    Crate README's gate table is the third surface that readers hit — after
    the spec and book chapter. Aligning all three to say "inputSchema AND
    description" closes the documentation side of the silent-drift class.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-contract): sharpen coverage-note — 9 entries, gate surface spelled out
    
    Before: the coverage note said "All 8 Phase-1 tools are now registered in this
    contract" — technically correct (apr.version is an M1 scaffold, not a Phase-1
    workflow tool) but ambiguous, because the FALSIFY-MCP-008 harness iterates
    over all 9 entries including apr.version. A new reader easily miscounts.
    
    After: the note enumerates both categories explicitly (scaffold + 8 wrappers =
    9 entries) and adds a second paragraph spelling out what the PMAT-514
    extension now covers — `inputSchema` byte-identity AND tool-level
    `description` byte-identity — with the specific test function names. This
    matches the surface that was already asserted in the falsification block
    above (lines 281-286) and discharges the ambiguity in one pass.
    
    Verified: `cargo test -p aprender-mcp --test falsify_mcp_008` 6/6 green.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-spec): add apr-mcp-tool-schemas-v1 to Contracts header list
    
    The tool-schemas contract is the **single source of truth** for every
    MCP tool's `inputSchema` (and, as of PMAT-514, description), drives the
    `build.rs` codegen, and is referenced by FALSIFY-MCP-008 — yet it was
    missing from the header `**Contracts**:` list. The spec's own body text
    referenced it five times (lines 27, 40, 158, 177, 193) but a reader
    landing on the spec from a link would not see it in the contract
    register.
    
    Five whys:
    1. Why was the contract not listed? → the header was authored before
       the tool-schemas YAML was split out into a standalone contract.
    2. Why didn't the split author backfill the header? → the split PR
       (#871 — authored the YAML) focused on the contract body; the spec
       header wasn't on the review checklist.
    3. Why isn't there a checklist? → spec-header/contract-file consistency
       has no automated gate.
    4. Why no gate? → the spec body mentions multiple contracts in prose,
       so "spec references contract X" doesn't uniquely identify which
       contracts should appear in the header.
    5. Root cause: the header is a curated list (things a reader must
       know about), not a mechanical index. Kaizen is the right fix for
       curated-list drift — no automation needed, just periodic sweeps.
    
    Also included the ENFORCED status inline so readers see M3 progress at
    a glance.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-contract): broaden FALSIFY-MCP-008 condition to match assertions
    
    The `assertions:` block already covered descriptions (line 282) but the
    prose `condition:` above it talked only about "JSON Schema". Readers
    skimming the condition paragraph would miss that descriptions are also
    load-bearing.
    
    The rewrite preserves the JSON canonicalization language (important —
    that's the byte-for-byte definition) and adds a second clause spelling
    out how descriptions flow: directly compared at test time against
    `ToolDefinition.description`, separate from the build.rs codegen path
    that carries `inputSchema`.
    
    Verified: `cargo test -p aprender-mcp --test falsify_mcp_008` still
    6/6 green.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(falsify-mcp-008): refresh module doc-comment for PMAT-514 extension
    
    The file-level doc-comment predated the description-equality test added
    in PMAT-514. Three updates:
    
    1. Opening summary: "byte-identical to the schema" → "byte-identical to
       the corresponding entry ... covering both the `inputSchema` object
       and the tool-level `description` string" — so cargo-doc readers see
       the full gate surface on first hit.
    2. Numbered list: step 6 added for the description assertion, keeping
       the structural schema assertion as step 5.
    3. Scope paragraph: "Scope (M3 completion — PR #881 follow-up)" →
       "Scope (M3 shipped, extended by PMAT-514 on 2026-04-18)" and counts
       updated from "all 8 Phase-1 tools" to "all 9 registered tools
       (apr.version + 8 Phase-1 wrappers)" — matches the contract
       coverage-note landed in 3266e365f.
    
    Verified: 6/6 tests still pass.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(book/mcp): sharpen 'edit YAML, rebuild' — descriptions need Rust edit too
    
    Previous prose read "The Rust source does not need editing for schemas,
    and descriptions must track the YAML verbatim" — technically implies
    descriptions auto-flow from the YAML. They don't: the description
    string is hand-written in `crates/aprender-mcp/src/tools/<tool>.rs` and
    must be mirrored manually when the YAML changes. The harness
    (`tool_descriptions_match_yaml_contract`) fails CI on divergence but
    does not auto-fix the source.
    
    Why this matters: a contributor reading the old wording would think
    editing only the YAML is enough, push, and then be surprised when CI
    fails. The new wording makes the two-file edit explicit.
    
    Future cleanup: extend `build.rs` to codegen description constants too,
    then this note can collapse back to "edit YAML only". Not in scope for
    PMAT-514 — the test-time enforcement is sufficient today.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(aprender-mcp): codegen tool descriptions from YAML contract
    
    Extends build.rs to emit `APR_<TOOL>_DESCRIPTION: &str` alongside the
    existing `APR_<TOOL>_SCHEMA: &str` for each tool in
    `contracts/apr-mcp-tool-schemas-v1.yaml`. All 9 tool modules now consume
    `crate::schemas::APR_<TOOL>_DESCRIPTION.to_string()` instead of
    hand-mirroring the string in Rust source.
    
    Five-whys:
    1. Why extend codegen? Descriptions drifted silently twice in a 24h
       window (apr.serve 715781df5, apr.run 91a613968).
    2. Why did the test-time gate (PMAT-514) not catch drift before merge?
       It did — but only after the drift was committed; a compile-time gate
       prevents the drift from ever building.
    3. Why split schema and description into separate constants instead of
       one merged blob? ToolDefinition's `description` is a Rust String, not
       JSON; keeping them separate avoids forcing a JSON round-trip on a
       non-JSON field.
    4. Why keep the test-layer `tool_descriptions_match_yaml_contract` if
       codegen eliminates drift? Defence in depth — catches a future
       refactor that replaces the codegen consumer with a literal.
    5. Why only 9 files to update? 8 Phase-1 wrappers + apr.version are the
       entire current tool surface. M5 tools will consume the codegen
       constants from day one.
    
    Refs PMAT-514.
    
    * test(falsify-mcp-008): codegen-layer description gate + coverage guardrail
    
    Adds two new tests to `falsify_mcp_008.rs`:
    
    * `codegen_description_constants_match_yaml` — asserts each
      `schemas::APR_<TOOL>_DESCRIPTION` codegen constant equals
      `tools[*].description` byte-for-byte. This is a strictly stronger gate
      than `tool_descriptions_match_yaml_contract`: the live-ToolDefinition
      test would silently pass if a future refactor replaced
      `APR_X_DESCRIPTION.to_string()` with a hand-coded literal. Asserting
      the codegen constant itself closes that bypass route.
    * `codegen_descriptions_cover_every_tool_name` — mirrors the existing
      `codegen_constants_cover_every_tool_name` guardrail: every name in
      `schemas::TOOL_NAMES` must appear in `CODEGEN_DESCRIPTIONS`, catching
      the case where a new tool is added to YAML but its description
      constant isn't registered in the test table.
    
    Refreshes module-level doc-comment to enumerate 7 layers of coverage
    and the dual codegen path (SCHEMA + DESCRIPTION).
    
    Test count: falsify_mcp_008 grows 6→8; aprender-mcp total 76→78.
    
    Refs PMAT-514.
    
    * docs(mcp): sync all surfaces with PMAT-514 description-codegen extension
    
    Mirrors the build.rs description-codegen change into every doc surface
    that previously said descriptions were hand-mirrored:
    
    * docs/specifications/apr-mcp-server-spec.md — FALSIFY-MCP-008 row now
      names the codegen-layer test; M3 milestone bullet points at the
      PMAT-514 extension; suite count 76→78.
    * contracts/apr-mcp-tool-schemas-v1.yaml — `condition:` prose and
      `test_harness:` / `codegen_consumer:` pointers describe both
      `APR_<TOOL>_SCHEMA` and `APR_<TOOL>_DESCRIPTION` codegen paths;
      tool-registry comment states both fields flow through build.rs.
    * book/src/tools/mcp-server.md — "edit YAML, rebuild" guidance updated:
      changing a description now requires only a YAML edit (was: YAML +
      Rust); enumerates 4 sub-tests (2 live, 2 codegen).
    * crates/aprender-mcp/README.md — gate-table row references the dual
      codegen constants.
    
    Refs PMAT-514.
    
    * chore(roadmap): PMAT-514 record description-codegen discharge line
    
    Marks the PMAT-514 roadmap entry with a DISCHARGED acceptance line
    pointing at the two-layer gate (test-layer + codegen-layer) and the
    `APR_<TOOL>_DESCRIPTION` build.rs output. The top-level "ongoing
    kaizen sweeps" acceptance stays — this is one ticket, many sweeps.
    
    Refs PMAT-514.
    
    * docs(mcp): sync remaining module-doc + README M3 bullet with PMAT-514
    
    Three surfaces still described M3 codegen as "schema only":
    
    * docs/specifications/apr-mcp-server-spec.md — file-tree build.rs
      comment now spells out both constants emitted.
    * crates/aprender-mcp/README.md — M3 milestone bullet enumerates
      `APR_<TOOL>_SCHEMA` and `APR_<TOOL>_DESCRIPTION`.
    * crates/aprender-mcp/src/lib.rs — module-doc for `schemas` now
      documents both constants, how to consume them, and that hand-coding
      either is caugh…
    noahgift and claude authored Apr 19, 2026
    Configuration menu
    Copy the full SHA
    7d4917b View commit details
    Browse the repository at this point in the history
  8. docs(mcp-spec): fold FALSIFY-MCP-PROGRESS-002 into numbered gates + f…

    …ix dangling "this PR" ref (#905)
    
    * docs(mcp): spec v1.1.0 — DRAFT → ACTIVE; M1–M3 shipped
    
    - Status: DRAFT (pre-implementation) → ACTIVE (M1–M3 shipped; M4 dogfood pending)
    - Falsification conditions table now annotates ENFORCED / PARTIAL / Deferred
      per shipped test mappings in contracts/apr-mcp-server-v1.yaml (#886)
    - Adds FALSIFY-MCP-PROGRESS-001 entry for #887 progress-notification gate
    - Milestones M1/M2/M3 marked SHIPPED with PR cross-references
    - M4 acceptance items remain open (real-model gates, dogfood)
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp): align spec output shapes with CLI reality (PR #889 falsifications)
    
    PR #889 added mock-subprocess e2e tests for FALSIFY-MCP-003/-004 and discovered
    two spec-vs-CLI mismatches via test failures:
    
    1. apr.run: spec listed `stop_reason` in output — CLI's print_run_output
       (crates/apr-cli/src/commands/run_entry.rs:279) does NOT emit it.
       Spec corrected to the actual emitted set (model, text, tokens, ...).
    
    2. apr.qa: spec wrote gates as `{pass, value, threshold}` — CLI's GateResult
       (crates/apr-cli/src/commands/qa.rs:368) uses `passed` not `pass`.
       Spec corrected.
    
    Also fixes the codegen source reference: FALSIFY-MCP-008 uses
    contracts/apr-mcp-tool-schemas-v1.yaml (PR #871), not apr-cli-commands-v1.yaml.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * fix(mcp-spec): M1 PR refs #862 → #864
    
    PR #862 is a matmul test fix, not the MCP M1 skeleton. The correct
    skeleton PR is #864 (`feat(mcp): apr mcp M1 skeleton — MCP server
    over stdio`). All three stale citations in the M1 milestone replaced.
    
    Five-whys root cause: the spec retrofit (#873) reconstructed PR
    numbers from memory; future retrofits should verify against
    `git log --grep=...` before committing.
    
    Refs PMAT-037.
    
    * fix(mcp-spec): demote unmerged contract + M3 PR accuracy
    
    Three stale citations corrected in the M3 milestone:
    - #874 removed from cancellation bullet (#874 is the book-chapter doc
      commit, not cancellation — that's #883 alone).
    - `contracts/apr-mcp-server-v1.yaml (ACTIVE) ... (#886)` bullet moved
      from M3 SHIPPED to M4 IN PROGRESS. PR #886 is still OPEN and its
      own title says "M4 — apr-mcp-server-v1 contract ACTIVE". The file
      is not in-tree. Header's "**New**:" label also updated to "Pending
      (PR #886)" for the same file.
    - Book-chapter citation expanded to list #874 (M2 creation) + #885
      (M3 update) for accurate provenance.
    
    Five-whys root cause (false "M3 SHIPPED" on #886): the spec promotion
    commit (a496ce97c) rolled unmerged M4 work into M3 bullets under the
    optimistic assumption the PR would land first. Going forward: any
    bullet citing a PR must verify `gh pr view <N>` is MERGED before
    promoting a milestone.
    
    Refs PMAT-037.
    
    * fix(mcp-spec): Architecture — refresh to match built reality
    
    The Architecture + Protocol + Out-of-Scope sections carried pre-M1
    aspirations that no longer match the shipped crate. Refreshed against
    actual source tree in crates/aprender-mcp/:
    
    - Goal: schema source was `apr-cli-commands-v1.yaml`; lines 11/87/139
      correctly cite `apr-mcp-tool-schemas-v1.yaml`. Unified.
    - Directory diagram: listed absent `schema.rs`; missing `build.rs`,
      `types.rs`, `tools/subprocess.rs`, `tools/version.rs`. `server.rs`
      comment said "pmcp::Server wiring" but M1 shipped a hand-rolled
      JSON-RPC loop (also noted line 149). `Cargo.toml` comment listed
      pmcp/tokio/clap/apr-cli — none are actual deps (verified: serde,
      serde_json, anyhow, nix, serde_yaml build, jsonschema dev).
      `tests/` now lists the four actual `falsify_*.rs` harnesses.
    - `apr mcp` subcommand: snippet promised `async` with `McpArgs` +
      transport matching + SSE; actual `run()` is blocking, takes no
      args, calls `AprMcpServer::new().run_stdio()`.
    - Protocol/Transport: "SSE optional" was false; flag doesn't exist.
      Downgraded to stdio-only and added SSE to Out of Scope.
    
    Five-whys root cause: the Architecture diagram was authored pre-M1
    as a design sketch; later commits (#873 retrofit, v1.1.0 promotion)
    updated Milestones but never re-diffed the static diagram against
    `ls crates/aprender-mcp/src/`. Going forward: any spec change
    touching Milestones must run a diagram-vs-tree check.
    
    Follow-up filed: verify Config Precedence (lines 122-126) against
    implementation — `pub fn run()` consults no env vars today.
    
    Refs PMAT-037.
    
    * fix(mcp-spec): reconcile 8-vs-9 tool count + Related Work misattribution
    
    Two factual errors corrected:
    
    - Tool count: spec said "8 Phase-1 tools" (lines 74, 133); `tools/list`
      actually returns 9 because `apr.version` (M1 scaffold) is also
      registered. Verified by
      `crates/aprender-mcp/tests/falsify_m1.rs::falsify_mcp_002_tools_list_schema_shape`,
      which asserts all 9 names (apr.version + 8 workflow tools).
      Clarified spec to state "8 Phase-1 workflow tools + apr.version
      scaffold = 9 total registered" and added test cross-link to the
      FALSIFY-MCP-002 bullet.
    
    - Related Work line 210 claimed `crates/apr-cli/src/tool_commands.rs`
      is the "planned MCP tool surface (referenced but unimplemented)".
      That file exists and is the `apr tool` CLI subcommand group
      (Showcase, Rosetta, …), unrelated to MCP. The actual MCP tool
      surface lives in `crates/aprender-mcp/src/tools/`. Corrected and
      noted that rust-mcp-sdk (paiml/rust-mcp-sdk) is currently unused
      since M1 shipped a hand-rolled JSON-RPC dispatcher.
    
    Five-whys root cause (8 vs 9): the original Phase-1 design enumerated
    8 workflow tools and `apr.version` was added later as an M1 handshake
    probe without updating the narrative count. No invariant check
    cross-references spec tool-count against `tools/list` test assertions.
    
    Refs PMAT-037.
    
    * fix(mcp-spec): mark config precedence Phase-2 aspirational
    
    Lines 122-126 stated a four-level config precedence (`--config`,
    `$APR_MCP_CONFIG`, `~/.config/apr/mcp.toml`, defaults) as if it were
    implemented. Actual `crates/apr-cli/src/commands/mcp.rs::run()` takes
    no arguments and consults no env vars; `AprMcpServer::new()` has no
    config loader. The `APR_MODEL_DIR` env in the `.mcp.json` snippet is
    read by the spawned `apr <cmd>` subprocesses, not by the MCP server.
    
    Rewrote the section to keep the intended precedence as the Phase-2
    contract while making Phase 1's "no config loader" reality explicit.
    
    Five-whys root cause: the Configuration section predates the M1
    skeleton and was not re-verified against `commands/mcp.rs` during
    the v1.1.0 promotion. A "spec bullet implies an API — grep for the
    API" check belongs in the promotion workflow.
    
    Refs PMAT-037.
    
    * fix(mcp-spec): Success Criteria gate count 8 → 9
    
    Spec's Falsification Conditions section lists 9 entries (FALSIFY-MCP-001
    through -008 plus FALSIFY-MCP-PROGRESS-001 added in M3), but the Success
    Criteria table still said "8 falsification gates". Count corrected and
    wording clarified to reflect that -003/-004 are currently PARTIAL and
    must promote to PASS at M4 close.
    
    Five-whys root cause: adding PROGRESS-001 in M3 touched the conditions
    section but didn't update the downstream summary row. Going forward:
    whenever a new FALSIFY-MCP-* lands, grep the spec for `N falsification`
    to catch all downstream counts.
    
    Refs PMAT-037.
    
    * fix(mcp-spec): close residual kaizen items
    
    Three dangling claims resolved:
    
    - Target version: `v0.32.0 / v0.33.0` stands as the intended release
      tags but `git tag -l 'v0.3*'` returns only `v0.3.0 / v0.3.1 / v0.30.0`.
      M1–M3 are merged on `main` but unreleased. Added a clarifier so a
      reader doesn't assume those tags exist.
    - Aspirational follow-ons: `apr-mcp-plugin-marketplace-v1.md` and
      `apr-mcp-hooks-v1.md` are not in `docs/specifications/`. Labelled
      "(spec files not yet authored)" so readers don't hunt for them.
    - Risk Register: "pmcp crate API instability" is dormant because M1
      shipped a hand-rolled JSON-RPC dispatcher (line 166 already notes
      pmcp is deferred). Row reworded so the risk's activation condition
      is explicit.
    
    Five-whys root cause (across all three): the spec's non-Milestone
    sections — Target, Related Work, Risk Register — were not refreshed
    during v1.1.0 promotion. Every milestone promotion should sweep those
    sections, not just the milestone table.
    
    Refs PMAT-037.
    
    * chore(pmcp): bump to 2.3 and drop pforge-runtime (Refs PMAT-037)
    
    Five-Whys:
    - Symptom: dual pmcp versions (1.20 + 2.3) resolved in agents-mcp build.
    - Why #1: pforge-runtime 0.1.4 (last released 2025-12) still pins pmcp 1.x.
    - Why #2: pforge-runtime was listed as an optional dep alongside pmcp.
    - Why #3: it was a forward-compat hedge — but no Rust code imports it
      (only doc-comment mentions and knowledge-graph string literals).
    - Why #4: keeping an unused dep doubled the compile footprint and split
      the pmcp protocol surface across two crates.
    - Root cause: speculative dep on a framework wrapper for an SDK we
      already use directly.
    
    Fix:
    - Cargo.toml: bump pmcp 1.10 → 2.3 (PAIML's actively-maintained SDK);
      remove pforge-runtime dep; agents-mcp feature now just ["agents","pmcp"].
    - Doc comments and the mcp_demo example rewritten to name pmcp v2.3 as
      the SDK instead of pforge. No Rust-level API change — pforge-runtime
      was never imported, just advertised.
    - cargo tree -i pmcp now shows a single pmcp v2.3.0 node.
    
    Follow-up: spec's pmcp framing (M1 note + Risk Register) still needs
    rewrite in apr-mcp-server-spec.md.
    
    * docs(apr-mcp-spec): v1.2.0 — honest pmcp framing, add M5 migration plan (Refs PMAT-037)
    
    Five-Whys:
    - Symptom: spec framed pmcp as unstable/dormant, treating the SDK as risk
      rather than planned substrate.
    - Why #1: Risk Register called out "pmcp crate API instability (dormant...)"
      — language from before pmcp was actively maintained.
    - Why #2: M1 note said "pmcp SDK deferred — more deterministic for current
      scope" without explaining the actual technical rationale.
    - Why #3: no adoption path existed — M4 stops at dogfood, so readers
      couldn't tell whether pmcp would ever land.
    - Why #4: pmcp v2.3 is PAIML's own crate (paiml/rust-mcp-sdk) and already
      used by aprender-orchestrate; keeping the spec's out-of-date framing
      forced the /tmp/spec-update session to discover this from crates.io.
    - Root cause: stale spec language from the early M1 period where the
      adoption path was genuinely uncertain; never updated after pmcp
      stabilised.
    
    Fix:
    - Line 15: link now labels pmcp as "PAIML's Rust MCP SDK, actively
      maintained, v2.3.1 on crates.io (2026-04-16)".
    - Line 44 / 167: architecture + M1 note explain the three concrete
      reasons the dispatcher is hand-rolled (minimal request/response shape
      over `apr <cmd> --json`, build.rs schema codegen keeps tools/list
      byte-identical to contract YAML, falsification asserts on wire bytes
      without an SDK layer).
    - Risk Register row rewritten from "API instability" to "adoption-path
      coordination" — real risk is workspace version alignment with the
      pmcp client role in aprender-orchestrate. Mitigation: single
      workspace-wide bump + `cargo tree -d` CI gate.
    - New M5 milestone: concrete pmcp migration plan — port dispatcher to
      pmcp::Server (retain build.rs codegen), add SSE + WebSocket
      transports, re-run falsification suite post-migration.
    - Out of Scope: SSE/WebSocket transports reclassified as "scheduled for
      M5 on top of pmcp v2.3".
    - Related Work: pmcp-sdk contract row now notes aprender-orchestrate
      already links pmcp v2.3 as a client; server-side migration is M5.
    - Version bumped 1.1.0 → 1.2.0.
    
    * docs(mcp-spec): reconcile M4 gate count with PR #886; bump pmcp contract v2.3 (Refs PMAT-037)
    
    Five-Whys:
    - Symptom: M4 bullet claimed "9 falsification_conditions" to match the 9
      gates listed in Section 145, but PR #886's contract pins exactly 8
      (FALSIFY-MCP-001..008) and a Rust test enforces that invariant.
    - Why #1: the 9th gate (FALSIFY-MCP-PROGRESS-001) was added in M3 AFTER
      PR #886 was drafted.
    - Why #2: PR #886's harness
      (apr_mcp_server_contract_ids_are_falsify_mcp_001_through_008) explicitly
      rejects anything outside 001..008, so the contract row for PROGRESS-001
      cannot land in the same PR without harness changes.
    - Why #3: the spec's earlier count-reconciliation (2026-04-18 prior
      kaizen round) missed this because it was looking for text matches, not
      contract row counts.
    - Root cause: spec and contract evolved on different PR branches.
    
    Fix:
    - M4 bullet: accurately describes PR #886 as landing 8 falsification
      rows, names the exact-8 invariant by its test function.
    - Adds an explicit follow-up bullet: "Extend the contract with a 9th row
      for PROGRESS-001 after PR #886 merges — relax the exact-8 invariant to
      'FALSIFY-MCP-001..008 + PROGRESS-001, no extras'".
    - Success Criteria table unchanged (line 220 still correctly says "9
      falsification gates ... all PASS or PARTIAL→PASS by M4 close") — the
      9th gate is already ENFORCED in code via falsify_mcp_progress_001.rs,
      we just need the contract YAML to catch up.
    
    Also:
    - contracts/pmcp/mcp-protocol-sdk-v1.yaml version 1.0.0 → 1.1.0 with
      "last_modified: 2026-04-18".
    - Description updated v2.1 → v2.3, adds consumer-of-record (aprender-
      orchestrate via agents-mcp feature) + future consumer (aprender-mcp
      M5 migration) + link to apr-mcp-server-spec.md.
    
    * docs(book/mcp): align M3 scope + add M5 pmcp migration row (Refs PMAT-037)
    
    Five-Whys:
    - Symptom: book chapter's M3 row missed FALSIFY-MCP-PROGRESS-001 (shipped
      via PR #887) and the paragraph called progress streaming "a follow-up
      slice" for BOTH apr.run and apr.finetune — incorrect for apr.finetune.
    - Why #1: book chapter was authored before PR #887 landed
      progressToken-gated notifications for apr.finetune.
    - Why #2: M5 pmcp migration (added to spec v1.2.0 today) had no
      corresponding row in the book status table.
    - Root cause: book lagged spec after the M3 progress slice merged and
      after the M5 migration plan was formalised today.
    
    Fix:
    - M3 row now mentions the opt-in progress notifications.
    - Paragraph specifies: FALSIFY-MCP-PROGRESS-001 is enforced for
      apr.finetune; only per-step structured progress (CLI event channel
      prereq) and apr.run progress (apr run --stream flag prereq) remain
      open.
    - New M5 row in the status table mirrors the spec's M5 milestone.
    
    * docs(mcp-spec): tighten streaming claim + M5 transport pointer (Refs PMAT-037)
    
    Five-Whys:
    - Symptom: Section "Protocol" bullet on Streaming claimed "apr.run and
      apr.finetune send notifications/progress for each decoded token /
      training step" — but apr.run progress is a deferred M4 item and
      apr.finetune only emits per-stdout-line progress (not per training
      step) and only when the client opts in via progressToken.
    - Why #1: the bullet was authored when both tools were planned to
      stream per-token. Reality diverged: progress landed for apr.finetune
      only (opt-in, per-line), apr.run was deferred.
    - Why #2: the Architecture paragraph pointed to "Phase 2 with SSE" for
      transport selection without naming the actual M5 milestone that now
      schedules it.
    - Root cause: drift between aspirational early-M2 text and the M3/M5
      structure formalised today.
    
    Fix:
    - Streaming bullet now names what's actually enforced
      (FALSIFY-MCP-PROGRESS-001, apr.finetune opt-in, per-stdout-line) and
      explicitly calls out the apr.run follow-up prereq (apr run --stream
      flag + per-step CLI event channel).
    - Architecture paragraph points at M5 as the SSE/WebSocket landing
      spot rather than the generic "Phase 2".
    
    * fix(examples): unblock Chapter Examples Compile on main (Refs PMAT-037)
    
    Five-Whys:
    - Symptom: CI job "Chapter Examples Compile" has been failing on every
      push to main since PR #701 (2 days), plus on this PR, with RUSTFLAGS=
      "-D warnings" promoting unused-import warnings to hard errors.
    - Why #1: ch10_training and ch24_switch_pytorch both import
      `aprender::nn::Optimizer` but only call `optimizer.step_with_params`,
      which is an inherent method on `SGD` (not a trait method) — so the
      trait import is genuinely unused.
    - Why #2: ch26_switch_ndarray binds `let pred = lr.predict(&x)` but
      never reads `pred` (score re-computes internally).
    - Why #3: these examples predate the refactor that moved
      `step_with_params` from the Optimizer trait to inherent impls; the
      trait import was never cleaned up.
    - Why #4: the Book Contract Enforcement and Chapter Examples Compile
      jobs are non-required checks, so the red status never blocked merges
      and accumulated as tech debt.
    - Root cause: main CI andon rule (main must always be green) was
      waived for non-required checks. Toyota Way: "all defects are your
      defects" — fix it regardless of whose PR introduced it.
    
    Fix:
    - ch10_training.rs, ch24_switch_pytorch.rs: drop `Optimizer` from the
      aprender::nn:: import list.
    - ch26_switch_ndarray.rs: consume `pred` by printing the first
      prediction — preserves pedagogical intent of showing predict() works,
      and unblocks -D warnings.
    - `cargo build -p aprender-core --examples` now warnings-clean.
    
    * fix(ci): use contract: pointer, not derived PCU path (Refs PMAT-037)
    
    The "Every PCU page has matching contract" gate derived paths from the
    PCU ID (`apr-page-${ID}-v1.yaml` / `apr-book-${ID}-v1.yaml`) but real
    page headers already carry an authoritative `contract:` field, and
    chapter contracts are named `apr-book-ch01-v1.yaml` (chapter-number
    only) while PCU IDs include a slug (`ch01-why-rust`). The mismatch
    failed all 27+ book pages on every run.
    
    Five whys:
      1. Why red? Script can't find `apr-page-tools-apr-cli-v1.yaml`
         from ID `tools-apr-cli`... wait it can. But for chapters it
         looks for `apr-book-ch01-why-rust-v1.yaml` which doesn't exist.
      2. Why does it derive? The earlier convention stored ID-derived
         paths before `contract:` was added to headers.
      3. Why not updated when `contract:` was added? The workflow was
         not migrated; the two lookup paths stopped covering all cases.
      4. Why silent until now? The gate was not blocking main.
      5. Why fix now? Kaizen sweep surfaced 27-page failure.
    
    Parse the authoritative `contract:` field. Also add missing PCU
    header + page contract for book/src/tools/mcp-server.md (now points
    to contracts/apr-page-tools-mcp-server-v1.yaml).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp): retire stale 'M3 will ship apr.serve lifecycle' (Refs PMAT-037)
    
    Three places claimed `apr.serve` cancellation lands in M3:
     - book/src/tools/mcp-server.md apr.serve paragraph
     - crates/aprender-mcp/src/tools/serve.rs module/fn docs
     - serve tool `description` field embedded in tools/list
    
    M3 actually shipped `notifications/cancelled` for apr.run only.
    `server.rs::CancelHandle` doc explicitly states: "Only apr.run
    currently honours cancellation." apr.serve remains fire-and-forget
    and the spec M3 bullet list never promised otherwise.
    
    Five whys:
      1. Why stale? Comments predicted M3 scope before scope narrowed.
      2. Why narrowed? Spec M3 scope: FALSIFY-MCP-006 for apr.run,
         -008 codegen, -PROGRESS-001 for apr.finetune. apr.serve
         lifecycle was never inside that gate set.
      3. Why not updated at M3 close? No acceptance criterion forced
         a sweep of surface prose when milestone shipped.
      4. Why matters now? Readers of book/tools page and users calling
         apr.serve via MCP get incorrect "lifecycle lands in M3" note
         that reads as imminent, not aspirational.
      5. Why fix now? Kaizen sweep surfaced; retarget to M5 where a
         daemon registry + pmcp Server port belong together.
    
    Edits: book paragraph + serve.rs module header + serve.rs `call`
    docstring + serve.rs description field + spec M5 new bullet for
    apr.serve cancel extension. Also spec M5 falsification-suite bullet
    updated from "71+ tests" to measured "75 tests" with file list.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(book/mcp): clarify apr.finetune progress shipped with limits (Refs PMAT-037)
    
    The apr.finetune paragraph said "Per-step notifications/progress
    streaming is a follow-up M3 slice" — read as "no progress yet" —
    but FALSIFY-MCP-PROGRESS-001 shipped in PR #887: per-line progress
    over `params._meta.progressToken` IS live.
    
    Five whys:
      1. Why stale? Paragraph was written before PR #887 merged.
      2. Why not updated at PR #887? PR focused on server.rs + test
         additions; book paragraph not flagged in review.
      3. Why matters? Clients reading the book will assume they cannot
         stream updates and skip progressToken, losing observability.
      4. Why two progress layers? Per-line (shipped, stdout-driven) vs
         per-step (needs a CLI event channel from `apr finetune`
         itself) — the former is cheap plumbing over JSON-RPC, the
         latter is a CLI-side refactor.
      5. Why fix now? Kaizen sweep surfaced.
    
    Rewrote the paragraph to state (a) what shipped (opt-in per-line),
    (b) the gate it satisfies (FALSIFY-MCP-PROGRESS-001), (c) the
    honest limitation (terminal blob today), (d) where per-step
    lives (M4 follow-up with CLI prereq).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * contract(mcp-schemas): retire 'retrofit-only' header, lock v1.1.0 (Refs PMAT-037)
    
    The apr-mcp-tool-schemas-v1.yaml header still read:
      "This M2 cut is RETROFIT-ONLY"
      "If this file ever disagrees with the Rust source, the Rust source wins"
      "In milestone M3 a build.rs at ... will read this YAML"
    
    All three are post-M3 stale:
      1. M3 shipped (PRs #880, #884) — build.rs is live.
      2. Byte-identity is enforced by tests/falsify_mcp_008.rs (5 tests).
      3. Rust tool sources contain zero hand-written schemas — they only
         parse `crate::schemas::APR_<TOOL>_SCHEMA` from $OUT_DIR.
      4. Direction is reversed: YAML authoritative, Rust derived.
    
    Five whys:
      1. Why stale header? Written for M2 retrofit cut.
      2. Why not flipped at M3 close? PR #884 focused on codegen, not
         contract prose.
      3. Why matters? Future readers will assume Rust source is the
         authority and "fix" the wrong side of a drift — inverting
         FALSIFY-MCP-008's intent.
      4. Why now? Kaizen sweep.
      5. Why v1.1.0? Semantic bump: authoritativeness change, plus new
         reference pointer to apr-mcp-server-spec.md.
    
    Bumped metadata version 1.0.0 → 1.1.0, added last_modified, rewrote
    header and description to reflect current state (YAML is SoT, Rust
    parses codegen constants, falsify_mcp_008.rs enforces byte-identity).
    Also updated spec M5 falsification-suite file list to include
    `falsify_mcp_008` and drop nonexistent `codegen_bytes`.
    
    Verified: `cargo test -p aprender-mcp --test falsify_mcp_008` — 5/5
    pass after YAML comment edits (no functional change, just prose).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-spec): 57 → 58 CLI commands (mcp added PR #864) (Refs PMAT-037)
    
    The spec claimed a 57-command CLI surface three times:
      - Contracts bullet: "57-command tool surface"
      - Problem paragraph: "57-subcommand CLI"
      - Goal paragraph: "subset of the 57 apr CLI commands"
    
    PR #864 registered `apr mcp` as the 58th command
    (contracts/apr-cli-commands-v1.yaml). The 63-line count in the
    contract is 58 commands + 5 FALSIFY-CLI-00* falsification rules.
    
    Five whys:
      1. Why stale? The 57 figure dates to #701 contract landing
         (2026-04-06) — the initial MCP PRs added `apr mcp` but
         didn't sweep cross-cutting doc claims.
      2. Why matters? MCP spec's own subject command is the 58th — a
         reader comparing counts will mistrust the surface-area claim.
      3. Why only fixing here? Scope is `apr-mcp-server-spec.md`;
         CLAUDE.md and apr-book-spec.md have broader audiences and
         want their own kaizen passes.
      4. Why cite PR #864 inline? Makes the delta auditable by a
         future reviewer checking `git log --oneline apr-cli-commands-v1.yaml`.
      5. Why not reword to "58+ commands" for future-proofing? The
         contract is the source of truth; stale counts are better
         caught by an exact-match CI gate than smeared over with
         imprecise phrasing. (PR #864 added a FALSIFY-CLI gate.)
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-spec): honest release-target footer (M3 shipped same week as M2) (Refs PMAT-037)
    
    The footer claimed:
      v0.32.0 (M1–M2), v0.33.0 (M3–M4)
    
    But M3 shipped on 2026-04-18, same week as M2 (2026-04-17/18), and
    the workspace is still at v0.30.0 on main. The old split-tag plan
    (M1–M2 in one release, M3–M4 in the next) no longer maps to
    reality — M3 will publish alongside M1–M2 because there's nothing
    to publish in between.
    
    Five whys:
      1. Why stale? Target was written assuming M2 → cut release → M3.
      2. Why reality diverged? M3 landed fast because cancellation +
         codegen + progress + apr.finetune were all independent PRs.
      3. Why matters? A reader looking at `git tag` + this footer
         would expect v0.32.0 to exist; it doesn't.
      4. Why not assign firm tags? Release cuts require a separate
         decision (changelog + publishing); this spec shouldn't
         preempt it.
      5. Why keep historical context? Future reader asking "why is
         the M3–M4 split collapsed?" deserves a traceable answer
         instead of silently rewritten history.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(aprender-mcp/README): sync milestones + full gate table (Refs PMAT-037)
    
    The crate README was three milestones behind the spec:
      - M2 bullet: "apr.serve (fire-and-forget; full lifecycle in M3)"
        — M3 shipped apr.run cancel only; serve registry is M5.
      - M3 bullet: "in progress" — M3 actually shipped 2026-04-18
        (PRs #880, #881, #883, #884, #887).
      - Gate table: listed 5 gates (001, 002, 005, 007, VALIDATE-001);
        missed 003, 004, 006, 008, PROGRESS-001 — 4 of 5 are now
        ENFORCED or PARTIAL, and PROGRESS-001 is net-new since M3.
    
    Five whys:
      1. Why lag? README is surface-facing, spec/code are the primary
         targets during milestone closes.
      2. Why matters? crates.io readers land here first — inaccurate
         milestone + gate table = miscalibrated expectations, especially
         about apr.serve cancellation.
      3. Why add status column? Distinguishing ENFORCED vs PARTIAL vs
         planned is what readers actually want when choosing whether
         to depend on a given gate.
      4. Why spell out M4 + M5 here? Same reason — readers want to
         know what's next, not dig through the spec.
      5. Why fix now? Kaizen sweep; PR #888 already touches this crate.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(README): 57 → 58 commands across 4 sites (Refs PMAT-037)
    
    The MCP spec already reconciled 57 → 58 (PR #864 added `apr mcp` as
    the 58th command in contracts/apr-cli-commands-v1.yaml). The root
    README still repeated 57 in four places: headline paragraph, stats
    bullet list, crate-layout tree comment, and smoke-test snippet.
    
    Keeping the count exact matters more than soft-pedalling it — PR
    #864 also added a FALSIFY-CLI gate that enforces `apr --help`
    listing against the YAML, so drift is caught at CI and the README
    should track it. Fixing here alongside the spec keeps the docs
    audit self-consistent within one PR.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(orchestrate/book): pmcp 1.8 → 2.3, drop pforge-runtime (Refs PMAT-037)
    
    Two orchestrate book pages carried stale pmcp/pforge references:
      - part3/pmcp.md — header still claimed pmcp v1.8.6 and showed
        `pmcp = "1.8"` in Cargo.toml snippet. crates.io has pmcp 2.3.1
        as of 2026-04-16 and the crate's Cargo.toml already pins it.
      - part3/agent-runtime.md L575 — `agents-mcp = ["agents", "pmcp",
        "pforge-runtime"]` but pforge-runtime was dropped earlier in
        this PR series (it pinned pmcp 1.20 and was unused outside
        knowledge-graph cataloguing).
    
    Five whys for each:
      1. Why stale? Book pages were written against pmcp 1.x, before
         the 2.x release cleanup.
      2. Why not caught? The orchestrate book has no CI gate matching
         its Cargo.toml snippets to actual crate deps.
      3. Why matters? Readers copy-pasting `pmcp = "1.8"` into a new
         project would land on a yanked / unmaintained line.
      4. Why not add a CI gate? Out of PR scope; filed mentally as an
         M5+ follow-up when `apr-contracts` lints cross-project snippets.
      5. Why fix now? Kaizen sweep surfaced during pmcp/pforge audit.
    
    Both archived batuta-agent.md references left alone — they live in
    `docs/specifications/archive/` and document the old design state.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(CLAUDE.md): 57 → 58 commands, add mcp to key-command list (Refs PMAT-037)
    
    Three stale 57-command claims in CLAUDE.md — the overview line,
    the key-files bullet, and the APR CLI section. Brought them in
    line with contracts/apr-cli-commands-v1.yaml (58 commands including
    `apr mcp`, added PR #864). Also added `mcp` to the inline key-command
    list — discovery matters more than alphabetical tradition given
    the MCP spec is the current top-of-mind work.
    
    The 405-contract and 25,300-test counts are out of spec scope and
    left for a future sweep (workspace tests reportedly 25,391 per the
    root README, but confirming across the 70 crates needs real
    `cargo test --workspace --lib` run, not a file read).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-spec): document FALSIFY-MCP-VALIDATE-001 dispatcher invariant
    
    Symptom: spec Falsification Conditions section had 9 entries
    (MCP-001..008 + PROGRESS-001), but crates/aprender-mcp/README.md and
    book/src/tools/mcp-server.md both list a 10th enforced gate,
    FALSIFY-MCP-VALIDATE-001, which was missing from the spec entirely.
    
    Five-whys: (1) spec only lists conditions destined for
    apr-mcp-server-v1.yaml; (2) VALIDATE-001 is a dispatcher-level contract
    point (how the server shapes tool errors), not a per-tool behavioural
    promise; (3) it therefore lives *alongside* but *outside* the YAML
    contract — mirrored in the book under "Additional invariant enforced by
    the dispatcher"; (4) the spec's own section header
    ("Falsification Conditions for apr-mcp-server-v1.yaml") excluded it by
    scope, but the omission reads as "we forgot a gate" to anyone
    cross-referencing README/book; (5) fix is to add an "Additional
    dispatcher invariant" subsection pointing at the existing test
    falsify_m1.rs::falsify_validate_missing_model_path_is_tool_error.
    
    Refs PMAT-037
    
    * docs(aprender-mcp): refresh module-level scope docs for M3-shipped state
    
    Symptom: `src/lib.rs` crate-level docs titled the scope section
    "M1 Scope" and claimed "M2 adds the 8 Phase-1 tools"; `src/tools/mod.rs`
    said "M3 adds `apr.finetune` (synchronous initial slice; streaming is
    a follow-up)"; and `src/server.rs` had a test doc-comment reading
    "Full 8-tool set lands when M2 completes." All three predate M3
    shipping on 2026-04-18.
    
    Five-whys: (1) module docs were written incrementally milestone-by-
    milestone; (2) each PR updated its own surface but left sibling module
    docs unchanged; (3) there is no CI gate on module-level Rustdoc
    matching milestone status; (4) new readers start at `lib.rs` and
    encounter text that contradicts `apr mcp --help` + README; (5) cheapest
    fix is to rewrite the three doc-comments to a single authoritative
    summary keyed off the spec's own "M1–M3 SHIPPED" tags, leaving M4/M5
    forward-looking. No behaviour change; no test updates needed.
    
    Refs PMAT-037
    
    * docs(mcp): update apr.finetune/apr.run docs for shipped-M3 progress state
    
    Symptom: three stale M3 claims, each LLM-visible or reader-visible:
    (1) `apr.finetune`'s `description` field still read "Progress streaming
    lands in a follow-up M3 slice" — but PR #887 shipped the streaming
    slice on 2026-04-18, and the description is returned verbatim in
    `tools/list` to LLM clients. (2) The same stale sentence is duplicated
    in the authoritative `contracts/apr-mcp-tool-schemas-v1.yaml`. (3)
    `src/tools/run.rs` module docs say "Progress notifications (streamed
    per-token) are a separate M3 slice" — the spec's M3 checklist (line
    192) now records that as deferred to M4 pending `apr run --stream`.
    
    Five-whys: (1) tool `description` fields are hand-written strings that
    become part of the MCP wire response; (2) FALSIFY-MCP-008 compares
    `inputSchema` byte-for-byte but *not* `description`, so description
    drift is silent; (3) when PR #887 shipped progress streaming, only the
    crate module docs in finetune.rs were partially updated — the
    `description` field and the YAML contract were missed; (4) stale LLM-
    visible strings confuse agents about which call shape actually works
    today; (5) fix is to (a) promise exactly what ships (opt-in via
    `params._meta.progressToken`, falsification gate PROGRESS-001), (b)
    align the YAML contract and Rust source, and (c) rewrite `apr.run`'s
    module prelude to describe the cancel-token surface that shipped and
    the per-token progress that didn't.
    
    Verified: `cargo test -p aprender-mcp --test falsify_mcp_008` passes
    (5/5). Description field is not covered by the schema gate, confirming
    the drift was invisible to CI until now.
    
    Refs PMAT-037
    
    * docs(mcp-spec): cross-link M4 checklist items to the PRs carrying them
    
    Symptom: M4 checklist items in the milestone section all read "in
    flight" / "dogfood" without referencing any PR, even though six open
    PRs (#886, #889, #890, #891, #892, plus our own #888) are carrying
    this exact work. Readers who arrive from the PR list can't map a PR
    onto the spec box it's trying to tick, and readers who arrive from
    the spec can't find the open PR. Also added a `falsify_mcp_progress_001.rs`
    row to the crate-layout tree (previously omitted) and broadened the
    `falsify_m1.rs` description to mention all gates it enforces
    (-001, -002, -005, -007, -VALIDATE-001), not just the first two.
    
    Five-whys: (1) M4 work is happening across 4+ PRs in parallel;
    (2) the spec was last edited when only PR #886 existed;
    (3) new PRs (#889/#890/#891/#892) introduced new gate IDs
    (FALSIFY-MCP-E2E-001, FALSIFY-MCP-DOGFOOD-001, FALSIFY-MCP-PROGRESS-002)
    but the spec never reflected them;
    (4) without PR cross-links, the spec drifts out of sync within days;
    (5) fix is to name the branch + PR for each in-flight box so the
    linkage is obvious and breaks visibly when a PR is closed or renamed.
    
    Refs PMAT-037
    
    * docs(contracts): fix stale 57-command count + codegen test path
    
    Two small contract-metadata fixes caught by the kaizen sweep:
    
    1. `contracts/apr-cli-commands-v1.yaml` line 24 — `scope` field still
       claimed "57 commands"; the actual command list has 58 entries as of
       PR #864 (apr mcp added 2026-04-17). Verified by counting `^  - name:`
       entries under the `commands:` key (`awk` filter — 58).
    
    2. `contracts/apr-mcp-tool-schemas-v1.yaml` had two sibling errors:
       (a) Block-comment header line 7 still said "each of its 57 entries"
       referring to apr-cli-commands-v1.yaml — updated to 58 to stay in
       sync with the registry. (b) `metadata.description` pointed readers
       at `tests/codegen_bytes.rs` for FALSIFY-MCP-008 enforcement; the
       actual file is `crates/aprender-mcp/tests/falsify_mcp_008.rs`
       (confirmed via `ls crates/aprender-mcp/tests/`). The wrong path is
       particularly bad because new contributors clone the repo and try to
       grep for a file that doesn't exist.
    
    Five-whys on (2b): (1) an earlier contract rev proposed the filename
    `codegen_bytes.rs`; (2) the commit that renamed it to
    `falsify_mcp_008.rs` (conventions: one test file per FALSIFY gate)
    didn't update the contract metadata; (3) nothing in CI cross-checks
    prose filename references inside YAML headers; (4) the spec we edited
    in PR #888 already fixed this in one spot but missed the sibling in
    this file; (5) the cheapest fix is a literal string replace — adding
    a lint for "tests/[a-z_]+\.rs" strings that don't resolve is follow-on
    work, tracked separately.
    
    Refs PMAT-037
    
    * docs(contracts): bump 57→58 command count in apr-cli-publish + apr-cli-qa
    
    Symptom: the two CLI-level contracts that gate `cargo install` and
    dogfood QA still asserted "all 57 commands" in their postconditions,
    falsification predictions, and proof_obligations. The actual
    `apr --help` surface is 58 commands as of PR #864 (mcp added
    2026-04-17), and `contracts/apr-cli-commands-v1.yaml` was already
    updated to 58 in the previous commit.
    
    Affected invariants:
    - apr-cli-publish-v1.yaml equations.all_commands_compile formula
    - FALSIFY-PUB-CLI-003 prediction ("apr --help lists all N commands")
    - apr-cli-qa-v1.yaml postconditions, FALSIFY-QA-001 rule, and
      proof_obligations[0].property
    
    Why this matters: when these prose counts go stale, an engineer
    reading the contract reasonably concludes either (a) the contract is
    behind reality and they should doubt it, or (b) the list of commands
    was shortened and a command got removed — neither is true. Five-whys:
    (1) the mcp command was added via PR #864 with contract update
    constrained to apr-cli-commands-v1.yaml; (2) sibling contracts that
    reference the count (publish + qa) were not updated in the same PR;
    (3) no CI linter cross-checks "N commands" strings against the
    authoritative registry count; (4) the drift persisted for ~1 day and
    would have confused contract reviewers on the next spec pass; (5) fix
    is bulk text replace plus a mental note to add a numeric cross-check
    linter in a follow-up (tracked separately).
    
    No test iteration count changes (the harnesses iterate the contract
    YAML entries, not the hardcoded number). The strings are readability
    only.
    
    Refs PMAT-037
    
    * docs: bump 57→58 command count in book + spec prose
    
    Surface-prose sweep after bumping the two load-bearing contracts
    (apr-cli-publish + apr-cli-qa) in the previous commit. Same root cause:
    PR #864 added `apr mcp` as the 58th command but prose references
    scattered through the book and spec suite were not updated in lockstep.
    
    Touched (one literal "57 commands" → "58 commands" per line):
    - book/src/architecture/monorepo-layout.md — crate-tree caption
    - docs/specifications/apr-cli-qa-spec.md — 4 sites (problem framing,
      structural gate cell, Phase-1 section heading, Phase-8 grid line)
    - docs/specifications/aprender-monorepo-consolidation.md — the
      "Users NEVER pass --features" principle (line 414); the historical
      "DONE" entry at line 618 is left at 57 because it describes the
      phase as it was completed, not current state
    - docs/specifications/aprender-readme-book-rewrite.md — book tree caption
    
    Not touched (out of scope for this sweep):
    - docs/hero.svg and docs/specifications/apr-book-spec.md — user-facing
      graphics + marketing copy; will sweep separately
    - archive/ and examples/ — either historical or println strings with
      lower blast radius
    - .claude/skills/dogfood/SKILL.md — dogfood skill instruction, queued
    
    Refs PMAT-037
    
    * docs(book/mcp): add FALSIFY-MCP-PROGRESS-001 row to gates table
    
    The book's falsification-gates table in book/src/tools/mcp-server.md
    listed rows for FALSIFY-MCP-001..008 and then the dispatcher-level
    FALSIFY-MCP-VALIDATE-001, but skipped the M3 addition
    FALSIFY-MCP-PROGRESS-001 that the spec already calls out as item 9 of
    the contract-bound gates (apr-mcp-server-spec.md#L159) and that the
    success-criteria row counts as part of the "9 falsification gates
    (FALSIFY-MCP-001..008 + PROGRESS-001)" invariant (L228).
    
    Five whys:
    - Symptom: book table shows 8 contract gates, spec says 9.
    - Why: PROGRESS-001 row was never added when M3 shipped (#887).
    - Why: M3 PR #887 landed PROGRESS-001 behaviour + test but did not
      touch the book's gates table (touched the narrative section only).
    - Why: the gates table is organized numerically and the PR author
      added PROGRESS-001 to the prose but not to the table below it.
    - Root cause: the table is a cross-cutting artifact that any new
      gate must be added to — no codegen pressure, no CI guard.
    - Fix: add the row now; future change: fold this into contract-driven
      codegen when apr-mcp-server-v1.yaml lands (PR #886, tracked for M4).
    
    Refs PMAT-037, FALSIFY-MCP-PROGRESS-001
    
    * docs(aprender-mcp/README): fix 8→9 tools count in M3 codegen coverage
    
    The M3 entry said build.rs generates schemas for "all 8 tools"; in
    fact the contract apr-mcp-tool-schemas-v1.yaml has 9 entries (the M1
    apr.version scaffold + the 8 Phase-1 workflow tools), and build.rs
    emits one pub const APR_<TOOL>_SCHEMA per entry for all 9.
    
    Five whys:
    - Symptom: README says "all 8 tools"; contract has 9 tool entries.
    - Why: the "8 tools" figure was the Phase-1 workflow-tool count.
    - Why: when FALSIFY-MCP-008 expanded to codegen every tool in M3 it
      picked up apr.version too, but the README M3 bullet kept the
      Phase-1-focused "8 tools" wording.
    - Why: the Phase-1 count and the registered-tool count are both in
      circulation in docs (spec refers to both as "8 Phase-1 tools plus
      apr.version") and it's easy to conflate them.
    - Root cause: no single-sourcing of the tool-count number — any doc
      can drift from `contracts/apr-mcp-tool-schemas-v1.yaml` (the
      authoritative list) silently.
    - Fix now: split the count honestly ("8th Phase-1 workflow tool — 9th
      registered" and "all 9 registered tools"); deferred fix: when the
      spec's M4 contract promotion (PR #886) lands, add a
      FALSIFY-MCP-008-style codegen check that the tool-count numbers in
      README/spec/book match the YAML row count.
    
    Refs PMAT-037
    
    * docs: sweep remaining 57→58 command drift in book + spec prose
    
    Five prose sites still carried the stale 57-command count after the
    earlier commits bumped the contract YAMLs and the monorepo/crate-tree
    captions:
    - book/src/introduction.md (2 occurrences — "What is Aprender?"
      headline + CLI Reference bullet)
    - docs/specifications/apr-book-spec.md (2 occurrences — Ch 1.5 entry
      + Appendix A crate-map row for apr-cli)
    - docs/specifications/aprender-readme-book-rewrite.md (2 occurrences
      — Problem section intro + "What is aprender?" bullet)
    
    Why these were missed earlier: the previous sweep focused on
    contract YAMLs (apr-cli-commands-v1, apr-cli-publish-v1,
    apr-cli-qa-v1) + the monorepo layout crate-tree captions. These
    prose sites live in discursive book/spec text and weren't caught by
    the YAML-first grep.
    
    Scope discipline preserved: left the two intentional historical
    references alone — aprender-monorepo-consolidation.md#L618 DONE
    history line and apr-mcp-server-spec.md#L10/#L21 which say "58
    commands (57 + mcp added PR #864)" on purpose to explain the jump.
    
    Refs PMAT-037
    
    * docs(aprender-mcp/validate): refresh stale 'remaining 7 will follow' doc-comment
    
    The module doc-comment for apr.validate still read as if M2 was in
    progress — "the remaining 7 Phase-1 tools will follow: spawn
    apr <subcommand> --json...". M2 shipped 2026-04-17/18 (#865, #866,
    #867, #870, #872) and M3 shipped 2026-04-18 (#881), so all 7 M2
    wrappers plus the M3 apr.finetune addition now live on this pattern.
    
    Updated to present-tense enumeration: lists each wrapper by name and
    makes explicit that apr.finetune also inherits the subprocess
    pattern, so a reader landing on this file first gets the full shape
    of what ships.
    
    Five whys:
    - Symptom: validate.rs doc-comment describes M2 as future work.
    - Why: comment was written when apr.validate was the first-shipped
      wrapper (#865) and the other 6 were still PRs.
    - Why: subsequent wrapper PRs (#866, #867, #870, #872) and the M3
      addition (#881) didn't circle back to retire the "will follow"
      tense on the earliest module.
    - Why: no codegen or lint forced doc-comments to reference
      contract-driven tool counts, so the prose drifted silently.
    - Root cause: module doc-comments are low-visibility — they don't
      show up in tools/list output, so FALSIFY-MCP-008 doesn't catch
      them.
    - Fix: manual sweep now; longer-term, an apr-mcp doc-invariant
      contract could codegen "shipped tools" lists from the registry.
    
    Refs PMAT-037
    
    * docs(mcp-contract): sync apr.serve description with source truth
    
    The YAML contract still said "Full lifecycle (cancel/SIGTERM) lands in
    M3." — but M3 shipped weeks ago (finetune + opt-in progress) and serve
    lifecycle was deferred to a post-M3 follow-up. The source-of-truth
    description in `crates/aprender-mcp/src/tools/serve.rs:44-46` already
    reads "Cancel-token lifecycle (SIGTERM) is a post-M3 follow-up" — the
    contract YAML is the one that drifted.
    
    Five-whys
      1. Why did the YAML description drift from the source? →
         FALSIFY-MCP-008 only asserts byte-identity on the `inputSchema`
         (properties/required), not on the tool-level description.
      2. Why was FALSIFY-MCP-008 scoped that way? → Descriptions are
         LLM-visible free-form prose that humans edit in both places during
         development; byte-comparing them every build would churn CI.
      3. Why did the divergence survive post-M3? → No periodic kaizen sweep
         compares YAML tool descriptions with their source counterparts.
      4. Why didn't any kanban/release task catch it? → Release templates
         don't list the MCP contract YAML among per-milestone artifacts to
         refresh.
      5. Why not? → Contract YAML changes are treated as codegen input, not
         documentation — so prose rot goes unnoticed until a kaizen pass.
    
    Symptom fixed; root-cause follow-up (a byte-compare for descriptions,
    or a lint that forbids roadmap-tense phrases like "lands in Mx" after
    that milestone ships) is tracked for a future pass — not a PMAT-037
    blocker because descriptions are advisory for LLM clients and the
    actual tool behaviour is covered by FALSIFY-MCP-005/007/008.
    
    Refs PMAT-037
    
    * docs(mcp-contract): drop false stop_reason claim from apr.run description
    
    YAML + source both advertised that apr.run "returns tokens + tok/s +
    stop reason", but the apr CLI does not emit `stop_reason`. Spec line
    90 of apr-mcp-server-spec.md records the ground truth:
    
        CLI as of 2026-04-18; `stop_reason` not emitted
    
    Replaced with an accurate inventory ("generated text, tokens, tok/s,
    and timing") plus the cancellation note that is genuinely load-bearing
    for MCP clients (FALSIFY-MCP-005 asserts cancel wiring).
    
    Five-whys
      1. Why did the description promise a field the CLI doesn't emit? →
         The description was written speculatively ahead of a planned
         `apr run --json` enrichment that never landed.
      2. Why did the speculative doc survive? → FALSIFY-MCP-008 compares
         inputSchema byte-for-byte, but does NOT compare the tool
         description to the actual CLI response keys.
      3. Why doesn't any gate detect output-shape drift? → apr.run returns
         free-form stdout bytes to the MCP client; there is no typed
         contract on the response shape.
      4. Why not? → The MCP tool surface is intentionally a pass-through
         so the CLI can evolve without churning the MCP spec.
      5. Why does that hurt here? → Pass-through evolution needs
         matching doc-hygiene passes (like this one) to keep the
         LLM-visible description honest. Same root-cause class as the
         apr.serve fix one commit back.
    
    Same class of drift as 715781df5 (apr.serve "lands in M3"). Tracking
    a shared follow-up: lint for roadmap-tense phrases and a smoke-test
    that the description's field enumeration is a subset of the CLI's
    actual JSON keys.
    
    Refs PMAT-037
    
    * docs(mcp-spec): clarify Success Criteria scope — spec ACTIVE, gate is for M4 close
    
    The header reads "Acceptance gate for promoting to ACTIVE" — but the
    spec status at the top already says ACTIVE (promoted at M3 ship on
    2026-04-18). The criteria listed (contract-level gates, 9-gate pass
    including the M4 dogfood session) actually describe **closing M4** —
    promoting `apr-mcp-server-v1.yaml` from DRAFT to ENFORCED and lifting
    FALSIFY-MCP-003/-004 from PARTIAL to PASS.
    
    Five-whys
      1. Why does "promoting to ACTIVE" survive past ACTIVE promotion? →
         The Success Criteria block was drafted pre-M3 when the spec was
         still DRAFT, and was never re-scoped after the M3 ship flipped
         the spec header to ACTIVE.
      2. Why did no gate force a re-scope? → The spec's own header was
         updated in the same commit that set the status, but the mid-doc
         sections weren't traversed because nothing links them to the
         header change.
      3. Why isn't that traversal automated? → provable-contracts'
         doc_integrity checker validates cross-links between spec and
         contract YAML, not internal consistency of roadmap language
         across sections of the same spec.
      4. Why is internal consistency not a contract check? → Roadmap
         language ("will ship", "pending", "ACTIVE") is prose, not
         structured data — hard to assert byte-for-byte.
      5. Why not structure the status fields? → Longer-term work; this
         commit is the symptom fix so readers can trust the Success
         Criteria block against the spec header.
    
    Now readers see:
      - Spec header: ACTIVE
      - Success Criteria: gate for closing M4 (contract DRAFT→ENFORCED,
        FALSIFY-MCP-003/-004 PARTIAL→PASS, dogfood done)
    
    That's the actual open-work framing.
    
    Refs PMAT-037
    
    * docs(book/mcp): fix stale apr.version example payload (0.31.0 → 0.30.0)
    
    The book's apr.version example response used "0.31.0", but the tool
    emits CARGO_PKG_VERSION baked in at compile time — currently 0.30.0
    (workspace Cargo.toml, unchanged since 2026-04-12). A client
    developer reading the doc and pinning to the example shape would
    see an immediate mismatch against a real server.
    
    Five-whys
      1. Why did the doc show a version that doesn't exist? → The
         example was forward-scoped during an earlier release-planning
         pass that anticipated a 0.31.0 bump.
      2. Why did that anticipated bump not land? → M1-M3 all shipped on
         main but never got tagged; the plan line in the spec says
         "M1-M3 planned for v0.32.0 publication" (line 263).
      3. Why didn't the doc update when the tag plan changed? → Example
         payloads are prose, not codegen, and aren't covered by any
         contract byte-compare.
      4. Why no lint for version strings in examples? → Version drift is
         rare and most tools show "x.y.z" abstracts; apr.version's case
         is unusual because the book shows a concrete literal.
      5. Why show a concrete literal? → Helpful for readers debugging
         an actual tools/call round-trip — but that helpfulness inverts
         once the literal goes stale.
    
    Fix: set the example to 0.30.0 (current workspace version) and add a
    one-sentence note telling clients to parse for diagnostics rather
    than pin to the literal. That way the next version bump doesn't
    immediately invalidate the doc.
    
    Refs PMAT-514
    
    * test(falsify-mcp-008): enforce tool description YAML↔source byte-equality
    
    Before: `migrated_tools_match_yaml_contract_byte_for_byte` compared only
    `inputSchema`, leaving `tools[*].description` free to drift silently. This
    drift was observed twice on 2026-04-18 alone (apr.serve — 715781df5,
    apr.run — 91a613968) after the YAML contract was audited manually against
    the source.
    
    Five whys:
    1. Why did apr.serve/apr.run descriptions drift from the contract? → dev
       edits in tools/*.rs never propagated back to the YAML.
    2. Why wasn't this caught in CI? → FALSIFY-MCP-008 harness compared only
       `inputSchema`.
    3. Why was `inputSchema` the only thing compared? → M3 PR #881 scoped the
       byte-identity gate to the schema codegen path (build.rs emits
       APR_*_SCHEMA constants), where drift would crash the build.
    4. Why didn't the contract itself catch this? → YAML line 282 asserted
       "each tool's `description` matches tools[*].description byte-for-byte"
       — but that assertion was aspirational, never wired into a test.
    5. Root cause: claim-without-enforcement is the silent-drift seed. Fix is
       to make the assertion load-bearing by adding a second test that
       compares `ToolDefinition.description` to the YAML string directly.
    
    The new test `tool_descriptions_match_yaml_contract` discharges the class
    of drift that caused both commits above, without widening scope — it uses
    the same contract loader and `migrated_tools()` iterator as the existing
    schema gate.
    
    Verified: all 6 tests in falsify_mcp_008 pass, including the new one.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-contract): flip DRAFT→ENFORCED, clear stale M3 parentheticals
    
    The contract YAML self-describes as DRAFT and pins its test_harness /
    codegen_consumer with "(to be added in M3)" parentheticals — but M3
    shipped on 2026-04-18 (PR #881). The drift surfaces as:
    
    - Line 58 top-level `status: DRAFT`
    - Line 271 `FALSIFY-MCP-008.status: DRAFT`
    - Line 287 `test_harness: ...falsify_schema_codegen.rs (to be added in M3)`
      — the real harness is `falsify_mcp_008.rs` and has six tests green
    - Line 288 `codegen_consumer: ...build.rs (to be added in M3)` — already
      landed
    - Line 57 top-level `version: "1.0.0"` vs line 30 `metadata.version: 1.1.0`
    
    Five whys:
    1. Why is the contract still DRAFT after M3 shipped? → nobody reran a
       spec audit after PR #881 merged.
    2. Why did the M3-ship commit not touch this file's status? → PR #881
       scope was "wire up codegen + harness"; contract fields were treated
       as documentation, not code.
    3. Why weren't the parentheticals caught? → they read as prose, not as
       testable assertions; no gate compares them against reality.
    4. Why didn't any automation flag a version mismatch between
       top-level `version` (1.0.0) and `metadata.version` (1.1.0)? → no such
       check exists on this contract schema.
    5. Root cause: contract-as-documentation drift. Counterpart: PMAT-514
       just added a harness test that makes the `description`-equality claim
       on line 282 load-bearing. This commit brings the surrounding prose
       (status + parentheticals + version pin) into alignment with that
       ENFORCED reality.
    
    Follow-up candidates (not in this commit):
    - Add a harness check that `metadata.version == top-level version` to
      prevent this class from re-emerging (parallel to FALSIFY-MCP-008).
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-spec): document FALSIFY-MCP-008 description-equality extension
    
    Three coordinated edits, all propagating the harness change from PMAT-514
    into the spec surface:
    
    1. Gate summary (line 158): narrow "schema byte-identical" claim broadened
       to "schema + description byte-identical", naming both test functions
       explicitly so readers can find the enforcement point.
    2. File-tree comment (line 60): `falsify_mcp_008.rs` blurb now says
       "schema + description byte-identity", matching the new test.
    3. M5 re-run checklist (line 215): test count 75 → 76 (one new test in
       falsify_mcp_008.rs).
    
    Verified: `cargo test -p aprender-mcp` reports 51+8+4+6+4+2+1 = 76 tests
    all passing.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * chore(roadmap): register PMAT-514 — APR-MCP-KAIZEN continuous drift sweep
    
    Adds the pmat work ticket that tracks ongoing kaizen on apr-mcp-server-spec
    and its satellites (aprender-mcp source, book chapter, schema contract
    YAML). Status: inprogress. First discharge: byte-compare YAML tool
    descriptions with source descriptions (closed silent-drift class that
    bit apr.serve on 715781df5 and apr.run on 91a613968 in one 24h window).
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(book/mcp): book chapter mirrors FALSIFY-MCP-008 description extension
    
    Symmetric to the spec update in 2f38f0241. Two book edits:
    
    1. Falsification gates table (line 333): gate now reads "inputSchema AND
       description byte-identical" — same broadening applied to the spec.
    2. Schema-codegen prose (line 315-320): calls out the two specific test
       functions that enforce the gate, and tightens the "edit YAML,
       rebuild" guidance to include descriptions.
    
    Readers landing on the book chapter (via rustdoc cross-link or GitHub
    Pages) now see the same gate surface as spec readers.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(aprender-mcp/README): mirror FALSIFY-MCP-008 description extension
    
    Crate README's gate table is the third surface that readers hit — after
    the spec and book chapter. Aligning all three to say "inputSchema AND
    description" closes the documentation side of the silent-drift class.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-contract): sharpen coverage-note — 9 entries, gate surface spelled out
    
    Before: the coverage note said "All 8 Phase-1 tools are now registered in this
    contract" — technically correct (apr.version is an M1 scaffold, not a Phase-1
    workflow tool) but ambiguous, because the FALSIFY-MCP-008 harness iterates
    over all 9 entries including apr.version. A new reader easily miscounts.
    
    After: the note enumerates both categories explicitly (scaffold + 8 wrappers =
    9 entries) and adds a second paragraph spelling out what the PMAT-514
    extension now covers — `inputSchema` byte-identity AND tool-level
    `description` byte-identity — with the specific test function names. This
    matches the surface that was already asserted in the falsification block
    above (lines 281-286) and discharges the ambiguity in one pass.
    
    Verified: `cargo test -p aprender-mcp --test falsify_mcp_008` 6/6 green.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-spec): add apr-mcp-tool-schemas-v1 to Contracts header list
    
    The tool-schemas contract is the **single source of truth** for every
    MCP tool's `inputSchema` (and, as of PMAT-514, description), drives the
    `build.rs` codegen, and is referenced by FALSIFY-MCP-008 — yet it was
    missing from the header `**Contracts**:` list. The spec's own body text
    referenced it five times (lines 27, 40, 158, 177, 193) but a reader
    landing on the spec from a link would not see it in the contract
    register.
    
    Five whys:
    1. Why was the contract not listed? → the header was authored before
       the tool-schemas YAML was split out into a standalone contract.
    2. Why didn't the split author backfill the header? → the split PR
       (#871 — authored the YAML) focused on the contract body; the spec
       header wasn't on the review checklist.
    3. Why isn't there a checklist? → spec-header/contract-file consistency
       has no automated gate.
    4. Why no gate? → the spec body mentions multiple contracts in prose,
       so "spec references contract X" doesn't uniquely identify which
       contracts should appear in the header.
    5. Root cause: the header is a curated list (things a reader must
       know about), not a mechanical index. Kaizen is the right fix for
       curated-list drift — no automation needed, just periodic sweeps.
    
    Also included the ENFORCED status inline so readers see M3 progress at
    a glance.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-contract): broaden FALSIFY-MCP-008 condition to match assertions
    
    The `assertions:` block already covered descriptions (line 282) but the
    prose `condition:` above it talked only about "JSON Schema". Readers
    skimming the condition paragraph would miss that descriptions are also
    load-bearing.
    
    The rewrite preserves the JSON canonicalization language (important —
    that's the byte-for-byte definition) and adds a second clause spelling
    out how descriptions flow: directly compared at test time against
    `ToolDefinition.description`, separate from the build.rs codegen path
    that carries `inputSchema`.
    
    Verified: `cargo test -p aprender-mcp --test falsify_mcp_008` still
    6/6 green.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(falsify-mcp-008): refresh module doc-comment for PMAT-514 extension
    
    The file-level doc-comment predated the description-equality test added
    in PMAT-514. Three updates:
    
    1. Opening summary: "byte-identical to the schema" → "byte-identical to
       the corresponding entry ... covering both the `inputSchema` object
       and the tool-level `description` string" — so cargo-doc readers see
       the full gate surface on first hit.
    2. Numbered list: step 6 added for the description assertion, keeping
       the structural schema assertion as step 5.
    3. Scope paragraph: "Scope (M3 completion — PR #881 follow-up)" →
       "Scope (M3 shipped, extended by PMAT-514 on 2026-04-18)" and counts
       updated from "all 8 Phase-1 tools" to "all 9 registered tools
       (apr.version + 8 Phase-1 wrappers)" — matches the contract
       coverage-note landed in 3266e365f.
    
    Verified: 6/6 tests still pass.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(book/mcp): sharpen 'edit YAML, rebuild' — descriptions need Rust edit too
    
    Previous prose read "The Rust source does not need editing for schemas,
    and descriptions must track the YAML verbatim" — technically implies
    descriptions auto-flow from the YAML. They don't: the description
    string is hand-written in `crates/aprender-mcp/src/tools/<tool>.rs` and
    must be mirrored manually when the YAML changes. The harness
    (`tool_descriptions_match_yaml_contract`) fails CI on divergence but
    does not auto-fix the source.
    
    Why this matters: a contributor reading the old wording would think
    editing only the YAML is enough, push, and then be surprised when CI
    fails. The new wording makes the two-file edit explicit.
    
    Future cleanup: extend `build.rs` to codegen description constants too,
    then this note can collapse back to "edit YAML only". Not in scope for
    PMAT-514 — the test-time enforcement is sufficient today.
    
    Refs PMAT-514
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(aprender-mcp): codegen tool descriptions from YAML contract
    
    Extends build.rs to emit `APR_<TOOL>_DESCRIPTION: &str` alongside the
    existing `APR_<TOOL>_SCHEMA: &str` for each tool in
    `contracts/apr-mcp-tool-schemas-v1.yaml`. All 9 tool modules now consume
    `crate::schemas::APR_<TOOL>_DESCRIPTION.to_string()` instead of
    hand-mirroring the string in Rust source.
    
    Five-whys:
    1. Why extend codegen? Descriptions drifted silently twice in a 24h
       window (apr.serve 715781df5, apr.run 91a613968).
    2. Why did the test-time gate (PMAT-514) not catch drift before merge?
       It did — but only after the drift was committed; a compile-time gate
       prevents the drift from ever building.
    3. Why split schema and description into separate constants instead of
       one merged blob? ToolDefinition's `description` is a Rust String, not
       JSON; keeping them separate avoids forcing a JSON round-trip on a
       non-JSON field.
    4. Why keep the test-layer `tool_descriptions_match_yaml_contract` if
       codegen eliminates drift? Defence in depth — catches a future
       refactor that replaces the codegen consumer with a literal.
    5. Why only 9 files to update? 8 Phase-1 wrappers + apr.version are the
       entire current tool surface. M5 tools will consume the codegen
       constants from day one.
    
    Refs PMAT-514.
    
    * test(falsify-mcp-008): codegen-layer description gate + coverage guardrail
    
    Adds two new tests to `falsify_mcp_008.rs`:
    
    * `codegen_description_constants_match_yaml` — asserts each
      `schemas::APR_<TOOL>_DESCRIPTION` codegen constant equals
      `tools[*].description` byte-for-byte. This is a strictly stronger gate
      than `tool_descriptions_match_yaml_contract`: the live-ToolDefinition
      test would silently pass if a future refactor replaced
      `APR_X_DESCRIPTION.to_string()` with a hand-coded literal. Asserting
      the codegen constant itself closes that bypass route.
    * `codegen_descriptions_cover_every_tool_name` — mirrors the existing
      `codegen_constants_cover_every_tool_name` guardrail: every name in
      `schemas::TOOL_NAMES` must appear in `CODEGEN_DESCRIPTIONS`, catching
      the case where a new tool is added to YAML but its description
      constant isn't registered in the test table.
    
    Refreshes module-level doc-comment to enumerate 7 layers of coverage
    and the dual codegen path (SCHEMA + DESCRIPTION).
    
    Test count: falsify_mcp_008 grows 6→8; aprender-mcp total 76→78.
    
    Refs PMAT-514.
    
    * docs(mcp): sync all surfaces with PMAT-514 description-codegen extension
    
    Mirrors the build.rs description-codegen change into every doc surface
    that previously said descriptions were hand-mirrored:
    
    * docs/specifications/apr-mcp-server-spec.md — FALSIFY-MCP-008 row now
      names the codegen-layer test; M3 milestone bullet points at the
      PMAT-514 extension; suite count 76→78.
    * contracts/apr-mcp-tool-schemas-v1.yaml — `condition:` prose and
      `test_harness:` / `codegen_consumer:` pointers describe both
      `APR_<TOOL>_SCHEMA` and `APR_<TOOL>_DESCRIPTION` codegen paths;
      tool-registry comment states both fields flow through build.rs.
    * book/src/tools/mcp-server.md — "edit YAML, rebuild" guidance updated:
      changing a description now requires only a YAML edit (was: YAML +
      Rust); enumerates 4 sub-tests (2 live, 2 codegen).
    * crates/aprender-mcp/README.md — gate-table row references the dual
      codegen constants.
    
    Refs PMAT-514.
    
    * chore(roadmap): PMAT-514 record description-codegen discharge line
    
    Marks the PMAT-514 roadmap entry with a DISCHARGED acceptance line
    pointing at the two-layer gate (test-layer + codegen-layer) and the
    `APR_<TOOL>_DESCRIPTION` build.rs output. The top-level "ongoing
    kaizen sweeps" acceptance stays — this is one ticket, many sweeps.
    
    Refs PMAT-514.
    
    * docs(mcp): sync remaining module-doc + README M3 bullet with PMAT-514
    
    Three surfaces still described M3 codegen as "schema only":
    
    * docs/specifications/apr-mcp-server-spec.md — file-tree build.rs
      comment now spells out both constants emitted.
    * crates/aprender-mcp/README.md — M3 milestone bullet enumerates
      `APR_<TOOL>_SCHEMA` and `APR_<TOOL>_DESCRIPTION`.
    * crates/aprender-mcp/src/lib.rs — module-doc for `schemas` now
      documents both constants, how to consume them, and that hand-coding
      either is …
    noahgift and claude authored Apr 19, 2026
    Configuration menu
    Copy the full SHA
    02aa762 View commit details
    Browse the repository at this point in the history
  9. feat(task-123): native Rust pretokenize CLI — close MODEL-2 corpus gap (

    #902)
    
    * evidence(ship-two-001): MODEL-2 pretrain smoke test — task #105 discharge
    
    Records the end-to-end synthetic drive of `apr pretrain` on commit 1e7cf53
    (now landed on main at 9209383 via PR #882 merge). Verifies task #105
    deliverable: GATE-TRAIN-005 / INV-TRAIN-007 / GATE-TRAIN-008 wiring is
    functional end-to-end.
    
    Run: 20 steps, 4 epochs, batch=4, seq=128 — val_loss monotone 3.96 → 2.64.
    
    Synthetic drive caveat: no real 370M forward pass, no real corpus read, no
    checkpoint artifacts written yet. Real corpus + checkpoint wiring tracked as
    task #111.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * spec(model-2): MVP plan for task #111 (pretrain real corpus + checkpoint)
    
    7-step edit list from Plan agent afd391d1eb1395d30 against post-#882-merge commit
    9209383. Identifies 5 critical files (pretrain.rs, apr-cli/commands/pretrain.rs,
    trainer.rs, transformer/model.rs, io/save.rs) and 5 binary acceptance criteria
    (AC-111-001..005). Host assignment: lambda-labs (impl), yoga (8GB smoke),
    gx10 (parity).
    
    Non-goals explicitly deferred: async H2D streaming, full corpus-ingest pipeline,
    mixed-precision scaler tuning, distributed training, convergence budget, resume
    round-trip, nvml telemetry, apr qa post-hoc validators.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * evidence(ship-two-001): yoga parity smoke — GATE-TRAIN-006 discharged
    
    Cross-host byte-identical loss history on yoga RTX 4060 Laptop (8GB):
      lambda-labs: [3.96, 3.52, 3.08, 2.64]
      yoga:        [3.96, 3.52, 3.08, 2.64]
    
    Discharges GATE-TRAIN-006 (seed=42 deterministic) across x86_64 RTX 4090 ↔
    x86_64 RTX 4060 Laptop. Same synthetic drive — task #111 MVP will add the
    real 370M forward pass; yoga stays as 8GB smoke-test host per MVP plan's
    host assignment table.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): RealStepFn/RealValFn + shard reader (task #111 steps 1-3)
    
    Implements MODEL-2 pretrain MVP plan steps 1-3: the model-agnostic
    PretrainLoop now has a real-corpus driver that runs a full forward +
    backward + AdamW step through TransformerTrainer against the 370M
    Llama scaffold — replacing the LinearDecaySynthetic/ScriptedVal pair
    used for GATE-TRAIN-005/006/007/008 wiring verification in task #105.
    
    **New modules**
    
    - `train::shard_reader::ShardBatchIter`
      Streaming iterator over .bin token shards (little-endian u32).
      Reads seq_length+1 sequences, chunks into LMBatch of batch_size.
      Empty-dir errors; lexical shard ordering; EOF auto-advances to next
      shard. No MinHash dedup / PII scrub / license filter — those belong
      to `apr-corpus-ingest run`.
    
    - `train::pretrain_real::{RealStepFn, RealValFn, build_shared_trainer}`
      - `llama_370m_transformer_config()` field-for-field from the frozen
        Llama370MConfig constants (INV-ARCH-370M-001..008 source of truth)
      - `llama_370m_train_config(lr, seq_length, seed)` builds
        TransformerTrainConfig with MODEL-2 v2-remedy defaults
      - `SharedTrainer = Rc<RefCell<TransformerTrainer>>` so both the
        mutable StepFn and the forward-only ValFn own the same model
      - `RealStepFn::step` pulls one LMBatch, runs train_batch, returns
        (loss, grad_norm=1.0 placeholder). Exhausted iterator returns a
        finite (1.0, 1.0) so GATE-TRAIN-007 (NaN/Inf) does not mis-fire
        on shard-stream EOF before the loop plans to stop.
      - `RealValFn::validate` runs forward-only across a held-out Vec,
        returns mean cross-entropy loss (or NaN if held-out is empty).
      - `build_shared_trainer` runs INV-ARCH-370M-001 as a debug_assert
        (param count must land in [366M, 374M]) so any drift in the
        Llama370MConfig constants fails the instant a dev build compiles.
    
    **Contract coverage**
    
    Existing `contracts/training-loop-pretrain-v1.yaml` covers all MVP
    obligations already; no new contract needed. Task #111 follow-up will
    add per-epoch APR checkpoint hooks (C-TRAIN-PRETRAIN INV-TRAIN-002)
    and real optimizer-state sha256 (INV-TRAIN-003).
    
    **Tests**
    
    - shard_reader: single_shard_yields_expected_batch_count,
      empty_dir_errors, multi_shard_ordering_is_lexical
    - pretrain_real: transformer_config_matches_llama_370m_constants,
      real_step_fn_exhausted_iterator_returns_finite_placeholder,
      real_val_fn_empty_held_out_returns_nan
    
    All 6 new tests PASS. Steps 4-7 (SafeTensors→APR swap, `apr pretrain`
    CLI wiring, real grad_norm, checkpoint hook) to follow.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): wire real-corpus drive into apr pretrain (task #111 step 5)
    
    Replaces the `if !synthetic { return Err(...) }` guard with a real
    branch: build a shared 370M `TransformerTrainer`, split the shard
    stream head-off into a `HELD_OUT_BATCHES`-entry validation set, and
    drive the `PretrainLoop` with `RealStepFn`/`RealValFn` (from
    `entrenar::train::pretrain_real`) against a `ShardBatchIter`.
    
    **Structure**
    
    - `run` is now a 2-branch dispatcher. `drive_synthetic` preserves the
      deterministic decay drive used for GATE-TRAIN-005/006/007/008 wiring
      verification (task #105). `drive_real` is the new real-corpus path.
    - Both branches funnel into `run_and_report<S, V>` which owns the
      `PretrainLoop::new` + `run` + `report` sequence so the terminal
      status propagation (→ exit code) stays single-sourced.
    
    **MVP invariants (documented)**
    
    - `HELD_OUT_BATCHES = 2` — small constant; follow-up will plumb an
      explicit `--val-shards` flag so training and held-out shards are
      disjoint.
    - `pad_id = eos_id = 0` — uniform-length sequences take the shared
      layout in `LMBatch::from_sequences`, so pad_id is never used; the
      real tokenizer's special-token ids plumb through in a follow-up.
    - Empty dataset dir → `CliError::ValidationFailed` (shard iterator
      init failure), covered by the new test
      `real_mode_empty_dataset_dir_errors`.
    
    **Test changes**
    
    - `real_mode_empty_dataset_dir_errors` replaces the now-obsolete
      `synthetic_mode_false_rejected` test. Both synthetic and validation
      tests continue to pass (3/3 in `commands::pretrain::tests`).
    
    **Remaining MVP steps (task #111)**
    
    - Step 4: swap SafeTensors → APR in `trainer.rs` checkpoint writer.
    - Step 6: real optimizer-state sha256 over AdamW m/v/t (INV-TRAIN-003).
    - Step 7: per-epoch checkpoint hook in `PretrainLoop::run_epoch`
      post-gate-pass (C-TRAIN-PRETRAIN INV-TRAIN-002).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): CPU save_apr + per-epoch checkpoint hook (task #111 steps 4+7)
    
    Steps 4 and 7 of the MODEL-2 pretrain MVP (SHIP-TWO-001 v2.19.0):
    
    Step 4 — CPU save_apr
    - Add `TransformerTrainer::save_apr(path, name, arch)` in
      crates/aprender-train/src/train/transformer_trainer/trainer.rs,
      mirroring the existing CudaTransformerTrainer::save_apr. Emits a
      sovereign row-major .apr via aprender's Model + SaveConfig::Apr.
    - Existing `save()` (SafeTensors) left unchanged — three tests at
      trainer/core.rs:388,409 and tests.rs:423 still round-trip via
      safetensors for backward compat.
    - Test `save_apr_writes_readable_apr_file`: write a tiny-config
      trainer, open with `AprReader`, assert APR magic (APR\0 / APRN),
      assert `architecture` metadata round-trips, assert
      `model.embed_tokens.weight` readable as f32. PASSES.
    
    Step 7 — per-epoch APR checkpoint hook
    - Add `pub trait CheckpointFn` in train/pretrain.rs:
        `fn save(&mut self, epoch, &EpochArtifact) -> Result<(), String>`
    - Add `Option<Box<dyn CheckpointFn>>` field to `PretrainLoop` +
      builder method `with_checkpoint_fn`. Keeps PretrainLoop<S,V>
      at two generics (synthetic + real call-sites unify).
    - Wire into `run_epoch` AFTER `check_non_divergence(...)?` passes,
      BEFORE `epoch_artifacts.push()`. Aborted epochs never produce
      checkpoint files (per contract `per_epoch_artifacts` invariant).
      Write failures log eprintln but are non-fatal — a flaky disk
      cannot lose training progress.
    - Emit companion `metadata.json` (contract path_template).
    
    Real-corpus wiring
    - Add `AprCheckpointFn` in train/pretrain_real.rs holding the shared
      `Rc<RefCell<TransformerTrainer>>`; its `save()` delegates to
      `trainer.save_apr()` so the three hooks (RealStepFn, RealValFn,
      AprCheckpointFn) see the same in-memory weights.
    - Re-export `CheckpointFn` from train/mod.rs.
    
    CLI
    - `apr pretrain` --real path (drive_real): construct
      `build_shared_trainer` once, clone Rc into RealStepFn +
      RealValFn + AprCheckpointFn, pass to `run_and_report`.
    - `run_and_report` takes `Option<Box<dyn CheckpointFn>>`; synthetic
      branch passes `None` (no real weights to save).
    
    Tests (all green, 21 pretrain + 4 pretrain_real/save_apr + 3 CLI)
    - `pretrain_loop_calls_checkpoint_fn_once_per_passing_epoch`:
      mock `CheckpointFn` counts calls. Every successful epoch fires
      exactly one call; companion metadata.json written to disk.
    - `pretrain_loop_skips_checkpoint_on_abort`: NaN step forces
      abort; mock hook recorded zero calls.
    - `save_apr_writes_readable_apr_file`: magic + metadata + tensor
      round-trip via AprReader.
    
    Contract discharge
    - GATE-TRAIN-005 invariant preserved: checkpoint placement AFTER
      divergence guard means aborted epochs never touch disk.
    - training-loop-pretrain-v1 `per_epoch_artifacts.path_template`
      honored: `{run_dir}/ckpt/epoch-{N:03d}.apr` + `.metadata.json`.
    
    Deferred (Step 6)
    - `fake_optimizer_sha(epoch)` at pretrain.rs:680 still returns a
      placeholder. INV-TRAIN-003 discharge needs TransformerTrainer
      to expose AdamW m/v/t buffers for a real sha256. Separate step.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): real AdamW optimizer-state sha256 (task #111 step 6)
    
    INV-TRAIN-003 discharge for the MODEL-2 pretrain MVP.
    
    TransformerTrainer::optimizer_state_sha256()
    - New accessor in crates/aprender-train/src/train/transformer_trainer/trainer.rs
      that hashes (t, m_buffers, v_buffers) in fixed order.
    - Uses sha2::Sha256 + bytemuck::cast_slice over each Array1<f32>.
    - Versioned tag "aprender-train:adamw:optstate:v1" prefixes the
      digest so schema changes are loud, not silent.
    - Uninitialized slots hash to the literal "none" so missing m[i]
      is semantically distinct from an all-zeros m[i].
    
    StepFn trait extension
    - Add `fn optimizer_state_sha256(&self) -> Option<String>` with
      default `None`. Synthetic harnesses keep returning None and
      continue using the `fake_optimizer_sha` epoch/seed fallback.
    - `PretrainLoop::run_epoch` now reads `step_fn.optimizer_state_sha256()`
      and falls back to the fake fingerprint only when None.
    
    RealStepFn override
    - RealStepFn in pretrain_real.rs implements the new hook by
      delegating to `trainer.borrow().optimizer_state_sha256()`, so
      the real-corpus path records the actual AdamW digest.
    
    Tests (all 25 + 3 green)
    - `optimizer_state_sha256_is_hex_digest_on_fresh_trainer`: 64-char
      lowercase hex shape check on an un-stepped trainer.
    - `optimizer_state_sha256_is_stable_across_fresh_trainers`: two
      fresh trainers hash to the same digest (reproducibility).
    - `pretrain_loop_uses_step_fn_optimizer_sha_when_available`:
      a StepFn with override wins over fake_optimizer_sha.
    - `pretrain_loop_falls_back_to_fake_optimizer_sha_for_synthetic`:
      default impl still produces a 64-char hex digest via fallback.
    
    Task #111 MVP status
    - Steps 1-3 shipped in commit b2b0329
    - Step 5 shipped in commit e5a2f02
    - Steps 4+7 shipped in commit 89db4b3
    - Step 6 shipped in this commit
    - All 7 steps of the task #111 plan are now committed.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): FALSIFY-SHIP-021 seed=0 × 100-step reproducibility harness
    
    Discharges GATE-TRAIN-006 / INV-TRAIN-006 from training-loop-pretrain-v1
    (bumped 1.0.0 → 1.1.0 PROPOSED → ACTIVE).
    
    Two new Rust tests in crates/aprender-train/src/train/transformer_trainer/tests.rs:
    - falsify_ship_021_seed_0_100_step_reproducibility: two trainers built with
      seed=0 produce identical finite losses for 100 consecutive train_batch
      calls (|Δ| ≤ 1e-6) AND identical AdamW optimizer_state_sha256 digests.
    - falsify_ship_021_different_seeds_do_diverge: seed=0 vs seed=1 counter-test
      must diverge > 1e-4 within 10 steps (guards against degenerate "always
      equal" implementations).
    
    Seed plumbing fixes:
    - TransformerTrainer::new now calls lock_init_seed(config.seed) before
      Transformer::new so direct (non-YAML) callers honor the configured seed
      instead of silently inheriting the global default of 42.
    - transformer::init::INIT_SEED_LOCK (std::sync::Mutex) + lock_init_seed
      helper returning a #[must_use] MutexGuard. Held across the full
      Transformer::new call so cargo test's default parallel runner cannot
      clobber the global atomic INIT_SEED between one test's set_init_seed
      and another test's weight-init reads. Poisoned mutex is recovered
      transparently (seed itself is atomic; poison only signals prior panic).
    
    Contract uplift (contracts/training-loop-pretrain-v1.yaml v1.1.0):
    - status PROPOSED → ACTIVE
    - INV-TRAIN-006 gains harness: block naming both test paths + assertions
    - GATE-TRAIN-006 gains evidence_discharged_by: pointing to both tests
    - metadata.changelog entry recording the discharge
    
    Verification:
      cargo test -p aprender-train --lib falsify_ship_021 → 2 passed
      cargo clippy -p aprender-train --lib --no-deps -- -D warnings → clean
      pv validate contracts/training-loop-pretrain-v1.yaml → 0 errors
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(ship-two): FALSIFY-SHIP-022 apr inspect provenance (AC-SHIP2-012)
    
    Discharges FALSIFY-SHIP-022: apr inspect surfaces license + data_source
    + data_license on every .apr, with "(missing)" / null rendering when a
    field is absent rather than silent skip. Makes a .apr binary a
    sufficient provenance-audit artifact (no sidecar manifest required).
    
    Contract: contracts/apr-provenance-v1.yaml (C-APR-PROVENANCE v1.0.0,
    ACTIVE, kind: schema). 3 invariants + 3 gates + 3 failure modes, all
    bound to AC-SHIP2-012 / FALSIFY-SHIP-022. pv validate PASS.
    
    Code changes:
    - AprV2Metadata: add data_source + data_license as named Option<String>
      fields (not buried in custom HashMap). No skip_serializing_if, so JSON
      round-trips them as null when None (FM-APR-PROV-SILENT-SKIP).
    - apr inspect MetadataInfo: mirror all 3 provenance fields, also with
      no skip_serializing_if.
    - apr inspect text output: new "Provenance:" block via pure helper
      format_provenance_block() — always emits all 3 keys, renders None as
      literal "(missing)".
    - Two struct-literal construction sites updated for new fields.
    
    Harness tests (5 passing):
    - aprender-core:
      - falsify_ship_022_apr_metadata_provenance_round_trip
      - falsify_ship_022_inspect_emits_provenance_keys (JSON null half)
      - falsify_ship_022_partial_provenance_round_trip
    - apr-cli:
      - falsify_ship_022_inspect_emits_provenance_keys (MetadataInfo JSON)
      - falsify_ship_022_inspect_missing_renders_as_missing (text half)
      - falsify_ship_022_inspect_populated_renders_values
    
    Smoke test: apr inspect on existing .apr (no provenance stored)
    correctly emits:
      Provenance:
        license: (missing)
        data_source: (missing)
        data_license: (missing)
    
    cargo fmt + cargo clippy (aprender-core, apr-cli) clean.
    3239 aprender-core format tests PASS, 85 apr-cli inspect tests PASS.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(ship-two): v2.20.0 amendment — FALSIFY-SHIP-021 + FALSIFY-SHIP-022 DISCHARGED
    
    Documents two MODEL-2 ship gates closed in the post-v2.19 evidence window:
    
    1. FALSIFY-SHIP-021 (AC-SHIP2-011) — seed=0 × 100-step reproducibility
       harness + counter-test seed=0 vs seed=1 divergence proof. Root cause
       of original flake (sibling test racing on global INIT_SEED atomic)
       fixed via lock_init_seed(seed) -> MutexGuard. Contract
       training-loop-pretrain-v1.yaml bumped 1.0.0 → 1.1.0 ACTIVE.
       Commit 0b8ca8c, task #112.
    
    2. FALSIFY-SHIP-022 (AC-SHIP2-012) — apr inspect provenance block
       (license + data_source + data_license) shipped. AprV2Metadata
       extended with 2 named Option<String> fields; no skip_serializing_if
       (FM-APR-PROV-SILENT-SKIP guard). Pure helper format_provenance_block
       replaces stdout-capture in tests (gag is NOT parallel-safe).
       New contract apr-provenance-v1.yaml (C-APR-PROVENANCE v1.0.0
       ACTIVE, kind: schema). pv validate PASS. Commit 8f0607d,
       task #113.
    
    Combined status: 2/12 AC-SHIP2 gates DISCHARGED. Remaining 10 block
    on 370M compute-dispatch (the long-pole from v2.19.0).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): FALSIFY-SHIP-011 llama-370m sovereign contract ACTIVE (AC-SHIP2-001)
    
    Discharges FALSIFY-SHIP-011 / AC-SHIP2-001 — MODEL-2 370M architectural
    contract registered AND byte-equally bound to the Rust scaffold that
    aprender-train consumes.
    
    Contract lift:
    - contracts/model-families/llama-370m-sovereign-v1.yaml
      - version 1.0.0 → 1.1.0
      - status PROPOSED → ACTIVE
      - GATE-ARCH-370M-001 gains evidence_discharged_by (4 entries) and
        ship_blocking: true
      - changelog block added documenting the v1.1.0 discharge
    
    Harness tests (crates/aprender-train/src/models/llama_370m.rs):
    - `falsify_ship_011_rust_scaffold_matches_yaml_contract` — loads the
      contract via include_str! (compile-time-embedded, no path deps at
      runtime) and asserts every architecture.* and constraints.* key
      matches the corresponding Llama370MConfig::* const byte-equally
    - `falsify_ship_011_sovereign_contract_is_active` — asserts status ==
      ACTIVE (a PROPOSED contract cannot gate a ship)
    
    Test run: 6/6 aprender-train::models::llama_370m tests PASS (4 pre-
    existing + 2 new). pv validate on contract: 0 errors, 0 warnings.
    
    Why this discharge is strong:
    - Rust scaffold already encodes INV-ARCH-370M-002..008 as compile-time
      `const _: () = Llama370MConfig::validate();` — a drift of any value
      fails `cargo build`, not just `cargo test`
    - The new YAML-vs-Rust binding test adds the missing half: drift of a
      YAML key that the Rust scaffold doesn't mirror is now also caught at
      test time, preventing the MODEL-1-v2 QLoRA class of recipe/artifact
      drift (rank=16 actual vs rank=32 recipe — see
      project_ship_two_001_model1_qlora_divergence.md)
    - INV-ARCH-370M-001 (param count band) is discharged by the existing
      `estimated_param_count_within_contract_band` test
    - INV-ARCH-370M-009 (row-major layout) is discharged by
      aprender::format::layout_contract at APR load time
    
    Combined MODEL-2 status after this commit: 3/12 AC-SHIP2 gates
    DISCHARGED (001, 011, 012). Remaining 9 (002–010) still block on
    actual 370M training compute-dispatch — the pretrain loop driver from
    v2.19.0 is ready to exercise them once the weights exist.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): FALSIFY-SHIP-012 algorithm-level PARTIAL discharge (AC-SHIP2-002)
    
    Bumps C-TOK-BPE to v1.1.0 and wires evidence_discharged_by into
    GATE-BPE-003 pointing at 3 existing harness tests in
    crates/apr-cli/tests/falsify_ship_012_tokenizer_roundtrip.rs and
    the emitted evidence JSON at
    evidence/ship-two-001/model-2/falsify-ship-012-tokenizer-roundtrip.json.
    
    Status intentionally stays PROPOSED. The gate requires 10K-doc
    byte-exact round-trip on The Stack v2 Python holdout; task #91 shipped
    the ingest scaffold (corpus-ingest dry-run CLI) but the 10K fixture
    itself is not yet materialized — so this lands as PARTIAL_ALGORITHM_LEVEL
    discharge with full_discharge_blocks_on: task #91 data.
    
    What passes algorithm-level today (all 3 tests green at commit time):
    - falsify_ship_012_tokenizer_roundtrip_byte_exact — decode(encode(nfc(doc)))
      byte-equals nfc(doc) on every doc in a 20-doc synthetic Python-like
      holdout (ASCII keywords + Unicode identifiers + docstrings + emoji +
      combining marks). Hard-asserts evidence.docs_failed == 0 — regressions
      reintroducing whitespace splitting or dropping the byte encoder panic.
    - falsify_ship_012_nfc_idempotence_only — INV-BPE-005 standalone: nfc(nfc(x))
      byte-equals nfc(x) on every holdout doc.
    - falsify_ship_012_train_corpus_sanity — train/holdout set disjointness
      plus minimum corpus sizes (>=20 docs each).
    
    When task #91's 10K Stack-v2 Python holdout lands the fixture swap is
    data-only: the harness module doc-comment already flagged this path so
    no test rewrite will be required.
    
    Evidence: evidence/ship-two-001/model-2/falsify-ship-012-tokenizer-roundtrip.json
    (20/20 passed, nfc_idempotent: true, vocab_size_trained: 489/512).
    
    Verification:
    - pv validate contracts/tokenizer-bpe-v1.yaml -> 0 errors, 0 warnings
    - cargo test -p apr-cli --test falsify_ship_012_tokenizer_roundtrip -> 3/3 passed
    
    Bound to: AC-SHIP2-002 (ship-two-models-spec §5).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): FALSIFY-SHIP-015 algorithm-level PARTIAL discharge (AC-SHIP2-005)
    
    Bumps C-LLAMA-370M-SOVEREIGN v1.1.0 → v1.2.0 and wires
    evidence_discharged_by into GATE-ARCH-370M-003 (the param-count gate that
    binds AC-SHIP2-005 via FALSIFY-SHIP-015). Contract stays ACTIVE — the
    FALSIFY-SHIP-011 discharge (v1.1.0) is what gates the ACTIVE promotion,
    not SHIP-015.
    
    GATE-ARCH-370M-003's evidence_required asks for
      apr inspect --json model.apr | jq '.param_count' ∈ [366M, 374M]
    on a real 370M `.apr` checkpoint. That file does not exist yet — it
    blocks on AC-SHIP2-003/004 pretraining compute-dispatch. Rather than
    leave the gate's evidence blank, this commit wires the algorithm-level
    proof that already exists:
    
    - estimated_param_count() / estimated_stored_param_count() — const fn
      over Llama370MConfig::*, so the count is computed at compile time.
    - estimated_param_count_within_contract_band (unit test) hard-asserts:
        * p ∈ [PARAMETERS_MIN=366M, PARAMETERS_MAX=374M]  (INV-ARCH-370M-001)
        * |p − 370M| / 370M < 5%                          (tighter sanity)
        * p − stored == VOCAB_SIZE × HIDDEN_DIM           (tied embeddings)
    
    Any edit to Llama370MConfig that moves the count out of the
    INV-ARCH-370M-001 band fails `cargo test -p aprender-train --lib
    llama_370m` — before any compute runs.
    
    The gate now carries:
      discharge_status: PARTIAL_ALGORITHM_LEVEL
      full_discharge_blocks_on: "real 370M .apr checkpoint from pretraining
                                 compute-dispatch (AC-SHIP2-003/004)"
      ship_blocking: true
    
    so the data-scale gap is first-class contract state, not an unspoken
    assumption.
    
    Verification:
    - pv validate contracts/model-families/llama-370m-sovereign-v1.yaml
      -> 0 errors, 0 warnings
    - cargo test -p aprender-train --lib models::llama_370m
      -> 6/6 passed (including the newly-cited
         estimated_param_count_within_contract_band and the pre-existing
         falsify_ship_011_* pair)
    
    MODEL-2 AC-SHIP2 ledger after this: 3/12 fully ACTIVE (001, 011, 012)
    + 2/12 PARTIAL (002 via SHIP-012, 005 via SHIP-015) = 5/12 touched.
    Remaining 7 (003/004/006/007/008/009/010) block on 370M compute.
    
    Bound to: AC-SHIP2-005 (ship-two-models-spec §5).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(ship-two-001): spec v2.21.0 — FALSIFY-SHIP-011 DISCHARGED + SHIP-012/015 PARTIAL
    
    Captures the three evidence-wiring commits landed on
    chore/post-v2.19-evidence since v2.20.0:
    
    1. FALSIFY-SHIP-011 (AC-SHIP2-001) DISCHARGED at 338c6eb (task #114)
       C-LLAMA-370M-SOVEREIGN v1.0.0 PROPOSED -> v1.1.0 ACTIVE.
       Rust-YAML byte-equality binding via include_str! + serde_yaml::Value.
    
    2. FALSIFY-SHIP-012 (AC-SHIP2-002) PARTIAL_ALGORITHM_LEVEL at 2e8b8b8
       (task #115). C-TOK-BPE v1.0.0 -> v1.1.0 stays PROPOSED.
       3 tokenizer harness tests wired; full discharge blocks on task #91
       10K Stack-v2 Python holdout (fixture-swap is data-only).
    
    3. FALSIFY-SHIP-015 (AC-SHIP2-005) PARTIAL_ALGORITHM_LEVEL at bfb8831
       (task #116). Sovereign contract v1.1.0 -> v1.2.0 stays ACTIVE.
       estimated_param_count_within_contract_band + const fns wired;
       full discharge blocks on real 370M .apr from compute-dispatch.
    
    Also codifies the PARTIAL_ALGORITHM_LEVEL pattern as a first-class
    spec concept: when a gate's evidence_required describes a
    production-scale check that is not yet runnable but the underlying
    invariant is provable today at algorithm/compile/unit-test level,
    wire the algorithm proofs and carry discharge_status +
    partial_discharge_note + full_discharge_blocks_on + ship_blocking=true
    to make the data gap first-class contract state.
    
    MODEL-2 ship-gate status after v2.21.0: 3/12 fully ACTIVE (001, 011,
    012) + 2/12 PARTIAL_ALGORITHM_LEVEL (002, 005) = 5/12 touched (~42%).
    Remaining 7 block on real 370M compute-dispatch.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(model-2): FALSIFY-SHIP-019 algorithm-level PARTIAL discharge (AC-SHIP2-009)
    
    GATE-ARCH-370M-004 gains evidence_discharged_by + discharge_status:
    PARTIAL_ALGORITHM_LEVEL. Three algorithm-level invariants wired without
    training:
    
      1. Coverage — every 370M tensor (219 entries: 1 embed + 1 lm_head +
         9 per-layer × 24 layers + 1 final norm) resolves to a
         TensorContract entry in LayoutContract::new(). Pattern-normalises
         per-layer names; any uncovered tensor would be silently skipped
         by GGUF export.
    
      2. Row-major ordering (INV-ARCH-370M-009) — every 2D shape is
         [out_dim, in_dim]. Pinned lm_head/embed/q_proj/k_proj shapes
         verify GQA (k_proj = [kv_heads*head_dim, hidden]) and bind the
         370M architecture to the GH-202-regression-proof layout.
    
      3. Critical-tensor enforcement — validate_apr_shape accepts
         [vocab, hidden] AND rejects reversed [hidden, vocab] on
         lm_head.weight. Proves the validator catches layout bugs, not
         just passes silently.
    
    Full discharge (GGUF cosine-parity on trained 370M, max_logit_cosine
    ≤ 1e-3 over 100 canary prompts) blocks on compute-dispatch
    (AC-SHIP2-003/004). Harness is fixture-swap-ready once a trained .apr
    exists — no test rewrite needed. Spec §9 Risk #2 names this exact
    mitigation path.
    
    Contract: llama-370m-sovereign-v1.yaml v1.2.0 → v1.3.0, stays ACTIVE.
    Tests: 2 new test fns in crates/aprender-train/src/models/llama_370m.rs
    (8/8 pass). `pv validate` = 0 errors, 0 warnings.
    
    Closes #117. Binds to AC-SHIP2-009 / FALSIFY-SHIP-019.
    
    🤖 Generated with [Claude Code](https://claude.com/claude-code)
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(ship-two-001): v2.22.0 — FALSIFY-SHIP-019 PARTIAL discharge capstone
    
    Records the SHIP-019 algorithm-level PARTIAL discharge (task #117,
    commit 846cc1d) in the authoritative spec:
    
    - Version bump 2.21.0 → 2.22.0
    - Full amendment block #4 under post-v2.19 evidence window documenting
      GATE-ARCH-370M-004 wired to `layout_contract.rs` algorithm proofs
      (219-tensor coverage + row-major ordering + GH-202 rejection)
    - New "counter-example hunting" pattern lesson: prior "exhausted
      PARTIAL levers" verdict was ~86% correct; re-running the 7-gate
      FALSIFY-SHIP survey with explicit counter-example hunting found
      exactly one genuine lever (SHIP-019). SHIP-017/018/020 need compute;
      SHIP-013/014/016 collapse into SHIP-011 wiring.
    - Combined MODEL-2 ledger: 3/12 fully ACTIVE + 3/12 PARTIAL = 6/12
      touched (50%). Remaining 6 (003/004/006/007/008/010) all require
      real 370M compute, trained .apr + eval harness, or RTX 4090
      wall-clock benchmark. Genuine algorithm-level PARTIAL harvesting for
      MODEL-2 is now exhausted.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * chore(publish): mark 5 QA harness crates publish = false + document policy
    
    Evidence: aprender-qa-{cli,gen,runner,report,certify} have never been
    published to crates.io (verified against crates.io API 2026-04-19).
    They are reached through `apr qa` (the user-facing binary), not through
    `cargo add`, so marking them publish = false prevents accidental
    version-bump-with-no-publish drift across the workspace.
    
    Spec §A.12 rewritten from the stale "63 crates (49 published + 14 internal)"
    snapshot to the real 80-crate layout: 9 publish = false (4 benchmarks/xtask
    + 5 QA harness) plus 71 publishable. §A.12.1 codifies publishing policy:
    three opt-out categories (benchmarks, xtask, QA harness), and the rule
    that a v0.31.0-style release does NOT require cargo publish across all
    80 crates — crates.io publish is selective (via cargo workspaces publish
    --from-git or cargo publish -p <name>), workspace-wide tag/release is not.
    
    Verified: cargo check --workspace clean after the flip.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * docs(mcp-spec): refresh header — M1–M3 SHIPPED in v0.31.0, M4 in flight
    
    Five-whys on the stale 2026-04-17 draft status:
    1. Why stale? Spec said "DRAFT (pre-implementation)" + target "v0.32.0"
       but M1–M3 actually shipped in v0.31.0 on 2026-04-19 (tag 62893da).
    2. Why not refreshed? M1–M3 landed across multiple PRs without a
       spec-header refresh pass.
    3. Why is that a problem? New contributors reading the spec think MCP
       is unshipped — contradicted by `cargo install aprender` already
       exposing `apr mcp` with 9 tools.
    4. Root cause: spec headers are not on the release checklist.
    5. Fix here: update status to ACTIVE, version to 1.2.0, delivery line
       to "v0.31.0 M1–M3 SHIPPED / M4 in flight (PRs #886-892)". No body
       changes — architecture/tool-surface/protocol sections are still
       accurate.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * chore(publish): mark aprender-viz-ttop publish = false + 4th category
    
    Evidence: `aprender-viz-ttop` has never been published to crates.io
    (release workflow explicitly never invokes `cargo publish` for it).
    Its `description` field calls it a "Terminal Top: 10X better than btop"
    system monitor — ships as a binary subcommand inside the `apr` facade,
    not as a library dependency.
    
    Five-whys:
    1. Why flip it? Because it's a bundled binary, not a library.
    2. Why does that matter? `cargo add aprender-viz-ttop` would mislead
       library authors into taking a user-facing TUI as a dep.
    3. Why wasn't it already flipped? It predated the A.12 policy audit
       performed in 42907db.
    4. Why a 4th category? Benchmarks / xtask / QA harness all leave
       outputs as artifacts; this one ships a runnable subcommand. The
       distinction matters because `apr cbtop` dispatches to it.
    5. Why document it? To prevent a future reader from re-opening the
       "publish all 80 crates" question when we only publish ~70.
    
    Changes:
    - crates/aprender-viz-ttop/Cargo.toml: add `publish = false`
    - docs/specifications/aprender-monorepo-consolidation.md:
      - §A.12: add viz-ttop to internal-crates table (10 rows)
      - §A.12.1: add 4th category (Bundled binaries); update total to
        "10 opted out / 70 publishable"; remove stale "Candidates to
        migrate" paragraph (superseded by 42907db + this commit)
    
    Refs: APR-MONO, PR #901
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * feat(task-123): native Rust pretokenize CLI — close MODEL-2 corpus gap
    
    Root-cause fix for pretokenize-to-.bin gap that was blocking task #119
    MODEL-2 370M real-compute pretrain smoke. User 2026-04-19 callout
    "why not fix root cause vs 'hack'" rejected the Python shim path.
    
    What ships (uncommitted WIP in `pretrain.rs`/`llama_370m.rs` left out):
    
    - `contracts/pretokenize-bin-v1.yaml` v1.0.0 PROPOSED
      * `pv validate` PASS (0 errors / 0 warnings)
      * GATE-PRETOK-003 ship-blocking round-trip gate gains
        `evidence_discharged_by` (4 tests) + `discharge_status:
        PARTIAL_ALGORITHM_LEVEL`. Full discharge still blocks on
        cross-host byte-identical test (task #119 lambda-labs dispatch).
    
    - `BPETokenizer::from_vocab_merges(vocab, merges, cfg)` loader
      (crates/aprender-train/src/tokenizer/bpe.rs)
      * Reads HEX-encoded vocab.json + merges.txt
      * Detects id collisions, rejects orphan merges
      * 2 new round-trip tests PASS
    
    - `apr tokenize encode-corpus` CLI subcommand
      (crates/apr-cli/src/commands/tokenize.rs::run_encode_corpus,
       crates/apr-cli/src/tokenize_commands.rs,
       crates/apr-cli/src/dispatch_analysis.rs)
      * Gated `#[cfg(feature = "training")]`
      * Writes `shard-NNNNN.bin` (u32 LE) + `manifest.json` (schema
        `pretokenize-bin-v1`)
      * Flags: --corpus --tokenizer --output --shard-tokens
        --content-field --normalization --eos-policy
      * EOS lookup order: `</s>`, `<|endoftext|>`, `<eos>`, `<|eos|>`
      * "between" policy fix: emit EOS BEFORE each doc except the
        first (N-1 separators for N docs)
    
    - `tests/pretokenize_shard_roundtrip.rs`
      * `cli_shard_layout_is_read_by_shard_batch_iter`
        — INV-PRETOK-002 + INV-PRETOK-007
      * `multi_shard_names_preserve_order` — INV-PRETOK-004
    
    - `evidence/ship-two-001/pretokenize-bin-v1-partial-discharge.json`
      documents algorithm-level partial discharge.
    
    Manual dogfood: 5-doc fixture → 78 tokens / 1 shard / 312 bytes /
    4 EOS separators (N-1 for between-policy) / EOS id = 2 (`</s>`).
    
    Next session: wait on task #118 (50257-vocab tokenizer training,
    PID 2832743, 79min+) then run `apr tokenize encode-corpus` on
    CSN-Python train split and dispatch to lambda-labs RTX 4090.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
    noahgift and claude authored Apr 19, 2026
    Configuration menu
    Copy the full SHA
    2de4546 View commit details
    Browse the repository at this point in the history
  10. fix(qa): format_parity SKIPs on non-GGUF primary instead of FAILing (#…

    …907)
    
    `apr qa <safetensors>` previously exited 5 with "Failed gates: format_parity"
    because the format_parity gate treated a non-GGUF primary as a failure rather
    than as a category mismatch. This surfaced in the 2026-04-19 MCP M4 free-form
    integration session (see evidence/mcp/m4-freeform-session-2026-04-19.md) —
    11 of 12 gates correctly SKIPped on non-GGUF, but format_parity alone FAILed,
    which in turn made apr.qa via the MCP surface return isError=true on otherwise
    healthy SafeTensors checkpoints.
    
    Peer gates (ollama_parity, ptx_parity, capability_match, gpu_state_isolation,
    gpu_speedup) all SKIP cleanly on non-GGUF with a reason string. This aligns
    format_parity with that convention.
    
    Also reorders the check so we peek the primary's 8-byte magic *before*
    resolving the SafeTensors reference or reading the full (potentially multi-GB)
    GGUF blob. When the primary isn't GGUF, we skip before doing any expensive
    work. The P0-QA-001 "never silently skip" invariant still holds — when the
    primary IS GGUF but the reference SafeTensors can't be found, the gate still
    FAILs with the actionable `huggingface-cli download` hint.
    
    Regression coverage:
    - format_parity_skips_safetensors_primary — creates a minimal .safetensors
      and asserts SKIP + "Non-GGUF" message
    - format_parity_skips_apr_primary — same for APR magic bytes
    
    Verified end-to-end: `apr qa qwen2.5-coder-0.5b-instruct/model.safetensors`
    now exits 0 with "All QA gates passed (6 executed, 6 skipped)".
    noahgift authored Apr 19, 2026
    Configuration menu
    Copy the full SHA
    f4ff5bb View commit details
    Browse the repository at this point in the history
  11. feat(mcp-m5): add pmcp v2.3 optional dep behind pmcp-dispatcher featu…

    …re (#908)
    
    First slice of the M5 dispatcher migration. Zero behaviour change — the
    hand-rolled JSON-RPC stdio loop in `server.rs` still owns every request
    today. This PR only wires the dep so subsequent PRs can land the pmcp
    code paths incrementally behind a feature flag.
    
    ## What changes
    
    - `crates/aprender-mcp/Cargo.toml`: add `pmcp = { version = "2.3", optional
      = true }` + new `pmcp-dispatcher` feature that selects it. Off by default.
    - `Cargo.lock`: aprender-mcp entry now lists pmcp 2.3.0. No new transitive
      deps — pmcp 2.3.0 was already resolved in the workspace via
      `aprender-orchestrate` (MCP client side), so this is a single-line lock
      update, not a new tree branch. Workspace versions stay aligned per the
      spec's "M5+ will migrate the dispatcher to `pmcp` v2.x; workspace
      version must stay aligned" risk mitigation (§Risks).
    - Spec §M5 first checkbox flipped; header moves PLANNED → IN PROGRESS.
    
    ## PR plan (per the M5 research brief, 2026-04-19)
    
    1. **this PR** — pmcp optional dep, feature flag off. FALSIFY-MCP-009
       deferred until PR 2 lands the actual parity test.
    2. **PR 2 (next)** — `src/transports/{mod,stdio}.rs`: stdio wrapper that
       delegates to either the hand-rolled loop or `pmcp::Server` based on
       the feature; `tests/falsify_mcp_009.rs` asserts byte-identical
       `tools/list` + `tools/call` wire output across both paths.
    3. PR 3 — per-tool `pmcp::Handler` registration (FALSIFY-MCP-010,
       preserving FALSIFY-MCP-008 codegen).
    4. PR 4 — cancellation port + `apr.serve` daemon tracking
       (FALSIFY-MCP-011).
    5. PR 5 — `--transport sse --port N` (FALSIFY-MCP-012).
    6. PR 6 — `--transport websocket` (FALSIFY-MCP-013) + re-run the full
       falsification suite.
    
    ## Verification
    - `cargo check -p aprender-mcp` — green with default features
    - `cargo check -p aprender-mcp --features pmcp-dispatcher` — green; pmcp
      2.3.0 resolves from the existing workspace lockfile entry
    - `cargo test -p aprender-mcp` — all 51 lib + 8 falsify_m1 + 4 falsify_006
      + 8 falsify_008 + 4 falsify_progress_001 + 5 falsify_003 + 5 falsify_004
      + 2 falsify_schema + 1 doctest still PASS
    - `cargo tree -d -p aprender-mcp --features pmcp-dispatcher` — no
      duplicate deps
    noahgift authored Apr 19, 2026
    Configuration menu
    Copy the full SHA
    25bfdd6 View commit details
    Browse the repository at this point in the history
  12. chore(release): v0.31.1 — QA format_parity SKIP fix + MCP M5 pmcp sca…

    …ffold (#909)
    
    * chore(release): v0.31.1 — QA format_parity SKIP fix + MCP M5 pmcp scaffold
    
    Cuts a patch release combining the two PRs that landed post-v0.31.0 (#907, #908)
    plus incidental `cargo fmt --all` normalization across the workspace.
    
    Wire-level changes:
    - Workspace + root `aprender` package bumped 0.31.0 → 0.31.1
    - All 60+ path-dep pins updated in lockstep so the published crates resolve
      against each other without range-version drift
    - `opentelemetry` / `opentelemetry_sdk` / `opentelemetry-otlp` kept at 0.31.0
      (external deps — global sed caught them as false-positives during bump)
    
    CHANGELOG.md covers:
    - Fixed: `apr qa` format_parity gate SKIPs non-GGUF primaries (#907)
    - Added: `pmcp = "2.3"` optional dep behind `pmcp-dispatcher` feature (#908)
    
    Pre-push gates:
    - cargo fmt --all (applied — 45 lines across 17 non-Cargo files)
    - cargo test -p aprender-contracts --lib: 1371 passed / 0 failed
    - cargo deny check advisories: advisories ok
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    * ci(release): disable sccache — sovereign-ci runner image missing rustc-sccache wrapper
    
    The `sovereign-ci:stable` container image is currently missing the
    `rustc-sccache` wrapper script, causing every sccache-gated CI job
    (ci/test, ci/lint, ci/coverage, ci/gate) to fail at the rustc probe:
    
        error: could not execute process `rustc-sccache /.../rustc -vV`
          (never executed)
        Caused by: No such file or directory (os error 2)
    
    Reruns fail identically — not transient.
    
    Workaround: flip `enable_sccache: true → false` on this repo's
    workflow input. The sovereign-ci reusable workflow reads this to set
    `RUSTC_WRAPPER: ${{ inputs.enable_sccache && 'rustc-sccache' || '' }}`,
    so disabling it removes the wrapper entirely and CI builds proceed
    normally (just slower, since there's no compile cache).
    
    Revert once paiml/.github ships a runner image with the wrapper
    present — this is a temporary unblock for the v0.31.1 release PR.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
    noahgift and claude authored Apr 19, 2026
    Configuration menu
    Copy the full SHA
    61dbd29 View commit details
    Browse the repository at this point in the history
Loading