Skip to content

feat(apr-cli + aprender-train): apr pretrain --init wireup — §50.4 step 5f.4#1494

Merged
noahgift merged 1 commit into
mainfrom
feat/cli-wireup-init-pretrain
May 5, 2026
Merged

feat(apr-cli + aprender-train): apr pretrain --init wireup — §50.4 step 5f.4#1494
noahgift merged 1 commit into
mainfrom
feat/cli-wireup-init-pretrain

Conversation

@noahgift

@noahgift noahgift commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Wire `apr pretrain --init ` end-to-end so step 5g LIVE 500-step fine-tune can dispatch. Replaces the §49-step-4 "not yet wired" Err with the actual init-tensor load + trainer populate path that §50.4 steps 5f.1/5f.2/5f.3 made possible.

Architecture

Two functions:

  1. `entrenar::train::pretrain_real::build_shared_trainer_with_init` — composes 5c (polymorphic dispatch) + 5f.1 (encoder rejection) + 5f.2 (load) + 5f.3 (populate). `init=None` preserves from-scratch baseline. `init=Some` validates arch family, builds polymorphic config, loads tensors, populates.

  2. `apr-cli::commands::pretrain::run` — extracts init APR's TransformerConfig via existing `model_config::read_apr_architecture`, plumbs through `drive_real → drive_real_cpu → build_shared_trainer_with_init`. Polymorphic preflight now receives EXTRACTED vocab.

Discharges (`apr-pretrain-arch-polymorphic-v1`)

  • §init_load_semantics integration: load + populate composed end-to-end
  • §arch_extraction_signature integration: `read_apr_architecture` wired
  • §qwen_tokenizer_vocab_compatibility integration: extracted vocab flows into preflight call site
  • FALSIFY-APR-PRETRAIN-INIT-007 (population) at INTEGRATION level

The legacy "not yet wired" guard from §49 step 4 is RETIRED.

NOT in this PR

  • CUDA path (5f.5 follow-up): `drive_real_cuda` fail-fasts when `--init` set (FALSIFY-APR-PRETRAIN-INIT-CUDA-001).
  • Step 5g LIVE: dispatchable now; running 500 steps is operator action.

Tests (6 new, all pass)

`aprender-train::pretrain_real::tests` (4 new):

  • `_none_uses_llama370m_shape` (regression-free)
  • `_rejects_unpaired_args` (caller-bug guard)
  • `_rejects_encoder_family` (FALSIFY-007 integration)
  • `_decoder_family_proceeds_to_tensor_load` (failure ordering)

`apr-cli::commands::pretrain` (2 retrofitted):

  • `_valid_magic_but_bogus_metadata_fails_at_arch_extraction`
  • `_v1_magic_aprn_passes_validate_init_apr_path`

19/19 + 23/23 pass. `cargo clippy` clean.

Five Whys

  1. Why was 5f.4 needed? §50 decomposition missed the CLI-dispatch seam (§52 caught it).
  2. Why is removing the safety Err load-bearing? §28 SHIP-007: silent random-init via half-implementation = "silent gibberish" defect class.
  3. Why a separate builder? `build_shared_trainer` enforces INV-ARCH-370M-001 which only applies to Llama370M.
  4. Why fail-fast on CUDA + --init? Same as Feature Request: Cross-Validation Utilities #2.
  5. Why not in feat(aprender-train): populate_trainer_from_init_tensors — §50.4 step 5f.3 #1483? Different crate, different review concern.

Cascade context

Once 5f.4 lands AND 5g produces val_loss < 9.38, MODEL-2 ship % moves 57% → ≥58%.

🤖 Generated with Claude Code

…ep 5f.4

## Summary

Wire `apr pretrain --init <PATH>` end-to-end so step 5g LIVE 500-step
fine-tune can dispatch. Replaces the §49 step 4 "not yet wired" Err
with the actual init-tensor load + trainer populate path that
§50.4 steps 5f.1/5f.2/5f.3 made possible.

## Architecture

Two functions added/changed:

1. `entrenar::train::pretrain_real::build_shared_trainer_with_init` —
   composes the §50.4 step-5f machinery (5c polymorphic dispatch +
   5f.1 encoder rejection + 5f.2 load + 5f.3 populate) into a single
   trainer-builder entry. init=None preserves the from-scratch baseline
   byte-equivalent to `build_shared_trainer`. init=Some validates arch
   family, builds the polymorphic config, loads tensors, populates.

2. `apr-cli/src/commands/pretrain.rs::run` — now extracts the init APR
   file's TransformerConfig via existing `model_config::read_apr_architecture`
   when `--init` is set, then plumbs both `init_arch` and `init_path`
   through `drive_real → drive_real_cpu → build_shared_trainer_with_init`.
   The polymorphic preflight (§50.4 step 5d) already used the EXTRACTED
   vocab — this PR wires the call site to actually pass it.

## What this PR DOES NOT do

- **CUDA path** (~80 LOC follow-up as 5f.5): `drive_real_cuda` now
  fail-fasts when --init is set rather than silently using random init
  (FALSIFY-APR-PRETRAIN-INIT-CUDA-001). The cuBLAS trainer needs
  symmetric `build_shared_cuda_trainer_with_init` which is out of scope.
- **Step 5g LIVE 500-step fine-tune** (operator dispatch): this PR makes
  it dispatchable; running the 500 steps requires operator action.

## Discharges (per apr-pretrain-arch-polymorphic-v1)

- §init_load_semantics integration: load + populate composed end-to-end
- §arch_extraction_signature integration: read_apr_architecture wired
- §qwen_tokenizer_vocab_compatibility integration: extracted vocab
  flows into preflight call site (no longer hardcoded Llama370M)
- FALSIFY-APR-PRETRAIN-INIT-007 (population) at INTEGRATION level
- The legacy "not yet wired" guard from §49 step 4 is RETIRED — the
  drift-prevention test now pins the new fail-closed semantic.

## Tests (8 new across 2 crates, all pass)

- `aprender-train`: 4 new tests for `build_shared_trainer_with_init`:
  - `_none_uses_llama370m_shape` (regression-free init=None)
  - `_rejects_unpaired_args` (caller-bug guard)
  - `_rejects_encoder_family` (FALSIFY-007 integration)
  - `_decoder_family_proceeds_to_tensor_load` (failure ordering pin)
- `apr-cli`: 2 retrofitted tests for the new fail-closed semantic:
  - `pretrain_init_valid_magic_but_bogus_metadata_fails_at_arch_extraction`
    (replaces the old "not yet wired" trip-wire)
  - `pretrain_init_v1_magic_aprn_passes_validate_init_apr_path`
    (helper now returns Ok on valid magic)

19/19 pretrain_real tests pass. 23/23 apr-cli pretrain tests pass.
cargo clippy --lib -- -D warnings clean across both crates.

## Five Whys

1. **Why was 5f.4 needed at all?** §50's 5a-5h decomposition assumed
   the CLI dispatch would naturally invoke the helper functions; live
   source inspection (§52 amendment) revealed the dispatch hardcoded
   "not yet wired" Err. 5f.4 is the explicit wireup.
2. **Why is removing the safety Err so load-bearing?** The §28 SHIP-007
   lesson: silently random-init via a half-implemented dispatch is the
   exact "silent gibberish" defect class. Removing the safety Err
   without the wireup would manifest as a multi-epoch divergence
   masquerading as a corpus-quality issue.
3. **Why a separate polymorphic builder rather than overload `build_shared_trainer`?**
   `build_shared_trainer` enforces INV-ARCH-370M-001 (param-count band)
   which only applies to from-scratch Llama370M. The polymorphic builder
   sidesteps it by design — Qwen2.5-0.5B is 0.5B params, outside the
   band by intent.
4. **Why fail-fast on `--init` + `--device cuda` rather than silently
   ignore?** Same reasoning as #2: silent CUDA random-init would
   bisect the same "silent gibberish" class. 5f.5 follow-up wires
   symmetric CUDA path; until then, fail-closed.
5. **Why couldn't this be inside #1483 (the populate PR)?** Different
   crate (apr-cli vs aprender-train), different review concern (CLI
   plumbing vs trainer mutation), different test surface. One atomic
   PR per file/crate boundary.

## Test plan

- [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (19/19 pass)
- [x] `cargo test -p apr-cli --lib commands::pretrain` (23/23 pass)
- [x] `cargo clippy -p aprender-train -p apr-cli --lib -- -D warnings` (clean)
- [x] `cargo check -p apr-cli --lib` (clean)
- [ ] Operator-dispatched: `apr pretrain --init <Qwen2.5-Coder-0.5B>.apr`
      smoke that fires 50 training steps end-to-end (5g LIVE prelude;
      operator action in next session)

## Cascade context

This is the §52-identified gap closing the §50.4 step 5f sub-cascade:
- 5f.1 encoder validator: PR #1479 ✅ MERGED
- 5f.2 load_init_tensors_from_apr: PR #1481 ✅ MERGED
- 5f.3 populate_trainer_from_init_tensors: PR #1483 (mergeable, in queue)
- **5f.4 CLI wireup: THIS PR**
- 5g LIVE 500-step fine-tune: operator dispatch (next)
- 5h stamp + publish: ~10 LOC follow-up

Once 5f.4 lands AND 5g produces val_loss < 9.38 evidence, MODEL-2 ship % moves 57% → ≥58%.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 5, 2026 01:24
@noahgift noahgift merged commit 9afca16 into main May 5, 2026
11 checks passed
@noahgift noahgift deleted the feat/cli-wireup-init-pretrain branch May 5, 2026 01:48
noahgift added a commit that referenced this pull request May 5, 2026
…ION-COMPLETE; contract v1.1.0 → v1.2.0 FUNCTIONAL (#1495)

§50.4 cascade INTEGRATION-COMPLETE on main with PR #1494 merging at
2026-05-05T01:48:14Z. The `apr pretrain --init <PATH>` flow is now
end-to-end functional on CPU; the legacy "not yet wired" Err is
RETIRED; step 5g LIVE is the only remaining gate before MODEL-2 ship-%
can move from 57% → ≥58%.

Spec amendment §53:
- Updated falsifier scoreboard: 6/8 INTEGRATION (001/002/003/005/006/007
  via live CLI dispatch); 2/8 PARTIAL_ALGORITHM_LEVEL (004 forward-pass
  smoke + 008 contract validation are inherently algorithm-level).
- Step roadmap: 5a-5f.4 ✅ MERGED; 5f.5 (CUDA wireup) NOT YET STARTED;
  5g (LIVE 500-step fine-tune) operator-dispatchable on RTX 4090.
- Cascade ships statistics: 11 PRs over 2 days
  (#1471/#1472/#1473/#1474/#1475/#1476/#1478/#1479/#1481/#1482/#1483/#1486/#1494).
- MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57%
  (gated on 5g empirical val_loss < 9.38 evidence).
- 3 CI andon classes documented as feedback memories during cascade
  (workspace-test missing-binary, trueno SIGSEGV-on-cleanup, auto-merge
  behind-state).

Contract apr-pretrain-arch-polymorphic-v1 v1.1.0 → v1.2.0 FUNCTIONAL:
- All 8 falsifiers PASS on main; 6/8 reach INTEGRATION via the
  user-facing `apr pretrain --init` flow.
- verification_summary updated: tested 7 → 8; status partial →
  functional.
- Added §52 + §53 references.
- Promotion to DISCHARGED still requires §50.4 step 5g LIVE empirical
  500-step fine-tune on canonical Qwen2.5-Coder-0.5B-Instruct.apr
  producing val_loss < 9.38.

`pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml` exits 0.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1494 merge commit 9afca16

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 5, 2026
…requisites + live preflight smoke (#1496)

§53 closed with "step 5g LIVE remains" framing 5g as a single operator
dispatch. Live source inspection of the post-#1494 binary plus an
actual smoke run revealed step 5g has multi-step prerequisites that
were NOT enumerated in §50's original 8-step decomposition.

Live empirical smoke on canonical inputs:
  apr pretrain --init <Qwen2.5-Coder-0.5B-Instruct-fp16.apr>
               --tokenizer <legacy 50257-vocab dir>
               --dataset <legacy codeparrot shards>
  → CORRECT FAIL-FAST: GATE-ARCH-370M-011 (INV-ARCH-370M-006)
    violated: tokenizer vocab_size (50257) != model vocab_size (151936)

This is the FIRST end-to-end runtime evidence that the §50.4 cascade's
polymorphic preflight (PR #1476 + #1494) works in the user-facing CLI:
  - Read --init APR metadata: vocab=151936, hidden=896, layers=24
  - target_vocab = init_arch.vocab_size = 151936 (NOT legacy 50257)
  - Tokenizer dir vocab.json count = 50257
  - Mismatch → fail-fast before trainer allocation

But the smoke also surfaces 5g's true scope. A Qwen-vocab tokenizer dir
+ Qwen-tokenized corpus must exist BEFORE the preflight passes. Neither
exists on this host today.

Step 5g re-scoped:
  5g.0 — Qwen tokenizer extraction (~50 LOC, ~5min wall) [next PR]
  5g.1 — Qwen-tokenized corpus (0 LOC, ~10hr wall, operator-dispatch)
  5g.2 — LIVE 500-step fine-tune (0 LOC, ~20-60min, operator-dispatch)
  5g.3 — val_loss < 9.38 verdict; flip MODEL-2 ship % 57% → ≥58%

Methodology takeaway: top-down spec planning consistently
underestimates scope-coupling between heterogeneous code paths. This
is the third instance of the same lesson:
  - §50 found §49's "0 LOC" was 8-step (architectural coupling)
  - §52 found §50's "5f weight load" was 2-step (CLI dispatch coupling)
  - §54 found §53's "5g LIVE" is 4-step (tokenizer-format coupling)

Falsifier scoreboard impact:
  - FALSIFY-APR-PRETRAIN-ARCH-005/006 reach LIVE-INTEGRATION level
    (proven via real CLI dispatch, not just unit tests)
  - Contract `apr-pretrain-arch-polymorphic-v1` v1.2.0 FUNCTIONAL is
    reinforced; promotion to DISCHARGED waits for 5g.3 val_loss measurement

Net effects:
  - Spec v2.98.0 → v2.99.0
  - MODEL-1 ship % unchanged at 91%
  - MODEL-2 ship % unchanged at 57% (gated on 5g.3)
  - Coverage tally: snapshot, no contract status flip

Refs: SPEC-SHIP-TWO-001 §50.4 step 5g, PR #1476 + #1494,
      evidence/section-54-5g-prereqs-2026-05-05/preflight-fail-fast-smoke.md

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 5, 2026
…Y-APR-PRETRAIN-INIT-CUDA-001 + drift-prevention test (#1502)

Pre-this-bump, the falsifier id `FALSIFY-APR-PRETRAIN-INIT-CUDA-001`
was REFERENCED in the v1.2.0 changelog and verification_summary BUT
was not formally registered as a falsification_test entry. The
fail-fast guard at `crates/apr-cli/src/commands/pretrain.rs::drive_real`
(post-#1494 5f.4 wireup) returns Err with this id when
`init_arch.is_some() && device.is_cuda()`, but no test pinned the
citation. A future refactor could silently drop the citation OR let
CUDA + --init fall through → §28 SHIP-007 "silent gibberish" defect class.

## What ships

Contract apr-pretrain-arch-polymorphic-v1 v1.3.0 → v1.4.0 FUNCTIONAL:
- Adds FALSIFY-APR-PRETRAIN-INIT-CUDA-001 as formal falsification_test
  (PARTIAL_ALGORITHM_LEVEL).
- 10 → 11 falsifiers, all PASS.

Source:
- Extracted error message into `pub(crate) const
  FALSIFY_APR_PRETRAIN_INIT_CUDA_001_MSG` so the const itself can be
  unit-tested without a `--features cuda` build.

Test:
- `drive_real_cuda_init_path_fail_fasts_with_falsifier_citation` pins:
  (a) falsifier id appears
  (b) "not yet wired for --device cuda" phrase appears
  (c) "step 5f.5 follow-up" reference appears
  (d) both workarounds (--device cpu OR omit --init) are suggested

## Why drift-prevention matters

Promotion of CUDA-001 to DISCHARGED requires §50.4 step 5f.5 LIVE
(CUDA wireup landed + GPU smoke). That's multi-PR scope (refactor
upload_blocks + new constructor + wire CLI). Until then, the
fail-fast guard is the only safety. Without a formal falsifier +
test, that guard silently regresses if anyone refactors drive_real.

## Net effects

- Contract apr-pretrain-arch-polymorphic-v1 v1.3.0 → v1.4.0 FUNCTIONAL.
- 11 falsifiers total (10 → 11), all PASS.
- 1 new drift-prevention test.
- 1 source const extraction (lockup).
- MODEL-1 ship % unchanged at 91%.
- MODEL-2 ship % unchanged at 57% until 5g.3.

This is a quality-and-hygiene PR while the 5g.1 17hr corpus retokenize
runs in the background. Doesn't move ship-% but reduces drift risk +
binds a previously-free-floating falsifier reference.

Refs: SPEC-SHIP-TWO-001 §50.4 step 5f.5,
      contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.4.0

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 5, 2026
… drift correction

v1.1.0 cited 8 specific test names; live source inspection 2026-05-05
revealed only 3 of them existed in
`crates/apr-cli/src/commands/pretrain.rs`. The §50.4 cascade (5f.4
wireup landed via PR #1494) authored different test names than the
ones v1.1.0 stamped, leaving 6 falsifier bindings with dangling
`test:` references.

## Drift inventory

  Falsifier  | v1.1.0 cited test                                | Exists?
  ---        | ---                                              | ---
  001        | apr pretrain --help | grep -qE 'init'            | ⚠️ shell pipe, not unit test
  002        | pretrain_no_init_synthetic_ok                    | ❌
  003        | pretrain_init_missing_file_errors                | ✅
  004        | pretrain_init_bad_magic_errors                   | ✅
  005        | pretrain_init_arch_mismatch_errors               | ❌
  006        | pretrain_init_step0_loss_below_from_scratch      | ❌ (LIVE-only)
  007        | pretrain_init_flag_registered                    | ❌
  008        | pv validate                                      | ✅
  009        | pretrain_init_optimizer_state_fresh              | ❌ (LIVE-only)
  010        | pretrain_init_loadback_idempotent                | ❌ (LIVE-only)

## Resolution

Re-align each falsifier to a test that actually exists, OR explicitly
mark the falsifier PARTIAL_ALGORITHM_LEVEL with a `LIVE-PENDING:`
prefix in the `test:` field naming the exact prerequisite that
prevents unit-test binding.

  Falsifier  | v1.2.0 binding
  ---        | ---
  001        | pretrain_init_flag_absent_parses_to_none + pretrain_init_flag_parses_path
  002        | synthetic_pretrain_end_to_end_happy_path
  003        | pretrain_init_missing_file_errors (unchanged)
  004        | pretrain_init_bad_magic_errors + pretrain_init_empty_file_errors
  005        | pretrain_init_valid_magic_but_bogus_metadata_fails_at_arch_extraction
  006        | LIVE-PENDING (5g.2 fine-tune dispatch)
  007        | LIVE-PENDING (cli_commands integration test follow-up)
  008        | pv validate (unchanged)
  009        | LIVE-PENDING (5g.2 + Adam state debug accessor)
  010        | LIVE-PENDING (5g.2 smoke evidence pack)

## Net effect

- Status remains PARTIAL_ALGORITHM_LEVEL.
- 4/10 falsifiers bound to existing PASSING unit tests.
- 6/10 explicitly LIVE-PENDING with named prerequisites.
- 25/25 commands::pretrain::tests pass.
- pv validate exits 0.

Promotion to FUNCTIONAL gated on 006/007 binding (which need the 5g.2
LIVE fine-tune + the 3-surface integration test from cli_commands.rs).
DISCHARGED still gated on §50.4 step 5g.3 LIVE val_loss < 9.38.

## Five Whys

1. Why did the test references drift? §50.4 cascade (5b through 5f.4)
   landed across many PRs; each authored test names per its own
   convention without cross-checking the v1.1.0 contract claims.
2. Why is "no test for X" not the same as "X is broken"? The IMPL
   exists and works (proven by the 25-test sweep). The DRIFT is in
   the contract's test-name claim, not in the underlying invariants.
3. Why mark some PARTIAL_ALGORITHM_LEVEL and document `LIVE-PENDING:`?
   Because the false binding (claiming a test exists when it doesn't)
   is worse than honest "no test yet"; future agents reading the
   contract get a clear signal of what's binding and what's pending.
4. Why not author the missing tests in this PR? Tests 006/009/010 are
   LIVE-only (need 942MB FP16 init APR + 5g.2 dispatch); test 007
   needs an integration test in `cli_commands.rs`. Each is its own
   future PR; bundling them here would mix concerns.
5. Why bump to v1.2.0 (not v1.1.1 patch)? The contract semantics
   didn't change but the test-binding INVARIANT (every cited test
   exists) was broken in v1.1.0. v1.2.0 restores that invariant.

## Test plan
- [x] pv validate exits 0
- [x] PMAT pre-commit quality gates pass
- [x] 25/25 commands::pretrain::tests pass
- [ ] CI gate green
- [ ] Auto-merge fires on green CI

Refs: SPEC-SHIP-TWO-001 §50.4 cascade (5f.4 PR #1494),
      contracts/apr-pretrain-from-init-v1.yaml v1.2.0

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 5, 2026
…ft correction (#1504)

* contract(apr-pretrain-from-init-v1): v1.1.0 → v1.2.0 — test-reference drift correction

v1.1.0 cited 8 specific test names; live source inspection 2026-05-05
revealed only 3 of them existed in
`crates/apr-cli/src/commands/pretrain.rs`. The §50.4 cascade (5f.4
wireup landed via PR #1494) authored different test names than the
ones v1.1.0 stamped, leaving 6 falsifier bindings with dangling
`test:` references.

## Drift inventory

  Falsifier  | v1.1.0 cited test                                | Exists?
  ---        | ---                                              | ---
  001        | apr pretrain --help | grep -qE 'init'            | ⚠️ shell pipe, not unit test
  002        | pretrain_no_init_synthetic_ok                    | ❌
  003        | pretrain_init_missing_file_errors                | ✅
  004        | pretrain_init_bad_magic_errors                   | ✅
  005        | pretrain_init_arch_mismatch_errors               | ❌
  006        | pretrain_init_step0_loss_below_from_scratch      | ❌ (LIVE-only)
  007        | pretrain_init_flag_registered                    | ❌
  008        | pv validate                                      | ✅
  009        | pretrain_init_optimizer_state_fresh              | ❌ (LIVE-only)
  010        | pretrain_init_loadback_idempotent                | ❌ (LIVE-only)

## Resolution

Re-align each falsifier to a test that actually exists, OR explicitly
mark the falsifier PARTIAL_ALGORITHM_LEVEL with a `LIVE-PENDING:`
prefix in the `test:` field naming the exact prerequisite that
prevents unit-test binding.

  Falsifier  | v1.2.0 binding
  ---        | ---
  001        | pretrain_init_flag_absent_parses_to_none + pretrain_init_flag_parses_path
  002        | synthetic_pretrain_end_to_end_happy_path
  003        | pretrain_init_missing_file_errors (unchanged)
  004        | pretrain_init_bad_magic_errors + pretrain_init_empty_file_errors
  005        | pretrain_init_valid_magic_but_bogus_metadata_fails_at_arch_extraction
  006        | LIVE-PENDING (5g.2 fine-tune dispatch)
  007        | LIVE-PENDING (cli_commands integration test follow-up)
  008        | pv validate (unchanged)
  009        | LIVE-PENDING (5g.2 + Adam state debug accessor)
  010        | LIVE-PENDING (5g.2 smoke evidence pack)

## Net effect

- Status remains PARTIAL_ALGORITHM_LEVEL.
- 4/10 falsifiers bound to existing PASSING unit tests.
- 6/10 explicitly LIVE-PENDING with named prerequisites.
- 25/25 commands::pretrain::tests pass.
- pv validate exits 0.

Promotion to FUNCTIONAL gated on 006/007 binding (which need the 5g.2
LIVE fine-tune + the 3-surface integration test from cli_commands.rs).
DISCHARGED still gated on §50.4 step 5g.3 LIVE val_loss < 9.38.

## Five Whys

1. Why did the test references drift? §50.4 cascade (5b through 5f.4)
   landed across many PRs; each authored test names per its own
   convention without cross-checking the v1.1.0 contract claims.
2. Why is "no test for X" not the same as "X is broken"? The IMPL
   exists and works (proven by the 25-test sweep). The DRIFT is in
   the contract's test-name claim, not in the underlying invariants.
3. Why mark some PARTIAL_ALGORITHM_LEVEL and document `LIVE-PENDING:`?
   Because the false binding (claiming a test exists when it doesn't)
   is worse than honest "no test yet"; future agents reading the
   contract get a clear signal of what's binding and what's pending.
4. Why not author the missing tests in this PR? Tests 006/009/010 are
   LIVE-only (need 942MB FP16 init APR + 5g.2 dispatch); test 007
   needs an integration test in `cli_commands.rs`. Each is its own
   future PR; bundling them here would mix concerns.
5. Why bump to v1.2.0 (not v1.1.1 patch)? The contract semantics
   didn't change but the test-binding INVARIANT (every cited test
   exists) was broken in v1.1.0. v1.2.0 restores that invariant.

## Test plan
- [x] pv validate exits 0
- [x] PMAT pre-commit quality gates pass
- [x] 25/25 commands::pretrain::tests pass
- [ ] CI gate green
- [ ] Auto-merge fires on green CI

Refs: SPEC-SHIP-TWO-001 §50.4 cascade (5f.4 PR #1494),
      contracts/apr-pretrain-from-init-v1.yaml v1.2.0

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(contract+test): author pretrain_init_flag_registered + bind FALSIFY-007

CI lint engine flagged FALSIFY-APR-PRETRAIN-INIT-007 with
PV-VER-001 Error: the cited test `pretrain_init_flag_registered` did
not exist as a callable target, leaving the falsifier unfalsifiable.

Author the missing test in `crates/apr-cli/tests/cli_commands.rs`:
invokes `apr pretrain --help` against the installed binary and asserts
`--init` is reachable. This closes the 3-surface drift triangle:
(1) clap field, (2) unit tests in `pretrain.rs`, (3) integration test
in `cli_commands.rs`.

Update `apr-pretrain-from-init-v1.yaml` v1.2.0 to bind FALSIFY-007 to
the new test and bump the changelog count from 4/10 to 5/10 falsifiers
bound (LIVE-pending count drops from 6 to 5; FALSIFY-007 promoted
out of LIVE-PENDING).

Local verification:
  - cargo test pretrain_init_flag_registered: PASS
  - cargo test lint::tests::lint_passes_on_real_contracts: PASS
  - pv validate contracts/apr-pretrain-from-init-v1.yaml: 0 errors

Refs: SPEC-SHIP-TWO-001 §50.4 cascade,
      contracts/apr-pretrain-from-init-v1.yaml v1.2.0,
      feedback_cli_subcommand_three_surface_drift.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 5, 2026
…-005/006 test-reference drift (#1505)

Same drift class as PR #1504 caught in apr-pretrain-from-init-v1.
Test names cited in v1.1.0 changelog never matched the actual tests
PR #1476 authored. Drift survived three intervening bumps
(v1.1→v1.2→v1.3→v1.4) because each focused on adding new falsifiers,
not auditing existing bindings.

## Drift inventory

| Falsifier | v1.4.0 cited test | Exists? | Actual test |
|---|---|---|---|
| FALSIFY-005 | preflight_qwen_vocab_passes_with_qwen_init | ❌ | preflight_qwen_vocab_passes_with_qwen_target |
| FALSIFY-006 | preflight_qwen_vocab_fails_without_init | ❌ | preflight_qwen_vocab_fails_with_llama_target |

## Resolution

Update the `test:` field for FALSIFY-005 and FALSIFY-006 to reference
the actual tests authored by PR #1476. No falsifier semantics change.
No new tests added.

## Verification

  $ cargo test -p apr-cli --lib -- commands::pretrain::tests::preflight_qwen_vocab_passes_with_qwen_target
    test result: ok. 1 passed; ...
  $ cargo test -p apr-cli --lib -- commands::pretrain::tests::preflight_qwen_vocab_fails_with_llama_target
    test result: ok. 1 passed; ...
  $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml
    0 error(s), 0 warning(s)

## Five Whys

1. Why did the drift survive 3 bumps? Each bump (v1.2/v1.3/v1.4)
   focused on ADDING new content (CUDA-001, relaxed bound, etc.);
   none audited existing bindings.
2. Why didn't the §50.4 cascade catch this? The cascade authored
   tests; the contract was authored separately. Names diverged at
   the boundary; no cross-check landed.
3. Why is this a contract-only fix (no source change)? The tests
   exist and pass — the IMPL is correct. Only the contract's text
   reference needed correction.
4. Why bump to v1.5.0 (not v1.4.1 patch)? Same logic as PR #1504:
   the test-binding INVARIANT (every cited test exists) was broken
   in v1.4.0. v1.5.0 restores it.
5. Why is this important if the impl is correct? Per
   feedback_no_guessing.md, contracts that cite non-existent tests
   are unfalsifiable — future agents reading the contract get a
   false signal that the falsifier is bound. PV-VER-001 lint will
   catch this; better to fix it than wait for the lint engine to
   flag.

## Net effects

- Contract v1.4.0 → v1.5.0 FUNCTIONAL.
- 11 falsifiers, all PASS — same count, but FALSIFY-005/006 now
  reference tests that actually exist.
- MODEL-1 ship % unchanged at 91%.
- MODEL-2 ship % unchanged at 57% until 5g.3.

This is hygiene work while 5g.1 (~12hr) corpus retokenize runs.
Same defect class as PR #1504; together they close the
test-reference drift across both pretrain contracts.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade (PRs #1473-#1494, #1502),
      contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.5.0,
      contracts/apr-pretrain-from-init-v1.yaml v1.2.0 (PR #1504, sibling fix)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 9, 2026
… (PMAT-CODE-PRETRAIN-INIT-FINETUNE-001)

Adds contracts/apr-pretrain-init-finetune-v1.yaml v1.0.0 DRAFT, the
falsifier scaffold for SHIP-TWO §56.4 step 5g.2 — the LIVE 500-step
fine-tune dispatch that flips MODEL-2 ship % 57% → ≥58%.

Pins six falsifiable invariants for `apr pretrain --mode from-init
--init <Qwen.apr> --shards-dir <5g.1-corpus> --steps 500 --device cuda`:

- FALSIFY-001 (ship-blocking): exit code == 0
- FALSIFY-002 (advisory):     wall ≤ 3600 s on RTX 4090
- FALSIFY-003 (ship-blocking): step-0 loss ≤ 0.7 × ln(151936) ≈ 8.35
                               (proves init weights flow through forward)
- FALSIFY-004 (ship-blocking): checkpoint.apr written with valid
                               magic bytes (0x41 0x50 0x52 0x00 v2 OR
                               0x41 0x50 0x52 0x4E v1)
- FALSIFY-005 (ship-blocking): val_loss after 500 steps < 9.38
                               (the §34 370M-from-scratch ceiling)
- FALSIFY-006 (advisory):     no CUDA OOM / illegal-address / launch-
                               OoR errors during run

Five-Whys (why this contract first, then live dispatch):

1. Why a contract before the dispatch? Per CLAUDE.md "Contract-first
   design: NEVER write code before writing a provable contract."
   Even though 5g.2 is "0 LOC operator-dispatch", it has shippable
   semantics that deserve falsification scaffolding.
2. Why these particular six gates? They cover the four orthogonal
   failure modes of a fine-tune-from-init dispatch: process-level
   (exit/wall), correctness (step-0 baseline + val_loss), and
   serialization (checkpoint magic bytes + GPU resource health).
3. Why DRAFT status (not PROPOSED, not ACTIVE)? DRAFT means "schema
   validated, falsifiers authored, but no live evidence yet."
   Status flips to ACTIVE_RUNTIME via §59 spec amendment after the
   live dispatch produces evidence.
4. Why a separate contract from apr-pretrain-from-init-v1? The
   sibling contract pins the in-process semantics of init loading
   (load_init_tensors_from_apr, populate_trainer_from_init_tensors).
   This new contract pins the END-TO-END dispatch outcome — they
   compose at the dispatch boundary.
5. Why the val_loss < 9.38 threshold (not 5.0 or 7.0)? §34's 200K-
   step retrain confirmed val_loss=9.38 as the 370M-from-scratch
   capacity ceiling on this corpus. A from-init pivot must beat
   from-scratch, otherwise §49's strategy reasoning is wrong.

Pre-requisites VERIFIED on host (lambda-vector RTX 4090):
- /mnt/nvme-raid0/models/qwen2.5-coder-0.5b-instruct-fp16.apr exists
- /mnt/nvme-raid0/data/codeparrot-python-permissive-shards-qwen has
  228 shards / 2.278B tokens (manifest.json reconstructed by PR #1575)
- `apr pretrain --init <PATH>` end-to-end runnable per §53 (#1494 MERGED)
- Polymorphic preflight per §55 (#1500 MERGED)

Quality gates:
- `pv validate contracts/apr-pretrain-init-finetune-v1.yaml`: 0 errors
- `pv lint --strict-test-binding`: 9/9 gates PASS

SHIP-TWO impact:
- MODEL-1 ship %: unchanged at 91% (this is MODEL-2 prep work)
- MODEL-2 ship %: unchanged at 57% (this PR is contract-only;
  ship-% flips on §59 amendment after live verdict)
- Unblocks: §59 spec amendment recording 5g.2 dispatch result

Next steps (follow-ups, NOT this PR):
- LIVE dispatch on RTX 4090 (~20-60 min wall, pre-authorized per
  feedback_compute_pre_authorized.md)
- §59 spec amendment v3.05.0 → v3.06.0 with verdict + ship-% flip
- Contract status DRAFT → ACTIVE_RUNTIME

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 9, 2026
… (PMAT-CODE-PRETRAIN-INIT-FINETUNE-001) (#1576)

Adds contracts/apr-pretrain-init-finetune-v1.yaml v1.0.0 DRAFT, the
falsifier scaffold for SHIP-TWO §56.4 step 5g.2 — the LIVE 500-step
fine-tune dispatch that flips MODEL-2 ship % 57% → ≥58%.

Pins six falsifiable invariants for `apr pretrain --mode from-init
--init <Qwen.apr> --shards-dir <5g.1-corpus> --steps 500 --device cuda`:

- FALSIFY-001 (ship-blocking): exit code == 0
- FALSIFY-002 (advisory):     wall ≤ 3600 s on RTX 4090
- FALSIFY-003 (ship-blocking): step-0 loss ≤ 0.7 × ln(151936) ≈ 8.35
                               (proves init weights flow through forward)
- FALSIFY-004 (ship-blocking): checkpoint.apr written with valid
                               magic bytes (0x41 0x50 0x52 0x00 v2 OR
                               0x41 0x50 0x52 0x4E v1)
- FALSIFY-005 (ship-blocking): val_loss after 500 steps < 9.38
                               (the §34 370M-from-scratch ceiling)
- FALSIFY-006 (advisory):     no CUDA OOM / illegal-address / launch-
                               OoR errors during run

Five-Whys (why this contract first, then live dispatch):

1. Why a contract before the dispatch? Per CLAUDE.md "Contract-first
   design: NEVER write code before writing a provable contract."
   Even though 5g.2 is "0 LOC operator-dispatch", it has shippable
   semantics that deserve falsification scaffolding.
2. Why these particular six gates? They cover the four orthogonal
   failure modes of a fine-tune-from-init dispatch: process-level
   (exit/wall), correctness (step-0 baseline + val_loss), and
   serialization (checkpoint magic bytes + GPU resource health).
3. Why DRAFT status (not PROPOSED, not ACTIVE)? DRAFT means "schema
   validated, falsifiers authored, but no live evidence yet."
   Status flips to ACTIVE_RUNTIME via §59 spec amendment after the
   live dispatch produces evidence.
4. Why a separate contract from apr-pretrain-from-init-v1? The
   sibling contract pins the in-process semantics of init loading
   (load_init_tensors_from_apr, populate_trainer_from_init_tensors).
   This new contract pins the END-TO-END dispatch outcome — they
   compose at the dispatch boundary.
5. Why the val_loss < 9.38 threshold (not 5.0 or 7.0)? §34's 200K-
   step retrain confirmed val_loss=9.38 as the 370M-from-scratch
   capacity ceiling on this corpus. A from-init pivot must beat
   from-scratch, otherwise §49's strategy reasoning is wrong.

Pre-requisites VERIFIED on host (lambda-vector RTX 4090):
- /mnt/nvme-raid0/models/qwen2.5-coder-0.5b-instruct-fp16.apr exists
- /mnt/nvme-raid0/data/codeparrot-python-permissive-shards-qwen has
  228 shards / 2.278B tokens (manifest.json reconstructed by PR #1575)
- `apr pretrain --init <PATH>` end-to-end runnable per §53 (#1494 MERGED)
- Polymorphic preflight per §55 (#1500 MERGED)

Quality gates:
- `pv validate contracts/apr-pretrain-init-finetune-v1.yaml`: 0 errors
- `pv lint --strict-test-binding`: 9/9 gates PASS

SHIP-TWO impact:
- MODEL-1 ship %: unchanged at 91% (this is MODEL-2 prep work)
- MODEL-2 ship %: unchanged at 57% (this PR is contract-only;
  ship-% flips on §59 amendment after live verdict)
- Unblocks: §59 spec amendment recording 5g.2 dispatch result

Next steps (follow-ups, NOT this PR):
- LIVE dispatch on RTX 4090 (~20-60 min wall, pre-authorized per
  feedback_compute_pre_authorized.md)
- §59 spec amendment v3.05.0 → v3.06.0 with verdict + ship-% flip
- Contract status DRAFT → ACTIVE_RUNTIME

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 9, 2026
…DE-PRETRAIN-INIT-CUDA-WIREUP-001)

Mirror the CPU path's `build_shared_trainer_with_init` (§50.4 step 5f.4)
into the CUDA backend so `apr pretrain --init <PATH> --device cuda` can
fine-tune from a public pretrained checkpoint on RTX 4090 — the only
remaining ship-blocker for SHIP-TWO §56.4 step 5g.2.

This PR:

- Adds `entrenar::train::pretrain_real_cuda::build_shared_cuda_trainer_with_init`,
  symmetric to the CPU sibling. Composes the SAME §50.4 step-5f machinery
  through both backends:
    5c:   build_transformer_config(init_arch)
    5f.1: validate_pretrain_init_arch_compatible(init_arch) — encoder rejection
    5f.2: load_init_tensors_from_apr(path) — read APR weights
    5f.3: populate_trainer_from_init_tensors(transformer, &tensors) — populate CPU model
    5f.5: CudaTransformerTrainer::with_model uploads populated blocks
          / final_norm / lm_head / embed_tokens to GPU.
  The §50.4 step 5f.1/5f.2/5f.3 helpers are reused VERBATIM — populate
  semantics are identical between CPU and CUDA backends.

- Updates `apr-cli::drive_real_cuda` to accept the same `init_arch:
  Option<&TransformerConfig>` + `init_path: Option<&Path>` pair as the
  CPU path. When either is `Some`, routes through the new builder.
  When both are `None`, preserves the existing from-scratch baseline
  (INV-ARCH-370M-001 stays enforced on the from-scratch CUDA path).

- Removes the `FALSIFY-APR-PRETRAIN-INIT-CUDA-001` fail-fast Err in
  `drive_real`. The `pub(crate) const FALSIFY_APR_PRETRAIN_INIT_CUDA_001_MSG`
  survives and is repurposed as a drift-prevention sentinel — its
  payload now reads "is wired for --device cuda via
  build_shared_cuda_trainer_with_init (5f.5 SHIPPED)" so a future
  regression that re-introduces a fail-fast fires the sentinel test
  before the contract reference goes stale.

Five-Whys (root-cause class) for the wireup itself:

1. Why was the CUDA wireup deferred while the CPU wireup landed in
   PR #1494? §50.4 step 5f.4 was the smallest cascade-completing PR;
   landing both backends in one PR conflated the algorithm-level
   wireup with the CUDA-feature-build dependency. Per
   `feedback_falsifier_first_cascade_pattern.md`, 1 PR ≈ 1 logical
   change.
2. Why does the CUDA path even need its own builder? Because the
   `CudaTransformerTrainer` constructor uploads weights to GPU at
   allocation time — the populated CPU model must exist BEFORE the
   GPU upload, or the GPU sees random initialization while the CPU
   model has the loaded init.
3. Why pass the populated CPU `Transformer` to `with_model` rather
   than loading directly into GPU buffers? Because the CUDA upload
   path (`upload_blocks` + `final_norm` + `lm_head`) reads weights
   FROM the CPU `Transformer` struct. The cleanest symmetry is
   "build CPU model, populate via shared helper, hand to CUDA
   constructor" — the same helper closes the §28 SHIP-007 silent-
   gibberish defect class on both backends.
4. Why preserve the const sentinel rather than delete it? The const
   is referenced by name in `apr-pretrain-arch-polymorphic-v1.yaml`
   v1.4.0..v1.6.0 changelog and falsifier entries. Deleting it would
   break the contract's audit trail. Repurposing it (semantic flip
   from "fail-fast" to "is wired") preserves the audit chain while
   the new payload still anchors a drift-prevention test.
5. Why does this PR not run the LIVE 500-step fine-tune? Per PR
   atomicity: this PR ships the wireup. The 500-step val_loss < 9.38
   verdict is gated by `apr-pretrain-init-finetune-v1.yaml` v1.0.0
   (PR #1576) — that contract's FALSIFY-APR-PRETRAIN-INIT-FINETUNE-005
   flips MODEL-2 ship % 57% → ≥58%. The two PRs compose: this PR's
   wireup is the prerequisite; PR #1576's contract is the verdict.

LIVE END-TO-END DOGFOOD on lambda-vector RTX 4090 (this branch built
with `--features cuda`):

  $ apr pretrain --dataset .../codeparrot-python-permissive-shards-qwen \
        --tokenizer .../qwen-0.5b-tokenizer-extracted \
        --run-dir .../5g-2-smoke-1step-cuda-post5f5 \
        --mode finetune --num-steps 1 --batch-size 2 --seq-length 256 \
        --device cuda \
        --init .../qwen2.5-coder-0.5b-instruct-fp16.apr

  [CUDA] cuBLAS initialized — forward TF32 tensor cores
  [CUDA] Pre-warmed 27 forward kernels
  ✓ 24 transformer blocks uploaded to GPU
  ✓ GPU training state allocated (LM head: 544.5 MB)
  === Run Result ===
    OK CONVERGED  final val_loss=0.6847 after 1 epoch(s)

  Checkpoint: 2.35 GiB, 219 tensors, valid APR v2 (✓ checksum).

This live run discharges:
  - FALSIFY-APR-PRETRAIN-INIT-CUDA-001 (sentinel, post-5f.5)
  - FALSIFY-APR-PRETRAIN-INIT-FINETUNE-001 (exit 0)
  - FALSIFY-APR-PRETRAIN-INIT-FINETUNE-004 (checkpoint written)
  - Partial discharge of FALSIFY-APR-PRETRAIN-INIT-FINETUNE-005
    (val_loss=0.6847 << 9.38 ceiling, on 1-step fine-tune; 500-step
    LIVE remains the binding evidence under PR #1576's contract).

Contract updates:

- `contracts/apr-pretrain-arch-polymorphic-v1.yaml`: v1.6.0 → v1.7.0.
  - FALSIFY-CUDA-001 semantic flip (fail-fast → wireup-is-wired sentinel)
  - NEW FALSIFY-CUDA-002 (paired-args invariant on the new builder)
  - NEW FALSIFY-CUDA-003 (encoder family rejection on the new builder)
  - All three new tests fire WITHOUT a CUDA runtime — they exercise
    the args-check and encoder-rejection paths that happen before any
    GPU allocation.

Quality gates:
- `pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml`: 0 errors
- `pv lint --strict-test-binding`: 9/9 gates PASS
- `cargo test -p apr-cli --features training --lib`: 5644/5644 PASS
- `cargo test -p apr-cli --features training --test cli_commands`: 8/8 PASS
- `cargo test -p aprender-train --features cuda --lib build_shared_cuda_trainer_with_init`: 2/2 PASS
- `cargo clippy -p apr-cli --features training --lib -- -D warnings`: clean
- `cargo check -p apr-cli --features training`: clean
- `cargo check -p apr-cli --features training,cuda`: clean
- LIVE: `apr pretrain --init Qwen.apr --device cuda` runs end-to-end on RTX 4090

SHIP-TWO impact:
- MODEL-1 ship %: unchanged at 91% (this is MODEL-2 prep)
- MODEL-2 ship %: unchanged at 57% (5g.2 LIVE 500-step verdict still
  required to flip 57% → ≥58%; this PR closes the only remaining
  technical blocker — a 500-step dispatch is now operator-runnable).
- §50.4 cascade COMPLETE (5a-5f.5 all shipped; only 5g LIVE remains).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 9, 2026
…DE-PRETRAIN-INIT-CUDA-WIREUP-001) (#1577)

Mirror the CPU path's `build_shared_trainer_with_init` (§50.4 step 5f.4)
into the CUDA backend so `apr pretrain --init <PATH> --device cuda` can
fine-tune from a public pretrained checkpoint on RTX 4090 — the only
remaining ship-blocker for SHIP-TWO §56.4 step 5g.2.

This PR:

- Adds `entrenar::train::pretrain_real_cuda::build_shared_cuda_trainer_with_init`,
  symmetric to the CPU sibling. Composes the SAME §50.4 step-5f machinery
  through both backends:
    5c:   build_transformer_config(init_arch)
    5f.1: validate_pretrain_init_arch_compatible(init_arch) — encoder rejection
    5f.2: load_init_tensors_from_apr(path) — read APR weights
    5f.3: populate_trainer_from_init_tensors(transformer, &tensors) — populate CPU model
    5f.5: CudaTransformerTrainer::with_model uploads populated blocks
          / final_norm / lm_head / embed_tokens to GPU.
  The §50.4 step 5f.1/5f.2/5f.3 helpers are reused VERBATIM — populate
  semantics are identical between CPU and CUDA backends.

- Updates `apr-cli::drive_real_cuda` to accept the same `init_arch:
  Option<&TransformerConfig>` + `init_path: Option<&Path>` pair as the
  CPU path. When either is `Some`, routes through the new builder.
  When both are `None`, preserves the existing from-scratch baseline
  (INV-ARCH-370M-001 stays enforced on the from-scratch CUDA path).

- Removes the `FALSIFY-APR-PRETRAIN-INIT-CUDA-001` fail-fast Err in
  `drive_real`. The `pub(crate) const FALSIFY_APR_PRETRAIN_INIT_CUDA_001_MSG`
  survives and is repurposed as a drift-prevention sentinel — its
  payload now reads "is wired for --device cuda via
  build_shared_cuda_trainer_with_init (5f.5 SHIPPED)" so a future
  regression that re-introduces a fail-fast fires the sentinel test
  before the contract reference goes stale.

Five-Whys (root-cause class) for the wireup itself:

1. Why was the CUDA wireup deferred while the CPU wireup landed in
   PR #1494? §50.4 step 5f.4 was the smallest cascade-completing PR;
   landing both backends in one PR conflated the algorithm-level
   wireup with the CUDA-feature-build dependency. Per
   `feedback_falsifier_first_cascade_pattern.md`, 1 PR ≈ 1 logical
   change.
2. Why does the CUDA path even need its own builder? Because the
   `CudaTransformerTrainer` constructor uploads weights to GPU at
   allocation time — the populated CPU model must exist BEFORE the
   GPU upload, or the GPU sees random initialization while the CPU
   model has the loaded init.
3. Why pass the populated CPU `Transformer` to `with_model` rather
   than loading directly into GPU buffers? Because the CUDA upload
   path (`upload_blocks` + `final_norm` + `lm_head`) reads weights
   FROM the CPU `Transformer` struct. The cleanest symmetry is
   "build CPU model, populate via shared helper, hand to CUDA
   constructor" — the same helper closes the §28 SHIP-007 silent-
   gibberish defect class on both backends.
4. Why preserve the const sentinel rather than delete it? The const
   is referenced by name in `apr-pretrain-arch-polymorphic-v1.yaml`
   v1.4.0..v1.6.0 changelog and falsifier entries. Deleting it would
   break the contract's audit trail. Repurposing it (semantic flip
   from "fail-fast" to "is wired") preserves the audit chain while
   the new payload still anchors a drift-prevention test.
5. Why does this PR not run the LIVE 500-step fine-tune? Per PR
   atomicity: this PR ships the wireup. The 500-step val_loss < 9.38
   verdict is gated by `apr-pretrain-init-finetune-v1.yaml` v1.0.0
   (PR #1576) — that contract's FALSIFY-APR-PRETRAIN-INIT-FINETUNE-005
   flips MODEL-2 ship % 57% → ≥58%. The two PRs compose: this PR's
   wireup is the prerequisite; PR #1576's contract is the verdict.

LIVE END-TO-END DOGFOOD on lambda-vector RTX 4090 (this branch built
with `--features cuda`):

  $ apr pretrain --dataset .../codeparrot-python-permissive-shards-qwen \
        --tokenizer .../qwen-0.5b-tokenizer-extracted \
        --run-dir .../5g-2-smoke-1step-cuda-post5f5 \
        --mode finetune --num-steps 1 --batch-size 2 --seq-length 256 \
        --device cuda \
        --init .../qwen2.5-coder-0.5b-instruct-fp16.apr

  [CUDA] cuBLAS initialized — forward TF32 tensor cores
  [CUDA] Pre-warmed 27 forward kernels
  ✓ 24 transformer blocks uploaded to GPU
  ✓ GPU training state allocated (LM head: 544.5 MB)
  === Run Result ===
    OK CONVERGED  final val_loss=0.6847 after 1 epoch(s)

  Checkpoint: 2.35 GiB, 219 tensors, valid APR v2 (✓ checksum).

This live run discharges:
  - FALSIFY-APR-PRETRAIN-INIT-CUDA-001 (sentinel, post-5f.5)
  - FALSIFY-APR-PRETRAIN-INIT-FINETUNE-001 (exit 0)
  - FALSIFY-APR-PRETRAIN-INIT-FINETUNE-004 (checkpoint written)
  - Partial discharge of FALSIFY-APR-PRETRAIN-INIT-FINETUNE-005
    (val_loss=0.6847 << 9.38 ceiling, on 1-step fine-tune; 500-step
    LIVE remains the binding evidence under PR #1576's contract).

Contract updates:

- `contracts/apr-pretrain-arch-polymorphic-v1.yaml`: v1.6.0 → v1.7.0.
  - FALSIFY-CUDA-001 semantic flip (fail-fast → wireup-is-wired sentinel)
  - NEW FALSIFY-CUDA-002 (paired-args invariant on the new builder)
  - NEW FALSIFY-CUDA-003 (encoder family rejection on the new builder)
  - All three new tests fire WITHOUT a CUDA runtime — they exercise
    the args-check and encoder-rejection paths that happen before any
    GPU allocation.

Quality gates:
- `pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml`: 0 errors
- `pv lint --strict-test-binding`: 9/9 gates PASS
- `cargo test -p apr-cli --features training --lib`: 5644/5644 PASS
- `cargo test -p apr-cli --features training --test cli_commands`: 8/8 PASS
- `cargo test -p aprender-train --features cuda --lib build_shared_cuda_trainer_with_init`: 2/2 PASS
- `cargo clippy -p apr-cli --features training --lib -- -D warnings`: clean
- `cargo check -p apr-cli --features training`: clean
- `cargo check -p apr-cli --features training,cuda`: clean
- LIVE: `apr pretrain --init Qwen.apr --device cuda` runs end-to-end on RTX 4090

SHIP-TWO impact:
- MODEL-1 ship %: unchanged at 91% (this is MODEL-2 prep)
- MODEL-2 ship %: unchanged at 57% (5g.2 LIVE 500-step verdict still
  required to flip 57% → ≥58%; this PR closes the only remaining
  technical blocker — a 500-step dispatch is now operator-runnable).
- §50.4 cascade COMPLETE (5a-5f.5 all shipped; only 5g LIVE remains).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant