feat(apr-cli + aprender-train): apr pretrain --init wireup — §50.4 step 5f.4#1494
Merged
Conversation
…ep 5f.4
## Summary
Wire `apr pretrain --init <PATH>` end-to-end so step 5g LIVE 500-step
fine-tune can dispatch. Replaces the §49 step 4 "not yet wired" Err
with the actual init-tensor load + trainer populate path that
§50.4 steps 5f.1/5f.2/5f.3 made possible.
## Architecture
Two functions added/changed:
1. `entrenar::train::pretrain_real::build_shared_trainer_with_init` —
composes the §50.4 step-5f machinery (5c polymorphic dispatch +
5f.1 encoder rejection + 5f.2 load + 5f.3 populate) into a single
trainer-builder entry. init=None preserves the from-scratch baseline
byte-equivalent to `build_shared_trainer`. init=Some validates arch
family, builds the polymorphic config, loads tensors, populates.
2. `apr-cli/src/commands/pretrain.rs::run` — now extracts the init APR
file's TransformerConfig via existing `model_config::read_apr_architecture`
when `--init` is set, then plumbs both `init_arch` and `init_path`
through `drive_real → drive_real_cpu → build_shared_trainer_with_init`.
The polymorphic preflight (§50.4 step 5d) already used the EXTRACTED
vocab — this PR wires the call site to actually pass it.
## What this PR DOES NOT do
- **CUDA path** (~80 LOC follow-up as 5f.5): `drive_real_cuda` now
fail-fasts when --init is set rather than silently using random init
(FALSIFY-APR-PRETRAIN-INIT-CUDA-001). The cuBLAS trainer needs
symmetric `build_shared_cuda_trainer_with_init` which is out of scope.
- **Step 5g LIVE 500-step fine-tune** (operator dispatch): this PR makes
it dispatchable; running the 500 steps requires operator action.
## Discharges (per apr-pretrain-arch-polymorphic-v1)
- §init_load_semantics integration: load + populate composed end-to-end
- §arch_extraction_signature integration: read_apr_architecture wired
- §qwen_tokenizer_vocab_compatibility integration: extracted vocab
flows into preflight call site (no longer hardcoded Llama370M)
- FALSIFY-APR-PRETRAIN-INIT-007 (population) at INTEGRATION level
- The legacy "not yet wired" guard from §49 step 4 is RETIRED — the
drift-prevention test now pins the new fail-closed semantic.
## Tests (8 new across 2 crates, all pass)
- `aprender-train`: 4 new tests for `build_shared_trainer_with_init`:
- `_none_uses_llama370m_shape` (regression-free init=None)
- `_rejects_unpaired_args` (caller-bug guard)
- `_rejects_encoder_family` (FALSIFY-007 integration)
- `_decoder_family_proceeds_to_tensor_load` (failure ordering pin)
- `apr-cli`: 2 retrofitted tests for the new fail-closed semantic:
- `pretrain_init_valid_magic_but_bogus_metadata_fails_at_arch_extraction`
(replaces the old "not yet wired" trip-wire)
- `pretrain_init_v1_magic_aprn_passes_validate_init_apr_path`
(helper now returns Ok on valid magic)
19/19 pretrain_real tests pass. 23/23 apr-cli pretrain tests pass.
cargo clippy --lib -- -D warnings clean across both crates.
## Five Whys
1. **Why was 5f.4 needed at all?** §50's 5a-5h decomposition assumed
the CLI dispatch would naturally invoke the helper functions; live
source inspection (§52 amendment) revealed the dispatch hardcoded
"not yet wired" Err. 5f.4 is the explicit wireup.
2. **Why is removing the safety Err so load-bearing?** The §28 SHIP-007
lesson: silently random-init via a half-implemented dispatch is the
exact "silent gibberish" defect class. Removing the safety Err
without the wireup would manifest as a multi-epoch divergence
masquerading as a corpus-quality issue.
3. **Why a separate polymorphic builder rather than overload `build_shared_trainer`?**
`build_shared_trainer` enforces INV-ARCH-370M-001 (param-count band)
which only applies to from-scratch Llama370M. The polymorphic builder
sidesteps it by design — Qwen2.5-0.5B is 0.5B params, outside the
band by intent.
4. **Why fail-fast on `--init` + `--device cuda` rather than silently
ignore?** Same reasoning as #2: silent CUDA random-init would
bisect the same "silent gibberish" class. 5f.5 follow-up wires
symmetric CUDA path; until then, fail-closed.
5. **Why couldn't this be inside #1483 (the populate PR)?** Different
crate (apr-cli vs aprender-train), different review concern (CLI
plumbing vs trainer mutation), different test surface. One atomic
PR per file/crate boundary.
## Test plan
- [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (19/19 pass)
- [x] `cargo test -p apr-cli --lib commands::pretrain` (23/23 pass)
- [x] `cargo clippy -p aprender-train -p apr-cli --lib -- -D warnings` (clean)
- [x] `cargo check -p apr-cli --lib` (clean)
- [ ] Operator-dispatched: `apr pretrain --init <Qwen2.5-Coder-0.5B>.apr`
smoke that fires 50 training steps end-to-end (5g LIVE prelude;
operator action in next session)
## Cascade context
This is the §52-identified gap closing the §50.4 step 5f sub-cascade:
- 5f.1 encoder validator: PR #1479 ✅ MERGED
- 5f.2 load_init_tensors_from_apr: PR #1481 ✅ MERGED
- 5f.3 populate_trainer_from_init_tensors: PR #1483 (mergeable, in queue)
- **5f.4 CLI wireup: THIS PR**
- 5g LIVE 500-step fine-tune: operator dispatch (next)
- 5h stamp + publish: ~10 LOC follow-up
Once 5f.4 lands AND 5g produces val_loss < 9.38 evidence, MODEL-2 ship % moves 57% → ≥58%.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…ION-COMPLETE; contract v1.1.0 → v1.2.0 FUNCTIONAL (#1495) §50.4 cascade INTEGRATION-COMPLETE on main with PR #1494 merging at 2026-05-05T01:48:14Z. The `apr pretrain --init <PATH>` flow is now end-to-end functional on CPU; the legacy "not yet wired" Err is RETIRED; step 5g LIVE is the only remaining gate before MODEL-2 ship-% can move from 57% → ≥58%. Spec amendment §53: - Updated falsifier scoreboard: 6/8 INTEGRATION (001/002/003/005/006/007 via live CLI dispatch); 2/8 PARTIAL_ALGORITHM_LEVEL (004 forward-pass smoke + 008 contract validation are inherently algorithm-level). - Step roadmap: 5a-5f.4 ✅ MERGED; 5f.5 (CUDA wireup) NOT YET STARTED; 5g (LIVE 500-step fine-tune) operator-dispatchable on RTX 4090. - Cascade ships statistics: 11 PRs over 2 days (#1471/#1472/#1473/#1474/#1475/#1476/#1478/#1479/#1481/#1482/#1483/#1486/#1494). - MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57% (gated on 5g empirical val_loss < 9.38 evidence). - 3 CI andon classes documented as feedback memories during cascade (workspace-test missing-binary, trueno SIGSEGV-on-cleanup, auto-merge behind-state). Contract apr-pretrain-arch-polymorphic-v1 v1.1.0 → v1.2.0 FUNCTIONAL: - All 8 falsifiers PASS on main; 6/8 reach INTEGRATION via the user-facing `apr pretrain --init` flow. - verification_summary updated: tested 7 → 8; status partial → functional. - Added §52 + §53 references. - Promotion to DISCHARGED still requires §50.4 step 5g LIVE empirical 500-step fine-tune on canonical Qwen2.5-Coder-0.5B-Instruct.apr producing val_loss < 9.38. `pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml` exits 0. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1494 merge commit 9afca16 Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Merged
3 tasks
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…requisites + live preflight smoke (#1496) §53 closed with "step 5g LIVE remains" framing 5g as a single operator dispatch. Live source inspection of the post-#1494 binary plus an actual smoke run revealed step 5g has multi-step prerequisites that were NOT enumerated in §50's original 8-step decomposition. Live empirical smoke on canonical inputs: apr pretrain --init <Qwen2.5-Coder-0.5B-Instruct-fp16.apr> --tokenizer <legacy 50257-vocab dir> --dataset <legacy codeparrot shards> → CORRECT FAIL-FAST: GATE-ARCH-370M-011 (INV-ARCH-370M-006) violated: tokenizer vocab_size (50257) != model vocab_size (151936) This is the FIRST end-to-end runtime evidence that the §50.4 cascade's polymorphic preflight (PR #1476 + #1494) works in the user-facing CLI: - Read --init APR metadata: vocab=151936, hidden=896, layers=24 - target_vocab = init_arch.vocab_size = 151936 (NOT legacy 50257) - Tokenizer dir vocab.json count = 50257 - Mismatch → fail-fast before trainer allocation But the smoke also surfaces 5g's true scope. A Qwen-vocab tokenizer dir + Qwen-tokenized corpus must exist BEFORE the preflight passes. Neither exists on this host today. Step 5g re-scoped: 5g.0 — Qwen tokenizer extraction (~50 LOC, ~5min wall) [next PR] 5g.1 — Qwen-tokenized corpus (0 LOC, ~10hr wall, operator-dispatch) 5g.2 — LIVE 500-step fine-tune (0 LOC, ~20-60min, operator-dispatch) 5g.3 — val_loss < 9.38 verdict; flip MODEL-2 ship % 57% → ≥58% Methodology takeaway: top-down spec planning consistently underestimates scope-coupling between heterogeneous code paths. This is the third instance of the same lesson: - §50 found §49's "0 LOC" was 8-step (architectural coupling) - §52 found §50's "5f weight load" was 2-step (CLI dispatch coupling) - §54 found §53's "5g LIVE" is 4-step (tokenizer-format coupling) Falsifier scoreboard impact: - FALSIFY-APR-PRETRAIN-ARCH-005/006 reach LIVE-INTEGRATION level (proven via real CLI dispatch, not just unit tests) - Contract `apr-pretrain-arch-polymorphic-v1` v1.2.0 FUNCTIONAL is reinforced; promotion to DISCHARGED waits for 5g.3 val_loss measurement Net effects: - Spec v2.98.0 → v2.99.0 - MODEL-1 ship % unchanged at 91% - MODEL-2 ship % unchanged at 57% (gated on 5g.3) - Coverage tally: snapshot, no contract status flip Refs: SPEC-SHIP-TWO-001 §50.4 step 5g, PR #1476 + #1494, evidence/section-54-5g-prereqs-2026-05-05/preflight-fail-fast-smoke.md Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
5 tasks
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…Y-APR-PRETRAIN-INIT-CUDA-001 + drift-prevention test (#1502) Pre-this-bump, the falsifier id `FALSIFY-APR-PRETRAIN-INIT-CUDA-001` was REFERENCED in the v1.2.0 changelog and verification_summary BUT was not formally registered as a falsification_test entry. The fail-fast guard at `crates/apr-cli/src/commands/pretrain.rs::drive_real` (post-#1494 5f.4 wireup) returns Err with this id when `init_arch.is_some() && device.is_cuda()`, but no test pinned the citation. A future refactor could silently drop the citation OR let CUDA + --init fall through → §28 SHIP-007 "silent gibberish" defect class. ## What ships Contract apr-pretrain-arch-polymorphic-v1 v1.3.0 → v1.4.0 FUNCTIONAL: - Adds FALSIFY-APR-PRETRAIN-INIT-CUDA-001 as formal falsification_test (PARTIAL_ALGORITHM_LEVEL). - 10 → 11 falsifiers, all PASS. Source: - Extracted error message into `pub(crate) const FALSIFY_APR_PRETRAIN_INIT_CUDA_001_MSG` so the const itself can be unit-tested without a `--features cuda` build. Test: - `drive_real_cuda_init_path_fail_fasts_with_falsifier_citation` pins: (a) falsifier id appears (b) "not yet wired for --device cuda" phrase appears (c) "step 5f.5 follow-up" reference appears (d) both workarounds (--device cpu OR omit --init) are suggested ## Why drift-prevention matters Promotion of CUDA-001 to DISCHARGED requires §50.4 step 5f.5 LIVE (CUDA wireup landed + GPU smoke). That's multi-PR scope (refactor upload_blocks + new constructor + wire CLI). Until then, the fail-fast guard is the only safety. Without a formal falsifier + test, that guard silently regresses if anyone refactors drive_real. ## Net effects - Contract apr-pretrain-arch-polymorphic-v1 v1.3.0 → v1.4.0 FUNCTIONAL. - 11 falsifiers total (10 → 11), all PASS. - 1 new drift-prevention test. - 1 source const extraction (lockup). - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. This is a quality-and-hygiene PR while the 5g.1 17hr corpus retokenize runs in the background. Doesn't move ship-% but reduces drift risk + binds a previously-free-floating falsifier reference. Refs: SPEC-SHIP-TWO-001 §50.4 step 5f.5, contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.4.0 Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 5, 2026
… drift correction v1.1.0 cited 8 specific test names; live source inspection 2026-05-05 revealed only 3 of them existed in `crates/apr-cli/src/commands/pretrain.rs`. The §50.4 cascade (5f.4 wireup landed via PR #1494) authored different test names than the ones v1.1.0 stamped, leaving 6 falsifier bindings with dangling `test:` references. ## Drift inventory Falsifier | v1.1.0 cited test | Exists? --- | --- | --- 001 | apr pretrain --help | grep -qE 'init' |⚠️ shell pipe, not unit test 002 | pretrain_no_init_synthetic_ok | ❌ 003 | pretrain_init_missing_file_errors | ✅ 004 | pretrain_init_bad_magic_errors | ✅ 005 | pretrain_init_arch_mismatch_errors | ❌ 006 | pretrain_init_step0_loss_below_from_scratch | ❌ (LIVE-only) 007 | pretrain_init_flag_registered | ❌ 008 | pv validate | ✅ 009 | pretrain_init_optimizer_state_fresh | ❌ (LIVE-only) 010 | pretrain_init_loadback_idempotent | ❌ (LIVE-only) ## Resolution Re-align each falsifier to a test that actually exists, OR explicitly mark the falsifier PARTIAL_ALGORITHM_LEVEL with a `LIVE-PENDING:` prefix in the `test:` field naming the exact prerequisite that prevents unit-test binding. Falsifier | v1.2.0 binding --- | --- 001 | pretrain_init_flag_absent_parses_to_none + pretrain_init_flag_parses_path 002 | synthetic_pretrain_end_to_end_happy_path 003 | pretrain_init_missing_file_errors (unchanged) 004 | pretrain_init_bad_magic_errors + pretrain_init_empty_file_errors 005 | pretrain_init_valid_magic_but_bogus_metadata_fails_at_arch_extraction 006 | LIVE-PENDING (5g.2 fine-tune dispatch) 007 | LIVE-PENDING (cli_commands integration test follow-up) 008 | pv validate (unchanged) 009 | LIVE-PENDING (5g.2 + Adam state debug accessor) 010 | LIVE-PENDING (5g.2 smoke evidence pack) ## Net effect - Status remains PARTIAL_ALGORITHM_LEVEL. - 4/10 falsifiers bound to existing PASSING unit tests. - 6/10 explicitly LIVE-PENDING with named prerequisites. - 25/25 commands::pretrain::tests pass. - pv validate exits 0. Promotion to FUNCTIONAL gated on 006/007 binding (which need the 5g.2 LIVE fine-tune + the 3-surface integration test from cli_commands.rs). DISCHARGED still gated on §50.4 step 5g.3 LIVE val_loss < 9.38. ## Five Whys 1. Why did the test references drift? §50.4 cascade (5b through 5f.4) landed across many PRs; each authored test names per its own convention without cross-checking the v1.1.0 contract claims. 2. Why is "no test for X" not the same as "X is broken"? The IMPL exists and works (proven by the 25-test sweep). The DRIFT is in the contract's test-name claim, not in the underlying invariants. 3. Why mark some PARTIAL_ALGORITHM_LEVEL and document `LIVE-PENDING:`? Because the false binding (claiming a test exists when it doesn't) is worse than honest "no test yet"; future agents reading the contract get a clear signal of what's binding and what's pending. 4. Why not author the missing tests in this PR? Tests 006/009/010 are LIVE-only (need 942MB FP16 init APR + 5g.2 dispatch); test 007 needs an integration test in `cli_commands.rs`. Each is its own future PR; bundling them here would mix concerns. 5. Why bump to v1.2.0 (not v1.1.1 patch)? The contract semantics didn't change but the test-binding INVARIANT (every cited test exists) was broken in v1.1.0. v1.2.0 restores that invariant. ## Test plan - [x] pv validate exits 0 - [x] PMAT pre-commit quality gates pass - [x] 25/25 commands::pretrain::tests pass - [ ] CI gate green - [ ] Auto-merge fires on green CI Refs: SPEC-SHIP-TWO-001 §50.4 cascade (5f.4 PR #1494), contracts/apr-pretrain-from-init-v1.yaml v1.2.0 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…ft correction (#1504) * contract(apr-pretrain-from-init-v1): v1.1.0 → v1.2.0 — test-reference drift correction v1.1.0 cited 8 specific test names; live source inspection 2026-05-05 revealed only 3 of them existed in `crates/apr-cli/src/commands/pretrain.rs`. The §50.4 cascade (5f.4 wireup landed via PR #1494) authored different test names than the ones v1.1.0 stamped, leaving 6 falsifier bindings with dangling `test:` references. ## Drift inventory Falsifier | v1.1.0 cited test | Exists? --- | --- | --- 001 | apr pretrain --help | grep -qE 'init' |⚠️ shell pipe, not unit test 002 | pretrain_no_init_synthetic_ok | ❌ 003 | pretrain_init_missing_file_errors | ✅ 004 | pretrain_init_bad_magic_errors | ✅ 005 | pretrain_init_arch_mismatch_errors | ❌ 006 | pretrain_init_step0_loss_below_from_scratch | ❌ (LIVE-only) 007 | pretrain_init_flag_registered | ❌ 008 | pv validate | ✅ 009 | pretrain_init_optimizer_state_fresh | ❌ (LIVE-only) 010 | pretrain_init_loadback_idempotent | ❌ (LIVE-only) ## Resolution Re-align each falsifier to a test that actually exists, OR explicitly mark the falsifier PARTIAL_ALGORITHM_LEVEL with a `LIVE-PENDING:` prefix in the `test:` field naming the exact prerequisite that prevents unit-test binding. Falsifier | v1.2.0 binding --- | --- 001 | pretrain_init_flag_absent_parses_to_none + pretrain_init_flag_parses_path 002 | synthetic_pretrain_end_to_end_happy_path 003 | pretrain_init_missing_file_errors (unchanged) 004 | pretrain_init_bad_magic_errors + pretrain_init_empty_file_errors 005 | pretrain_init_valid_magic_but_bogus_metadata_fails_at_arch_extraction 006 | LIVE-PENDING (5g.2 fine-tune dispatch) 007 | LIVE-PENDING (cli_commands integration test follow-up) 008 | pv validate (unchanged) 009 | LIVE-PENDING (5g.2 + Adam state debug accessor) 010 | LIVE-PENDING (5g.2 smoke evidence pack) ## Net effect - Status remains PARTIAL_ALGORITHM_LEVEL. - 4/10 falsifiers bound to existing PASSING unit tests. - 6/10 explicitly LIVE-PENDING with named prerequisites. - 25/25 commands::pretrain::tests pass. - pv validate exits 0. Promotion to FUNCTIONAL gated on 006/007 binding (which need the 5g.2 LIVE fine-tune + the 3-surface integration test from cli_commands.rs). DISCHARGED still gated on §50.4 step 5g.3 LIVE val_loss < 9.38. ## Five Whys 1. Why did the test references drift? §50.4 cascade (5b through 5f.4) landed across many PRs; each authored test names per its own convention without cross-checking the v1.1.0 contract claims. 2. Why is "no test for X" not the same as "X is broken"? The IMPL exists and works (proven by the 25-test sweep). The DRIFT is in the contract's test-name claim, not in the underlying invariants. 3. Why mark some PARTIAL_ALGORITHM_LEVEL and document `LIVE-PENDING:`? Because the false binding (claiming a test exists when it doesn't) is worse than honest "no test yet"; future agents reading the contract get a clear signal of what's binding and what's pending. 4. Why not author the missing tests in this PR? Tests 006/009/010 are LIVE-only (need 942MB FP16 init APR + 5g.2 dispatch); test 007 needs an integration test in `cli_commands.rs`. Each is its own future PR; bundling them here would mix concerns. 5. Why bump to v1.2.0 (not v1.1.1 patch)? The contract semantics didn't change but the test-binding INVARIANT (every cited test exists) was broken in v1.1.0. v1.2.0 restores that invariant. ## Test plan - [x] pv validate exits 0 - [x] PMAT pre-commit quality gates pass - [x] 25/25 commands::pretrain::tests pass - [ ] CI gate green - [ ] Auto-merge fires on green CI Refs: SPEC-SHIP-TWO-001 §50.4 cascade (5f.4 PR #1494), contracts/apr-pretrain-from-init-v1.yaml v1.2.0 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(contract+test): author pretrain_init_flag_registered + bind FALSIFY-007 CI lint engine flagged FALSIFY-APR-PRETRAIN-INIT-007 with PV-VER-001 Error: the cited test `pretrain_init_flag_registered` did not exist as a callable target, leaving the falsifier unfalsifiable. Author the missing test in `crates/apr-cli/tests/cli_commands.rs`: invokes `apr pretrain --help` against the installed binary and asserts `--init` is reachable. This closes the 3-surface drift triangle: (1) clap field, (2) unit tests in `pretrain.rs`, (3) integration test in `cli_commands.rs`. Update `apr-pretrain-from-init-v1.yaml` v1.2.0 to bind FALSIFY-007 to the new test and bump the changelog count from 4/10 to 5/10 falsifiers bound (LIVE-pending count drops from 6 to 5; FALSIFY-007 promoted out of LIVE-PENDING). Local verification: - cargo test pretrain_init_flag_registered: PASS - cargo test lint::tests::lint_passes_on_real_contracts: PASS - pv validate contracts/apr-pretrain-from-init-v1.yaml: 0 errors Refs: SPEC-SHIP-TWO-001 §50.4 cascade, contracts/apr-pretrain-from-init-v1.yaml v1.2.0, feedback_cli_subcommand_three_surface_drift.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…-005/006 test-reference drift (#1505) Same drift class as PR #1504 caught in apr-pretrain-from-init-v1. Test names cited in v1.1.0 changelog never matched the actual tests PR #1476 authored. Drift survived three intervening bumps (v1.1→v1.2→v1.3→v1.4) because each focused on adding new falsifiers, not auditing existing bindings. ## Drift inventory | Falsifier | v1.4.0 cited test | Exists? | Actual test | |---|---|---|---| | FALSIFY-005 | preflight_qwen_vocab_passes_with_qwen_init | ❌ | preflight_qwen_vocab_passes_with_qwen_target | | FALSIFY-006 | preflight_qwen_vocab_fails_without_init | ❌ | preflight_qwen_vocab_fails_with_llama_target | ## Resolution Update the `test:` field for FALSIFY-005 and FALSIFY-006 to reference the actual tests authored by PR #1476. No falsifier semantics change. No new tests added. ## Verification $ cargo test -p apr-cli --lib -- commands::pretrain::tests::preflight_qwen_vocab_passes_with_qwen_target test result: ok. 1 passed; ... $ cargo test -p apr-cli --lib -- commands::pretrain::tests::preflight_qwen_vocab_fails_with_llama_target test result: ok. 1 passed; ... $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml 0 error(s), 0 warning(s) ## Five Whys 1. Why did the drift survive 3 bumps? Each bump (v1.2/v1.3/v1.4) focused on ADDING new content (CUDA-001, relaxed bound, etc.); none audited existing bindings. 2. Why didn't the §50.4 cascade catch this? The cascade authored tests; the contract was authored separately. Names diverged at the boundary; no cross-check landed. 3. Why is this a contract-only fix (no source change)? The tests exist and pass — the IMPL is correct. Only the contract's text reference needed correction. 4. Why bump to v1.5.0 (not v1.4.1 patch)? Same logic as PR #1504: the test-binding INVARIANT (every cited test exists) was broken in v1.4.0. v1.5.0 restores it. 5. Why is this important if the impl is correct? Per feedback_no_guessing.md, contracts that cite non-existent tests are unfalsifiable — future agents reading the contract get a false signal that the falsifier is bound. PV-VER-001 lint will catch this; better to fix it than wait for the lint engine to flag. ## Net effects - Contract v1.4.0 → v1.5.0 FUNCTIONAL. - 11 falsifiers, all PASS — same count, but FALSIFY-005/006 now reference tests that actually exist. - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. This is hygiene work while 5g.1 (~12hr) corpus retokenize runs. Same defect class as PR #1504; together they close the test-reference drift across both pretrain contracts. Refs: SPEC-SHIP-TWO-001 §50.4 cascade (PRs #1473-#1494, #1502), contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.5.0, contracts/apr-pretrain-from-init-v1.yaml v1.2.0 (PR #1504, sibling fix) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2 tasks
noahgift
added a commit
that referenced
this pull request
May 9, 2026
… (PMAT-CODE-PRETRAIN-INIT-FINETUNE-001)
Adds contracts/apr-pretrain-init-finetune-v1.yaml v1.0.0 DRAFT, the
falsifier scaffold for SHIP-TWO §56.4 step 5g.2 — the LIVE 500-step
fine-tune dispatch that flips MODEL-2 ship % 57% → ≥58%.
Pins six falsifiable invariants for `apr pretrain --mode from-init
--init <Qwen.apr> --shards-dir <5g.1-corpus> --steps 500 --device cuda`:
- FALSIFY-001 (ship-blocking): exit code == 0
- FALSIFY-002 (advisory): wall ≤ 3600 s on RTX 4090
- FALSIFY-003 (ship-blocking): step-0 loss ≤ 0.7 × ln(151936) ≈ 8.35
(proves init weights flow through forward)
- FALSIFY-004 (ship-blocking): checkpoint.apr written with valid
magic bytes (0x41 0x50 0x52 0x00 v2 OR
0x41 0x50 0x52 0x4E v1)
- FALSIFY-005 (ship-blocking): val_loss after 500 steps < 9.38
(the §34 370M-from-scratch ceiling)
- FALSIFY-006 (advisory): no CUDA OOM / illegal-address / launch-
OoR errors during run
Five-Whys (why this contract first, then live dispatch):
1. Why a contract before the dispatch? Per CLAUDE.md "Contract-first
design: NEVER write code before writing a provable contract."
Even though 5g.2 is "0 LOC operator-dispatch", it has shippable
semantics that deserve falsification scaffolding.
2. Why these particular six gates? They cover the four orthogonal
failure modes of a fine-tune-from-init dispatch: process-level
(exit/wall), correctness (step-0 baseline + val_loss), and
serialization (checkpoint magic bytes + GPU resource health).
3. Why DRAFT status (not PROPOSED, not ACTIVE)? DRAFT means "schema
validated, falsifiers authored, but no live evidence yet."
Status flips to ACTIVE_RUNTIME via §59 spec amendment after the
live dispatch produces evidence.
4. Why a separate contract from apr-pretrain-from-init-v1? The
sibling contract pins the in-process semantics of init loading
(load_init_tensors_from_apr, populate_trainer_from_init_tensors).
This new contract pins the END-TO-END dispatch outcome — they
compose at the dispatch boundary.
5. Why the val_loss < 9.38 threshold (not 5.0 or 7.0)? §34's 200K-
step retrain confirmed val_loss=9.38 as the 370M-from-scratch
capacity ceiling on this corpus. A from-init pivot must beat
from-scratch, otherwise §49's strategy reasoning is wrong.
Pre-requisites VERIFIED on host (lambda-vector RTX 4090):
- /mnt/nvme-raid0/models/qwen2.5-coder-0.5b-instruct-fp16.apr exists
- /mnt/nvme-raid0/data/codeparrot-python-permissive-shards-qwen has
228 shards / 2.278B tokens (manifest.json reconstructed by PR #1575)
- `apr pretrain --init <PATH>` end-to-end runnable per §53 (#1494 MERGED)
- Polymorphic preflight per §55 (#1500 MERGED)
Quality gates:
- `pv validate contracts/apr-pretrain-init-finetune-v1.yaml`: 0 errors
- `pv lint --strict-test-binding`: 9/9 gates PASS
SHIP-TWO impact:
- MODEL-1 ship %: unchanged at 91% (this is MODEL-2 prep work)
- MODEL-2 ship %: unchanged at 57% (this PR is contract-only;
ship-% flips on §59 amendment after live verdict)
- Unblocks: §59 spec amendment recording 5g.2 dispatch result
Next steps (follow-ups, NOT this PR):
- LIVE dispatch on RTX 4090 (~20-60 min wall, pre-authorized per
feedback_compute_pre_authorized.md)
- §59 spec amendment v3.05.0 → v3.06.0 with verdict + ship-% flip
- Contract status DRAFT → ACTIVE_RUNTIME
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 9, 2026
… (PMAT-CODE-PRETRAIN-INIT-FINETUNE-001) (#1576) Adds contracts/apr-pretrain-init-finetune-v1.yaml v1.0.0 DRAFT, the falsifier scaffold for SHIP-TWO §56.4 step 5g.2 — the LIVE 500-step fine-tune dispatch that flips MODEL-2 ship % 57% → ≥58%. Pins six falsifiable invariants for `apr pretrain --mode from-init --init <Qwen.apr> --shards-dir <5g.1-corpus> --steps 500 --device cuda`: - FALSIFY-001 (ship-blocking): exit code == 0 - FALSIFY-002 (advisory): wall ≤ 3600 s on RTX 4090 - FALSIFY-003 (ship-blocking): step-0 loss ≤ 0.7 × ln(151936) ≈ 8.35 (proves init weights flow through forward) - FALSIFY-004 (ship-blocking): checkpoint.apr written with valid magic bytes (0x41 0x50 0x52 0x00 v2 OR 0x41 0x50 0x52 0x4E v1) - FALSIFY-005 (ship-blocking): val_loss after 500 steps < 9.38 (the §34 370M-from-scratch ceiling) - FALSIFY-006 (advisory): no CUDA OOM / illegal-address / launch- OoR errors during run Five-Whys (why this contract first, then live dispatch): 1. Why a contract before the dispatch? Per CLAUDE.md "Contract-first design: NEVER write code before writing a provable contract." Even though 5g.2 is "0 LOC operator-dispatch", it has shippable semantics that deserve falsification scaffolding. 2. Why these particular six gates? They cover the four orthogonal failure modes of a fine-tune-from-init dispatch: process-level (exit/wall), correctness (step-0 baseline + val_loss), and serialization (checkpoint magic bytes + GPU resource health). 3. Why DRAFT status (not PROPOSED, not ACTIVE)? DRAFT means "schema validated, falsifiers authored, but no live evidence yet." Status flips to ACTIVE_RUNTIME via §59 spec amendment after the live dispatch produces evidence. 4. Why a separate contract from apr-pretrain-from-init-v1? The sibling contract pins the in-process semantics of init loading (load_init_tensors_from_apr, populate_trainer_from_init_tensors). This new contract pins the END-TO-END dispatch outcome — they compose at the dispatch boundary. 5. Why the val_loss < 9.38 threshold (not 5.0 or 7.0)? §34's 200K- step retrain confirmed val_loss=9.38 as the 370M-from-scratch capacity ceiling on this corpus. A from-init pivot must beat from-scratch, otherwise §49's strategy reasoning is wrong. Pre-requisites VERIFIED on host (lambda-vector RTX 4090): - /mnt/nvme-raid0/models/qwen2.5-coder-0.5b-instruct-fp16.apr exists - /mnt/nvme-raid0/data/codeparrot-python-permissive-shards-qwen has 228 shards / 2.278B tokens (manifest.json reconstructed by PR #1575) - `apr pretrain --init <PATH>` end-to-end runnable per §53 (#1494 MERGED) - Polymorphic preflight per §55 (#1500 MERGED) Quality gates: - `pv validate contracts/apr-pretrain-init-finetune-v1.yaml`: 0 errors - `pv lint --strict-test-binding`: 9/9 gates PASS SHIP-TWO impact: - MODEL-1 ship %: unchanged at 91% (this is MODEL-2 prep work) - MODEL-2 ship %: unchanged at 57% (this PR is contract-only; ship-% flips on §59 amendment after live verdict) - Unblocks: §59 spec amendment recording 5g.2 dispatch result Next steps (follow-ups, NOT this PR): - LIVE dispatch on RTX 4090 (~20-60 min wall, pre-authorized per feedback_compute_pre_authorized.md) - §59 spec amendment v3.05.0 → v3.06.0 with verdict + ship-% flip - Contract status DRAFT → ACTIVE_RUNTIME Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 9, 2026
…DE-PRETRAIN-INIT-CUDA-WIREUP-001)
Mirror the CPU path's `build_shared_trainer_with_init` (§50.4 step 5f.4)
into the CUDA backend so `apr pretrain --init <PATH> --device cuda` can
fine-tune from a public pretrained checkpoint on RTX 4090 — the only
remaining ship-blocker for SHIP-TWO §56.4 step 5g.2.
This PR:
- Adds `entrenar::train::pretrain_real_cuda::build_shared_cuda_trainer_with_init`,
symmetric to the CPU sibling. Composes the SAME §50.4 step-5f machinery
through both backends:
5c: build_transformer_config(init_arch)
5f.1: validate_pretrain_init_arch_compatible(init_arch) — encoder rejection
5f.2: load_init_tensors_from_apr(path) — read APR weights
5f.3: populate_trainer_from_init_tensors(transformer, &tensors) — populate CPU model
5f.5: CudaTransformerTrainer::with_model uploads populated blocks
/ final_norm / lm_head / embed_tokens to GPU.
The §50.4 step 5f.1/5f.2/5f.3 helpers are reused VERBATIM — populate
semantics are identical between CPU and CUDA backends.
- Updates `apr-cli::drive_real_cuda` to accept the same `init_arch:
Option<&TransformerConfig>` + `init_path: Option<&Path>` pair as the
CPU path. When either is `Some`, routes through the new builder.
When both are `None`, preserves the existing from-scratch baseline
(INV-ARCH-370M-001 stays enforced on the from-scratch CUDA path).
- Removes the `FALSIFY-APR-PRETRAIN-INIT-CUDA-001` fail-fast Err in
`drive_real`. The `pub(crate) const FALSIFY_APR_PRETRAIN_INIT_CUDA_001_MSG`
survives and is repurposed as a drift-prevention sentinel — its
payload now reads "is wired for --device cuda via
build_shared_cuda_trainer_with_init (5f.5 SHIPPED)" so a future
regression that re-introduces a fail-fast fires the sentinel test
before the contract reference goes stale.
Five-Whys (root-cause class) for the wireup itself:
1. Why was the CUDA wireup deferred while the CPU wireup landed in
PR #1494? §50.4 step 5f.4 was the smallest cascade-completing PR;
landing both backends in one PR conflated the algorithm-level
wireup with the CUDA-feature-build dependency. Per
`feedback_falsifier_first_cascade_pattern.md`, 1 PR ≈ 1 logical
change.
2. Why does the CUDA path even need its own builder? Because the
`CudaTransformerTrainer` constructor uploads weights to GPU at
allocation time — the populated CPU model must exist BEFORE the
GPU upload, or the GPU sees random initialization while the CPU
model has the loaded init.
3. Why pass the populated CPU `Transformer` to `with_model` rather
than loading directly into GPU buffers? Because the CUDA upload
path (`upload_blocks` + `final_norm` + `lm_head`) reads weights
FROM the CPU `Transformer` struct. The cleanest symmetry is
"build CPU model, populate via shared helper, hand to CUDA
constructor" — the same helper closes the §28 SHIP-007 silent-
gibberish defect class on both backends.
4. Why preserve the const sentinel rather than delete it? The const
is referenced by name in `apr-pretrain-arch-polymorphic-v1.yaml`
v1.4.0..v1.6.0 changelog and falsifier entries. Deleting it would
break the contract's audit trail. Repurposing it (semantic flip
from "fail-fast" to "is wired") preserves the audit chain while
the new payload still anchors a drift-prevention test.
5. Why does this PR not run the LIVE 500-step fine-tune? Per PR
atomicity: this PR ships the wireup. The 500-step val_loss < 9.38
verdict is gated by `apr-pretrain-init-finetune-v1.yaml` v1.0.0
(PR #1576) — that contract's FALSIFY-APR-PRETRAIN-INIT-FINETUNE-005
flips MODEL-2 ship % 57% → ≥58%. The two PRs compose: this PR's
wireup is the prerequisite; PR #1576's contract is the verdict.
LIVE END-TO-END DOGFOOD on lambda-vector RTX 4090 (this branch built
with `--features cuda`):
$ apr pretrain --dataset .../codeparrot-python-permissive-shards-qwen \
--tokenizer .../qwen-0.5b-tokenizer-extracted \
--run-dir .../5g-2-smoke-1step-cuda-post5f5 \
--mode finetune --num-steps 1 --batch-size 2 --seq-length 256 \
--device cuda \
--init .../qwen2.5-coder-0.5b-instruct-fp16.apr
[CUDA] cuBLAS initialized — forward TF32 tensor cores
[CUDA] Pre-warmed 27 forward kernels
✓ 24 transformer blocks uploaded to GPU
✓ GPU training state allocated (LM head: 544.5 MB)
=== Run Result ===
OK CONVERGED final val_loss=0.6847 after 1 epoch(s)
Checkpoint: 2.35 GiB, 219 tensors, valid APR v2 (✓ checksum).
This live run discharges:
- FALSIFY-APR-PRETRAIN-INIT-CUDA-001 (sentinel, post-5f.5)
- FALSIFY-APR-PRETRAIN-INIT-FINETUNE-001 (exit 0)
- FALSIFY-APR-PRETRAIN-INIT-FINETUNE-004 (checkpoint written)
- Partial discharge of FALSIFY-APR-PRETRAIN-INIT-FINETUNE-005
(val_loss=0.6847 << 9.38 ceiling, on 1-step fine-tune; 500-step
LIVE remains the binding evidence under PR #1576's contract).
Contract updates:
- `contracts/apr-pretrain-arch-polymorphic-v1.yaml`: v1.6.0 → v1.7.0.
- FALSIFY-CUDA-001 semantic flip (fail-fast → wireup-is-wired sentinel)
- NEW FALSIFY-CUDA-002 (paired-args invariant on the new builder)
- NEW FALSIFY-CUDA-003 (encoder family rejection on the new builder)
- All three new tests fire WITHOUT a CUDA runtime — they exercise
the args-check and encoder-rejection paths that happen before any
GPU allocation.
Quality gates:
- `pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml`: 0 errors
- `pv lint --strict-test-binding`: 9/9 gates PASS
- `cargo test -p apr-cli --features training --lib`: 5644/5644 PASS
- `cargo test -p apr-cli --features training --test cli_commands`: 8/8 PASS
- `cargo test -p aprender-train --features cuda --lib build_shared_cuda_trainer_with_init`: 2/2 PASS
- `cargo clippy -p apr-cli --features training --lib -- -D warnings`: clean
- `cargo check -p apr-cli --features training`: clean
- `cargo check -p apr-cli --features training,cuda`: clean
- LIVE: `apr pretrain --init Qwen.apr --device cuda` runs end-to-end on RTX 4090
SHIP-TWO impact:
- MODEL-1 ship %: unchanged at 91% (this is MODEL-2 prep)
- MODEL-2 ship %: unchanged at 57% (5g.2 LIVE 500-step verdict still
required to flip 57% → ≥58%; this PR closes the only remaining
technical blocker — a 500-step dispatch is now operator-runnable).
- §50.4 cascade COMPLETE (5a-5f.5 all shipped; only 5g LIVE remains).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 9, 2026
…DE-PRETRAIN-INIT-CUDA-WIREUP-001) (#1577) Mirror the CPU path's `build_shared_trainer_with_init` (§50.4 step 5f.4) into the CUDA backend so `apr pretrain --init <PATH> --device cuda` can fine-tune from a public pretrained checkpoint on RTX 4090 — the only remaining ship-blocker for SHIP-TWO §56.4 step 5g.2. This PR: - Adds `entrenar::train::pretrain_real_cuda::build_shared_cuda_trainer_with_init`, symmetric to the CPU sibling. Composes the SAME §50.4 step-5f machinery through both backends: 5c: build_transformer_config(init_arch) 5f.1: validate_pretrain_init_arch_compatible(init_arch) — encoder rejection 5f.2: load_init_tensors_from_apr(path) — read APR weights 5f.3: populate_trainer_from_init_tensors(transformer, &tensors) — populate CPU model 5f.5: CudaTransformerTrainer::with_model uploads populated blocks / final_norm / lm_head / embed_tokens to GPU. The §50.4 step 5f.1/5f.2/5f.3 helpers are reused VERBATIM — populate semantics are identical between CPU and CUDA backends. - Updates `apr-cli::drive_real_cuda` to accept the same `init_arch: Option<&TransformerConfig>` + `init_path: Option<&Path>` pair as the CPU path. When either is `Some`, routes through the new builder. When both are `None`, preserves the existing from-scratch baseline (INV-ARCH-370M-001 stays enforced on the from-scratch CUDA path). - Removes the `FALSIFY-APR-PRETRAIN-INIT-CUDA-001` fail-fast Err in `drive_real`. The `pub(crate) const FALSIFY_APR_PRETRAIN_INIT_CUDA_001_MSG` survives and is repurposed as a drift-prevention sentinel — its payload now reads "is wired for --device cuda via build_shared_cuda_trainer_with_init (5f.5 SHIPPED)" so a future regression that re-introduces a fail-fast fires the sentinel test before the contract reference goes stale. Five-Whys (root-cause class) for the wireup itself: 1. Why was the CUDA wireup deferred while the CPU wireup landed in PR #1494? §50.4 step 5f.4 was the smallest cascade-completing PR; landing both backends in one PR conflated the algorithm-level wireup with the CUDA-feature-build dependency. Per `feedback_falsifier_first_cascade_pattern.md`, 1 PR ≈ 1 logical change. 2. Why does the CUDA path even need its own builder? Because the `CudaTransformerTrainer` constructor uploads weights to GPU at allocation time — the populated CPU model must exist BEFORE the GPU upload, or the GPU sees random initialization while the CPU model has the loaded init. 3. Why pass the populated CPU `Transformer` to `with_model` rather than loading directly into GPU buffers? Because the CUDA upload path (`upload_blocks` + `final_norm` + `lm_head`) reads weights FROM the CPU `Transformer` struct. The cleanest symmetry is "build CPU model, populate via shared helper, hand to CUDA constructor" — the same helper closes the §28 SHIP-007 silent- gibberish defect class on both backends. 4. Why preserve the const sentinel rather than delete it? The const is referenced by name in `apr-pretrain-arch-polymorphic-v1.yaml` v1.4.0..v1.6.0 changelog and falsifier entries. Deleting it would break the contract's audit trail. Repurposing it (semantic flip from "fail-fast" to "is wired") preserves the audit chain while the new payload still anchors a drift-prevention test. 5. Why does this PR not run the LIVE 500-step fine-tune? Per PR atomicity: this PR ships the wireup. The 500-step val_loss < 9.38 verdict is gated by `apr-pretrain-init-finetune-v1.yaml` v1.0.0 (PR #1576) — that contract's FALSIFY-APR-PRETRAIN-INIT-FINETUNE-005 flips MODEL-2 ship % 57% → ≥58%. The two PRs compose: this PR's wireup is the prerequisite; PR #1576's contract is the verdict. LIVE END-TO-END DOGFOOD on lambda-vector RTX 4090 (this branch built with `--features cuda`): $ apr pretrain --dataset .../codeparrot-python-permissive-shards-qwen \ --tokenizer .../qwen-0.5b-tokenizer-extracted \ --run-dir .../5g-2-smoke-1step-cuda-post5f5 \ --mode finetune --num-steps 1 --batch-size 2 --seq-length 256 \ --device cuda \ --init .../qwen2.5-coder-0.5b-instruct-fp16.apr [CUDA] cuBLAS initialized — forward TF32 tensor cores [CUDA] Pre-warmed 27 forward kernels ✓ 24 transformer blocks uploaded to GPU ✓ GPU training state allocated (LM head: 544.5 MB) === Run Result === OK CONVERGED final val_loss=0.6847 after 1 epoch(s) Checkpoint: 2.35 GiB, 219 tensors, valid APR v2 (✓ checksum). This live run discharges: - FALSIFY-APR-PRETRAIN-INIT-CUDA-001 (sentinel, post-5f.5) - FALSIFY-APR-PRETRAIN-INIT-FINETUNE-001 (exit 0) - FALSIFY-APR-PRETRAIN-INIT-FINETUNE-004 (checkpoint written) - Partial discharge of FALSIFY-APR-PRETRAIN-INIT-FINETUNE-005 (val_loss=0.6847 << 9.38 ceiling, on 1-step fine-tune; 500-step LIVE remains the binding evidence under PR #1576's contract). Contract updates: - `contracts/apr-pretrain-arch-polymorphic-v1.yaml`: v1.6.0 → v1.7.0. - FALSIFY-CUDA-001 semantic flip (fail-fast → wireup-is-wired sentinel) - NEW FALSIFY-CUDA-002 (paired-args invariant on the new builder) - NEW FALSIFY-CUDA-003 (encoder family rejection on the new builder) - All three new tests fire WITHOUT a CUDA runtime — they exercise the args-check and encoder-rejection paths that happen before any GPU allocation. Quality gates: - `pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml`: 0 errors - `pv lint --strict-test-binding`: 9/9 gates PASS - `cargo test -p apr-cli --features training --lib`: 5644/5644 PASS - `cargo test -p apr-cli --features training --test cli_commands`: 8/8 PASS - `cargo test -p aprender-train --features cuda --lib build_shared_cuda_trainer_with_init`: 2/2 PASS - `cargo clippy -p apr-cli --features training --lib -- -D warnings`: clean - `cargo check -p apr-cli --features training`: clean - `cargo check -p apr-cli --features training,cuda`: clean - LIVE: `apr pretrain --init Qwen.apr --device cuda` runs end-to-end on RTX 4090 SHIP-TWO impact: - MODEL-1 ship %: unchanged at 91% (this is MODEL-2 prep) - MODEL-2 ship %: unchanged at 57% (5g.2 LIVE 500-step verdict still required to flip 57% → ≥58%; this PR closes the only remaining technical blocker — a 500-step dispatch is now operator-runnable). - §50.4 cascade COMPLETE (5a-5f.5 all shipped; only 5g LIVE remains). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wire `apr pretrain --init ` end-to-end so step 5g LIVE 500-step fine-tune can dispatch. Replaces the §49-step-4 "not yet wired" Err with the actual init-tensor load + trainer populate path that §50.4 steps 5f.1/5f.2/5f.3 made possible.
Architecture
Two functions:
`entrenar::train::pretrain_real::build_shared_trainer_with_init` — composes 5c (polymorphic dispatch) + 5f.1 (encoder rejection) + 5f.2 (load) + 5f.3 (populate). `init=None` preserves from-scratch baseline. `init=Some` validates arch family, builds polymorphic config, loads tensors, populates.
`apr-cli::commands::pretrain::run` — extracts init APR's TransformerConfig via existing `model_config::read_apr_architecture`, plumbs through `drive_real → drive_real_cpu → build_shared_trainer_with_init`. Polymorphic preflight now receives EXTRACTED vocab.
Discharges (`apr-pretrain-arch-polymorphic-v1`)
The legacy "not yet wired" guard from §49 step 4 is RETIRED.
NOT in this PR
Tests (6 new, all pass)
`aprender-train::pretrain_real::tests` (4 new):
`apr-cli::commands::pretrain` (2 retrofitted):
19/19 + 23/23 pass. `cargo clippy` clean.
Five Whys
Cascade context
Once 5f.4 lands AND 5g produces val_loss < 9.38, MODEL-2 ship % moves 57% → ≥58%.
🤖 Generated with Claude Code