feat(aprender-train): populate_trainer_from_init_tensors — §50.4 step 5f.3 by noahgift · Pull Request #1483 · paiml/aprender

noahgift · 2026-05-04T19:39:58Z

Summary

Add populate_trainer_from_init_tensors(transformer, init_tensors) —
the population half of apr pretrain --init. Iterates the model's
named_parameters() set, looks up each name in the init BTreeMap (HF
naming preserved by §50.4 step 5f.2's loader), validates length, and
calls Transformer::set_named_parameter().

Discharges

apr-pretrain-arch-polymorphic-v1 §init_load_semantics:

Population invariant: "Init tensors populate trainer parameters byte-equivalent to source"
FALSIFY-APR-PRETRAIN-INIT-007 (population step) at PARTIAL_ALGORITHM_LEVEL

Strictness rules (5-whys-driven)

Why strict on missing-required? Architecture mismatch would silently leave random init for absent parameters, exactly the §28 SHIP-007 "silent gibberish" defect class.
Why strict on length-mismatch? A length mismatch indicates the from_apr_metadata extractor misread a shape — populating regardless would silently truncate or pad.
Why permissive on extra-init-entries? Tied embeddings: a Qwen2.5 APR may publish a separate `lm_head.weight` that the trainer's tied model omits.
Why FALSIFIER ID in error message? Falsifier IDs in error messages turn opaque CI failures into self-explaining defects.
Why one function not two (load+populate fused)? Decoupling keeps `aprender-train` free of `aprender-serve` (the APR loader); the loader is in §50.4 step 5f.2; this is the consumer.

Tests (4 new, 12 total in pretrain_real::tests)

`populate_trainer_from_init_tensors_happy_path`: every param matched → returns Ok(N) where N = named_parameters().len()
`populate_trainer_from_init_tensors_extra_entries_silently_ignored`: fictitious extra entry must NOT cause Err
`populate_trainer_from_init_tensors_rejects_length_mismatch`: wrong flat length → Err naming the param + falsifier ID
`populate_trainer_from_init_tensors_rejects_missing_required_param`: missing required → Err with "not present in init APR" + falsifier ID

Test plan

`cargo test -p aprender-train --lib train::pretrain_real::tests` (12/12 pass)
`cargo clippy -p aprender-train --lib -- -D warnings` (clean)
No new dependencies; pure aprender-train + autograd Tensor + std::collections::BTreeMap

Cascade context

Step 5f.3 caps the §50.4 step-5f sub-cascade:

5f.1 — encoder-family validator (PR feat(aprender-train): validate_pretrain_init_arch_compatible — §50.4 step 5f.1 #1479, awaiting CI)
5f.2 — load_init_tensors_from_apr (PR feat(aprender-train): load_init_tensors_from_apr — §50.4 step 5f.2 #1481 MERGED)
5f.3 — THIS PR (populate_trainer_from_init_tensors)

Remaining roadmap: 5g (LIVE 500-step fine-tune, operator dispatch), 5h (stamp + publish as MODEL-2 v2).

🤖 Generated with Claude Code

… 5f.3 Add `populate_trainer_from_init_tensors(transformer, init_tensors)` — the population half of `apr pretrain --init`. Iterates the model's `named_parameters()` set, looks up each name in the init BTreeMap (HF naming preserved by §50.4 step 5f.2's loader), validates length, and calls `Transformer::set_named_parameter()`. `apr-pretrain-arch-polymorphic-v1` §init_load_semantics: - Population invariant: "Init tensors populate trainer parameters byte-equivalent to source" - FALSIFY-APR-PRETRAIN-INIT-007 (population step) at PARTIAL_ALGORITHM_LEVEL 1. **Why strict on missing-required?** Architecture mismatch (e.g., init from a different model family) would silently leave random init for absent parameters, which §28's SHIP-007 lesson teaches us is the exact class of "silent gibberish" defect that hides for many epochs. 2. **Why strict on length-mismatch?** A length mismatch indicates the from_apr_metadata extractor misread a shape — populating regardless would silently truncate or pad, masking the bug. 3. **Why permissive on extra-init-entries?** Tied embeddings: a Qwen2.5 APR may publish a separate `lm_head.weight` that the trainer's tied model omits. Failing on extra entries would force operators to pre-strip APRs, which is muda. 4. **Why FALSIFIER ID in error message?** §28 lesson — falsifier IDs in error messages turn opaque CI failures into self-explaining defects. 5. **Why one function not two (load+populate fused)?** Decoupling keeps `aprender-train` free of `aprender-serve` (the APR loader): the loader is a free function in §50.4 step 5f.2; this is the consumer. Two-step composition is testable independently (and is, in this PR). - `populate_trainer_from_init_tensors_happy_path`: every param matched → returns Ok(N) where N = named_parameters().len() - `populate_trainer_from_init_tensors_extra_entries_silently_ignored`: fictitious extra entry must NOT cause Err (tied-embeddings safety) - `populate_trainer_from_init_tensors_rejects_length_mismatch`: wrong flat length → Err naming the param + falsifier ID - `populate_trainer_from_init_tensors_rejects_missing_required_param`: missing required → Err with "not present in init APR" + falsifier ID All 12 tests pass; cargo clippy --lib clean. - [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (12/12 pass) - [x] `cargo clippy -p aprender-train --lib -- -D warnings` (clean) - [x] No new dependencies; pure aprender-train + autograd Tensor + std::collections::BTreeMap Step 5f.3 caps the §50.4 step-5f sub-cascade: 5f.1 — encoder-family validator (PR #1479, awaiting CI) 5f.2 — load_init_tensors_from_apr (PR #1481 MERGED) 5f.3 — THIS PR (populate_trainer_from_init_tensors) Remaining roadmap: 5g (LIVE 500-step fine-tune, operator dispatch), 5h (stamp + publish as MODEL-2 v2). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…LETE + 5f.4 wireup gap identified (#1486) ## Summary Same-day continuation of §51 cascade landed PR #1479 (FALSIFY-007 encoder/decoder validator) and PR #1481 (load_init_tensors_from_apr). PR #1483 (5f.3 populate) and PR #1482 (contract status bump) are MERGEABLE in queue. All 8 falsifiers in `apr-pretrain-arch-polymorphic-v1` are now PARTIAL_ALGORITHM_LEVEL bound on main or about to land. §52 records: 1. Updated falsifier scoreboard (8/8 vs §51's 7/8) 2. NEW step 5f.4 (CLI wireup, ~150 LOC) identified via live source inspection of `apr-cli/src/commands/pretrain.rs:259-297` 3. Step 5g LIVE 500-step fine-tune is now gated on 5f.4 landing first ## Why now Per `feedback_falsifier_first_cascade_pattern.md`: when a saturated auto-merge queue blocks more impl PRs (#1483 + #1482 both in queue touching pretrain_real.rs), switch to non-conflicting work. This spec amendment touches one markdown file with no PR conflicts. ## Five Whys (§52.8 in body) 1. Why didn't §50 catch 5f.4? — top-down arch-coupling lens missed the CLI-dispatch seam. 2. Why is 5f.4 separate from 5f.3? — different crate (apr-cli vs aprender-train). 3. Why must 5f.4 be atomic? — removing "not yet wired" Err without the wireup produces silent random-init (§28 SHIP-007 defect class). 4. Why ~150 LOC? — 4 levels of plumbing + new builder + tests + CUDA. 5. Why call 5f.4 out in spec? — without §52, readers would assume 5g is dispatchable; spec is the source of truth. ## Test plan - [x] Single markdown file, no Rust changes - [x] Falsifier scoreboard table updated to 8/8 PARTIAL_ALGORITHM_LEVEL - [x] Step roadmap table adds 5f.4 between 5f.3 and 5g - [x] Cadence preserved: §41 → ... → §51 → §52 (12 amendments since 2026-05-03) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ep 5f.4 (#1494) ## Summary Wire `apr pretrain --init <PATH>` end-to-end so step 5g LIVE 500-step fine-tune can dispatch. Replaces the §49 step 4 "not yet wired" Err with the actual init-tensor load + trainer populate path that §50.4 steps 5f.1/5f.2/5f.3 made possible. ## Architecture Two functions added/changed: 1. `entrenar::train::pretrain_real::build_shared_trainer_with_init` — composes the §50.4 step-5f machinery (5c polymorphic dispatch + 5f.1 encoder rejection + 5f.2 load + 5f.3 populate) into a single trainer-builder entry. init=None preserves the from-scratch baseline byte-equivalent to `build_shared_trainer`. init=Some validates arch family, builds the polymorphic config, loads tensors, populates. 2. `apr-cli/src/commands/pretrain.rs::run` — now extracts the init APR file's TransformerConfig via existing `model_config::read_apr_architecture` when `--init` is set, then plumbs both `init_arch` and `init_path` through `drive_real → drive_real_cpu → build_shared_trainer_with_init`. The polymorphic preflight (§50.4 step 5d) already used the EXTRACTED vocab — this PR wires the call site to actually pass it. ## What this PR DOES NOT do - **CUDA path** (~80 LOC follow-up as 5f.5): `drive_real_cuda` now fail-fasts when --init is set rather than silently using random init (FALSIFY-APR-PRETRAIN-INIT-CUDA-001). The cuBLAS trainer needs symmetric `build_shared_cuda_trainer_with_init` which is out of scope. - **Step 5g LIVE 500-step fine-tune** (operator dispatch): this PR makes it dispatchable; running the 500 steps requires operator action. ## Discharges (per apr-pretrain-arch-polymorphic-v1) - §init_load_semantics integration: load + populate composed end-to-end - §arch_extraction_signature integration: read_apr_architecture wired - §qwen_tokenizer_vocab_compatibility integration: extracted vocab flows into preflight call site (no longer hardcoded Llama370M) - FALSIFY-APR-PRETRAIN-INIT-007 (population) at INTEGRATION level - The legacy "not yet wired" guard from §49 step 4 is RETIRED — the drift-prevention test now pins the new fail-closed semantic. ## Tests (8 new across 2 crates, all pass) - `aprender-train`: 4 new tests for `build_shared_trainer_with_init`: - `_none_uses_llama370m_shape` (regression-free init=None) - `_rejects_unpaired_args` (caller-bug guard) - `_rejects_encoder_family` (FALSIFY-007 integration) - `_decoder_family_proceeds_to_tensor_load` (failure ordering pin) - `apr-cli`: 2 retrofitted tests for the new fail-closed semantic: - `pretrain_init_valid_magic_but_bogus_metadata_fails_at_arch_extraction` (replaces the old "not yet wired" trip-wire) - `pretrain_init_v1_magic_aprn_passes_validate_init_apr_path` (helper now returns Ok on valid magic) 19/19 pretrain_real tests pass. 23/23 apr-cli pretrain tests pass. cargo clippy --lib -- -D warnings clean across both crates. ## Five Whys 1. **Why was 5f.4 needed at all?** §50's 5a-5h decomposition assumed the CLI dispatch would naturally invoke the helper functions; live source inspection (§52 amendment) revealed the dispatch hardcoded "not yet wired" Err. 5f.4 is the explicit wireup. 2. **Why is removing the safety Err so load-bearing?** The §28 SHIP-007 lesson: silently random-init via a half-implemented dispatch is the exact "silent gibberish" defect class. Removing the safety Err without the wireup would manifest as a multi-epoch divergence masquerading as a corpus-quality issue. 3. **Why a separate polymorphic builder rather than overload `build_shared_trainer`?** `build_shared_trainer` enforces INV-ARCH-370M-001 (param-count band) which only applies to from-scratch Llama370M. The polymorphic builder sidesteps it by design — Qwen2.5-0.5B is 0.5B params, outside the band by intent. 4. **Why fail-fast on `--init` + `--device cuda` rather than silently ignore?** Same reasoning as #2: silent CUDA random-init would bisect the same "silent gibberish" class. 5f.5 follow-up wires symmetric CUDA path; until then, fail-closed. 5. **Why couldn't this be inside #1483 (the populate PR)?** Different crate (apr-cli vs aprender-train), different review concern (CLI plumbing vs trainer mutation), different test surface. One atomic PR per file/crate boundary. ## Test plan - [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (19/19 pass) - [x] `cargo test -p apr-cli --lib commands::pretrain` (23/23 pass) - [x] `cargo clippy -p aprender-train -p apr-cli --lib -- -D warnings` (clean) - [x] `cargo check -p apr-cli --lib` (clean) - [ ] Operator-dispatched: `apr pretrain --init <Qwen2.5-Coder-0.5B>.apr` smoke that fires 50 training steps end-to-end (5g LIVE prelude; operator action in next session) ## Cascade context This is the §52-identified gap closing the §50.4 step 5f sub-cascade: - 5f.1 encoder validator: PR #1479 ✅ MERGED - 5f.2 load_init_tensors_from_apr: PR #1481 ✅ MERGED - 5f.3 populate_trainer_from_init_tensors: PR #1483 (mergeable, in queue) - **5f.4 CLI wireup: THIS PR** - 5g LIVE 500-step fine-tune: operator dispatch (next) - 5h stamp + publish: ~10 LOC follow-up Once 5f.4 lands AND 5g produces val_loss < 9.38 evidence, MODEL-2 ship % moves 57% → ≥58%. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ION-COMPLETE; contract v1.1.0 → v1.2.0 FUNCTIONAL (#1495) §50.4 cascade INTEGRATION-COMPLETE on main with PR #1494 merging at 2026-05-05T01:48:14Z. The `apr pretrain --init <PATH>` flow is now end-to-end functional on CPU; the legacy "not yet wired" Err is RETIRED; step 5g LIVE is the only remaining gate before MODEL-2 ship-% can move from 57% → ≥58%. Spec amendment §53: - Updated falsifier scoreboard: 6/8 INTEGRATION (001/002/003/005/006/007 via live CLI dispatch); 2/8 PARTIAL_ALGORITHM_LEVEL (004 forward-pass smoke + 008 contract validation are inherently algorithm-level). - Step roadmap: 5a-5f.4 ✅ MERGED; 5f.5 (CUDA wireup) NOT YET STARTED; 5g (LIVE 500-step fine-tune) operator-dispatchable on RTX 4090. - Cascade ships statistics: 11 PRs over 2 days (#1471/#1472/#1473/#1474/#1475/#1476/#1478/#1479/#1481/#1482/#1483/#1486/#1494). - MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57% (gated on 5g empirical val_loss < 9.38 evidence). - 3 CI andon classes documented as feedback memories during cascade (workspace-test missing-binary, trueno SIGSEGV-on-cleanup, auto-merge behind-state). Contract apr-pretrain-arch-polymorphic-v1 v1.1.0 → v1.2.0 FUNCTIONAL: - All 8 falsifiers PASS on main; 6/8 reach INTEGRATION via the user-facing `apr pretrain --init` flow. - verification_summary updated: tested 7 → 8; status partial → functional. - Added §52 + §53 references. - Promotion to DISCHARGED still requires §50.4 step 5g LIVE empirical 500-step fine-tune on canonical Qwen2.5-Coder-0.5B-Instruct.apr producing val_loss < 9.38. `pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml` exits 0. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1494 merge commit 9afca16 Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 4, 2026 19:40

noahgift force-pushed the feat/populate-trainer-from-init-tensors branch from 867ff5f to abe4f5c Compare May 4, 2026 20:08

noahgift mentioned this pull request May 4, 2026

spec(ship-two-models): v2.96.0 → v2.97.0 — §52 cascade ALGORITHM-COMPLETE + 5f.4 wireup gap #1486

Merged

4 tasks

noahgift added 8 commits May 4, 2026 22:37

Merge branch 'main' into feat/populate-trainer-from-init-tensors

22dfbc8

Merge branch 'main' into feat/populate-trainer-from-init-tensors

677da00

Merge branch 'main' into feat/populate-trainer-from-init-tensors

9c9fa93

Merge branch 'main' into feat/populate-trainer-from-init-tensors

f9f4ad1

Merge branch 'main' into feat/populate-trainer-from-init-tensors

aff0056

Merge branch 'main' into feat/populate-trainer-from-init-tensors

cee4043

Merge branch 'main' into feat/populate-trainer-from-init-tensors

677a199

Merge branch 'main' into feat/populate-trainer-from-init-tensors

53651c0

noahgift added 2 commits May 5, 2026 02:40

Merge branch 'main' into feat/populate-trainer-from-init-tensors

7a579f3

Merge branch 'main' into feat/populate-trainer-from-init-tensors

06a6b2c

noahgift merged commit 6179441 into main May 5, 2026
10 checks passed

noahgift deleted the feat/populate-trainer-from-init-tensors branch May 5, 2026 01:16

noahgift mentioned this pull request May 5, 2026

feat(apr-cli + aprender-train): apr pretrain --init wireup — §50.4 step 5f.4 #1494

Merged

noahgift mentioned this pull request May 5, 2026

spec(ship-two-models): v2.97 → v2.98 — §53 §50.4 cascade INTEGRATION-COMPLETE; contract v1.1 → v1.2 FUNCTIONAL #1495

Merged

4 tasks

This was referenced May 9, 2026

feat: §50.4 step 5f.5 CUDA --init wireup (PMAT-CODE-PRETRAIN-INIT-CUDA-WIREUP-001) #1577

Merged

fix(task-148): Toyota Way 500-line refactor + FALSIFY-CORPUS-004 + QLoRA + GPU training backend #1003

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(aprender-train): populate_trainer_from_init_tensors — §50.4 step 5f.3#1483

feat(aprender-train): populate_trainer_from_init_tensors — §50.4 step 5f.3#1483
noahgift merged 11 commits into
mainfrom
feat/populate-trainer-from-init-tensors

noahgift commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 4, 2026

Summary

Discharges

Strictness rules (5-whys-driven)

Tests (4 new, 12 total in pretrain_real::tests)

Test plan

Cascade context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant