Skip to content

feat(aprender-train): populate_trainer_from_init_tensors — §50.4 step 5f.3#1483

Merged
noahgift merged 11 commits into
mainfrom
feat/populate-trainer-from-init-tensors
May 5, 2026
Merged

feat(aprender-train): populate_trainer_from_init_tensors — §50.4 step 5f.3#1483
noahgift merged 11 commits into
mainfrom
feat/populate-trainer-from-init-tensors

Conversation

@noahgift

@noahgift noahgift commented May 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Add populate_trainer_from_init_tensors(transformer, init_tensors)
the population half of apr pretrain --init. Iterates the model's
named_parameters() set, looks up each name in the init BTreeMap (HF
naming preserved by §50.4 step 5f.2's loader), validates length, and
calls Transformer::set_named_parameter().

Discharges

apr-pretrain-arch-polymorphic-v1 §init_load_semantics:

  • Population invariant: "Init tensors populate trainer parameters byte-equivalent to source"
  • FALSIFY-APR-PRETRAIN-INIT-007 (population step) at PARTIAL_ALGORITHM_LEVEL

Strictness rules (5-whys-driven)

  1. Why strict on missing-required? Architecture mismatch would silently leave random init for absent parameters, exactly the §28 SHIP-007 "silent gibberish" defect class.
  2. Why strict on length-mismatch? A length mismatch indicates the from_apr_metadata extractor misread a shape — populating regardless would silently truncate or pad.
  3. Why permissive on extra-init-entries? Tied embeddings: a Qwen2.5 APR may publish a separate `lm_head.weight` that the trainer's tied model omits.
  4. Why FALSIFIER ID in error message? Falsifier IDs in error messages turn opaque CI failures into self-explaining defects.
  5. Why one function not two (load+populate fused)? Decoupling keeps `aprender-train` free of `aprender-serve` (the APR loader); the loader is in §50.4 step 5f.2; this is the consumer.

Tests (4 new, 12 total in pretrain_real::tests)

  • `populate_trainer_from_init_tensors_happy_path`: every param matched → returns Ok(N) where N = named_parameters().len()
  • `populate_trainer_from_init_tensors_extra_entries_silently_ignored`: fictitious extra entry must NOT cause Err
  • `populate_trainer_from_init_tensors_rejects_length_mismatch`: wrong flat length → Err naming the param + falsifier ID
  • `populate_trainer_from_init_tensors_rejects_missing_required_param`: missing required → Err with "not present in init APR" + falsifier ID

Test plan

  • `cargo test -p aprender-train --lib train::pretrain_real::tests` (12/12 pass)
  • `cargo clippy -p aprender-train --lib -- -D warnings` (clean)
  • No new dependencies; pure aprender-train + autograd Tensor + std::collections::BTreeMap

Cascade context

Step 5f.3 caps the §50.4 step-5f sub-cascade:

Remaining roadmap: 5g (LIVE 500-step fine-tune, operator dispatch), 5h (stamp + publish as MODEL-2 v2).

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) May 4, 2026 19:40
… 5f.3

Add `populate_trainer_from_init_tensors(transformer, init_tensors)` —
the population half of `apr pretrain --init`. Iterates the model's
`named_parameters()` set, looks up each name in the init BTreeMap (HF
naming preserved by §50.4 step 5f.2's loader), validates length, and
calls `Transformer::set_named_parameter()`.

`apr-pretrain-arch-polymorphic-v1` §init_load_semantics:
  - Population invariant: "Init tensors populate trainer parameters
    byte-equivalent to source"
  - FALSIFY-APR-PRETRAIN-INIT-007 (population step) at PARTIAL_ALGORITHM_LEVEL

1. **Why strict on missing-required?** Architecture mismatch (e.g., init
   from a different model family) would silently leave random init for
   absent parameters, which §28's SHIP-007 lesson teaches us is the
   exact class of "silent gibberish" defect that hides for many epochs.
2. **Why strict on length-mismatch?** A length mismatch indicates the
   from_apr_metadata extractor misread a shape — populating regardless
   would silently truncate or pad, masking the bug.
3. **Why permissive on extra-init-entries?** Tied embeddings: a Qwen2.5
   APR may publish a separate `lm_head.weight` that the trainer's tied
   model omits. Failing on extra entries would force operators to
   pre-strip APRs, which is muda.
4. **Why FALSIFIER ID in error message?** §28 lesson — falsifier IDs in
   error messages turn opaque CI failures into self-explaining defects.
5. **Why one function not two (load+populate fused)?** Decoupling keeps
   `aprender-train` free of `aprender-serve` (the APR loader): the
   loader is a free function in §50.4 step 5f.2; this is the consumer.
   Two-step composition is testable independently (and is, in this PR).

- `populate_trainer_from_init_tensors_happy_path`: every param matched
  → returns Ok(N) where N = named_parameters().len()
- `populate_trainer_from_init_tensors_extra_entries_silently_ignored`:
  fictitious extra entry must NOT cause Err (tied-embeddings safety)
- `populate_trainer_from_init_tensors_rejects_length_mismatch`: wrong
  flat length → Err naming the param + falsifier ID
- `populate_trainer_from_init_tensors_rejects_missing_required_param`:
  missing required → Err with "not present in init APR" + falsifier ID

All 12 tests pass; cargo clippy --lib clean.

- [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (12/12 pass)
- [x] `cargo clippy -p aprender-train --lib -- -D warnings` (clean)
- [x] No new dependencies; pure aprender-train + autograd Tensor + std::collections::BTreeMap

Step 5f.3 caps the §50.4 step-5f sub-cascade:
  5f.1 — encoder-family validator (PR #1479, awaiting CI)
  5f.2 — load_init_tensors_from_apr (PR #1481 MERGED)
  5f.3 — THIS PR (populate_trainer_from_init_tensors)
Remaining roadmap: 5g (LIVE 500-step fine-tune, operator dispatch),
5h (stamp + publish as MODEL-2 v2).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 5, 2026
…LETE + 5f.4 wireup gap identified (#1486)

## Summary

Same-day continuation of §51 cascade landed PR #1479 (FALSIFY-007
encoder/decoder validator) and PR #1481 (load_init_tensors_from_apr).
PR #1483 (5f.3 populate) and PR #1482 (contract status bump) are
MERGEABLE in queue. All 8 falsifiers in `apr-pretrain-arch-polymorphic-v1`
are now PARTIAL_ALGORITHM_LEVEL bound on main or about to land.

§52 records:
1. Updated falsifier scoreboard (8/8 vs §51's 7/8)
2. NEW step 5f.4 (CLI wireup, ~150 LOC) identified via live source
   inspection of `apr-cli/src/commands/pretrain.rs:259-297`
3. Step 5g LIVE 500-step fine-tune is now gated on 5f.4 landing first

## Why now

Per `feedback_falsifier_first_cascade_pattern.md`: when a saturated
auto-merge queue blocks more impl PRs (#1483 + #1482 both in queue
touching pretrain_real.rs), switch to non-conflicting work. This spec
amendment touches one markdown file with no PR conflicts.

## Five Whys (§52.8 in body)

1. Why didn't §50 catch 5f.4? — top-down arch-coupling lens missed the
   CLI-dispatch seam.
2. Why is 5f.4 separate from 5f.3? — different crate (apr-cli vs
   aprender-train).
3. Why must 5f.4 be atomic? — removing "not yet wired" Err without the
   wireup produces silent random-init (§28 SHIP-007 defect class).
4. Why ~150 LOC? — 4 levels of plumbing + new builder + tests + CUDA.
5. Why call 5f.4 out in spec? — without §52, readers would assume 5g
   is dispatchable; spec is the source of truth.

## Test plan

- [x] Single markdown file, no Rust changes
- [x] Falsifier scoreboard table updated to 8/8 PARTIAL_ALGORITHM_LEVEL
- [x] Step roadmap table adds 5f.4 between 5f.3 and 5g
- [x] Cadence preserved: §41 → ... → §51 → §52 (12 amendments since 2026-05-03)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 6179441 into main May 5, 2026
10 checks passed
@noahgift noahgift deleted the feat/populate-trainer-from-init-tensors branch May 5, 2026 01:16
noahgift added a commit that referenced this pull request May 5, 2026
…ep 5f.4 (#1494)

## Summary

Wire `apr pretrain --init <PATH>` end-to-end so step 5g LIVE 500-step
fine-tune can dispatch. Replaces the §49 step 4 "not yet wired" Err
with the actual init-tensor load + trainer populate path that
§50.4 steps 5f.1/5f.2/5f.3 made possible.

## Architecture

Two functions added/changed:

1. `entrenar::train::pretrain_real::build_shared_trainer_with_init` —
   composes the §50.4 step-5f machinery (5c polymorphic dispatch +
   5f.1 encoder rejection + 5f.2 load + 5f.3 populate) into a single
   trainer-builder entry. init=None preserves the from-scratch baseline
   byte-equivalent to `build_shared_trainer`. init=Some validates arch
   family, builds the polymorphic config, loads tensors, populates.

2. `apr-cli/src/commands/pretrain.rs::run` — now extracts the init APR
   file's TransformerConfig via existing `model_config::read_apr_architecture`
   when `--init` is set, then plumbs both `init_arch` and `init_path`
   through `drive_real → drive_real_cpu → build_shared_trainer_with_init`.
   The polymorphic preflight (§50.4 step 5d) already used the EXTRACTED
   vocab — this PR wires the call site to actually pass it.

## What this PR DOES NOT do

- **CUDA path** (~80 LOC follow-up as 5f.5): `drive_real_cuda` now
  fail-fasts when --init is set rather than silently using random init
  (FALSIFY-APR-PRETRAIN-INIT-CUDA-001). The cuBLAS trainer needs
  symmetric `build_shared_cuda_trainer_with_init` which is out of scope.
- **Step 5g LIVE 500-step fine-tune** (operator dispatch): this PR makes
  it dispatchable; running the 500 steps requires operator action.

## Discharges (per apr-pretrain-arch-polymorphic-v1)

- §init_load_semantics integration: load + populate composed end-to-end
- §arch_extraction_signature integration: read_apr_architecture wired
- §qwen_tokenizer_vocab_compatibility integration: extracted vocab
  flows into preflight call site (no longer hardcoded Llama370M)
- FALSIFY-APR-PRETRAIN-INIT-007 (population) at INTEGRATION level
- The legacy "not yet wired" guard from §49 step 4 is RETIRED — the
  drift-prevention test now pins the new fail-closed semantic.

## Tests (8 new across 2 crates, all pass)

- `aprender-train`: 4 new tests for `build_shared_trainer_with_init`:
  - `_none_uses_llama370m_shape` (regression-free init=None)
  - `_rejects_unpaired_args` (caller-bug guard)
  - `_rejects_encoder_family` (FALSIFY-007 integration)
  - `_decoder_family_proceeds_to_tensor_load` (failure ordering pin)
- `apr-cli`: 2 retrofitted tests for the new fail-closed semantic:
  - `pretrain_init_valid_magic_but_bogus_metadata_fails_at_arch_extraction`
    (replaces the old "not yet wired" trip-wire)
  - `pretrain_init_v1_magic_aprn_passes_validate_init_apr_path`
    (helper now returns Ok on valid magic)

19/19 pretrain_real tests pass. 23/23 apr-cli pretrain tests pass.
cargo clippy --lib -- -D warnings clean across both crates.

## Five Whys

1. **Why was 5f.4 needed at all?** §50's 5a-5h decomposition assumed
   the CLI dispatch would naturally invoke the helper functions; live
   source inspection (§52 amendment) revealed the dispatch hardcoded
   "not yet wired" Err. 5f.4 is the explicit wireup.
2. **Why is removing the safety Err so load-bearing?** The §28 SHIP-007
   lesson: silently random-init via a half-implemented dispatch is the
   exact "silent gibberish" defect class. Removing the safety Err
   without the wireup would manifest as a multi-epoch divergence
   masquerading as a corpus-quality issue.
3. **Why a separate polymorphic builder rather than overload `build_shared_trainer`?**
   `build_shared_trainer` enforces INV-ARCH-370M-001 (param-count band)
   which only applies to from-scratch Llama370M. The polymorphic builder
   sidesteps it by design — Qwen2.5-0.5B is 0.5B params, outside the
   band by intent.
4. **Why fail-fast on `--init` + `--device cuda` rather than silently
   ignore?** Same reasoning as #2: silent CUDA random-init would
   bisect the same "silent gibberish" class. 5f.5 follow-up wires
   symmetric CUDA path; until then, fail-closed.
5. **Why couldn't this be inside #1483 (the populate PR)?** Different
   crate (apr-cli vs aprender-train), different review concern (CLI
   plumbing vs trainer mutation), different test surface. One atomic
   PR per file/crate boundary.

## Test plan

- [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (19/19 pass)
- [x] `cargo test -p apr-cli --lib commands::pretrain` (23/23 pass)
- [x] `cargo clippy -p aprender-train -p apr-cli --lib -- -D warnings` (clean)
- [x] `cargo check -p apr-cli --lib` (clean)
- [ ] Operator-dispatched: `apr pretrain --init <Qwen2.5-Coder-0.5B>.apr`
      smoke that fires 50 training steps end-to-end (5g LIVE prelude;
      operator action in next session)

## Cascade context

This is the §52-identified gap closing the §50.4 step 5f sub-cascade:
- 5f.1 encoder validator: PR #1479 ✅ MERGED
- 5f.2 load_init_tensors_from_apr: PR #1481 ✅ MERGED
- 5f.3 populate_trainer_from_init_tensors: PR #1483 (mergeable, in queue)
- **5f.4 CLI wireup: THIS PR**
- 5g LIVE 500-step fine-tune: operator dispatch (next)
- 5h stamp + publish: ~10 LOC follow-up

Once 5f.4 lands AND 5g produces val_loss < 9.38 evidence, MODEL-2 ship % moves 57% → ≥58%.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 5, 2026
…ION-COMPLETE; contract v1.1.0 → v1.2.0 FUNCTIONAL (#1495)

§50.4 cascade INTEGRATION-COMPLETE on main with PR #1494 merging at
2026-05-05T01:48:14Z. The `apr pretrain --init <PATH>` flow is now
end-to-end functional on CPU; the legacy "not yet wired" Err is
RETIRED; step 5g LIVE is the only remaining gate before MODEL-2 ship-%
can move from 57% → ≥58%.

Spec amendment §53:
- Updated falsifier scoreboard: 6/8 INTEGRATION (001/002/003/005/006/007
  via live CLI dispatch); 2/8 PARTIAL_ALGORITHM_LEVEL (004 forward-pass
  smoke + 008 contract validation are inherently algorithm-level).
- Step roadmap: 5a-5f.4 ✅ MERGED; 5f.5 (CUDA wireup) NOT YET STARTED;
  5g (LIVE 500-step fine-tune) operator-dispatchable on RTX 4090.
- Cascade ships statistics: 11 PRs over 2 days
  (#1471/#1472/#1473/#1474/#1475/#1476/#1478/#1479/#1481/#1482/#1483/#1486/#1494).
- MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57%
  (gated on 5g empirical val_loss < 9.38 evidence).
- 3 CI andon classes documented as feedback memories during cascade
  (workspace-test missing-binary, trueno SIGSEGV-on-cleanup, auto-merge
  behind-state).

Contract apr-pretrain-arch-polymorphic-v1 v1.1.0 → v1.2.0 FUNCTIONAL:
- All 8 falsifiers PASS on main; 6/8 reach INTEGRATION via the
  user-facing `apr pretrain --init` flow.
- verification_summary updated: tested 7 → 8; status partial →
  functional.
- Added §52 + §53 references.
- Promotion to DISCHARGED still requires §50.4 step 5g LIVE empirical
  500-step fine-tune on canonical Qwen2.5-Coder-0.5B-Instruct.apr
  producing val_loss < 9.38.

`pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml` exits 0.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1494 merge commit 9afca16

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant