feat(aprender-train): validate_pretrain_init_arch_compatible — §50.4 step 5f.1 by noahgift · Pull Request #1479 · paiml/aprender

noahgift · 2026-05-04T16:21:40Z

Adds pretrain_real::validate_pretrain_init_arch_compatible(cfg) that fail-fast rejects an init TransformerConfig whose architecture family is incompatible with the decoder-only pretrain trainer. Discharges FALSIFY-APR-PRETRAIN-ARCH-007 from PR #1473 contract. Encoder configs (CodeBERT/RoBERTa/BERT) are rejected with error message naming: falsifier-id, architecture family, decoder-only requirement, hf_architecture (e.g. RobertaModel). 3 unit tests verify decoder accept + encoder reject + Llama370M baseline accept (drift-prevention). Step 5f decomposition: this is 5f.1 (~30 LOC arch gate); 5f.2 (~80 LOC actual weight load) is follow-up. Plain ship-%: MODEL-1=91%, MODEL-2=57% (unchanged; gated on step 5g LIVE fine-tune). Builds on PRs #1472+#1478 MERGED + #1473/#1474/#1475/#1476 in flight.

… (7/8 falsifiers bound) (#1480) Same-day continuation cycle landed 8 PRs across the §50.4 architecture- polymorphic infrastructure track. §51 records the cascade-complete state and pinpoints the remaining MODEL-2 ship-% gate (step 5g LIVE). Falsifier-discharge scoreboard for `apr-pretrain-arch-polymorphic-v1`: | ID | What it pins | PR | Status | |----|---------------------------------------|-------|--------| | 001 | qwen2_0_5b matches HF + tie fix | #1474 | PARTIAL | | 002 | init=None preserves Llama370M | #1475 | PARTIAL | | 003 | init=Some pass-through | #1475 | PARTIAL | | 004 | GQA-7:1 forward smoke | #1478 | MERGED | | 005 | Qwen tokenizer + Qwen target = pass | #1476 | MERGED | | 006 | Qwen tokenizer + Llama target = fail | #1476 | MERGED | | 007 | encoder/decoder family mismatch | #1479 | PARTIAL | | 008 | pv validate | #1473 | PARTIAL | 7 of 8 falsifiers PARTIAL_ALGORITHM_LEVEL or MERGED. Remaining work: - 5f.2 — wire APR file open + tensor materialization (~80 LOC) DELIBERATELY DEFERRED this cycle; doing 5f.2 now means rebasing onto 4 in-flight PRs as they land - 5g — LIVE 500-step smoke fine-tune (operator dispatch) THE LOAD-BEARING TEST that moves MODEL-2 ship-% - 5h — stamp + publish Per §47-§48 lesson: "infrastructure shipped ≠ ship-% movement." Cascade-complete state means the polymorphic foundation is in place; ship-% movement still requires the LIVE empirical check. Five Whys: 1. Why a snapshot now? Multiple PRs in cascade auto-merge create cognitive load. A spec snapshot captures both the achievement (7 falsifiers bound) and the remaining gate (step 5g LIVE). Without it, future operators waste cycles re-deriving the state. 2. Why focus on falsifier scoreboard rather than total LOC? Falsifier discharge is the actual contract obligation. 7/8 invariants pinned means CI now catches regressions in the polymorphic-init path. 3. Why mention 5f.2 explicitly as deliberately deferred? Naming the deferral makes it not a punt. Step 5f.2 has a clear "when": after the 4 in-flight PRs cascade-merge, then 5f.2 lands clean. 4. Why call out infrastructure ≠ ship-%? The §47-§48 cascade taught the same lesson — "11 SHIP-007 cascade PRs landed but no ship-% movement." Operator-facing ship-% is the LIVE check. 5. Why is FALSIFY-006 LIVE the load-bearing claim? init_loss(step=0) ≤ 6.0 vs from_scratch_loss(step=0) ≥ 9.5 proves end-to-end correctness in one number. No other falsifier can substitute. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Spec amendment cadence: §41 → §42 → §43 → §44 → §45 → §46 → §47 → §48 → §49 → §50 → §51. Eleven amendments since 2026-05-03. Same-day spec hygiene rather than letting the cascade-complete state remain implicit. Refs: - SPEC-SHIP-TWO-001 §50 — architecture-coupling finding (PR #1472, MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (in flight) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (MERGED) - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED) - PR #1479 — validate_pretrain_init_arch_compatible (in flight) - feedback_no_guessing.md Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…1481) Adds the read-half of `apr pretrain --init` weight load: a thin wrapper over `aprender::format::converter::load_model_tensors` that returns a `BTreeMap<String, (Vec<f32>, Vec<usize>)>` of tensor blobs keyed by HF naming convention. Per `apr-pretrain-arch-polymorphic-v1` §init_load_semantics (PR #1473): "Loader is REUSED, not reimplemented." This function does not duplicate APR parsing — it forwards to the same machinery `apr export` and `apr inspect` use. Discharges from `apr-pretrain-arch-polymorphic-v1`: - §init_load_semantics invariant (loader reuse): satisfied - FALSIFY-006 (init_loss < 6.0) at READ-COMPILE-BIND level Step 5f decomposition: - 5f.1 (PR #1479): encoder/decoder family validator (~30 LOC) - 5f.2 (this PR): APR file open + tensor read (~30 LOC + 2 tests) - 5f.3 (next): populate trainer parameters from BTreeMap (~50 LOC) - 5g (operator): LIVE 500-step fine-tune → DISCHARGES MODEL-2 ship-% Step 5f.2 is intentionally narrow — it only does the READ. Population into trainer parameter slots (5f.3) reconciles HF naming convention (e.g., `model.embed_tokens.weight`) against the trainer's internal parameter naming. That's a separate concern with its own falsifier. What this PR adds: 1. `pub fn load_init_tensors_from_apr(path) -> Result<BTreeMap<...>>` at pretrain_real.rs:35 (~25 LOC including doc comment) 2. 2 unit tests in `pretrain_real::tests`: - load_init_tensors_missing_file_errors_with_falsifier_id (FALSIFY-006 fail-fast path; asserts error message contains falsifier id + offending path for operator-experience) - load_init_tensors_signature_compile_bind (drift-prevention: catches a future signature change that would break step 5f.3's BTreeMap consumer) Test results (cargo test -p aprender-train --lib train::pretrain_real::tests::load_init_tensors): 2 passed; 0 failed; 0 ignored Five Whys: 1. Why decompose step 5f.2 to JUST the read? Single-piece flow. Read → Validate → Populate are three distinct concerns. Step 5f.1 did validation (#1479); 5f.2 does read; 5f.3 will do populate. Each PR has one falsifier discharge story. 2. Why use load_model_tensors and not write a new parser? The contract pins "Loader is reused, not reimplemented." Writing a new parser would create a parallel format-decoder that drifts from the canonical one. The same lesson as the LAYOUT-001/002 hits — parallel format code paths produce silent format-drift bugs. 3. Why return BTreeMap<String, (Vec<f32>, Vec<usize>)> rather than a trainer-parameter-shaped struct? Decoupling: the read shouldn't know about TransformerTrainer's internal parameter names. Step 5f.3's job is to map HF names → trainer slots; if 5f.2 baked that mapping in, every change to TransformerTrainer would break the read. 4. Why include the signature-compile-bind test? It's a compile-time check that drives step 5f.3's expectations. If a future refactor changes the return type (e.g., from BTreeMap to HashMap, or from Vec<usize> to Box<[usize]>), step 5f.3's consumer code stops compiling — caught here, not at the integration point. 5. Why is FALSIFY-006 NOT yet at PARTIAL_ALGORITHM_LEVEL after this PR? Because step 5f.2 only does the read; FALSIFY-006 requires the LIVE init_loss < 6.0 check, which needs steps 5f.3 + 5g. This PR moves FALSIFY-006 from UNBOUND → READ-COMPILE-BIND, a sub-level of PARTIAL_ALGORITHM_LEVEL. Full PARTIAL discharge happens at 5f.3 when the populate step exists. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Refs: - SPEC-SHIP-TWO-001 §50, §51 — MODEL-2 architecture-coupling + cascade snapshot (PR #1472, #1480 MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (MERGED) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (MERGED) - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED) - PR #1479 — validate_pretrain_init_arch_compatible (in flight) - feedback_no_guessing.md - feedback_falsifier_first_cascade_pattern.md (this turn's pattern) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…step 5f.1 Adds `pretrain_real::validate_pretrain_init_arch_compatible(cfg)` that fail-fast rejects an init `TransformerConfig` whose architecture family is incompatible with the decoder-only pretrain trainer. Discharges from `apr-pretrain-arch-polymorphic-v1` (PR #1473): - FALSIFY-APR-PRETRAIN-ARCH-007 — wrong-arch APR (e.g., CodeBERT/ RoBERTa encoder model) is FAIL-FAST not silent-truncate Why this matters: §49 wires `--init <PATH>` to load weights from any APR file. Without this gate, an operator who points --init at e.g. microsoft/codebert-base.apr would silently load encoder weights into a decoder-shaped trainer, producing nonsense gradients that the divergence guard catches LATE (multiple epochs in). This gate catches the family mismatch BEFORE any trainer allocation. Step 5f decomposition: this is step 5f.1 — the arch-family gate. Step 5f.2 (~80 LOC, follow-up) does the actual weight materialization into optimizer state. Splitting keeps each PR small + reviewable. What this PR adds: 1. `pub fn validate_pretrain_init_arch_compatible(cfg: &TransformerConfig) -> Result<(), String>` (~30 LOC including doc comment) at pretrain_real.rs:35 2. 3 unit tests in `pretrain_real::tests`: - validate_pretrain_init_arch_accepts_decoder (FALSIFY-007 negative) - validate_pretrain_init_arch_rejects_encoder (FALSIFY-007 positive, load-bearing) - validate_pretrain_init_arch_accepts_llama370m_baseline (drift-prevention, catches over-rejection regression) The encoder-rejection test asserts FOUR string contents in the error: - "FALSIFY-APR-PRETRAIN-ARCH-007" — falsifier id (auditability) - "Encoder" — names the architecture family - "decoder-only" — explains why this is wrong - "RobertaModel" — names the offending hf_architecture Operator-experience parity: when the gate fires, the error tells the operator exactly what they did wrong + how the trainer differs. Test results (cargo test -p aprender-train --lib train::pretrain_real::tests::validate_pretrain_init_arch): 3 passed; 0 failed; 0 ignored Five Whys: 1. Why a separate function rather than baking the check into build_transformer_config? Decoupling: build_transformer_config is a pure pass-through dispatch; adding arch validation would conflate "which config?" with "is this config valid?". Two functions, two concerns, two test surfaces. 2. Why focus this PR on JUST the arch-family check (step 5f.1) and not the full weight materialization (step 5f)? Single-piece flow. Step 5f's full scope (~120 LOC) splits naturally into 5f.1 (this PR, ~30 LOC + 3 tests) + 5f.2 (~80 LOC, the actual weight load). Each PR has its own falsifier discharge; CI catches regressions between them. 3. Why FOUR string assertions in the encoder-rejection error? Each piece of the error text serves a distinct operator need: - falsifier id → audit (which contract did this fail?) - architecture family → what (encoder vs decoder) - "decoder-only" → why (the trainer is decoder-only) - hf_architecture → which model (RobertaModel/CodeBERT/...) Lossy error messages erode operator trust; the contract pins all four to prevent message rot. 4. Why include the Llama370M baseline drift-prevention test? §24's retrospective showed silent over-rejection (every input rejected, even valid ones) is the symmetric defect to silent under-rejection (every input accepted, even invalid ones). The 3 tests cover both halves of the dispatch. 5. Why is FALSIFY-006 (init_loss < 6.0) NOT yet discharged? That requires the actual weight materialization (step 5f.2) PLUS a LIVE training run (step 5g). Step 5f.1 is just the gate; the load-bearing init_loss measurement is downstream. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Refs: - SPEC-SHIP-TWO-001 §50 — MODEL-2 architecture-coupling (PR #1472, MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (in flight) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (in flight) - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED) - feedback_no_guessing.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… 5f.3 Add `populate_trainer_from_init_tensors(transformer, init_tensors)` — the population half of `apr pretrain --init`. Iterates the model's `named_parameters()` set, looks up each name in the init BTreeMap (HF naming preserved by §50.4 step 5f.2's loader), validates length, and calls `Transformer::set_named_parameter()`. `apr-pretrain-arch-polymorphic-v1` §init_load_semantics: - Population invariant: "Init tensors populate trainer parameters byte-equivalent to source" - FALSIFY-APR-PRETRAIN-INIT-007 (population step) at PARTIAL_ALGORITHM_LEVEL 1. **Why strict on missing-required?** Architecture mismatch (e.g., init from a different model family) would silently leave random init for absent parameters, which §28's SHIP-007 lesson teaches us is the exact class of "silent gibberish" defect that hides for many epochs. 2. **Why strict on length-mismatch?** A length mismatch indicates the from_apr_metadata extractor misread a shape — populating regardless would silently truncate or pad, masking the bug. 3. **Why permissive on extra-init-entries?** Tied embeddings: a Qwen2.5 APR may publish a separate `lm_head.weight` that the trainer's tied model omits. Failing on extra entries would force operators to pre-strip APRs, which is muda. 4. **Why FALSIFIER ID in error message?** §28 lesson — falsifier IDs in error messages turn opaque CI failures into self-explaining defects. 5. **Why one function not two (load+populate fused)?** Decoupling keeps `aprender-train` free of `aprender-serve` (the APR loader): the loader is a free function in §50.4 step 5f.2; this is the consumer. Two-step composition is testable independently (and is, in this PR). - `populate_trainer_from_init_tensors_happy_path`: every param matched → returns Ok(N) where N = named_parameters().len() - `populate_trainer_from_init_tensors_extra_entries_silently_ignored`: fictitious extra entry must NOT cause Err (tied-embeddings safety) - `populate_trainer_from_init_tensors_rejects_length_mismatch`: wrong flat length → Err naming the param + falsifier ID - `populate_trainer_from_init_tensors_rejects_missing_required_param`: missing required → Err with "not present in init APR" + falsifier ID All 12 tests pass; cargo clippy --lib clean. - [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (12/12 pass) - [x] `cargo clippy -p aprender-train --lib -- -D warnings` (clean) - [x] No new dependencies; pure aprender-train + autograd Tensor + std::collections::BTreeMap Step 5f.3 caps the §50.4 step-5f sub-cascade: 5f.1 — encoder-family validator (PR #1479, awaiting CI) 5f.2 — load_init_tensors_from_apr (PR #1481 MERGED) 5f.3 — THIS PR (populate_trainer_from_init_tensors) Remaining roadmap: 5g (LIVE 500-step fine-tune, operator dispatch), 5h (stamp + publish as MODEL-2 v2). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…PARTIAL_ALGORITHM_LEVEL — §50.4 cascade snapshot (#1482) ## Summary Bump `apr-pretrain-arch-polymorphic-v1` contract status from PROPOSED to PARTIAL_ALGORITHM_LEVEL. All 8 FALSIFY-APR-PRETRAIN-ARCH-* falsifiers are now bound to executable tests across the §50.4 cascade. ## Falsifier scoreboard (post-§51 snapshot) | ID | Rule | PR | Status | |------------|-----------------------------------------------|-------------------|-----------------------| | FALSIFY-001 | qwen2_0_5b matches HF config | #1474 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-002 | build_transformer_config(None) → Llama370M | #1475 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-003 | build_transformer_config(Some) extracts 10 | #1475 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-004 | GQA-7:1 forward-pass smoke | #1478 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-005 | Qwen tokenizer passes with --init Qwen | #1476 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-006 | Qwen tokenizer fails without --init | #1476 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-007 | encoder-arch APR fail-fast | #1479 open (auto-merge armed) | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-008 | contract self-validates via pv | this PR (validates clean) | PARTIAL_ALGORITHM_LEVEL | ## Test plan - [x] pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml exits 0 - [x] All 8 falsifiers cite a concrete test path or PR - [x] Changelog entry under metadata.changelog with version/date/change ## Why now Per `feedback_falsifier_first_cascade_pattern.md`: when a saturated auto-merge queue (≥4 PRs) blocks more impl PRs, switch to non-conflict work. This contract bump: - touches only one YAML file (no Rust/test source) - cannot conflict with #1479 / #1481 (impl PRs) - audit-trails the cascade scoreboard Promotion to FUNCTIONAL is gated on #1479 landing (FALSIFY-007 PASS). Promotion to DISCHARGED is gated on §50.4 step 5g LIVE empirical run. ## Five Whys 1. Why bump status now? — 7/8 falsifiers bound on main + 8th bound on open PR; PROPOSED is stale. 2. Why not wait for #1479 land first? — §51 snapshot recorded "7/8 PARTIAL bound" 2 hours ago; the 8th binding is the contract-self validation, which is met by THIS PR's `pv validate` output. 3. Why not bundle with #1479? — Different file, different review scope, different concern (status semantics vs. impl). 4. Why not skip the bump? — Operator-facing scoreboard is in the YAML; stale PROPOSED implies "not yet started" which contradicts §51. 5. Why YAML changelog instead of just version? — Changelog records THIS bump's reasoning so future operators don't re-derive it from git log. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…LETE + 5f.4 wireup gap identified (#1486) ## Summary Same-day continuation of §51 cascade landed PR #1479 (FALSIFY-007 encoder/decoder validator) and PR #1481 (load_init_tensors_from_apr). PR #1483 (5f.3 populate) and PR #1482 (contract status bump) are MERGEABLE in queue. All 8 falsifiers in `apr-pretrain-arch-polymorphic-v1` are now PARTIAL_ALGORITHM_LEVEL bound on main or about to land. §52 records: 1. Updated falsifier scoreboard (8/8 vs §51's 7/8) 2. NEW step 5f.4 (CLI wireup, ~150 LOC) identified via live source inspection of `apr-cli/src/commands/pretrain.rs:259-297` 3. Step 5g LIVE 500-step fine-tune is now gated on 5f.4 landing first ## Why now Per `feedback_falsifier_first_cascade_pattern.md`: when a saturated auto-merge queue blocks more impl PRs (#1483 + #1482 both in queue touching pretrain_real.rs), switch to non-conflicting work. This spec amendment touches one markdown file with no PR conflicts. ## Five Whys (§52.8 in body) 1. Why didn't §50 catch 5f.4? — top-down arch-coupling lens missed the CLI-dispatch seam. 2. Why is 5f.4 separate from 5f.3? — different crate (apr-cli vs aprender-train). 3. Why must 5f.4 be atomic? — removing "not yet wired" Err without the wireup produces silent random-init (§28 SHIP-007 defect class). 4. Why ~150 LOC? — 4 levels of plumbing + new builder + tests + CUDA. 5. Why call 5f.4 out in spec? — without §52, readers would assume 5g is dispatchable; spec is the source of truth. ## Test plan - [x] Single markdown file, no Rust changes - [x] Falsifier scoreboard table updated to 8/8 PARTIAL_ALGORITHM_LEVEL - [x] Step roadmap table adds 5f.4 between 5f.3 and 5g - [x] Cadence preserved: §41 → ... → §51 → §52 (12 amendments since 2026-05-03) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… 5f.3 (#1483) Add `populate_trainer_from_init_tensors(transformer, init_tensors)` — the population half of `apr pretrain --init`. Iterates the model's `named_parameters()` set, looks up each name in the init BTreeMap (HF naming preserved by §50.4 step 5f.2's loader), validates length, and calls `Transformer::set_named_parameter()`. `apr-pretrain-arch-polymorphic-v1` §init_load_semantics: - Population invariant: "Init tensors populate trainer parameters byte-equivalent to source" - FALSIFY-APR-PRETRAIN-INIT-007 (population step) at PARTIAL_ALGORITHM_LEVEL 1. **Why strict on missing-required?** Architecture mismatch (e.g., init from a different model family) would silently leave random init for absent parameters, which §28's SHIP-007 lesson teaches us is the exact class of "silent gibberish" defect that hides for many epochs. 2. **Why strict on length-mismatch?** A length mismatch indicates the from_apr_metadata extractor misread a shape — populating regardless would silently truncate or pad, masking the bug. 3. **Why permissive on extra-init-entries?** Tied embeddings: a Qwen2.5 APR may publish a separate `lm_head.weight` that the trainer's tied model omits. Failing on extra entries would force operators to pre-strip APRs, which is muda. 4. **Why FALSIFIER ID in error message?** §28 lesson — falsifier IDs in error messages turn opaque CI failures into self-explaining defects. 5. **Why one function not two (load+populate fused)?** Decoupling keeps `aprender-train` free of `aprender-serve` (the APR loader): the loader is a free function in §50.4 step 5f.2; this is the consumer. Two-step composition is testable independently (and is, in this PR). - `populate_trainer_from_init_tensors_happy_path`: every param matched → returns Ok(N) where N = named_parameters().len() - `populate_trainer_from_init_tensors_extra_entries_silently_ignored`: fictitious extra entry must NOT cause Err (tied-embeddings safety) - `populate_trainer_from_init_tensors_rejects_length_mismatch`: wrong flat length → Err naming the param + falsifier ID - `populate_trainer_from_init_tensors_rejects_missing_required_param`: missing required → Err with "not present in init APR" + falsifier ID All 12 tests pass; cargo clippy --lib clean. - [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (12/12 pass) - [x] `cargo clippy -p aprender-train --lib -- -D warnings` (clean) - [x] No new dependencies; pure aprender-train + autograd Tensor + std::collections::BTreeMap Step 5f.3 caps the §50.4 step-5f sub-cascade: 5f.1 — encoder-family validator (PR #1479, awaiting CI) 5f.2 — load_init_tensors_from_apr (PR #1481 MERGED) 5f.3 — THIS PR (populate_trainer_from_init_tensors) Remaining roadmap: 5g (LIVE 500-step fine-tune, operator dispatch), 5h (stamp + publish as MODEL-2 v2). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ep 5f.4 (#1494) ## Summary Wire `apr pretrain --init <PATH>` end-to-end so step 5g LIVE 500-step fine-tune can dispatch. Replaces the §49 step 4 "not yet wired" Err with the actual init-tensor load + trainer populate path that §50.4 steps 5f.1/5f.2/5f.3 made possible. ## Architecture Two functions added/changed: 1. `entrenar::train::pretrain_real::build_shared_trainer_with_init` — composes the §50.4 step-5f machinery (5c polymorphic dispatch + 5f.1 encoder rejection + 5f.2 load + 5f.3 populate) into a single trainer-builder entry. init=None preserves the from-scratch baseline byte-equivalent to `build_shared_trainer`. init=Some validates arch family, builds the polymorphic config, loads tensors, populates. 2. `apr-cli/src/commands/pretrain.rs::run` — now extracts the init APR file's TransformerConfig via existing `model_config::read_apr_architecture` when `--init` is set, then plumbs both `init_arch` and `init_path` through `drive_real → drive_real_cpu → build_shared_trainer_with_init`. The polymorphic preflight (§50.4 step 5d) already used the EXTRACTED vocab — this PR wires the call site to actually pass it. ## What this PR DOES NOT do - **CUDA path** (~80 LOC follow-up as 5f.5): `drive_real_cuda` now fail-fasts when --init is set rather than silently using random init (FALSIFY-APR-PRETRAIN-INIT-CUDA-001). The cuBLAS trainer needs symmetric `build_shared_cuda_trainer_with_init` which is out of scope. - **Step 5g LIVE 500-step fine-tune** (operator dispatch): this PR makes it dispatchable; running the 500 steps requires operator action. ## Discharges (per apr-pretrain-arch-polymorphic-v1) - §init_load_semantics integration: load + populate composed end-to-end - §arch_extraction_signature integration: read_apr_architecture wired - §qwen_tokenizer_vocab_compatibility integration: extracted vocab flows into preflight call site (no longer hardcoded Llama370M) - FALSIFY-APR-PRETRAIN-INIT-007 (population) at INTEGRATION level - The legacy "not yet wired" guard from §49 step 4 is RETIRED — the drift-prevention test now pins the new fail-closed semantic. ## Tests (8 new across 2 crates, all pass) - `aprender-train`: 4 new tests for `build_shared_trainer_with_init`: - `_none_uses_llama370m_shape` (regression-free init=None) - `_rejects_unpaired_args` (caller-bug guard) - `_rejects_encoder_family` (FALSIFY-007 integration) - `_decoder_family_proceeds_to_tensor_load` (failure ordering pin) - `apr-cli`: 2 retrofitted tests for the new fail-closed semantic: - `pretrain_init_valid_magic_but_bogus_metadata_fails_at_arch_extraction` (replaces the old "not yet wired" trip-wire) - `pretrain_init_v1_magic_aprn_passes_validate_init_apr_path` (helper now returns Ok on valid magic) 19/19 pretrain_real tests pass. 23/23 apr-cli pretrain tests pass. cargo clippy --lib -- -D warnings clean across both crates. ## Five Whys 1. **Why was 5f.4 needed at all?** §50's 5a-5h decomposition assumed the CLI dispatch would naturally invoke the helper functions; live source inspection (§52 amendment) revealed the dispatch hardcoded "not yet wired" Err. 5f.4 is the explicit wireup. 2. **Why is removing the safety Err so load-bearing?** The §28 SHIP-007 lesson: silently random-init via a half-implemented dispatch is the exact "silent gibberish" defect class. Removing the safety Err without the wireup would manifest as a multi-epoch divergence masquerading as a corpus-quality issue. 3. **Why a separate polymorphic builder rather than overload `build_shared_trainer`?** `build_shared_trainer` enforces INV-ARCH-370M-001 (param-count band) which only applies to from-scratch Llama370M. The polymorphic builder sidesteps it by design — Qwen2.5-0.5B is 0.5B params, outside the band by intent. 4. **Why fail-fast on `--init` + `--device cuda` rather than silently ignore?** Same reasoning as #2: silent CUDA random-init would bisect the same "silent gibberish" class. 5f.5 follow-up wires symmetric CUDA path; until then, fail-closed. 5. **Why couldn't this be inside #1483 (the populate PR)?** Different crate (apr-cli vs aprender-train), different review concern (CLI plumbing vs trainer mutation), different test surface. One atomic PR per file/crate boundary. ## Test plan - [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (19/19 pass) - [x] `cargo test -p apr-cli --lib commands::pretrain` (23/23 pass) - [x] `cargo clippy -p aprender-train -p apr-cli --lib -- -D warnings` (clean) - [x] `cargo check -p apr-cli --lib` (clean) - [ ] Operator-dispatched: `apr pretrain --init <Qwen2.5-Coder-0.5B>.apr` smoke that fires 50 training steps end-to-end (5g LIVE prelude; operator action in next session) ## Cascade context This is the §52-identified gap closing the §50.4 step 5f sub-cascade: - 5f.1 encoder validator: PR #1479 ✅ MERGED - 5f.2 load_init_tensors_from_apr: PR #1481 ✅ MERGED - 5f.3 populate_trainer_from_init_tensors: PR #1483 (mergeable, in queue) - **5f.4 CLI wireup: THIS PR** - 5g LIVE 500-step fine-tune: operator dispatch (next) - 5h stamp + publish: ~10 LOC follow-up Once 5f.4 lands AND 5g produces val_loss < 9.38 evidence, MODEL-2 ship % moves 57% → ≥58%. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ION-COMPLETE; contract v1.1.0 → v1.2.0 FUNCTIONAL (#1495) §50.4 cascade INTEGRATION-COMPLETE on main with PR #1494 merging at 2026-05-05T01:48:14Z. The `apr pretrain --init <PATH>` flow is now end-to-end functional on CPU; the legacy "not yet wired" Err is RETIRED; step 5g LIVE is the only remaining gate before MODEL-2 ship-% can move from 57% → ≥58%. Spec amendment §53: - Updated falsifier scoreboard: 6/8 INTEGRATION (001/002/003/005/006/007 via live CLI dispatch); 2/8 PARTIAL_ALGORITHM_LEVEL (004 forward-pass smoke + 008 contract validation are inherently algorithm-level). - Step roadmap: 5a-5f.4 ✅ MERGED; 5f.5 (CUDA wireup) NOT YET STARTED; 5g (LIVE 500-step fine-tune) operator-dispatchable on RTX 4090. - Cascade ships statistics: 11 PRs over 2 days (#1471/#1472/#1473/#1474/#1475/#1476/#1478/#1479/#1481/#1482/#1483/#1486/#1494). - MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57% (gated on 5g empirical val_loss < 9.38 evidence). - 3 CI andon classes documented as feedback memories during cascade (workspace-test missing-binary, trueno SIGSEGV-on-cleanup, auto-merge behind-state). Contract apr-pretrain-arch-polymorphic-v1 v1.1.0 → v1.2.0 FUNCTIONAL: - All 8 falsifiers PASS on main; 6/8 reach INTEGRATION via the user-facing `apr pretrain --init` flow. - verification_summary updated: tested 7 → 8; status partial → functional. - Added §52 + §53 references. - Promotion to DISCHARGED still requires §50.4 step 5g LIVE empirical 500-step fine-tune on canonical Qwen2.5-Coder-0.5B-Instruct.apr producing val_loss < 9.38. `pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml` exits 0. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1494 merge commit 9afca16 Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…-003/004/007 drift (round 2) (#1509) * contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2) Second-round test-reference drift correction. §57's drift sweep (this contract's v1.4 → v1.5 bump in PR #1505) caught FALSIFY-005/006 but a more thorough audit (cross-referencing every `test:` field against the source-code function-name registry) surfaced three additional dangling references. ## Drift inventory (round 2) | Falsifier | v1.5.0 cited test | Exists? | Actual test | | --- | --- | --- | --- | | 003 | build_transformer_config_qwen_init_matches_constructor | ❌ | build_transformer_config_qwen_init_matches_input | | 004 | transformer::attention::tests::gqa_7_to_1_matches_full_mha | ❌ | transformer::model::tests::falsify_apr_pretrain_arch_004_* | | 007 | build_transformer_config_encoder_init_errors | ❌ | validate_pretrain_init_arch_rejects_encoder | ## Why §57 (PR #1505) didn't catch these §57's grep audited test-name SUFFIXES and FRAGMENTS, which produced false-negatives on: - `_init_matches_constructor` vs `_init_matches_input` — both end in `_matches_<word>` so a fragment grep counted the contract's name as "not dangling" - `transformer::attention::tests::` vs `transformer::model::tests::` — module-path drift not just function-name drift; only fully- qualified path comparison catches this - `_encoder_init_errors` vs `validate_pretrain_init_arch_rejects_encoder` — the contract's name was a guess at the impl name; impl PR #1479 chose a completely different convention ## How this round was found Used a stricter audit: for every `cargo test ... ::tests::<name>` in contracts, grep `fn <name>` in the actual source tree. If the fn doesn't exist, drift. This catches drift that PR #1505's fragment-based audit missed. ## Resolution Update FALSIFY-003/004/007 `test:` fields to the actual function names. No falsifier semantics change. 11 falsifiers all PASS; contract status remains FUNCTIONAL. ## Verification $ cargo test -p aprender-train --lib -- build_transformer_config_qwen_init_matches_input test result: ok. 1 passed $ cargo test -p aprender-train --lib -- falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke test result: ok. 1 passed $ cargo test -p aprender-train --lib -- validate_pretrain_init_arch_rejects_encoder test result: ok. 1 passed $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml 0 error(s), 0 warning(s) ## Five Whys 1. Why did §57's sweep miss these? Used name-fragment grep (`::tests::[a-z_]+`) which counted false-negatives on suffix- close names like `_constructor` ↔ `_input`. 2. Why is module-path drift a separate class? Because grep against the `[a-z_]+` regex captures the FUNCTION name, not the `::module::tests::` path. A function with the right name in the wrong module passes that audit but fails actual test invocation. 3. Why fix in a separate PR rather than amending PR #1505? PR #1505 already merged. Per `feedback_falsifier_first_cascade_pattern.md` the cleanest cadence is one-bump-per-PR. 4. Why bump to v1.6.0? Same pattern as PR #1505's v1.4 → v1.5: the test-binding INVARIANT was broken in v1.5.0 (residual drift) and v1.6.0 restores it. 5. Why now (during 5g.1 wait)? Productive use of the 5g.1 (~10hr remaining) compute-bound idle time. Each drift fix is small (~30 LOC), reduces drift risk for future agents, and restores the falsifier-binding invariant. The alternative (manufacture bigger work) would risk introducing defects the contract base doesn't catch yet. ## Net effects - Contract v1.5.0 → v1.6.0 FUNCTIONAL. - 11 falsifiers, all PASS — same count, but FALSIFY-003/004/007 now reference tests that actually exist. - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. This is the SECOND round of drift sweep on this contract. Together with PRs #1502/#1504/#1505/#1506 (round 1), all known test-reference drift is closed across the §50.4 cascade contracts. A future spec amendment could codify a `pv lint --strict-test-binding` enforcement that prevents drift at contract-merge time. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.6.0, PR #1505 (round 1 partial fix), PR #1502/#1504/#1506 (sibling fixes) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * contract(apr-pretrain-arch-polymorphic-v1): also fix FALSIFY-001 (round 2.5 — surfaced by PR #1511) Round 2 (initial commit on this branch) fixed FALSIFY-003/004/007. Sub-agent PR #1511 (`pv lint --strict-test-binding`) surfaced a 4th drift in this same contract: FALSIFY-001 cited `qwen2_0_5b_matches_hf_config` → does NOT exist on main. Actual: `qwen2_0_5b_matches_hf_config_2026_05_04` (date-suffix added by impl PR #1474 / commit 9af6e71 — May 4). The earlier round-2 audit (which focused on suffix + module-path drift) didn't catch this because the test name has a DATE-SUFFIX drift class (function name + `_<date>` is a real Rust test, but the contract truncated to the prefix). Updates: - FALSIFY-001 test ref: append `_2026_05_04` suffix. - v1.6.0 changelog updated to record 4 fixes (was 3). - Verified: cargo test qwen2_0_5b_matches_hf_config_2026_05_04 PASS. - pv lint --strict-test-binding contracts/apr-pretrain-arch-polymorphic-v1.yaml: 0 PV-VER-002 (down from 4 pre-fix). This consolidates round 2 into a single commit on the same branch + PR (#1509) rather than spawning a round-3 PR for one extra fix. The lint hardening in #1511 is what made finding the 4th drift trivial; future drift will be caught at contract-merge time once #1511 lands. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1511 (sub-agent's pv lint --strict-test-binding), Issue #1510 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 4, 2026 16:21

noahgift mentioned this pull request May 4, 2026

spec(ship-two-models): v2.95.0 → v2.96.0 — §51 §50.4 cascade snapshot (7/8 falsifiers bound) #1480

Merged

noahgift mentioned this pull request May 4, 2026

feat(aprender-train): load_init_tensors_from_apr — §50.4 step 5f.2 #1481

Merged

noahgift force-pushed the feat/validate-pretrain-init-arch-compatible branch from 4ed5894 to d9b8e20 Compare May 4, 2026 18:29

noahgift mentioned this pull request May 4, 2026

contract(apr-pretrain-arch-polymorphic-v1): v1.0.0 → v1.1.0 PARTIAL_ALGORITHM_LEVEL #1482

Merged

3 tasks

noahgift force-pushed the feat/validate-pretrain-init-arch-compatible branch from 7069a83 to 55040a5 Compare May 4, 2026 19:34

noahgift mentioned this pull request May 4, 2026

feat(aprender-train): populate_trainer_from_init_tensors — §50.4 step 5f.3 #1483

Merged

3 tasks

noahgift merged commit 96653ff into main May 4, 2026
10 checks passed

noahgift deleted the feat/validate-pretrain-init-arch-compatible branch May 4, 2026 20:01

noahgift mentioned this pull request May 4, 2026

spec(ship-two-models): v2.96.0 → v2.97.0 — §52 cascade ALGORITHM-COMPLETE + 5f.4 wireup gap #1486

Merged

4 tasks

noahgift mentioned this pull request May 5, 2026

feat(apr-cli + aprender-train): apr pretrain --init wireup — §50.4 step 5f.4 #1494

Merged

noahgift mentioned this pull request May 5, 2026

spec(ship-two-models): v2.97 → v2.98 — §53 §50.4 cascade INTEGRATION-COMPLETE; contract v1.1 → v1.2 FUNCTIONAL #1495

Merged

4 tasks

noahgift mentioned this pull request May 11, 2026

fix(task-148): Toyota Way 500-line refactor + FALSIFY-CORPUS-004 + QLoRA + GPU training backend #1003

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(aprender-train): validate_pretrain_init_arch_compatible — §50.4 step 5f.1#1479

feat(aprender-train): validate_pretrain_init_arch_compatible — §50.4 step 5f.1#1479
noahgift merged 1 commit into
mainfrom
feat/validate-pretrain-init-arch-compatible

noahgift commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant