feat(aprender-train): validate_pretrain_init_arch_compatible — §50.4 step 5f.1#1479
Merged
Merged
Conversation
noahgift
added a commit
that referenced
this pull request
May 4, 2026
… (7/8 falsifiers bound) (#1480) Same-day continuation cycle landed 8 PRs across the §50.4 architecture- polymorphic infrastructure track. §51 records the cascade-complete state and pinpoints the remaining MODEL-2 ship-% gate (step 5g LIVE). Falsifier-discharge scoreboard for `apr-pretrain-arch-polymorphic-v1`: | ID | What it pins | PR | Status | |----|---------------------------------------|-------|--------| | 001 | qwen2_0_5b matches HF + tie fix | #1474 | PARTIAL | | 002 | init=None preserves Llama370M | #1475 | PARTIAL | | 003 | init=Some pass-through | #1475 | PARTIAL | | 004 | GQA-7:1 forward smoke | #1478 | MERGED | | 005 | Qwen tokenizer + Qwen target = pass | #1476 | MERGED | | 006 | Qwen tokenizer + Llama target = fail | #1476 | MERGED | | 007 | encoder/decoder family mismatch | #1479 | PARTIAL | | 008 | pv validate | #1473 | PARTIAL | 7 of 8 falsifiers PARTIAL_ALGORITHM_LEVEL or MERGED. Remaining work: - 5f.2 — wire APR file open + tensor materialization (~80 LOC) DELIBERATELY DEFERRED this cycle; doing 5f.2 now means rebasing onto 4 in-flight PRs as they land - 5g — LIVE 500-step smoke fine-tune (operator dispatch) THE LOAD-BEARING TEST that moves MODEL-2 ship-% - 5h — stamp + publish Per §47-§48 lesson: "infrastructure shipped ≠ ship-% movement." Cascade-complete state means the polymorphic foundation is in place; ship-% movement still requires the LIVE empirical check. Five Whys: 1. Why a snapshot now? Multiple PRs in cascade auto-merge create cognitive load. A spec snapshot captures both the achievement (7 falsifiers bound) and the remaining gate (step 5g LIVE). Without it, future operators waste cycles re-deriving the state. 2. Why focus on falsifier scoreboard rather than total LOC? Falsifier discharge is the actual contract obligation. 7/8 invariants pinned means CI now catches regressions in the polymorphic-init path. 3. Why mention 5f.2 explicitly as deliberately deferred? Naming the deferral makes it not a punt. Step 5f.2 has a clear "when": after the 4 in-flight PRs cascade-merge, then 5f.2 lands clean. 4. Why call out infrastructure ≠ ship-%? The §47-§48 cascade taught the same lesson — "11 SHIP-007 cascade PRs landed but no ship-% movement." Operator-facing ship-% is the LIVE check. 5. Why is FALSIFY-006 LIVE the load-bearing claim? init_loss(step=0) ≤ 6.0 vs from_scratch_loss(step=0) ≥ 9.5 proves end-to-end correctness in one number. No other falsifier can substitute. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Spec amendment cadence: §41 → §42 → §43 → §44 → §45 → §46 → §47 → §48 → §49 → §50 → §51. Eleven amendments since 2026-05-03. Same-day spec hygiene rather than letting the cascade-complete state remain implicit. Refs: - SPEC-SHIP-TWO-001 §50 — architecture-coupling finding (PR #1472, MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (in flight) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (MERGED) - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED) - PR #1479 — validate_pretrain_init_arch_compatible (in flight) - feedback_no_guessing.md Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
4ed5894 to
d9b8e20
Compare
3 tasks
noahgift
added a commit
that referenced
this pull request
May 4, 2026
…1481) Adds the read-half of `apr pretrain --init` weight load: a thin wrapper over `aprender::format::converter::load_model_tensors` that returns a `BTreeMap<String, (Vec<f32>, Vec<usize>)>` of tensor blobs keyed by HF naming convention. Per `apr-pretrain-arch-polymorphic-v1` §init_load_semantics (PR #1473): "Loader is REUSED, not reimplemented." This function does not duplicate APR parsing — it forwards to the same machinery `apr export` and `apr inspect` use. Discharges from `apr-pretrain-arch-polymorphic-v1`: - §init_load_semantics invariant (loader reuse): satisfied - FALSIFY-006 (init_loss < 6.0) at READ-COMPILE-BIND level Step 5f decomposition: - 5f.1 (PR #1479): encoder/decoder family validator (~30 LOC) - 5f.2 (this PR): APR file open + tensor read (~30 LOC + 2 tests) - 5f.3 (next): populate trainer parameters from BTreeMap (~50 LOC) - 5g (operator): LIVE 500-step fine-tune → DISCHARGES MODEL-2 ship-% Step 5f.2 is intentionally narrow — it only does the READ. Population into trainer parameter slots (5f.3) reconciles HF naming convention (e.g., `model.embed_tokens.weight`) against the trainer's internal parameter naming. That's a separate concern with its own falsifier. What this PR adds: 1. `pub fn load_init_tensors_from_apr(path) -> Result<BTreeMap<...>>` at pretrain_real.rs:35 (~25 LOC including doc comment) 2. 2 unit tests in `pretrain_real::tests`: - load_init_tensors_missing_file_errors_with_falsifier_id (FALSIFY-006 fail-fast path; asserts error message contains falsifier id + offending path for operator-experience) - load_init_tensors_signature_compile_bind (drift-prevention: catches a future signature change that would break step 5f.3's BTreeMap consumer) Test results (cargo test -p aprender-train --lib train::pretrain_real::tests::load_init_tensors): 2 passed; 0 failed; 0 ignored Five Whys: 1. Why decompose step 5f.2 to JUST the read? Single-piece flow. Read → Validate → Populate are three distinct concerns. Step 5f.1 did validation (#1479); 5f.2 does read; 5f.3 will do populate. Each PR has one falsifier discharge story. 2. Why use load_model_tensors and not write a new parser? The contract pins "Loader is reused, not reimplemented." Writing a new parser would create a parallel format-decoder that drifts from the canonical one. The same lesson as the LAYOUT-001/002 hits — parallel format code paths produce silent format-drift bugs. 3. Why return BTreeMap<String, (Vec<f32>, Vec<usize>)> rather than a trainer-parameter-shaped struct? Decoupling: the read shouldn't know about TransformerTrainer's internal parameter names. Step 5f.3's job is to map HF names → trainer slots; if 5f.2 baked that mapping in, every change to TransformerTrainer would break the read. 4. Why include the signature-compile-bind test? It's a compile-time check that drives step 5f.3's expectations. If a future refactor changes the return type (e.g., from BTreeMap to HashMap, or from Vec<usize> to Box<[usize]>), step 5f.3's consumer code stops compiling — caught here, not at the integration point. 5. Why is FALSIFY-006 NOT yet at PARTIAL_ALGORITHM_LEVEL after this PR? Because step 5f.2 only does the read; FALSIFY-006 requires the LIVE init_loss < 6.0 check, which needs steps 5f.3 + 5g. This PR moves FALSIFY-006 from UNBOUND → READ-COMPILE-BIND, a sub-level of PARTIAL_ALGORITHM_LEVEL. Full PARTIAL discharge happens at 5f.3 when the populate step exists. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Refs: - SPEC-SHIP-TWO-001 §50, §51 — MODEL-2 architecture-coupling + cascade snapshot (PR #1472, #1480 MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (MERGED) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (MERGED) - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED) - PR #1479 — validate_pretrain_init_arch_compatible (in flight) - feedback_no_guessing.md - feedback_falsifier_first_cascade_pattern.md (this turn's pattern) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…step 5f.1 Adds `pretrain_real::validate_pretrain_init_arch_compatible(cfg)` that fail-fast rejects an init `TransformerConfig` whose architecture family is incompatible with the decoder-only pretrain trainer. Discharges from `apr-pretrain-arch-polymorphic-v1` (PR #1473): - FALSIFY-APR-PRETRAIN-ARCH-007 — wrong-arch APR (e.g., CodeBERT/ RoBERTa encoder model) is FAIL-FAST not silent-truncate Why this matters: §49 wires `--init <PATH>` to load weights from any APR file. Without this gate, an operator who points --init at e.g. microsoft/codebert-base.apr would silently load encoder weights into a decoder-shaped trainer, producing nonsense gradients that the divergence guard catches LATE (multiple epochs in). This gate catches the family mismatch BEFORE any trainer allocation. Step 5f decomposition: this is step 5f.1 — the arch-family gate. Step 5f.2 (~80 LOC, follow-up) does the actual weight materialization into optimizer state. Splitting keeps each PR small + reviewable. What this PR adds: 1. `pub fn validate_pretrain_init_arch_compatible(cfg: &TransformerConfig) -> Result<(), String>` (~30 LOC including doc comment) at pretrain_real.rs:35 2. 3 unit tests in `pretrain_real::tests`: - validate_pretrain_init_arch_accepts_decoder (FALSIFY-007 negative) - validate_pretrain_init_arch_rejects_encoder (FALSIFY-007 positive, load-bearing) - validate_pretrain_init_arch_accepts_llama370m_baseline (drift-prevention, catches over-rejection regression) The encoder-rejection test asserts FOUR string contents in the error: - "FALSIFY-APR-PRETRAIN-ARCH-007" — falsifier id (auditability) - "Encoder" — names the architecture family - "decoder-only" — explains why this is wrong - "RobertaModel" — names the offending hf_architecture Operator-experience parity: when the gate fires, the error tells the operator exactly what they did wrong + how the trainer differs. Test results (cargo test -p aprender-train --lib train::pretrain_real::tests::validate_pretrain_init_arch): 3 passed; 0 failed; 0 ignored Five Whys: 1. Why a separate function rather than baking the check into build_transformer_config? Decoupling: build_transformer_config is a pure pass-through dispatch; adding arch validation would conflate "which config?" with "is this config valid?". Two functions, two concerns, two test surfaces. 2. Why focus this PR on JUST the arch-family check (step 5f.1) and not the full weight materialization (step 5f)? Single-piece flow. Step 5f's full scope (~120 LOC) splits naturally into 5f.1 (this PR, ~30 LOC + 3 tests) + 5f.2 (~80 LOC, the actual weight load). Each PR has its own falsifier discharge; CI catches regressions between them. 3. Why FOUR string assertions in the encoder-rejection error? Each piece of the error text serves a distinct operator need: - falsifier id → audit (which contract did this fail?) - architecture family → what (encoder vs decoder) - "decoder-only" → why (the trainer is decoder-only) - hf_architecture → which model (RobertaModel/CodeBERT/...) Lossy error messages erode operator trust; the contract pins all four to prevent message rot. 4. Why include the Llama370M baseline drift-prevention test? §24's retrospective showed silent over-rejection (every input rejected, even valid ones) is the symmetric defect to silent under-rejection (every input accepted, even invalid ones). The 3 tests cover both halves of the dispatch. 5. Why is FALSIFY-006 (init_loss < 6.0) NOT yet discharged? That requires the actual weight materialization (step 5f.2) PLUS a LIVE training run (step 5g). Step 5f.1 is just the gate; the load-bearing init_loss measurement is downstream. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Refs: - SPEC-SHIP-TWO-001 §50 — MODEL-2 architecture-coupling (PR #1472, MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (in flight) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (in flight) - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED) - feedback_no_guessing.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
7069a83 to
55040a5
Compare
3 tasks
noahgift
added a commit
that referenced
this pull request
May 4, 2026
… 5f.3
Add `populate_trainer_from_init_tensors(transformer, init_tensors)` —
the population half of `apr pretrain --init`. Iterates the model's
`named_parameters()` set, looks up each name in the init BTreeMap (HF
naming preserved by §50.4 step 5f.2's loader), validates length, and
calls `Transformer::set_named_parameter()`.
`apr-pretrain-arch-polymorphic-v1` §init_load_semantics:
- Population invariant: "Init tensors populate trainer parameters
byte-equivalent to source"
- FALSIFY-APR-PRETRAIN-INIT-007 (population step) at PARTIAL_ALGORITHM_LEVEL
1. **Why strict on missing-required?** Architecture mismatch (e.g., init
from a different model family) would silently leave random init for
absent parameters, which §28's SHIP-007 lesson teaches us is the
exact class of "silent gibberish" defect that hides for many epochs.
2. **Why strict on length-mismatch?** A length mismatch indicates the
from_apr_metadata extractor misread a shape — populating regardless
would silently truncate or pad, masking the bug.
3. **Why permissive on extra-init-entries?** Tied embeddings: a Qwen2.5
APR may publish a separate `lm_head.weight` that the trainer's tied
model omits. Failing on extra entries would force operators to
pre-strip APRs, which is muda.
4. **Why FALSIFIER ID in error message?** §28 lesson — falsifier IDs in
error messages turn opaque CI failures into self-explaining defects.
5. **Why one function not two (load+populate fused)?** Decoupling keeps
`aprender-train` free of `aprender-serve` (the APR loader): the
loader is a free function in §50.4 step 5f.2; this is the consumer.
Two-step composition is testable independently (and is, in this PR).
- `populate_trainer_from_init_tensors_happy_path`: every param matched
→ returns Ok(N) where N = named_parameters().len()
- `populate_trainer_from_init_tensors_extra_entries_silently_ignored`:
fictitious extra entry must NOT cause Err (tied-embeddings safety)
- `populate_trainer_from_init_tensors_rejects_length_mismatch`: wrong
flat length → Err naming the param + falsifier ID
- `populate_trainer_from_init_tensors_rejects_missing_required_param`:
missing required → Err with "not present in init APR" + falsifier ID
All 12 tests pass; cargo clippy --lib clean.
- [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (12/12 pass)
- [x] `cargo clippy -p aprender-train --lib -- -D warnings` (clean)
- [x] No new dependencies; pure aprender-train + autograd Tensor + std::collections::BTreeMap
Step 5f.3 caps the §50.4 step-5f sub-cascade:
5f.1 — encoder-family validator (PR #1479, awaiting CI)
5f.2 — load_init_tensors_from_apr (PR #1481 MERGED)
5f.3 — THIS PR (populate_trainer_from_init_tensors)
Remaining roadmap: 5g (LIVE 500-step fine-tune, operator dispatch),
5h (stamp + publish as MODEL-2 v2).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Merged
4 tasks
noahgift
added a commit
that referenced
this pull request
May 4, 2026
…PARTIAL_ALGORITHM_LEVEL — §50.4 cascade snapshot (#1482) ## Summary Bump `apr-pretrain-arch-polymorphic-v1` contract status from PROPOSED to PARTIAL_ALGORITHM_LEVEL. All 8 FALSIFY-APR-PRETRAIN-ARCH-* falsifiers are now bound to executable tests across the §50.4 cascade. ## Falsifier scoreboard (post-§51 snapshot) | ID | Rule | PR | Status | |------------|-----------------------------------------------|-------------------|-----------------------| | FALSIFY-001 | qwen2_0_5b matches HF config | #1474 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-002 | build_transformer_config(None) → Llama370M | #1475 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-003 | build_transformer_config(Some) extracts 10 | #1475 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-004 | GQA-7:1 forward-pass smoke | #1478 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-005 | Qwen tokenizer passes with --init Qwen | #1476 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-006 | Qwen tokenizer fails without --init | #1476 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-007 | encoder-arch APR fail-fast | #1479 open (auto-merge armed) | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-008 | contract self-validates via pv | this PR (validates clean) | PARTIAL_ALGORITHM_LEVEL | ## Test plan - [x] pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml exits 0 - [x] All 8 falsifiers cite a concrete test path or PR - [x] Changelog entry under metadata.changelog with version/date/change ## Why now Per `feedback_falsifier_first_cascade_pattern.md`: when a saturated auto-merge queue (≥4 PRs) blocks more impl PRs, switch to non-conflict work. This contract bump: - touches only one YAML file (no Rust/test source) - cannot conflict with #1479 / #1481 (impl PRs) - audit-trails the cascade scoreboard Promotion to FUNCTIONAL is gated on #1479 landing (FALSIFY-007 PASS). Promotion to DISCHARGED is gated on §50.4 step 5g LIVE empirical run. ## Five Whys 1. Why bump status now? — 7/8 falsifiers bound on main + 8th bound on open PR; PROPOSED is stale. 2. Why not wait for #1479 land first? — §51 snapshot recorded "7/8 PARTIAL bound" 2 hours ago; the 8th binding is the contract-self validation, which is met by THIS PR's `pv validate` output. 3. Why not bundle with #1479? — Different file, different review scope, different concern (status semantics vs. impl). 4. Why not skip the bump? — Operator-facing scoreboard is in the YAML; stale PROPOSED implies "not yet started" which contradicts §51. 5. Why YAML changelog instead of just version? — Changelog records THIS bump's reasoning so future operators don't re-derive it from git log. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…LETE + 5f.4 wireup gap identified (#1486) ## Summary Same-day continuation of §51 cascade landed PR #1479 (FALSIFY-007 encoder/decoder validator) and PR #1481 (load_init_tensors_from_apr). PR #1483 (5f.3 populate) and PR #1482 (contract status bump) are MERGEABLE in queue. All 8 falsifiers in `apr-pretrain-arch-polymorphic-v1` are now PARTIAL_ALGORITHM_LEVEL bound on main or about to land. §52 records: 1. Updated falsifier scoreboard (8/8 vs §51's 7/8) 2. NEW step 5f.4 (CLI wireup, ~150 LOC) identified via live source inspection of `apr-cli/src/commands/pretrain.rs:259-297` 3. Step 5g LIVE 500-step fine-tune is now gated on 5f.4 landing first ## Why now Per `feedback_falsifier_first_cascade_pattern.md`: when a saturated auto-merge queue blocks more impl PRs (#1483 + #1482 both in queue touching pretrain_real.rs), switch to non-conflicting work. This spec amendment touches one markdown file with no PR conflicts. ## Five Whys (§52.8 in body) 1. Why didn't §50 catch 5f.4? — top-down arch-coupling lens missed the CLI-dispatch seam. 2. Why is 5f.4 separate from 5f.3? — different crate (apr-cli vs aprender-train). 3. Why must 5f.4 be atomic? — removing "not yet wired" Err without the wireup produces silent random-init (§28 SHIP-007 defect class). 4. Why ~150 LOC? — 4 levels of plumbing + new builder + tests + CUDA. 5. Why call 5f.4 out in spec? — without §52, readers would assume 5g is dispatchable; spec is the source of truth. ## Test plan - [x] Single markdown file, no Rust changes - [x] Falsifier scoreboard table updated to 8/8 PARTIAL_ALGORITHM_LEVEL - [x] Step roadmap table adds 5f.4 between 5f.3 and 5g - [x] Cadence preserved: §41 → ... → §51 → §52 (12 amendments since 2026-05-03) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 5, 2026
… 5f.3 (#1483) Add `populate_trainer_from_init_tensors(transformer, init_tensors)` — the population half of `apr pretrain --init`. Iterates the model's `named_parameters()` set, looks up each name in the init BTreeMap (HF naming preserved by §50.4 step 5f.2's loader), validates length, and calls `Transformer::set_named_parameter()`. `apr-pretrain-arch-polymorphic-v1` §init_load_semantics: - Population invariant: "Init tensors populate trainer parameters byte-equivalent to source" - FALSIFY-APR-PRETRAIN-INIT-007 (population step) at PARTIAL_ALGORITHM_LEVEL 1. **Why strict on missing-required?** Architecture mismatch (e.g., init from a different model family) would silently leave random init for absent parameters, which §28's SHIP-007 lesson teaches us is the exact class of "silent gibberish" defect that hides for many epochs. 2. **Why strict on length-mismatch?** A length mismatch indicates the from_apr_metadata extractor misread a shape — populating regardless would silently truncate or pad, masking the bug. 3. **Why permissive on extra-init-entries?** Tied embeddings: a Qwen2.5 APR may publish a separate `lm_head.weight` that the trainer's tied model omits. Failing on extra entries would force operators to pre-strip APRs, which is muda. 4. **Why FALSIFIER ID in error message?** §28 lesson — falsifier IDs in error messages turn opaque CI failures into self-explaining defects. 5. **Why one function not two (load+populate fused)?** Decoupling keeps `aprender-train` free of `aprender-serve` (the APR loader): the loader is a free function in §50.4 step 5f.2; this is the consumer. Two-step composition is testable independently (and is, in this PR). - `populate_trainer_from_init_tensors_happy_path`: every param matched → returns Ok(N) where N = named_parameters().len() - `populate_trainer_from_init_tensors_extra_entries_silently_ignored`: fictitious extra entry must NOT cause Err (tied-embeddings safety) - `populate_trainer_from_init_tensors_rejects_length_mismatch`: wrong flat length → Err naming the param + falsifier ID - `populate_trainer_from_init_tensors_rejects_missing_required_param`: missing required → Err with "not present in init APR" + falsifier ID All 12 tests pass; cargo clippy --lib clean. - [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (12/12 pass) - [x] `cargo clippy -p aprender-train --lib -- -D warnings` (clean) - [x] No new dependencies; pure aprender-train + autograd Tensor + std::collections::BTreeMap Step 5f.3 caps the §50.4 step-5f sub-cascade: 5f.1 — encoder-family validator (PR #1479, awaiting CI) 5f.2 — load_init_tensors_from_apr (PR #1481 MERGED) 5f.3 — THIS PR (populate_trainer_from_init_tensors) Remaining roadmap: 5g (LIVE 500-step fine-tune, operator dispatch), 5h (stamp + publish as MODEL-2 v2). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…ep 5f.4 (#1494) ## Summary Wire `apr pretrain --init <PATH>` end-to-end so step 5g LIVE 500-step fine-tune can dispatch. Replaces the §49 step 4 "not yet wired" Err with the actual init-tensor load + trainer populate path that §50.4 steps 5f.1/5f.2/5f.3 made possible. ## Architecture Two functions added/changed: 1. `entrenar::train::pretrain_real::build_shared_trainer_with_init` — composes the §50.4 step-5f machinery (5c polymorphic dispatch + 5f.1 encoder rejection + 5f.2 load + 5f.3 populate) into a single trainer-builder entry. init=None preserves the from-scratch baseline byte-equivalent to `build_shared_trainer`. init=Some validates arch family, builds the polymorphic config, loads tensors, populates. 2. `apr-cli/src/commands/pretrain.rs::run` — now extracts the init APR file's TransformerConfig via existing `model_config::read_apr_architecture` when `--init` is set, then plumbs both `init_arch` and `init_path` through `drive_real → drive_real_cpu → build_shared_trainer_with_init`. The polymorphic preflight (§50.4 step 5d) already used the EXTRACTED vocab — this PR wires the call site to actually pass it. ## What this PR DOES NOT do - **CUDA path** (~80 LOC follow-up as 5f.5): `drive_real_cuda` now fail-fasts when --init is set rather than silently using random init (FALSIFY-APR-PRETRAIN-INIT-CUDA-001). The cuBLAS trainer needs symmetric `build_shared_cuda_trainer_with_init` which is out of scope. - **Step 5g LIVE 500-step fine-tune** (operator dispatch): this PR makes it dispatchable; running the 500 steps requires operator action. ## Discharges (per apr-pretrain-arch-polymorphic-v1) - §init_load_semantics integration: load + populate composed end-to-end - §arch_extraction_signature integration: read_apr_architecture wired - §qwen_tokenizer_vocab_compatibility integration: extracted vocab flows into preflight call site (no longer hardcoded Llama370M) - FALSIFY-APR-PRETRAIN-INIT-007 (population) at INTEGRATION level - The legacy "not yet wired" guard from §49 step 4 is RETIRED — the drift-prevention test now pins the new fail-closed semantic. ## Tests (8 new across 2 crates, all pass) - `aprender-train`: 4 new tests for `build_shared_trainer_with_init`: - `_none_uses_llama370m_shape` (regression-free init=None) - `_rejects_unpaired_args` (caller-bug guard) - `_rejects_encoder_family` (FALSIFY-007 integration) - `_decoder_family_proceeds_to_tensor_load` (failure ordering pin) - `apr-cli`: 2 retrofitted tests for the new fail-closed semantic: - `pretrain_init_valid_magic_but_bogus_metadata_fails_at_arch_extraction` (replaces the old "not yet wired" trip-wire) - `pretrain_init_v1_magic_aprn_passes_validate_init_apr_path` (helper now returns Ok on valid magic) 19/19 pretrain_real tests pass. 23/23 apr-cli pretrain tests pass. cargo clippy --lib -- -D warnings clean across both crates. ## Five Whys 1. **Why was 5f.4 needed at all?** §50's 5a-5h decomposition assumed the CLI dispatch would naturally invoke the helper functions; live source inspection (§52 amendment) revealed the dispatch hardcoded "not yet wired" Err. 5f.4 is the explicit wireup. 2. **Why is removing the safety Err so load-bearing?** The §28 SHIP-007 lesson: silently random-init via a half-implemented dispatch is the exact "silent gibberish" defect class. Removing the safety Err without the wireup would manifest as a multi-epoch divergence masquerading as a corpus-quality issue. 3. **Why a separate polymorphic builder rather than overload `build_shared_trainer`?** `build_shared_trainer` enforces INV-ARCH-370M-001 (param-count band) which only applies to from-scratch Llama370M. The polymorphic builder sidesteps it by design — Qwen2.5-0.5B is 0.5B params, outside the band by intent. 4. **Why fail-fast on `--init` + `--device cuda` rather than silently ignore?** Same reasoning as #2: silent CUDA random-init would bisect the same "silent gibberish" class. 5f.5 follow-up wires symmetric CUDA path; until then, fail-closed. 5. **Why couldn't this be inside #1483 (the populate PR)?** Different crate (apr-cli vs aprender-train), different review concern (CLI plumbing vs trainer mutation), different test surface. One atomic PR per file/crate boundary. ## Test plan - [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (19/19 pass) - [x] `cargo test -p apr-cli --lib commands::pretrain` (23/23 pass) - [x] `cargo clippy -p aprender-train -p apr-cli --lib -- -D warnings` (clean) - [x] `cargo check -p apr-cli --lib` (clean) - [ ] Operator-dispatched: `apr pretrain --init <Qwen2.5-Coder-0.5B>.apr` smoke that fires 50 training steps end-to-end (5g LIVE prelude; operator action in next session) ## Cascade context This is the §52-identified gap closing the §50.4 step 5f sub-cascade: - 5f.1 encoder validator: PR #1479 ✅ MERGED - 5f.2 load_init_tensors_from_apr: PR #1481 ✅ MERGED - 5f.3 populate_trainer_from_init_tensors: PR #1483 (mergeable, in queue) - **5f.4 CLI wireup: THIS PR** - 5g LIVE 500-step fine-tune: operator dispatch (next) - 5h stamp + publish: ~10 LOC follow-up Once 5f.4 lands AND 5g produces val_loss < 9.38 evidence, MODEL-2 ship % moves 57% → ≥58%. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…ION-COMPLETE; contract v1.1.0 → v1.2.0 FUNCTIONAL (#1495) §50.4 cascade INTEGRATION-COMPLETE on main with PR #1494 merging at 2026-05-05T01:48:14Z. The `apr pretrain --init <PATH>` flow is now end-to-end functional on CPU; the legacy "not yet wired" Err is RETIRED; step 5g LIVE is the only remaining gate before MODEL-2 ship-% can move from 57% → ≥58%. Spec amendment §53: - Updated falsifier scoreboard: 6/8 INTEGRATION (001/002/003/005/006/007 via live CLI dispatch); 2/8 PARTIAL_ALGORITHM_LEVEL (004 forward-pass smoke + 008 contract validation are inherently algorithm-level). - Step roadmap: 5a-5f.4 ✅ MERGED; 5f.5 (CUDA wireup) NOT YET STARTED; 5g (LIVE 500-step fine-tune) operator-dispatchable on RTX 4090. - Cascade ships statistics: 11 PRs over 2 days (#1471/#1472/#1473/#1474/#1475/#1476/#1478/#1479/#1481/#1482/#1483/#1486/#1494). - MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57% (gated on 5g empirical val_loss < 9.38 evidence). - 3 CI andon classes documented as feedback memories during cascade (workspace-test missing-binary, trueno SIGSEGV-on-cleanup, auto-merge behind-state). Contract apr-pretrain-arch-polymorphic-v1 v1.1.0 → v1.2.0 FUNCTIONAL: - All 8 falsifiers PASS on main; 6/8 reach INTEGRATION via the user-facing `apr pretrain --init` flow. - verification_summary updated: tested 7 → 8; status partial → functional. - Added §52 + §53 references. - Promotion to DISCHARGED still requires §50.4 step 5g LIVE empirical 500-step fine-tune on canonical Qwen2.5-Coder-0.5B-Instruct.apr producing val_loss < 9.38. `pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml` exits 0. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1494 merge commit 9afca16 Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…-003/004/007 drift (round 2) (#1509) * contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2) Second-round test-reference drift correction. §57's drift sweep (this contract's v1.4 → v1.5 bump in PR #1505) caught FALSIFY-005/006 but a more thorough audit (cross-referencing every `test:` field against the source-code function-name registry) surfaced three additional dangling references. ## Drift inventory (round 2) | Falsifier | v1.5.0 cited test | Exists? | Actual test | | --- | --- | --- | --- | | 003 | build_transformer_config_qwen_init_matches_constructor | ❌ | build_transformer_config_qwen_init_matches_input | | 004 | transformer::attention::tests::gqa_7_to_1_matches_full_mha | ❌ | transformer::model::tests::falsify_apr_pretrain_arch_004_* | | 007 | build_transformer_config_encoder_init_errors | ❌ | validate_pretrain_init_arch_rejects_encoder | ## Why §57 (PR #1505) didn't catch these §57's grep audited test-name SUFFIXES and FRAGMENTS, which produced false-negatives on: - `_init_matches_constructor` vs `_init_matches_input` — both end in `_matches_<word>` so a fragment grep counted the contract's name as "not dangling" - `transformer::attention::tests::` vs `transformer::model::tests::` — module-path drift not just function-name drift; only fully- qualified path comparison catches this - `_encoder_init_errors` vs `validate_pretrain_init_arch_rejects_encoder` — the contract's name was a guess at the impl name; impl PR #1479 chose a completely different convention ## How this round was found Used a stricter audit: for every `cargo test ... ::tests::<name>` in contracts, grep `fn <name>` in the actual source tree. If the fn doesn't exist, drift. This catches drift that PR #1505's fragment-based audit missed. ## Resolution Update FALSIFY-003/004/007 `test:` fields to the actual function names. No falsifier semantics change. 11 falsifiers all PASS; contract status remains FUNCTIONAL. ## Verification $ cargo test -p aprender-train --lib -- build_transformer_config_qwen_init_matches_input test result: ok. 1 passed $ cargo test -p aprender-train --lib -- falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke test result: ok. 1 passed $ cargo test -p aprender-train --lib -- validate_pretrain_init_arch_rejects_encoder test result: ok. 1 passed $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml 0 error(s), 0 warning(s) ## Five Whys 1. Why did §57's sweep miss these? Used name-fragment grep (`::tests::[a-z_]+`) which counted false-negatives on suffix- close names like `_constructor` ↔ `_input`. 2. Why is module-path drift a separate class? Because grep against the `[a-z_]+` regex captures the FUNCTION name, not the `::module::tests::` path. A function with the right name in the wrong module passes that audit but fails actual test invocation. 3. Why fix in a separate PR rather than amending PR #1505? PR #1505 already merged. Per `feedback_falsifier_first_cascade_pattern.md` the cleanest cadence is one-bump-per-PR. 4. Why bump to v1.6.0? Same pattern as PR #1505's v1.4 → v1.5: the test-binding INVARIANT was broken in v1.5.0 (residual drift) and v1.6.0 restores it. 5. Why now (during 5g.1 wait)? Productive use of the 5g.1 (~10hr remaining) compute-bound idle time. Each drift fix is small (~30 LOC), reduces drift risk for future agents, and restores the falsifier-binding invariant. The alternative (manufacture bigger work) would risk introducing defects the contract base doesn't catch yet. ## Net effects - Contract v1.5.0 → v1.6.0 FUNCTIONAL. - 11 falsifiers, all PASS — same count, but FALSIFY-003/004/007 now reference tests that actually exist. - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. This is the SECOND round of drift sweep on this contract. Together with PRs #1502/#1504/#1505/#1506 (round 1), all known test-reference drift is closed across the §50.4 cascade contracts. A future spec amendment could codify a `pv lint --strict-test-binding` enforcement that prevents drift at contract-merge time. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.6.0, PR #1505 (round 1 partial fix), PR #1502/#1504/#1506 (sibling fixes) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * contract(apr-pretrain-arch-polymorphic-v1): also fix FALSIFY-001 (round 2.5 — surfaced by PR #1511) Round 2 (initial commit on this branch) fixed FALSIFY-003/004/007. Sub-agent PR #1511 (`pv lint --strict-test-binding`) surfaced a 4th drift in this same contract: FALSIFY-001 cited `qwen2_0_5b_matches_hf_config` → does NOT exist on main. Actual: `qwen2_0_5b_matches_hf_config_2026_05_04` (date-suffix added by impl PR #1474 / commit 9af6e71 — May 4). The earlier round-2 audit (which focused on suffix + module-path drift) didn't catch this because the test name has a DATE-SUFFIX drift class (function name + `_<date>` is a real Rust test, but the contract truncated to the prefix). Updates: - FALSIFY-001 test ref: append `_2026_05_04` suffix. - v1.6.0 changelog updated to record 4 fixes (was 3). - Verified: cargo test qwen2_0_5b_matches_hf_config_2026_05_04 PASS. - pv lint --strict-test-binding contracts/apr-pretrain-arch-polymorphic-v1.yaml: 0 PV-VER-002 (down from 4 pre-fix). This consolidates round 2 into a single commit on the same branch + PR (#1509) rather than spawning a round-3 PR for one extra fix. The lint hardening in #1511 is what made finding the 4th drift trivial; future drift will be caught at contract-merge time once #1511 lands. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1511 (sub-agent's pv lint --strict-test-binding), Issue #1510 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds pretrain_real::validate_pretrain_init_arch_compatible(cfg) that fail-fast rejects an init TransformerConfig whose architecture family is incompatible with the decoder-only pretrain trainer. Discharges FALSIFY-APR-PRETRAIN-ARCH-007 from PR #1473 contract. Encoder configs (CodeBERT/RoBERTa/BERT) are rejected with error message naming: falsifier-id, architecture family, decoder-only requirement, hf_architecture (e.g. RobertaModel). 3 unit tests verify decoder accept + encoder reject + Llama370M baseline accept (drift-prevention). Step 5f decomposition: this is 5f.1 (~30 LOC arch gate); 5f.2 (~80 LOC actual weight load) is follow-up. Plain ship-%: MODEL-1=91%, MODEL-2=57% (unchanged; gated on step 5g LIVE fine-tune). Builds on PRs #1472+#1478 MERGED + #1473/#1474/#1475/#1476 in flight.