feat(aprender-train): load_init_tensors_from_apr — §50.4 step 5f.2#1481
Merged
Conversation
Adds the read-half of `apr pretrain --init` weight load: a thin wrapper over `aprender::format::converter::load_model_tensors` that returns a `BTreeMap<String, (Vec<f32>, Vec<usize>)>` of tensor blobs keyed by HF naming convention. Per `apr-pretrain-arch-polymorphic-v1` §init_load_semantics (PR #1473): "Loader is REUSED, not reimplemented." This function does not duplicate APR parsing — it forwards to the same machinery `apr export` and `apr inspect` use. Discharges from `apr-pretrain-arch-polymorphic-v1`: - §init_load_semantics invariant (loader reuse): satisfied - FALSIFY-006 (init_loss < 6.0) at READ-COMPILE-BIND level Step 5f decomposition: - 5f.1 (PR #1479): encoder/decoder family validator (~30 LOC) - 5f.2 (this PR): APR file open + tensor read (~30 LOC + 2 tests) - 5f.3 (next): populate trainer parameters from BTreeMap (~50 LOC) - 5g (operator): LIVE 500-step fine-tune → DISCHARGES MODEL-2 ship-% Step 5f.2 is intentionally narrow — it only does the READ. Population into trainer parameter slots (5f.3) reconciles HF naming convention (e.g., `model.embed_tokens.weight`) against the trainer's internal parameter naming. That's a separate concern with its own falsifier. What this PR adds: 1. `pub fn load_init_tensors_from_apr(path) -> Result<BTreeMap<...>>` at pretrain_real.rs:35 (~25 LOC including doc comment) 2. 2 unit tests in `pretrain_real::tests`: - load_init_tensors_missing_file_errors_with_falsifier_id (FALSIFY-006 fail-fast path; asserts error message contains falsifier id + offending path for operator-experience) - load_init_tensors_signature_compile_bind (drift-prevention: catches a future signature change that would break step 5f.3's BTreeMap consumer) Test results (cargo test -p aprender-train --lib train::pretrain_real::tests::load_init_tensors): 2 passed; 0 failed; 0 ignored Five Whys: 1. Why decompose step 5f.2 to JUST the read? Single-piece flow. Read → Validate → Populate are three distinct concerns. Step 5f.1 did validation (#1479); 5f.2 does read; 5f.3 will do populate. Each PR has one falsifier discharge story. 2. Why use load_model_tensors and not write a new parser? The contract pins "Loader is reused, not reimplemented." Writing a new parser would create a parallel format-decoder that drifts from the canonical one. The same lesson as the LAYOUT-001/002 hits — parallel format code paths produce silent format-drift bugs. 3. Why return BTreeMap<String, (Vec<f32>, Vec<usize>)> rather than a trainer-parameter-shaped struct? Decoupling: the read shouldn't know about TransformerTrainer's internal parameter names. Step 5f.3's job is to map HF names → trainer slots; if 5f.2 baked that mapping in, every change to TransformerTrainer would break the read. 4. Why include the signature-compile-bind test? It's a compile-time check that drives step 5f.3's expectations. If a future refactor changes the return type (e.g., from BTreeMap to HashMap, or from Vec<usize> to Box<[usize]>), step 5f.3's consumer code stops compiling — caught here, not at the integration point. 5. Why is FALSIFY-006 NOT yet at PARTIAL_ALGORITHM_LEVEL after this PR? Because step 5f.2 only does the read; FALSIFY-006 requires the LIVE init_loss < 6.0 check, which needs steps 5f.3 + 5g. This PR moves FALSIFY-006 from UNBOUND → READ-COMPILE-BIND, a sub-level of PARTIAL_ALGORITHM_LEVEL. Full PARTIAL discharge happens at 5f.3 when the populate step exists. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Refs: - SPEC-SHIP-TWO-001 §50, §51 — MODEL-2 architecture-coupling + cascade snapshot (PR #1472, #1480 MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (MERGED) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (MERGED) - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED) - PR #1479 — validate_pretrain_init_arch_compatible (in flight) - feedback_no_guessing.md - feedback_falsifier_first_cascade_pattern.md (this turn's pattern) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
3 tasks
noahgift
added a commit
that referenced
this pull request
May 4, 2026
… 5f.3
Add `populate_trainer_from_init_tensors(transformer, init_tensors)` —
the population half of `apr pretrain --init`. Iterates the model's
`named_parameters()` set, looks up each name in the init BTreeMap (HF
naming preserved by §50.4 step 5f.2's loader), validates length, and
calls `Transformer::set_named_parameter()`.
`apr-pretrain-arch-polymorphic-v1` §init_load_semantics:
- Population invariant: "Init tensors populate trainer parameters
byte-equivalent to source"
- FALSIFY-APR-PRETRAIN-INIT-007 (population step) at PARTIAL_ALGORITHM_LEVEL
1. **Why strict on missing-required?** Architecture mismatch (e.g., init
from a different model family) would silently leave random init for
absent parameters, which §28's SHIP-007 lesson teaches us is the
exact class of "silent gibberish" defect that hides for many epochs.
2. **Why strict on length-mismatch?** A length mismatch indicates the
from_apr_metadata extractor misread a shape — populating regardless
would silently truncate or pad, masking the bug.
3. **Why permissive on extra-init-entries?** Tied embeddings: a Qwen2.5
APR may publish a separate `lm_head.weight` that the trainer's tied
model omits. Failing on extra entries would force operators to
pre-strip APRs, which is muda.
4. **Why FALSIFIER ID in error message?** §28 lesson — falsifier IDs in
error messages turn opaque CI failures into self-explaining defects.
5. **Why one function not two (load+populate fused)?** Decoupling keeps
`aprender-train` free of `aprender-serve` (the APR loader): the
loader is a free function in §50.4 step 5f.2; this is the consumer.
Two-step composition is testable independently (and is, in this PR).
- `populate_trainer_from_init_tensors_happy_path`: every param matched
→ returns Ok(N) where N = named_parameters().len()
- `populate_trainer_from_init_tensors_extra_entries_silently_ignored`:
fictitious extra entry must NOT cause Err (tied-embeddings safety)
- `populate_trainer_from_init_tensors_rejects_length_mismatch`: wrong
flat length → Err naming the param + falsifier ID
- `populate_trainer_from_init_tensors_rejects_missing_required_param`:
missing required → Err with "not present in init APR" + falsifier ID
All 12 tests pass; cargo clippy --lib clean.
- [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (12/12 pass)
- [x] `cargo clippy -p aprender-train --lib -- -D warnings` (clean)
- [x] No new dependencies; pure aprender-train + autograd Tensor + std::collections::BTreeMap
Step 5f.3 caps the §50.4 step-5f sub-cascade:
5f.1 — encoder-family validator (PR #1479, awaiting CI)
5f.2 — load_init_tensors_from_apr (PR #1481 MERGED)
5f.3 — THIS PR (populate_trainer_from_init_tensors)
Remaining roadmap: 5g (LIVE 500-step fine-tune, operator dispatch),
5h (stamp + publish as MODEL-2 v2).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Merged
4 tasks
noahgift
added a commit
that referenced
this pull request
May 4, 2026
…PARTIAL_ALGORITHM_LEVEL — §50.4 cascade snapshot (#1482) ## Summary Bump `apr-pretrain-arch-polymorphic-v1` contract status from PROPOSED to PARTIAL_ALGORITHM_LEVEL. All 8 FALSIFY-APR-PRETRAIN-ARCH-* falsifiers are now bound to executable tests across the §50.4 cascade. ## Falsifier scoreboard (post-§51 snapshot) | ID | Rule | PR | Status | |------------|-----------------------------------------------|-------------------|-----------------------| | FALSIFY-001 | qwen2_0_5b matches HF config | #1474 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-002 | build_transformer_config(None) → Llama370M | #1475 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-003 | build_transformer_config(Some) extracts 10 | #1475 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-004 | GQA-7:1 forward-pass smoke | #1478 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-005 | Qwen tokenizer passes with --init Qwen | #1476 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-006 | Qwen tokenizer fails without --init | #1476 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-007 | encoder-arch APR fail-fast | #1479 open (auto-merge armed) | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-008 | contract self-validates via pv | this PR (validates clean) | PARTIAL_ALGORITHM_LEVEL | ## Test plan - [x] pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml exits 0 - [x] All 8 falsifiers cite a concrete test path or PR - [x] Changelog entry under metadata.changelog with version/date/change ## Why now Per `feedback_falsifier_first_cascade_pattern.md`: when a saturated auto-merge queue (≥4 PRs) blocks more impl PRs, switch to non-conflict work. This contract bump: - touches only one YAML file (no Rust/test source) - cannot conflict with #1479 / #1481 (impl PRs) - audit-trails the cascade scoreboard Promotion to FUNCTIONAL is gated on #1479 landing (FALSIFY-007 PASS). Promotion to DISCHARGED is gated on §50.4 step 5g LIVE empirical run. ## Five Whys 1. Why bump status now? — 7/8 falsifiers bound on main + 8th bound on open PR; PROPOSED is stale. 2. Why not wait for #1479 land first? — §51 snapshot recorded "7/8 PARTIAL bound" 2 hours ago; the 8th binding is the contract-self validation, which is met by THIS PR's `pv validate` output. 3. Why not bundle with #1479? — Different file, different review scope, different concern (status semantics vs. impl). 4. Why not skip the bump? — Operator-facing scoreboard is in the YAML; stale PROPOSED implies "not yet started" which contradicts §51. 5. Why YAML changelog instead of just version? — Changelog records THIS bump's reasoning so future operators don't re-derive it from git log. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…LETE + 5f.4 wireup gap identified (#1486) ## Summary Same-day continuation of §51 cascade landed PR #1479 (FALSIFY-007 encoder/decoder validator) and PR #1481 (load_init_tensors_from_apr). PR #1483 (5f.3 populate) and PR #1482 (contract status bump) are MERGEABLE in queue. All 8 falsifiers in `apr-pretrain-arch-polymorphic-v1` are now PARTIAL_ALGORITHM_LEVEL bound on main or about to land. §52 records: 1. Updated falsifier scoreboard (8/8 vs §51's 7/8) 2. NEW step 5f.4 (CLI wireup, ~150 LOC) identified via live source inspection of `apr-cli/src/commands/pretrain.rs:259-297` 3. Step 5g LIVE 500-step fine-tune is now gated on 5f.4 landing first ## Why now Per `feedback_falsifier_first_cascade_pattern.md`: when a saturated auto-merge queue blocks more impl PRs (#1483 + #1482 both in queue touching pretrain_real.rs), switch to non-conflicting work. This spec amendment touches one markdown file with no PR conflicts. ## Five Whys (§52.8 in body) 1. Why didn't §50 catch 5f.4? — top-down arch-coupling lens missed the CLI-dispatch seam. 2. Why is 5f.4 separate from 5f.3? — different crate (apr-cli vs aprender-train). 3. Why must 5f.4 be atomic? — removing "not yet wired" Err without the wireup produces silent random-init (§28 SHIP-007 defect class). 4. Why ~150 LOC? — 4 levels of plumbing + new builder + tests + CUDA. 5. Why call 5f.4 out in spec? — without §52, readers would assume 5g is dispatchable; spec is the source of truth. ## Test plan - [x] Single markdown file, no Rust changes - [x] Falsifier scoreboard table updated to 8/8 PARTIAL_ALGORITHM_LEVEL - [x] Step roadmap table adds 5f.4 between 5f.3 and 5g - [x] Cadence preserved: §41 → ... → §51 → §52 (12 amendments since 2026-05-03) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 5, 2026
… 5f.3 (#1483) Add `populate_trainer_from_init_tensors(transformer, init_tensors)` — the population half of `apr pretrain --init`. Iterates the model's `named_parameters()` set, looks up each name in the init BTreeMap (HF naming preserved by §50.4 step 5f.2's loader), validates length, and calls `Transformer::set_named_parameter()`. `apr-pretrain-arch-polymorphic-v1` §init_load_semantics: - Population invariant: "Init tensors populate trainer parameters byte-equivalent to source" - FALSIFY-APR-PRETRAIN-INIT-007 (population step) at PARTIAL_ALGORITHM_LEVEL 1. **Why strict on missing-required?** Architecture mismatch (e.g., init from a different model family) would silently leave random init for absent parameters, which §28's SHIP-007 lesson teaches us is the exact class of "silent gibberish" defect that hides for many epochs. 2. **Why strict on length-mismatch?** A length mismatch indicates the from_apr_metadata extractor misread a shape — populating regardless would silently truncate or pad, masking the bug. 3. **Why permissive on extra-init-entries?** Tied embeddings: a Qwen2.5 APR may publish a separate `lm_head.weight` that the trainer's tied model omits. Failing on extra entries would force operators to pre-strip APRs, which is muda. 4. **Why FALSIFIER ID in error message?** §28 lesson — falsifier IDs in error messages turn opaque CI failures into self-explaining defects. 5. **Why one function not two (load+populate fused)?** Decoupling keeps `aprender-train` free of `aprender-serve` (the APR loader): the loader is a free function in §50.4 step 5f.2; this is the consumer. Two-step composition is testable independently (and is, in this PR). - `populate_trainer_from_init_tensors_happy_path`: every param matched → returns Ok(N) where N = named_parameters().len() - `populate_trainer_from_init_tensors_extra_entries_silently_ignored`: fictitious extra entry must NOT cause Err (tied-embeddings safety) - `populate_trainer_from_init_tensors_rejects_length_mismatch`: wrong flat length → Err naming the param + falsifier ID - `populate_trainer_from_init_tensors_rejects_missing_required_param`: missing required → Err with "not present in init APR" + falsifier ID All 12 tests pass; cargo clippy --lib clean. - [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (12/12 pass) - [x] `cargo clippy -p aprender-train --lib -- -D warnings` (clean) - [x] No new dependencies; pure aprender-train + autograd Tensor + std::collections::BTreeMap Step 5f.3 caps the §50.4 step-5f sub-cascade: 5f.1 — encoder-family validator (PR #1479, awaiting CI) 5f.2 — load_init_tensors_from_apr (PR #1481 MERGED) 5f.3 — THIS PR (populate_trainer_from_init_tensors) Remaining roadmap: 5g (LIVE 500-step fine-tune, operator dispatch), 5h (stamp + publish as MODEL-2 v2). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…ep 5f.4 (#1494) ## Summary Wire `apr pretrain --init <PATH>` end-to-end so step 5g LIVE 500-step fine-tune can dispatch. Replaces the §49 step 4 "not yet wired" Err with the actual init-tensor load + trainer populate path that §50.4 steps 5f.1/5f.2/5f.3 made possible. ## Architecture Two functions added/changed: 1. `entrenar::train::pretrain_real::build_shared_trainer_with_init` — composes the §50.4 step-5f machinery (5c polymorphic dispatch + 5f.1 encoder rejection + 5f.2 load + 5f.3 populate) into a single trainer-builder entry. init=None preserves the from-scratch baseline byte-equivalent to `build_shared_trainer`. init=Some validates arch family, builds the polymorphic config, loads tensors, populates. 2. `apr-cli/src/commands/pretrain.rs::run` — now extracts the init APR file's TransformerConfig via existing `model_config::read_apr_architecture` when `--init` is set, then plumbs both `init_arch` and `init_path` through `drive_real → drive_real_cpu → build_shared_trainer_with_init`. The polymorphic preflight (§50.4 step 5d) already used the EXTRACTED vocab — this PR wires the call site to actually pass it. ## What this PR DOES NOT do - **CUDA path** (~80 LOC follow-up as 5f.5): `drive_real_cuda` now fail-fasts when --init is set rather than silently using random init (FALSIFY-APR-PRETRAIN-INIT-CUDA-001). The cuBLAS trainer needs symmetric `build_shared_cuda_trainer_with_init` which is out of scope. - **Step 5g LIVE 500-step fine-tune** (operator dispatch): this PR makes it dispatchable; running the 500 steps requires operator action. ## Discharges (per apr-pretrain-arch-polymorphic-v1) - §init_load_semantics integration: load + populate composed end-to-end - §arch_extraction_signature integration: read_apr_architecture wired - §qwen_tokenizer_vocab_compatibility integration: extracted vocab flows into preflight call site (no longer hardcoded Llama370M) - FALSIFY-APR-PRETRAIN-INIT-007 (population) at INTEGRATION level - The legacy "not yet wired" guard from §49 step 4 is RETIRED — the drift-prevention test now pins the new fail-closed semantic. ## Tests (8 new across 2 crates, all pass) - `aprender-train`: 4 new tests for `build_shared_trainer_with_init`: - `_none_uses_llama370m_shape` (regression-free init=None) - `_rejects_unpaired_args` (caller-bug guard) - `_rejects_encoder_family` (FALSIFY-007 integration) - `_decoder_family_proceeds_to_tensor_load` (failure ordering pin) - `apr-cli`: 2 retrofitted tests for the new fail-closed semantic: - `pretrain_init_valid_magic_but_bogus_metadata_fails_at_arch_extraction` (replaces the old "not yet wired" trip-wire) - `pretrain_init_v1_magic_aprn_passes_validate_init_apr_path` (helper now returns Ok on valid magic) 19/19 pretrain_real tests pass. 23/23 apr-cli pretrain tests pass. cargo clippy --lib -- -D warnings clean across both crates. ## Five Whys 1. **Why was 5f.4 needed at all?** §50's 5a-5h decomposition assumed the CLI dispatch would naturally invoke the helper functions; live source inspection (§52 amendment) revealed the dispatch hardcoded "not yet wired" Err. 5f.4 is the explicit wireup. 2. **Why is removing the safety Err so load-bearing?** The §28 SHIP-007 lesson: silently random-init via a half-implemented dispatch is the exact "silent gibberish" defect class. Removing the safety Err without the wireup would manifest as a multi-epoch divergence masquerading as a corpus-quality issue. 3. **Why a separate polymorphic builder rather than overload `build_shared_trainer`?** `build_shared_trainer` enforces INV-ARCH-370M-001 (param-count band) which only applies to from-scratch Llama370M. The polymorphic builder sidesteps it by design — Qwen2.5-0.5B is 0.5B params, outside the band by intent. 4. **Why fail-fast on `--init` + `--device cuda` rather than silently ignore?** Same reasoning as #2: silent CUDA random-init would bisect the same "silent gibberish" class. 5f.5 follow-up wires symmetric CUDA path; until then, fail-closed. 5. **Why couldn't this be inside #1483 (the populate PR)?** Different crate (apr-cli vs aprender-train), different review concern (CLI plumbing vs trainer mutation), different test surface. One atomic PR per file/crate boundary. ## Test plan - [x] `cargo test -p aprender-train --lib train::pretrain_real::tests` (19/19 pass) - [x] `cargo test -p apr-cli --lib commands::pretrain` (23/23 pass) - [x] `cargo clippy -p aprender-train -p apr-cli --lib -- -D warnings` (clean) - [x] `cargo check -p apr-cli --lib` (clean) - [ ] Operator-dispatched: `apr pretrain --init <Qwen2.5-Coder-0.5B>.apr` smoke that fires 50 training steps end-to-end (5g LIVE prelude; operator action in next session) ## Cascade context This is the §52-identified gap closing the §50.4 step 5f sub-cascade: - 5f.1 encoder validator: PR #1479 ✅ MERGED - 5f.2 load_init_tensors_from_apr: PR #1481 ✅ MERGED - 5f.3 populate_trainer_from_init_tensors: PR #1483 (mergeable, in queue) - **5f.4 CLI wireup: THIS PR** - 5g LIVE 500-step fine-tune: operator dispatch (next) - 5h stamp + publish: ~10 LOC follow-up Once 5f.4 lands AND 5g produces val_loss < 9.38 evidence, MODEL-2 ship % moves 57% → ≥58%. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…ION-COMPLETE; contract v1.1.0 → v1.2.0 FUNCTIONAL (#1495) §50.4 cascade INTEGRATION-COMPLETE on main with PR #1494 merging at 2026-05-05T01:48:14Z. The `apr pretrain --init <PATH>` flow is now end-to-end functional on CPU; the legacy "not yet wired" Err is RETIRED; step 5g LIVE is the only remaining gate before MODEL-2 ship-% can move from 57% → ≥58%. Spec amendment §53: - Updated falsifier scoreboard: 6/8 INTEGRATION (001/002/003/005/006/007 via live CLI dispatch); 2/8 PARTIAL_ALGORITHM_LEVEL (004 forward-pass smoke + 008 contract validation are inherently algorithm-level). - Step roadmap: 5a-5f.4 ✅ MERGED; 5f.5 (CUDA wireup) NOT YET STARTED; 5g (LIVE 500-step fine-tune) operator-dispatchable on RTX 4090. - Cascade ships statistics: 11 PRs over 2 days (#1471/#1472/#1473/#1474/#1475/#1476/#1478/#1479/#1481/#1482/#1483/#1486/#1494). - MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57% (gated on 5g empirical val_loss < 9.38 evidence). - 3 CI andon classes documented as feedback memories during cascade (workspace-test missing-binary, trueno SIGSEGV-on-cleanup, auto-merge behind-state). Contract apr-pretrain-arch-polymorphic-v1 v1.1.0 → v1.2.0 FUNCTIONAL: - All 8 falsifiers PASS on main; 6/8 reach INTEGRATION via the user-facing `apr pretrain --init` flow. - verification_summary updated: tested 7 → 8; status partial → functional. - Added §52 + §53 references. - Promotion to DISCHARGED still requires §50.4 step 5g LIVE empirical 500-step fine-tune on canonical Qwen2.5-Coder-0.5B-Instruct.apr producing val_loss < 9.38. `pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml` exits 0. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1494 merge commit 9afca16 Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the read-half of
apr pretrain --initweight load: a thin wrapper overaprender::format::converter::load_model_tensorsreturningBTreeMap<String, (Vec<f32>, Vec<usize>)>keyed by HF naming. Per PR #1473 contract §init_load_semantics: 'Loader is REUSED, not reimplemented.' Step 5f decomposition: 5f.1 (#1479 validator) + 5f.2 (this PR, read) + 5f.3 (populate, next) + 5g (operator LIVE). Discharges FALSIFY-006 at READ-COMPILE-BIND level. 2 tests: missing-file fail-fast with falsifier id, signature compile-bind for step 5f.3 consumer drift-prevention. Plain ship-%: MODEL-1=91%, MODEL-2=57% (unchanged; step 5g gates ship-%). Builds on PRs #1472/#1474/#1476/#1478/#1480 MERGED + #1473/#1475/#1479 in flight.