feat(apr-cli): polymorphic preflight_tokenizer_vocab_matches_target — §50.4 step 5d by noahgift · Pull Request #1476 · paiml/aprender

noahgift · 2026-05-04T15:39:55Z

Summary

Refactors preflight_tokenizer_vocab_matches_model(tokenizer_dir) → preflight_tokenizer_vocab_matches_target(tokenizer_dir, target_vocab_size) so the GATE-ARCH-370M-011 preflight no longer hardcodes the Llama370M baseline. When --init wires up (§50.4 step 5f), the caller passes the EXTRACTED arch's vocab_size (e.g., 151_936 for Qwen2.5-0.5B); for now (init=None), the only existing caller passes Llama370MConfig::VOCAB_SIZE explicitly, preserving regression-free behavior on the §24/§25 from-scratch path.

Discharges

From apr-pretrain-arch-polymorphic-v1 (PR #1473, in flight):

FALSIFY-APR-PRETRAIN-ARCH-005 — Qwen tokenizer (151_936) PASSES preflight when target is Qwen-shaped
FALSIFY-APR-PRETRAIN-ARCH-006 — Qwen tokenizer (151_936) FAILS preflight when target is Llama-shaped (the silent-pass class)

What this PR adds

Renamed function with new target_vocab_size: usize parameter
Updated 4 callers (1 production at line 361 + 3 existing tests) to pass Llama370MConfig::VOCAB_SIZE explicitly
2 new unit tests in commands::pretrain::tests:
- preflight_qwen_vocab_passes_with_qwen_target (FALSIFY-005)
- preflight_qwen_vocab_fails_with_llama_target (FALSIFY-006 with error-message assertion that names BOTH vocab sizes)
Updated docstring noting the §50 polymorphism + cross-references to contract

Test results

$ cargo test -p apr-cli --lib commands::pretrain::tests::preflight
running 5 tests
test commands::pretrain::tests::preflight_rejects_missing_vocab_json ... ok
test commands::pretrain::tests::preflight_accepts_matching_vocab ... ok
test commands::pretrain::tests::preflight_rejects_tokenizer_vocab_mismatch ... ok
test commands::pretrain::tests::preflight_qwen_vocab_fails_with_llama_target ... ok
test commands::pretrain::tests::preflight_qwen_vocab_passes_with_qwen_target ... ok

test result: ok. 5 passed; 0 failed; 0 ignored

Plain ship-% update

MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track)
MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38)

Five Whys

Why polymorphic preflight NOW rather than at step 5f? Each step gets its own falsifier discharge. Step 5d's invariant — "preflight gates by EXTRACTED vocab when --init is set, by Llama370M vocab when --init is absent" — is independently testable WITHOUT actually reading an APR file. Authoring the test now pins the algorithm before the I/O integration arrives.
Why rename _matches_model → _matches_target? Old name implied "matches the model" with a fixed/canonical model. New name reflects the polymorphic dispatch where the target depends on call-site context.
Why pass target as parameter rather than extract from PretrainConfig? Decoupling: PretrainConfig already exists and is wired through many callers. Adding a new field would create a parallel drift surface. A function parameter forces every call site to decide explicitly — exactly the contract's "no silent defaults" invariant.
Why 2 new tests not 1? The two falsifiers (-005 and -006) are mutually exclusive proofs of the polymorphism: 005 (positive, Qwen+Qwen=pass), 006 (negative, Qwen+Llama=fail). Without the negative case, a regression that always returns Ok would silently pass the positive case.
Why does FALSIFY-006 assert BOTH vocab sizes appear in the error? Operator-experience: when a fine-tune fails, the operator needs to see WHICH tokenizer (151_936) and WHICH target (50_257) — not just an abstract "they don't match" error.

Test plan

cargo test -p apr-cli --lib commands::pretrain::tests::preflight — 5/5 pass
Pre-commit quality gates pass
CI checks (gate, test, lint, coverage, security)

Refs

Spec: PR spec(ship-two-models): v2.94.0 → v2.95.0 — §50 MODEL-2 architecture-coupling finding #1472 §50 (in flight)
Contract: PR contract(apr-pretrain-arch-polymorphic-v1): v1.0.0 PROPOSED — §50.4 step 5a #1473 (in flight) — FALSIFY-005 + -006 discharged here
Sibling impl: PR fix(aprender-train): qwen2_0_5b tie_word_embeddings true — §50.4 step 5b + DEFECT FIX #1474 (qwen2_0_5b fix), PR feat(aprender-train): build_transformer_config polymorphic dispatch — §50.4 step 5c #1475 (build_transformer_config) — both in flight
feedback_no_guessing.md

🤖 Generated with Claude Code

… §50.4 step 5d Refactors `preflight_tokenizer_vocab_matches_model(tokenizer_dir)` → `preflight_tokenizer_vocab_matches_target(tokenizer_dir, target_vocab_size)` so the GATE-ARCH-370M-011 preflight no longer hardcodes the Llama370M baseline. When --init wires up (§50.4 step 5f), the caller passes the EXTRACTED arch's vocab_size (e.g., 151_936 for Qwen2.5-0.5B); for now (init=None), the only existing caller passes Llama370MConfig::VOCAB_SIZE explicitly, preserving regression-free behavior on the §24/§25 from-scratch path. Discharges from `apr-pretrain-arch-polymorphic-v1` (PR #1473): - FALSIFY-APR-PRETRAIN-ARCH-005 — Qwen tokenizer (151_936) PASSES preflight when target is Qwen-shaped - FALSIFY-APR-PRETRAIN-ARCH-006 — Qwen tokenizer (151_936) FAILS preflight when target is Llama-shaped (the silent-pass class) What this PR adds: 1. Renamed function `preflight_tokenizer_vocab_matches_model` → `preflight_tokenizer_vocab_matches_target` with new `target_vocab_size: usize` parameter 2. Updated 4 callers (1 production at line 361 + 3 existing tests) to pass `Llama370MConfig::VOCAB_SIZE` explicitly — same behavior, now visible at the call site 3. 2 new unit tests in `commands::pretrain::tests`: - preflight_qwen_vocab_passes_with_qwen_target (FALSIFY-005) - preflight_qwen_vocab_fails_with_llama_target (FALSIFY-006 with error-message assertion that names BOTH vocab sizes) 4. Updated docstring noting the §50 polymorphism + cross-references to contract `apr-pretrain-arch-polymorphic-v1` Test results (cargo test -p apr-cli --lib commands::pretrain::tests::preflight): 5 passed; 0 failed; 0 ignored - preflight_accepts_matching_vocab (regression-free, unchanged behavior) - preflight_rejects_tokenizer_vocab_mismatch (regression-free) - preflight_rejects_missing_vocab_json (regression-free) - preflight_qwen_vocab_passes_with_qwen_target (NEW — FALSIFY-005) - preflight_qwen_vocab_fails_with_llama_target (NEW — FALSIFY-006) Five Whys: 1. Why polymorphic preflight NOW rather than at step 5f? Each step gets its own falsifier discharge. Step 5d's invariant — "preflight gates by EXTRACTED vocab when --init is set, by Llama370M vocab when --init is absent" — is independently testable WITHOUT actually reading an APR file. Authoring the test now pins the algorithm before the I/O integration arrives. 2. Why rename `_matches_model` → `_matches_target`? Old name implied "matches the model" with a fixed/canonical model. New name reflects the polymorphic dispatch where the target depends on call-site context. The rename is a one-time cost; staying with the old name would ossify the misleading abstraction. 3. Why pass target as parameter rather than extract from PretrainConfig? Decoupling: PretrainConfig already exists and is wired through many callers. Adding a new field to PretrainConfig would create a parallel drift surface (every constructor of PretrainConfig must remember to set it correctly). A function parameter forces every call site to decide explicitly, which is exactly the contract's "no silent defaults" invariant. 4. Why 2 new tests not 1? The two falsifiers (-005 and -006) are mutually exclusive proofs of the polymorphism: - 005 (positive): Qwen+Qwen target = pass - 006 (negative): Qwen+Llama target = fail Without the negative case, a regression that always returns Ok would silently pass the positive case. The pair pins the dispatch. 5. Why does FALSIFY-006 assert BOTH vocab sizes appear in the error message? Operator-experience: when a fine-tune fails with "tokenizer vocab mismatch", the operator needs to see WHICH tokenizer (151_936) and WHICH target (50_257) — not just an abstract "they don't match" error. The dual-name requirement prevents lossy error messages during the §49 strategy pivot. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Refs: - SPEC-SHIP-TWO-001 §50 — MODEL-2 architecture-coupling (#1472, in flight) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (in flight) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - feedback_no_guessing.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…#1478) Adds `falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke` that constructs a tiny GQA-7:1 transformer (kv_heads=2, query_heads=14, hidden=112=14*8, head_dim=8 — mimicking Qwen2.5-Coder-0.5B's GQA ratio) and verifies the forward pass: - runs without panic - returns the correct shape (seq_len * vocab_size) - produces all-finite logits (no NaN, no Inf) Discharges from `apr-pretrain-arch-polymorphic-v1` (PR #1473): - FALSIFY-APR-PRETRAIN-ARCH-004 at SMOKE level: kernel handles GQA-7:1 without per-ratio specialization. Full numerical-parity vs GQA-1:1 reference (cosine ≥ 0.9999) is a FUNCTIONAL-level discharge, not PARTIAL_ALGORITHM_LEVEL. Why this matters: the existing aprender-train Llama370M codepath only empirically exercised GQA-4:1 (kv_heads=4, query_heads=16). Qwen2.5-0.5B (the §49 fine-tune init source) uses GQA-7:1. Without this test, a future refactor of the attention kernel could silently break the 7:1 case while keeping 4:1 working — exactly the §24 silent-failure class. The test runs in <1ms (tiny shape: hidden=112, vocab=256, layers=1). Drift-prevention: also asserts the GQA ratio at construction time, so a typo in num_attention_heads or num_kv_heads is caught before the forward pass even runs. Test results (cargo test -p aprender-train --lib transformer::model::tests::falsify_apr_pretrain_arch_004): 1 passed; 0 failed; 0 ignored Five Whys: 1. Why a smoke test, not a numerical-parity test? PARTIAL_ALGORITHM_LEVEL requires only "compile + run + finite". FUNCTIONAL would require cosine vs reference. Smoke is the right scope for §50.4 step 5e — full parity is a follow-up if FALSIFY-006 (init_loss < 6.0) ever fails on the LIVE 500-step run. 2. Why num_attention_heads=14 (Qwen2.5-0.5B exact) and not e.g. 7 (smaller test model)? The Qwen2.5-0.5B-canonical 14/2=7 ratio is the load-bearing GQA shape. A 7/1 ratio would also be 7:1 but wouldn't exercise the multi-query-head-per-kv-head dispatch on more than one query group. 14/2 forces 2 query groups, each with 7 heads — the actual production shape. 3. Why use_bias=true and tie_word_embeddings=true? Mirror the Qwen2 scaling-law defaults verified by PR #1474 (the `qwen2_0_5b()` HF config check). If the test used the Llama defaults (use_bias=false, tie=false), it wouldn't catch a regression in the bias-add or embedding-tie code paths under the Qwen variant. 4. Why include the all-finite check, not just shape? §24's retrospective showed silent NaN propagation through GQA can produce loss=NaN that the divergence guard catches LATE (multiple steps in). The smoke test catches it at the first forward pass, before any optimizer state corrupts. 5. Why is this a SEPARATE test, not an extension of `test_transformer_tiny_forward`? The existing tiny() config uses defaults that may include GQA=1:1 (no GQA at all). A separate test makes the GQA-7:1 assertion auditable — `cargo test gqa_7_1` finds it directly, and contract drift between this test and FALSIFY-004 is detectable via grep. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Refs: - SPEC-SHIP-TWO-001 §50 — MODEL-2 architecture-coupling (PR #1472, MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (in flight) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (in flight) - feedback_no_guessing.md Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… (7/8 falsifiers bound) (#1480) Same-day continuation cycle landed 8 PRs across the §50.4 architecture- polymorphic infrastructure track. §51 records the cascade-complete state and pinpoints the remaining MODEL-2 ship-% gate (step 5g LIVE). Falsifier-discharge scoreboard for `apr-pretrain-arch-polymorphic-v1`: | ID | What it pins | PR | Status | |----|---------------------------------------|-------|--------| | 001 | qwen2_0_5b matches HF + tie fix | #1474 | PARTIAL | | 002 | init=None preserves Llama370M | #1475 | PARTIAL | | 003 | init=Some pass-through | #1475 | PARTIAL | | 004 | GQA-7:1 forward smoke | #1478 | MERGED | | 005 | Qwen tokenizer + Qwen target = pass | #1476 | MERGED | | 006 | Qwen tokenizer + Llama target = fail | #1476 | MERGED | | 007 | encoder/decoder family mismatch | #1479 | PARTIAL | | 008 | pv validate | #1473 | PARTIAL | 7 of 8 falsifiers PARTIAL_ALGORITHM_LEVEL or MERGED. Remaining work: - 5f.2 — wire APR file open + tensor materialization (~80 LOC) DELIBERATELY DEFERRED this cycle; doing 5f.2 now means rebasing onto 4 in-flight PRs as they land - 5g — LIVE 500-step smoke fine-tune (operator dispatch) THE LOAD-BEARING TEST that moves MODEL-2 ship-% - 5h — stamp + publish Per §47-§48 lesson: "infrastructure shipped ≠ ship-% movement." Cascade-complete state means the polymorphic foundation is in place; ship-% movement still requires the LIVE empirical check. Five Whys: 1. Why a snapshot now? Multiple PRs in cascade auto-merge create cognitive load. A spec snapshot captures both the achievement (7 falsifiers bound) and the remaining gate (step 5g LIVE). Without it, future operators waste cycles re-deriving the state. 2. Why focus on falsifier scoreboard rather than total LOC? Falsifier discharge is the actual contract obligation. 7/8 invariants pinned means CI now catches regressions in the polymorphic-init path. 3. Why mention 5f.2 explicitly as deliberately deferred? Naming the deferral makes it not a punt. Step 5f.2 has a clear "when": after the 4 in-flight PRs cascade-merge, then 5f.2 lands clean. 4. Why call out infrastructure ≠ ship-%? The §47-§48 cascade taught the same lesson — "11 SHIP-007 cascade PRs landed but no ship-% movement." Operator-facing ship-% is the LIVE check. 5. Why is FALSIFY-006 LIVE the load-bearing claim? init_loss(step=0) ≤ 6.0 vs from_scratch_loss(step=0) ≥ 9.5 proves end-to-end correctness in one number. No other falsifier can substitute. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Spec amendment cadence: §41 → §42 → §43 → §44 → §45 → §46 → §47 → §48 → §49 → §50 → §51. Eleven amendments since 2026-05-03. Same-day spec hygiene rather than letting the cascade-complete state remain implicit. Refs: - SPEC-SHIP-TWO-001 §50 — architecture-coupling finding (PR #1472, MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (in flight) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (MERGED) - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED) - PR #1479 — validate_pretrain_init_arch_compatible (in flight) - feedback_no_guessing.md Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…step 5f.1 Adds `pretrain_real::validate_pretrain_init_arch_compatible(cfg)` that fail-fast rejects an init `TransformerConfig` whose architecture family is incompatible with the decoder-only pretrain trainer. Discharges from `apr-pretrain-arch-polymorphic-v1` (PR #1473): - FALSIFY-APR-PRETRAIN-ARCH-007 — wrong-arch APR (e.g., CodeBERT/ RoBERTa encoder model) is FAIL-FAST not silent-truncate Why this matters: §49 wires `--init <PATH>` to load weights from any APR file. Without this gate, an operator who points --init at e.g. microsoft/codebert-base.apr would silently load encoder weights into a decoder-shaped trainer, producing nonsense gradients that the divergence guard catches LATE (multiple epochs in). This gate catches the family mismatch BEFORE any trainer allocation. Step 5f decomposition: this is step 5f.1 — the arch-family gate. Step 5f.2 (~80 LOC, follow-up) does the actual weight materialization into optimizer state. Splitting keeps each PR small + reviewable. What this PR adds: 1. `pub fn validate_pretrain_init_arch_compatible(cfg: &TransformerConfig) -> Result<(), String>` (~30 LOC including doc comment) at pretrain_real.rs:35 2. 3 unit tests in `pretrain_real::tests`: - validate_pretrain_init_arch_accepts_decoder (FALSIFY-007 negative) - validate_pretrain_init_arch_rejects_encoder (FALSIFY-007 positive, load-bearing) - validate_pretrain_init_arch_accepts_llama370m_baseline (drift-prevention, catches over-rejection regression) The encoder-rejection test asserts FOUR string contents in the error: - "FALSIFY-APR-PRETRAIN-ARCH-007" — falsifier id (auditability) - "Encoder" — names the architecture family - "decoder-only" — explains why this is wrong - "RobertaModel" — names the offending hf_architecture Operator-experience parity: when the gate fires, the error tells the operator exactly what they did wrong + how the trainer differs. Test results (cargo test -p aprender-train --lib train::pretrain_real::tests::validate_pretrain_init_arch): 3 passed; 0 failed; 0 ignored Five Whys: 1. Why a separate function rather than baking the check into build_transformer_config? Decoupling: build_transformer_config is a pure pass-through dispatch; adding arch validation would conflate "which config?" with "is this config valid?". Two functions, two concerns, two test surfaces. 2. Why focus this PR on JUST the arch-family check (step 5f.1) and not the full weight materialization (step 5f)? Single-piece flow. Step 5f's full scope (~120 LOC) splits naturally into 5f.1 (this PR, ~30 LOC + 3 tests) + 5f.2 (~80 LOC, the actual weight load). Each PR has its own falsifier discharge; CI catches regressions between them. 3. Why FOUR string assertions in the encoder-rejection error? Each piece of the error text serves a distinct operator need: - falsifier id → audit (which contract did this fail?) - architecture family → what (encoder vs decoder) - "decoder-only" → why (the trainer is decoder-only) - hf_architecture → which model (RobertaModel/CodeBERT/...) Lossy error messages erode operator trust; the contract pins all four to prevent message rot. 4. Why include the Llama370M baseline drift-prevention test? §24's retrospective showed silent over-rejection (every input rejected, even valid ones) is the symmetric defect to silent under-rejection (every input accepted, even invalid ones). The 3 tests cover both halves of the dispatch. 5. Why is FALSIFY-006 (init_loss < 6.0) NOT yet discharged? That requires the actual weight materialization (step 5f.2) PLUS a LIVE training run (step 5g). Step 5f.1 is just the gate; the load-bearing init_loss measurement is downstream. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Refs: - SPEC-SHIP-TWO-001 §50 — MODEL-2 architecture-coupling (PR #1472, MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (in flight) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (in flight) - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED) - feedback_no_guessing.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…1481) Adds the read-half of `apr pretrain --init` weight load: a thin wrapper over `aprender::format::converter::load_model_tensors` that returns a `BTreeMap<String, (Vec<f32>, Vec<usize>)>` of tensor blobs keyed by HF naming convention. Per `apr-pretrain-arch-polymorphic-v1` §init_load_semantics (PR #1473): "Loader is REUSED, not reimplemented." This function does not duplicate APR parsing — it forwards to the same machinery `apr export` and `apr inspect` use. Discharges from `apr-pretrain-arch-polymorphic-v1`: - §init_load_semantics invariant (loader reuse): satisfied - FALSIFY-006 (init_loss < 6.0) at READ-COMPILE-BIND level Step 5f decomposition: - 5f.1 (PR #1479): encoder/decoder family validator (~30 LOC) - 5f.2 (this PR): APR file open + tensor read (~30 LOC + 2 tests) - 5f.3 (next): populate trainer parameters from BTreeMap (~50 LOC) - 5g (operator): LIVE 500-step fine-tune → DISCHARGES MODEL-2 ship-% Step 5f.2 is intentionally narrow — it only does the READ. Population into trainer parameter slots (5f.3) reconciles HF naming convention (e.g., `model.embed_tokens.weight`) against the trainer's internal parameter naming. That's a separate concern with its own falsifier. What this PR adds: 1. `pub fn load_init_tensors_from_apr(path) -> Result<BTreeMap<...>>` at pretrain_real.rs:35 (~25 LOC including doc comment) 2. 2 unit tests in `pretrain_real::tests`: - load_init_tensors_missing_file_errors_with_falsifier_id (FALSIFY-006 fail-fast path; asserts error message contains falsifier id + offending path for operator-experience) - load_init_tensors_signature_compile_bind (drift-prevention: catches a future signature change that would break step 5f.3's BTreeMap consumer) Test results (cargo test -p aprender-train --lib train::pretrain_real::tests::load_init_tensors): 2 passed; 0 failed; 0 ignored Five Whys: 1. Why decompose step 5f.2 to JUST the read? Single-piece flow. Read → Validate → Populate are three distinct concerns. Step 5f.1 did validation (#1479); 5f.2 does read; 5f.3 will do populate. Each PR has one falsifier discharge story. 2. Why use load_model_tensors and not write a new parser? The contract pins "Loader is reused, not reimplemented." Writing a new parser would create a parallel format-decoder that drifts from the canonical one. The same lesson as the LAYOUT-001/002 hits — parallel format code paths produce silent format-drift bugs. 3. Why return BTreeMap<String, (Vec<f32>, Vec<usize>)> rather than a trainer-parameter-shaped struct? Decoupling: the read shouldn't know about TransformerTrainer's internal parameter names. Step 5f.3's job is to map HF names → trainer slots; if 5f.2 baked that mapping in, every change to TransformerTrainer would break the read. 4. Why include the signature-compile-bind test? It's a compile-time check that drives step 5f.3's expectations. If a future refactor changes the return type (e.g., from BTreeMap to HashMap, or from Vec<usize> to Box<[usize]>), step 5f.3's consumer code stops compiling — caught here, not at the integration point. 5. Why is FALSIFY-006 NOT yet at PARTIAL_ALGORITHM_LEVEL after this PR? Because step 5f.2 only does the read; FALSIFY-006 requires the LIVE init_loss < 6.0 check, which needs steps 5f.3 + 5g. This PR moves FALSIFY-006 from UNBOUND → READ-COMPILE-BIND, a sub-level of PARTIAL_ALGORITHM_LEVEL. Full PARTIAL discharge happens at 5f.3 when the populate step exists. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Refs: - SPEC-SHIP-TWO-001 §50, §51 — MODEL-2 architecture-coupling + cascade snapshot (PR #1472, #1480 MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (MERGED) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (MERGED) - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED) - PR #1479 — validate_pretrain_init_arch_compatible (in flight) - feedback_no_guessing.md - feedback_falsifier_first_cascade_pattern.md (this turn's pattern) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…step 5f.1 Adds `pretrain_real::validate_pretrain_init_arch_compatible(cfg)` that fail-fast rejects an init `TransformerConfig` whose architecture family is incompatible with the decoder-only pretrain trainer. Discharges from `apr-pretrain-arch-polymorphic-v1` (PR #1473): - FALSIFY-APR-PRETRAIN-ARCH-007 — wrong-arch APR (e.g., CodeBERT/ RoBERTa encoder model) is FAIL-FAST not silent-truncate Why this matters: §49 wires `--init <PATH>` to load weights from any APR file. Without this gate, an operator who points --init at e.g. microsoft/codebert-base.apr would silently load encoder weights into a decoder-shaped trainer, producing nonsense gradients that the divergence guard catches LATE (multiple epochs in). This gate catches the family mismatch BEFORE any trainer allocation. Step 5f decomposition: this is step 5f.1 — the arch-family gate. Step 5f.2 (~80 LOC, follow-up) does the actual weight materialization into optimizer state. Splitting keeps each PR small + reviewable. What this PR adds: 1. `pub fn validate_pretrain_init_arch_compatible(cfg: &TransformerConfig) -> Result<(), String>` (~30 LOC including doc comment) at pretrain_real.rs:35 2. 3 unit tests in `pretrain_real::tests`: - validate_pretrain_init_arch_accepts_decoder (FALSIFY-007 negative) - validate_pretrain_init_arch_rejects_encoder (FALSIFY-007 positive, load-bearing) - validate_pretrain_init_arch_accepts_llama370m_baseline (drift-prevention, catches over-rejection regression) The encoder-rejection test asserts FOUR string contents in the error: - "FALSIFY-APR-PRETRAIN-ARCH-007" — falsifier id (auditability) - "Encoder" — names the architecture family - "decoder-only" — explains why this is wrong - "RobertaModel" — names the offending hf_architecture Operator-experience parity: when the gate fires, the error tells the operator exactly what they did wrong + how the trainer differs. Test results (cargo test -p aprender-train --lib train::pretrain_real::tests::validate_pretrain_init_arch): 3 passed; 0 failed; 0 ignored Five Whys: 1. Why a separate function rather than baking the check into build_transformer_config? Decoupling: build_transformer_config is a pure pass-through dispatch; adding arch validation would conflate "which config?" with "is this config valid?". Two functions, two concerns, two test surfaces. 2. Why focus this PR on JUST the arch-family check (step 5f.1) and not the full weight materialization (step 5f)? Single-piece flow. Step 5f's full scope (~120 LOC) splits naturally into 5f.1 (this PR, ~30 LOC + 3 tests) + 5f.2 (~80 LOC, the actual weight load). Each PR has its own falsifier discharge; CI catches regressions between them. 3. Why FOUR string assertions in the encoder-rejection error? Each piece of the error text serves a distinct operator need: - falsifier id → audit (which contract did this fail?) - architecture family → what (encoder vs decoder) - "decoder-only" → why (the trainer is decoder-only) - hf_architecture → which model (RobertaModel/CodeBERT/...) Lossy error messages erode operator trust; the contract pins all four to prevent message rot. 4. Why include the Llama370M baseline drift-prevention test? §24's retrospective showed silent over-rejection (every input rejected, even valid ones) is the symmetric defect to silent under-rejection (every input accepted, even invalid ones). The 3 tests cover both halves of the dispatch. 5. Why is FALSIFY-006 (init_loss < 6.0) NOT yet discharged? That requires the actual weight materialization (step 5f.2) PLUS a LIVE training run (step 5g). Step 5f.1 is just the gate; the load-bearing init_loss measurement is downstream. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Refs: - SPEC-SHIP-TWO-001 §50 — MODEL-2 architecture-coupling (PR #1472, MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (in flight) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (in flight) - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED) - feedback_no_guessing.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…step 5f.1 (#1479) Adds `pretrain_real::validate_pretrain_init_arch_compatible(cfg)` that fail-fast rejects an init `TransformerConfig` whose architecture family is incompatible with the decoder-only pretrain trainer. Discharges from `apr-pretrain-arch-polymorphic-v1` (PR #1473): - FALSIFY-APR-PRETRAIN-ARCH-007 — wrong-arch APR (e.g., CodeBERT/ RoBERTa encoder model) is FAIL-FAST not silent-truncate Why this matters: §49 wires `--init <PATH>` to load weights from any APR file. Without this gate, an operator who points --init at e.g. microsoft/codebert-base.apr would silently load encoder weights into a decoder-shaped trainer, producing nonsense gradients that the divergence guard catches LATE (multiple epochs in). This gate catches the family mismatch BEFORE any trainer allocation. Step 5f decomposition: this is step 5f.1 — the arch-family gate. Step 5f.2 (~80 LOC, follow-up) does the actual weight materialization into optimizer state. Splitting keeps each PR small + reviewable. What this PR adds: 1. `pub fn validate_pretrain_init_arch_compatible(cfg: &TransformerConfig) -> Result<(), String>` (~30 LOC including doc comment) at pretrain_real.rs:35 2. 3 unit tests in `pretrain_real::tests`: - validate_pretrain_init_arch_accepts_decoder (FALSIFY-007 negative) - validate_pretrain_init_arch_rejects_encoder (FALSIFY-007 positive, load-bearing) - validate_pretrain_init_arch_accepts_llama370m_baseline (drift-prevention, catches over-rejection regression) The encoder-rejection test asserts FOUR string contents in the error: - "FALSIFY-APR-PRETRAIN-ARCH-007" — falsifier id (auditability) - "Encoder" — names the architecture family - "decoder-only" — explains why this is wrong - "RobertaModel" — names the offending hf_architecture Operator-experience parity: when the gate fires, the error tells the operator exactly what they did wrong + how the trainer differs. Test results (cargo test -p aprender-train --lib train::pretrain_real::tests::validate_pretrain_init_arch): 3 passed; 0 failed; 0 ignored Five Whys: 1. Why a separate function rather than baking the check into build_transformer_config? Decoupling: build_transformer_config is a pure pass-through dispatch; adding arch validation would conflate "which config?" with "is this config valid?". Two functions, two concerns, two test surfaces. 2. Why focus this PR on JUST the arch-family check (step 5f.1) and not the full weight materialization (step 5f)? Single-piece flow. Step 5f's full scope (~120 LOC) splits naturally into 5f.1 (this PR, ~30 LOC + 3 tests) + 5f.2 (~80 LOC, the actual weight load). Each PR has its own falsifier discharge; CI catches regressions between them. 3. Why FOUR string assertions in the encoder-rejection error? Each piece of the error text serves a distinct operator need: - falsifier id → audit (which contract did this fail?) - architecture family → what (encoder vs decoder) - "decoder-only" → why (the trainer is decoder-only) - hf_architecture → which model (RobertaModel/CodeBERT/...) Lossy error messages erode operator trust; the contract pins all four to prevent message rot. 4. Why include the Llama370M baseline drift-prevention test? §24's retrospective showed silent over-rejection (every input rejected, even valid ones) is the symmetric defect to silent under-rejection (every input accepted, even invalid ones). The 3 tests cover both halves of the dispatch. 5. Why is FALSIFY-006 (init_loss < 6.0) NOT yet discharged? That requires the actual weight materialization (step 5f.2) PLUS a LIVE training run (step 5g). Step 5f.1 is just the gate; the load-bearing init_loss measurement is downstream. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38) Refs: - SPEC-SHIP-TWO-001 §50 — MODEL-2 architecture-coupling (PR #1472, MERGED) - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight) - PR #1474 — qwen2_0_5b tie_word_embeddings fix (in flight) - PR #1475 — build_transformer_config polymorphic dispatch (in flight) - PR #1476 — preflight_tokenizer_vocab_matches_target (in flight) - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED) - feedback_no_guessing.md Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…PARTIAL_ALGORITHM_LEVEL — §50.4 cascade snapshot (#1482) ## Summary Bump `apr-pretrain-arch-polymorphic-v1` contract status from PROPOSED to PARTIAL_ALGORITHM_LEVEL. All 8 FALSIFY-APR-PRETRAIN-ARCH-* falsifiers are now bound to executable tests across the §50.4 cascade. ## Falsifier scoreboard (post-§51 snapshot) | ID | Rule | PR | Status | |------------|-----------------------------------------------|-------------------|-----------------------| | FALSIFY-001 | qwen2_0_5b matches HF config | #1474 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-002 | build_transformer_config(None) → Llama370M | #1475 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-003 | build_transformer_config(Some) extracts 10 | #1475 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-004 | GQA-7:1 forward-pass smoke | #1478 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-005 | Qwen tokenizer passes with --init Qwen | #1476 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-006 | Qwen tokenizer fails without --init | #1476 merged | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-007 | encoder-arch APR fail-fast | #1479 open (auto-merge armed) | PARTIAL_ALGORITHM_LEVEL | | FALSIFY-008 | contract self-validates via pv | this PR (validates clean) | PARTIAL_ALGORITHM_LEVEL | ## Test plan - [x] pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml exits 0 - [x] All 8 falsifiers cite a concrete test path or PR - [x] Changelog entry under metadata.changelog with version/date/change ## Why now Per `feedback_falsifier_first_cascade_pattern.md`: when a saturated auto-merge queue (≥4 PRs) blocks more impl PRs, switch to non-conflict work. This contract bump: - touches only one YAML file (no Rust/test source) - cannot conflict with #1479 / #1481 (impl PRs) - audit-trails the cascade scoreboard Promotion to FUNCTIONAL is gated on #1479 landing (FALSIFY-007 PASS). Promotion to DISCHARGED is gated on §50.4 step 5g LIVE empirical run. ## Five Whys 1. Why bump status now? — 7/8 falsifiers bound on main + 8th bound on open PR; PROPOSED is stale. 2. Why not wait for #1479 land first? — §51 snapshot recorded "7/8 PARTIAL bound" 2 hours ago; the 8th binding is the contract-self validation, which is met by THIS PR's `pv validate` output. 3. Why not bundle with #1479? — Different file, different review scope, different concern (status semantics vs. impl). 4. Why not skip the bump? — Operator-facing scoreboard is in the YAML; stale PROPOSED implies "not yet started" which contradicts §51. 5. Why YAML changelog instead of just version? — Changelog records THIS bump's reasoning so future operators don't re-derive it from git log. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ION-COMPLETE; contract v1.1.0 → v1.2.0 FUNCTIONAL (#1495) §50.4 cascade INTEGRATION-COMPLETE on main with PR #1494 merging at 2026-05-05T01:48:14Z. The `apr pretrain --init <PATH>` flow is now end-to-end functional on CPU; the legacy "not yet wired" Err is RETIRED; step 5g LIVE is the only remaining gate before MODEL-2 ship-% can move from 57% → ≥58%. Spec amendment §53: - Updated falsifier scoreboard: 6/8 INTEGRATION (001/002/003/005/006/007 via live CLI dispatch); 2/8 PARTIAL_ALGORITHM_LEVEL (004 forward-pass smoke + 008 contract validation are inherently algorithm-level). - Step roadmap: 5a-5f.4 ✅ MERGED; 5f.5 (CUDA wireup) NOT YET STARTED; 5g (LIVE 500-step fine-tune) operator-dispatchable on RTX 4090. - Cascade ships statistics: 11 PRs over 2 days (#1471/#1472/#1473/#1474/#1475/#1476/#1478/#1479/#1481/#1482/#1483/#1486/#1494). - MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57% (gated on 5g empirical val_loss < 9.38 evidence). - 3 CI andon classes documented as feedback memories during cascade (workspace-test missing-binary, trueno SIGSEGV-on-cleanup, auto-merge behind-state). Contract apr-pretrain-arch-polymorphic-v1 v1.1.0 → v1.2.0 FUNCTIONAL: - All 8 falsifiers PASS on main; 6/8 reach INTEGRATION via the user-facing `apr pretrain --init` flow. - verification_summary updated: tested 7 → 8; status partial → functional. - Added §52 + §53 references. - Promotion to DISCHARGED still requires §50.4 step 5g LIVE empirical 500-step fine-tune on canonical Qwen2.5-Coder-0.5B-Instruct.apr producing val_loss < 9.38. `pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml` exits 0. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1494 merge commit 9afca16 Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…requisites + live preflight smoke (#1496) §53 closed with "step 5g LIVE remains" framing 5g as a single operator dispatch. Live source inspection of the post-#1494 binary plus an actual smoke run revealed step 5g has multi-step prerequisites that were NOT enumerated in §50's original 8-step decomposition. Live empirical smoke on canonical inputs: apr pretrain --init <Qwen2.5-Coder-0.5B-Instruct-fp16.apr> --tokenizer <legacy 50257-vocab dir> --dataset <legacy codeparrot shards> → CORRECT FAIL-FAST: GATE-ARCH-370M-011 (INV-ARCH-370M-006) violated: tokenizer vocab_size (50257) != model vocab_size (151936) This is the FIRST end-to-end runtime evidence that the §50.4 cascade's polymorphic preflight (PR #1476 + #1494) works in the user-facing CLI: - Read --init APR metadata: vocab=151936, hidden=896, layers=24 - target_vocab = init_arch.vocab_size = 151936 (NOT legacy 50257) - Tokenizer dir vocab.json count = 50257 - Mismatch → fail-fast before trainer allocation But the smoke also surfaces 5g's true scope. A Qwen-vocab tokenizer dir + Qwen-tokenized corpus must exist BEFORE the preflight passes. Neither exists on this host today. Step 5g re-scoped: 5g.0 — Qwen tokenizer extraction (~50 LOC, ~5min wall) [next PR] 5g.1 — Qwen-tokenized corpus (0 LOC, ~10hr wall, operator-dispatch) 5g.2 — LIVE 500-step fine-tune (0 LOC, ~20-60min, operator-dispatch) 5g.3 — val_loss < 9.38 verdict; flip MODEL-2 ship % 57% → ≥58% Methodology takeaway: top-down spec planning consistently underestimates scope-coupling between heterogeneous code paths. This is the third instance of the same lesson: - §50 found §49's "0 LOC" was 8-step (architectural coupling) - §52 found §50's "5f weight load" was 2-step (CLI dispatch coupling) - §54 found §53's "5g LIVE" is 4-step (tokenizer-format coupling) Falsifier scoreboard impact: - FALSIFY-APR-PRETRAIN-ARCH-005/006 reach LIVE-INTEGRATION level (proven via real CLI dispatch, not just unit tests) - Contract `apr-pretrain-arch-polymorphic-v1` v1.2.0 FUNCTIONAL is reinforced; promotion to DISCHARGED waits for 5g.3 val_loss measurement Net effects: - Spec v2.98.0 → v2.99.0 - MODEL-1 ship % unchanged at 91% - MODEL-2 ship % unchanged at 57% (gated on 5g.3) - Coverage tally: snapshot, no contract status flip Refs: SPEC-SHIP-TWO-001 §50.4 step 5g, PR #1476 + #1494, evidence/section-54-5g-prereqs-2026-05-05/preflight-fail-fast-smoke.md Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…-005/006 test-reference drift (#1505) Same drift class as PR #1504 caught in apr-pretrain-from-init-v1. Test names cited in v1.1.0 changelog never matched the actual tests PR #1476 authored. Drift survived three intervening bumps (v1.1→v1.2→v1.3→v1.4) because each focused on adding new falsifiers, not auditing existing bindings. ## Drift inventory | Falsifier | v1.4.0 cited test | Exists? | Actual test | |---|---|---|---| | FALSIFY-005 | preflight_qwen_vocab_passes_with_qwen_init | ❌ | preflight_qwen_vocab_passes_with_qwen_target | | FALSIFY-006 | preflight_qwen_vocab_fails_without_init | ❌ | preflight_qwen_vocab_fails_with_llama_target | ## Resolution Update the `test:` field for FALSIFY-005 and FALSIFY-006 to reference the actual tests authored by PR #1476. No falsifier semantics change. No new tests added. ## Verification $ cargo test -p apr-cli --lib -- commands::pretrain::tests::preflight_qwen_vocab_passes_with_qwen_target test result: ok. 1 passed; ... $ cargo test -p apr-cli --lib -- commands::pretrain::tests::preflight_qwen_vocab_fails_with_llama_target test result: ok. 1 passed; ... $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml 0 error(s), 0 warning(s) ## Five Whys 1. Why did the drift survive 3 bumps? Each bump (v1.2/v1.3/v1.4) focused on ADDING new content (CUDA-001, relaxed bound, etc.); none audited existing bindings. 2. Why didn't the §50.4 cascade catch this? The cascade authored tests; the contract was authored separately. Names diverged at the boundary; no cross-check landed. 3. Why is this a contract-only fix (no source change)? The tests exist and pass — the IMPL is correct. Only the contract's text reference needed correction. 4. Why bump to v1.5.0 (not v1.4.1 patch)? Same logic as PR #1504: the test-binding INVARIANT (every cited test exists) was broken in v1.4.0. v1.5.0 restores it. 5. Why is this important if the impl is correct? Per feedback_no_guessing.md, contracts that cite non-existent tests are unfalsifiable — future agents reading the contract get a false signal that the falsifier is bound. PV-VER-001 lint will catch this; better to fix it than wait for the lint engine to flag. ## Net effects - Contract v1.4.0 → v1.5.0 FUNCTIONAL. - 11 falsifiers, all PASS — same count, but FALSIFY-005/006 now reference tests that actually exist. - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. This is hygiene work while 5g.1 (~12hr) corpus retokenize runs. Same defect class as PR #1504; together they close the test-reference drift across both pretrain contracts. Refs: SPEC-SHIP-TWO-001 §50.4 cascade (PRs #1473-#1494, #1502), contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.5.0, contracts/apr-pretrain-from-init-v1.yaml v1.2.0 (PR #1504, sibling fix) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…oughput characterization (#1508) §56 closed with 5g.1 full-corpus retokenization dispatched (PID 2767124, ~17hr wall projected). §57 records the parallel drift-sweep work that landed during the 5g.1 wait + throughput characterization of 5g.1 mid-run. ## Drift sweep (4 PRs) While 5g.1 ran in the background, a sweep of the §50.4 cascade contracts surfaced THE SAME drift class across multiple contracts: cited test names that didn't match what the impl PR actually authored. PR | Contract | v_old → v_new | Drift --- | --- | --- | --- #1502 | apr-pretrain-arch-polymorphic-v1 | v1.3 → v1.4 | CUDA-001 was REFERENCED in changelog but had no formal falsification_test entry #1504 | apr-pretrain-from-init-v1 | v1.1 → v1.2 | 7 of 8 cited test names didn't exist; re-aligned to existing tests #1505 | apr-pretrain-arch-polymorphic-v1 | v1.4 → v1.5 | FALSIFY-005/006 cited names diverged from PR #1476's actual authoring #1506 | apr-cli-tokenize-import-hf-v1 | v1.0 → v1.1 | FALSIFY-001 cited "or equivalent" — no real test name After PR #1506 lands, `pv lint contracts/` reports 0 PV-VER-001 errors across all 870+ contracts. The drift class is fully closed. ## 5g.1 throughput (real-time mid-run) Shard | Closed at | Δ from prev 0 | 07:08 | (start) 1 | 07:24 | 16 min 2 | 07:39 | 15 min 3 | 07:55 | 16 min ... 12 | 10:16 | (in progress) Mean wall: 16.3 min/shard. Linear projection: 57 shards × 16.3 min = 929 min = ~15.5 hr total → ETA ~22:30Z (slightly under §56's 17hr smoke estimate). ## Methodology takeaway When a contract is authored in PR_A alongside its impl, AND the impl's test names are stamped in the contract's `test:` field BEFORE the impl PR finalizes the names, the names diverge at the cascade boundary. Happened in 3 of 4 §50.4 cascade contracts. Prevention rule: when authoring a new contract that cites tests, EITHER reference tests that already exist on main, OR mark them `PENDING_PR_<N>:` with the impl PR ref so PV-VER-001 lint can flag dangling refs at contract-merge time. A future spec amendment could codify a `pv lint --strict-test-binding` enforcement that blocks contract merge when any `test:` field doesn't resolve to an existing test invocation. Out of §57 scope. ## Net effects - Spec v3.01.0 → v3.02.0. - Three contract bumps land cleanly (apr-pretrain-arch-polymorphic-v1 v1.3→v1.4→v1.5, apr-pretrain-from-init-v1 v1.1→v1.2, apr-cli-tokenize-import-hf-v1 v1.0→v1.1). - pv lint contracts/ 0 PV-VER-001 errors across 870+ contracts. - 5g.1 full corpus run progressing at 16.3 min/shard; ETA ~22:30Z. - MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57% until step 5g.3 produces val_loss < 9.38. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PRs #1502/#1504/#1505/#1506 (drift sweep), apr-cookbook spec v5.1.0 (companion update — operator-facing recipe) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 4, 2026 15:40

noahgift mentioned this pull request May 4, 2026

test(aprender-train): GQA-7:1 forward-pass smoke test — §50.4 step 5e #1478

Merged

Merge branch 'main' into feat/preflight-tokenizer-vocab-polymorphic

65a9d88

noahgift mentioned this pull request May 4, 2026

feat(aprender-train): validate_pretrain_init_arch_compatible — §50.4 step 5f.1 #1479

Merged

noahgift merged commit 2e8044a into main May 4, 2026
10 checks passed

noahgift deleted the feat/preflight-tokenizer-vocab-polymorphic branch May 4, 2026 16:46

noahgift mentioned this pull request May 4, 2026

spec(ship-two-models): v2.95.0 → v2.96.0 — §51 §50.4 cascade snapshot (7/8 falsifiers bound) #1480

Merged

noahgift mentioned this pull request May 4, 2026

feat(aprender-train): load_init_tensors_from_apr — §50.4 step 5f.2 #1481

Merged

noahgift mentioned this pull request May 4, 2026

contract(apr-pretrain-arch-polymorphic-v1): v1.0.0 → v1.1.0 PARTIAL_ALGORITHM_LEVEL #1482

Merged

3 tasks

noahgift mentioned this pull request May 4, 2026

spec(ship-two-models): v2.96.0 → v2.97.0 — §52 cascade ALGORITHM-COMPLETE + 5f.4 wireup gap #1486

Merged

4 tasks

noahgift mentioned this pull request May 5, 2026

spec(ship-two-models): v2.97 → v2.98 — §53 §50.4 cascade INTEGRATION-COMPLETE; contract v1.1 → v1.2 FUNCTIONAL #1495

Merged

4 tasks

noahgift mentioned this pull request May 5, 2026

contract(apr-pretrain-arch-polymorphic-v1): v1.4 → v1.5 — fix FALSIFY-005/006 test-reference drift #1505

Merged

5 tasks

noahgift mentioned this pull request May 11, 2026

fix(task-148): Toyota Way 500-line refactor + FALSIFY-CORPUS-004 + QLoRA + GPU training backend #1003

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(apr-cli): polymorphic preflight_tokenizer_vocab_matches_target — §50.4 step 5d#1476

feat(apr-cli): polymorphic preflight_tokenizer_vocab_matches_target — §50.4 step 5d#1476
noahgift merged 2 commits into
mainfrom
feat/preflight-tokenizer-vocab-polymorphic

noahgift commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 4, 2026

Summary

Discharges

What this PR adds

Test results

Plain ship-% update

Five Whys

Test plan

Refs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant