Skip to content

spec(ship-two-models): v2.95.0 → v2.96.0 — §51 §50.4 cascade snapshot (7/8 falsifiers bound)#1480

Merged
noahgift merged 1 commit into
mainfrom
spec/ship-two-models-v2-96-cascade-complete
May 4, 2026
Merged

spec(ship-two-models): v2.95.0 → v2.96.0 — §51 §50.4 cascade snapshot (7/8 falsifiers bound)#1480
noahgift merged 1 commit into
mainfrom
spec/ship-two-models-v2-96-cascade-complete

Conversation

@noahgift

@noahgift noahgift commented May 4, 2026

Copy link
Copy Markdown
Contributor

Same-day continuation cycle landed 8 PRs across the §50.4 architecture-polymorphic infrastructure track. §51 records the cascade-complete state and pinpoints the remaining MODEL-2 ship-% gate (step 5g LIVE). 7/8 falsifiers in apr-pretrain-arch-polymorphic-v1 are now PARTIAL_ALGORITHM_LEVEL or MERGED. Remaining: step 5f.2 (APR weight load + tensor materialization, ~80 LOC, deliberately deferred to let cascade settle), step 5g (LIVE 500-step fine-tune, operator-runnable, the load-bearing test that moves MODEL-2 ship-%), step 5h (stamp + publish). Per §47-§48 lesson: infrastructure ship != ship-% movement. Plain ship-%: MODEL-1=91%, MODEL-2=57% (unchanged; gated on step 5g). Refs: PRs #1472/#1476/#1478 MERGED, #1473/#1474/#1475/#1479 in flight.

… (7/8 falsifiers bound)

Same-day continuation cycle landed 8 PRs across the §50.4 architecture-
polymorphic infrastructure track. §51 records the cascade-complete
state and pinpoints the remaining MODEL-2 ship-% gate (step 5g LIVE).

Falsifier-discharge scoreboard for `apr-pretrain-arch-polymorphic-v1`:

  | ID | What it pins                          | PR    | Status |
  |----|---------------------------------------|-------|--------|
  | 001 | qwen2_0_5b matches HF + tie fix      | #1474 | PARTIAL |
  | 002 | init=None preserves Llama370M        | #1475 | PARTIAL |
  | 003 | init=Some pass-through               | #1475 | PARTIAL |
  | 004 | GQA-7:1 forward smoke                | #1478 | MERGED  |
  | 005 | Qwen tokenizer + Qwen target = pass  | #1476 | MERGED  |
  | 006 | Qwen tokenizer + Llama target = fail | #1476 | MERGED  |
  | 007 | encoder/decoder family mismatch      | #1479 | PARTIAL |
  | 008 | pv validate                          | #1473 | PARTIAL |

7 of 8 falsifiers PARTIAL_ALGORITHM_LEVEL or MERGED.

Remaining work:
  - 5f.2 — wire APR file open + tensor materialization (~80 LOC)
           DELIBERATELY DEFERRED this cycle; doing 5f.2 now means
           rebasing onto 4 in-flight PRs as they land
  - 5g  — LIVE 500-step smoke fine-tune (operator dispatch)
          THE LOAD-BEARING TEST that moves MODEL-2 ship-%
  - 5h  — stamp + publish

Per §47-§48 lesson: "infrastructure shipped ≠ ship-% movement."
Cascade-complete state means the polymorphic foundation is in place;
ship-% movement still requires the LIVE empirical check.

Five Whys:
  1. Why a snapshot now? Multiple PRs in cascade auto-merge create
     cognitive load. A spec snapshot captures both the achievement
     (7 falsifiers bound) and the remaining gate (step 5g LIVE).
     Without it, future operators waste cycles re-deriving the state.
  2. Why focus on falsifier scoreboard rather than total LOC? Falsifier
     discharge is the actual contract obligation. 7/8 invariants pinned
     means CI now catches regressions in the polymorphic-init path.
  3. Why mention 5f.2 explicitly as deliberately deferred? Naming the
     deferral makes it not a punt. Step 5f.2 has a clear "when": after
     the 4 in-flight PRs cascade-merge, then 5f.2 lands clean.
  4. Why call out infrastructure ≠ ship-%? The §47-§48 cascade taught
     the same lesson — "11 SHIP-007 cascade PRs landed but no ship-%
     movement." Operator-facing ship-% is the LIVE check.
  5. Why is FALSIFY-006 LIVE the load-bearing claim? init_loss(step=0)
     ≤ 6.0 vs from_scratch_loss(step=0) ≥ 9.5 proves end-to-end
     correctness in one number. No other falsifier can substitute.

Plain ship-% update:
  - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track)
  - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4
    step 5g (LIVE 500-step fine-tune producing val_loss < 9.38)

Spec amendment cadence: §41 → §42 → §43 → §44 → §45 → §46 → §47 →
§48 → §49 → §50 → §51. Eleven amendments since 2026-05-03. Same-day
spec hygiene rather than letting the cascade-complete state remain
implicit.

Refs:
  - SPEC-SHIP-TWO-001 §50 — architecture-coupling finding (PR #1472, MERGED)
  - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight)
  - PR #1474 — qwen2_0_5b tie_word_embeddings fix (in flight)
  - PR #1475 — build_transformer_config polymorphic dispatch (in flight)
  - PR #1476 — preflight_tokenizer_vocab_matches_target (MERGED)
  - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED)
  - PR #1479 — validate_pretrain_init_arch_compatible (in flight)
  - feedback_no_guessing.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 4, 2026 16:51
@noahgift noahgift merged commit 27e530d into main May 4, 2026
11 checks passed
@noahgift noahgift deleted the spec/ship-two-models-v2-96-cascade-complete branch May 4, 2026 17:17
noahgift added a commit that referenced this pull request May 4, 2026
…1481)

Adds the read-half of `apr pretrain --init` weight load: a thin
wrapper over `aprender::format::converter::load_model_tensors` that
returns a `BTreeMap<String, (Vec<f32>, Vec<usize>)>` of tensor blobs
keyed by HF naming convention.

Per `apr-pretrain-arch-polymorphic-v1` §init_load_semantics (PR #1473):
"Loader is REUSED, not reimplemented." This function does not duplicate
APR parsing — it forwards to the same machinery `apr export` and
`apr inspect` use.

Discharges from `apr-pretrain-arch-polymorphic-v1`:
  - §init_load_semantics invariant (loader reuse): satisfied
  - FALSIFY-006 (init_loss < 6.0) at READ-COMPILE-BIND level

Step 5f decomposition:
  - 5f.1 (PR #1479): encoder/decoder family validator (~30 LOC)
  - 5f.2 (this PR): APR file open + tensor read (~30 LOC + 2 tests)
  - 5f.3 (next):    populate trainer parameters from BTreeMap (~50 LOC)
  - 5g  (operator): LIVE 500-step fine-tune → DISCHARGES MODEL-2 ship-%

Step 5f.2 is intentionally narrow — it only does the READ. Population
into trainer parameter slots (5f.3) reconciles HF naming convention
(e.g., `model.embed_tokens.weight`) against the trainer's internal
parameter naming. That's a separate concern with its own falsifier.

What this PR adds:

  1. `pub fn load_init_tensors_from_apr(path) -> Result<BTreeMap<...>>`
     at pretrain_real.rs:35 (~25 LOC including doc comment)
  2. 2 unit tests in `pretrain_real::tests`:
       - load_init_tensors_missing_file_errors_with_falsifier_id
         (FALSIFY-006 fail-fast path; asserts error message contains
          falsifier id + offending path for operator-experience)
       - load_init_tensors_signature_compile_bind
         (drift-prevention: catches a future signature change that
          would break step 5f.3's BTreeMap consumer)

Test results (cargo test -p aprender-train --lib train::pretrain_real::tests::load_init_tensors):
    2 passed; 0 failed; 0 ignored

Five Whys:

  1. Why decompose step 5f.2 to JUST the read? Single-piece flow.
     Read → Validate → Populate are three distinct concerns. Step 5f.1
     did validation (#1479); 5f.2 does read; 5f.3 will do populate.
     Each PR has one falsifier discharge story.

  2. Why use load_model_tensors and not write a new parser? The contract
     pins "Loader is reused, not reimplemented." Writing a new parser
     would create a parallel format-decoder that drifts from the canonical
     one. The same lesson as the LAYOUT-001/002 hits — parallel format
     code paths produce silent format-drift bugs.

  3. Why return BTreeMap<String, (Vec<f32>, Vec<usize>)> rather than a
     trainer-parameter-shaped struct? Decoupling: the read shouldn't
     know about TransformerTrainer's internal parameter names. Step
     5f.3's job is to map HF names → trainer slots; if 5f.2 baked that
     mapping in, every change to TransformerTrainer would break the read.

  4. Why include the signature-compile-bind test? It's a compile-time
     check that drives step 5f.3's expectations. If a future refactor
     changes the return type (e.g., from BTreeMap to HashMap, or from
     Vec<usize> to Box<[usize]>), step 5f.3's consumer code stops
     compiling — caught here, not at the integration point.

  5. Why is FALSIFY-006 NOT yet at PARTIAL_ALGORITHM_LEVEL after this
     PR? Because step 5f.2 only does the read; FALSIFY-006 requires
     the LIVE init_loss < 6.0 check, which needs steps 5f.3 + 5g.
     This PR moves FALSIFY-006 from UNBOUND → READ-COMPILE-BIND, a
     sub-level of PARTIAL_ALGORITHM_LEVEL. Full PARTIAL discharge
     happens at 5f.3 when the populate step exists.

Plain ship-% update:
  - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track)
  - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4
    step 5g (LIVE 500-step fine-tune producing val_loss < 9.38)

Refs:
  - SPEC-SHIP-TWO-001 §50, §51 — MODEL-2 architecture-coupling +
    cascade snapshot (PR #1472, #1480 MERGED)
  - PR #1473 — apr-pretrain-arch-polymorphic-v1 contract (in flight)
  - PR #1474 — qwen2_0_5b tie_word_embeddings fix (MERGED)
  - PR #1475 — build_transformer_config polymorphic dispatch (in flight)
  - PR #1476 — preflight_tokenizer_vocab_matches_target (MERGED)
  - PR #1478 — GQA-7:1 forward-pass smoke test (MERGED)
  - PR #1479 — validate_pretrain_init_arch_compatible (in flight)
  - feedback_no_guessing.md
  - feedback_falsifier_first_cascade_pattern.md (this turn's pattern)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant