feat(apr-cli + aprender-train + spec): §55 polymorphic preflight relaxation v1.2 → v1.3 FUNCTIONAL by noahgift · Pull Request #1500 · paiml/aprender

noahgift · 2026-05-05T03:59:14Z

Summary

Implements §55 per spec §54 (PR #1496) finding. Refines the polymorphic-path preflight semantic from strict equality to relaxed bound (`tokenizer_vocab ≤ model_vocab`) when `--init` is set, while preserving strict equality for the from-scratch path. Bumps contract v1.2.0 → v1.3.0 FUNCTIONAL. Unblocks 5g.1 (corpus retokenize with Qwen vocab).

Why

§54 LIVE smoke + 5g.0 (PR #1497) chain surfaced that public Qwen2.5-Coder-0.5B-Instruct's `tokenizer.json` materializes 151643 BPE entries + 22 added = 151665 effective, but `config.json` declares vocab_size = 151936. The 271-slot gap is reserved/special slots — the lm_head + embedding layers have weights for IDs 151665..151935, but no tokenizer string maps to them. This is the canonical HF reserved-slot pattern across Qwen2.5/Llama2/Mistral.

Strict equality preflight was correct for §24/§25 from-scratch (where tokenizer is trained to exactly match) but wrong for HF-distributed pretrained checkpoints.

What ships

Artifact	Change
`aprender-train/src/models/llama_370m.rs`	New helper `assert_tokenizer_vocab_within_model_bound`
`apr-cli/src/commands/pretrain.rs`	`preflight_tokenizer_vocab_matches_target` takes `init_is_some: bool`; `drive_real` routes through it
Contract v1.2.0 → v1.3.0 FUNCTIONAL	refined `qwen_tokenizer_vocab_compatibility` formula + invariants
FALSIFY-APR-PRETRAIN-ARCH-009	NEW: relaxed bound accepts HF reserved slots
FALSIFY-APR-PRETRAIN-ARCH-010	NEW: relaxed bound rejects oversized (OOB safety)
4 new tests (2 helper + 2 integration)	All PASS
`evidence/section-55-relaxed-preflight-2026-05-05/`	LIVE smoke evidence
`docs/specifications/aprender-train/ship-two-models-spec.md`	§55 amendment

LIVE smoke

```
$ timeout 30 apr pretrain \
--init <Qwen2.5-Coder-0.5B-Instruct-fp16.apr> \
--tokenizer /tmp/qwen-0.5b-tokenizer-extracted \
--mode finetune --num-steps 1 --device cpu --vocab-size 151936
exit=124 # timeout, AFTER preflight passed

No GATE-ARCH-370M-011 violations in output

```

Process proceeded past preflight to weight load (timeout fired at 30s mid-load). FALSIFY-APR-PRETRAIN-ARCH-009 LIVE-INTEGRATION DISCHARGE.

Five Whys

Why did §54 not catch this? §54 used legacy 50257-vocab tokenizer (not §54's own extracted Qwen tokenizer); within-Qwen mismatch only surfaces after 5g.0 lands.
Why bound is ≤ not ==? HF checkpoints standardly declare vocab > tokenizer materialized; strict-equality would fail every Qwen/Llama2/Mistral.
Why preserve strict equality on from-scratch? §24/§25 evidence was under strict; weakening retroactively could mask future from-scratch tokenizer-drift bugs.
Why new helper rather than mode param on existing? External callers (`training-loop-pretrain-v1` contract, `llama-370m-sovereign-v1`) explicitly want strict; mode param would be backward-incompatible.
Why pin both FALSIFY-009 (accept) and FALSIFY-010 (reject)? Bound is ≤, not <. Without FALSIFY-010, a regression to `tokenizer_vocab > model_vocab` would silently restore N-09 OOB risk.

Test plan

`pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml` exits 0
PMAT pre-commit quality gates pass
3/3 aprender-train llama_370m helper tests pass (existing strict + 2 new relaxed)
7/7 apr-cli preflight tests pass (5 existing + 2 new integration)
LIVE smoke: relaxed preflight passes on §54-extracted Qwen tokenizer
CI gate green (workspace-test, ci/gate)
Auto-merge fires on green CI

🤖 Generated with Claude Code

…xation §54 LIVE smoke surfaced that public Qwen2.5-Coder-0.5B-Instruct's tokenizer.json materializes 151643 BPE entries + 22 added = 151665 effective strings, but config.json declares vocab_size=151936 (271 reserved/special slots not in tokenizer.json). Strict equality preflight was correct for §24/§25 from-scratch but too strict for HF-distributed pretrained checkpoints with reserved slots. ## Relaxation init=Some → tokenizer_vocab ≤ model_vocab (RELAXED, admits HF reserved slots) init=None → tokenizer_vocab == model_vocab (UNCHANGED, §24/§25 baseline) Safety: tokenizer-emitted ids ∈ [0, tokenizer_vocab) ⊆ [0, model_vocab). Reserved high-id slots are never indexed at training time. N-09 OOB escape impossible. Symmetric guard: tokenizer_vocab > model_vocab MUST FAIL even under init=Some (FALSIFY-APR-PRETRAIN-ARCH-010 — bound is ≤, not <). ## What ships Helper: assert_tokenizer_vocab_within_model_bound (aprender-train) symmetric to assert_tokenizer_vocab_matches_model Wireup: preflight_tokenizer_vocab_matches_target now takes init_is_some: bool; drive_real passes init_arch.is_some() to route to relaxed/strict. Contract: apr-pretrain-arch-polymorphic-v1 v1.2.0 → v1.3.0 FUNCTIONAL qwen_tokenizer_vocab_compatibility refined formula + invariants +FALSIFY-009 (relaxed accept) +FALSIFY-010 (oversize reject — OOB safety) total: 10 falsifiers, all PASS Tests (4 new, all PASS): falsify_apr_pretrain_arch_009_relaxed_bound_accepts_qwen_reserved_slots falsify_apr_pretrain_arch_010_relaxed_bound_rejects_oversized_tokenizer preflight_qwen_reserved_slots_pass_under_polymorphic_init preflight_oversized_tokenizer_rejected_even_under_polymorphic_init Spec amendment §55 + LIVE smoke evidence. ## LIVE smoke (this branch's apr binary + §54 extracted Qwen tokenizer) timeout 30 apr pretrain --init <Qwen.apr> --tokenizer <extracted-dir> ... → exit=124 (timeout), AFTER preflight passed → Configuration printed + Device: cpu + (proceeded to weight load) → No GATE-ARCH-370M-011 violations Evidence: evidence/section-55-relaxed-preflight-2026-05-05/relaxed-preflight-passes-smoke.md ## Five Whys 1. Why did §54 not catch this? §54 used legacy 50257 tokenizer (not §54's own extracted Qwen tokenizer); within-Qwen mismatch only surfaces after 5g.0 lands. 2. Why bound is ≤ not ==? HF checkpoints standardly declare vocab > tokenizer materialized; strict-equality would fail every Qwen/Llama2/Mistral. 3. Why preserve strict equality on from-scratch? §24/§25 evidence was gathered under strict; weakening retroactively could mask future from-scratch tokenizer-drift bugs. 4. Why new helper rather than mode parameter on existing? External callers (training-loop-pretrain-v1 contract) explicitly want strict; mode param would be backward-incompatible. 5. Why pin both FALSIFY-009 + FALSIFY-010? Bound is ≤, not <. Without FALSIFY-010, a regression to tokenizer_vocab > model_vocab would silently restore N-09 OOB risk. ## Net effects - Spec v2.99.0 → v3.00.0 (cascade pivots from infrastructure to LIVE prerequisites). - Contract apr-pretrain-arch-polymorphic-v1 v1.2.0 → v1.3.0 FUNCTIONAL. - 5g.0.1 lands as single PR; 5g.1 unblocked. - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3 val_loss < 9.38. - Coverage tally: +2 PARTIAL_ALGORITHM_LEVEL falsifiers + LIVE-INTEGRATION reinforcement of FALSIFY-005/009. Refs: SPEC-SHIP-TWO-001 §54 (PR #1496) for the gap finding, §55 (this PR) for the resolution. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…tion

…ted; full run is ~17hr operator-dispatch §55 (in-flight PR #1500) closes the polymorphic preflight strictness gap and unblocks 5g.1 dispatch. §56 records the LIVE smoke that validates 5g.1's correctness end-to-end before committing to the multi-hour full run. apr tokenize encode-corpus \ --corpus <python-permissive-5k.jsonl> \ --tokenizer /tmp/qwen-0.5b-tokenizer-extracted \ --output <smoke-shards> --shard-tokens 1000000 → 13 valid u32 shards (12 full × ~1M + 1 partial = ~13M tokens for 5000 docs) → ~110 sec / M-token single-thread → No errors; shard rotation correct → Killed before manifest.json write (sufficient evidence accumulated) Legacy 50257-vocab: ~64 sec / M-token → 9.99 hr for 565M (validated) Qwen 151643-vocab: ~110 sec / M-token → ~17 hr for 565M (projected) Qwen is ~70% slower per-token because the BPE merge table is 3× larger (151387 vs 49997 merges); per-character merge-table search dominates encoding cost. Below the 48hr feedback_compute_pre_authorized.md ceiling, so 5g.1 full run is pre-authorized. 5g.0 ✅ MERGED PR #1497 (apr tokenize import-hf) 5g.0.1 in-flight PR #1500 (§55 polymorphic preflight relaxation) 5g.1 CORRECTNESS-VALIDATED (this PR), full run pending operator 5g.2 gated on 5g.1 full run 5g.3 gated on 5g.2 (val_loss < 9.38 verdict) 1. Why smoke before full run? ~17hr non-trivial; smoke proves chain correctness before committing to long wall. 2. Why 5000 docs? Smallest slice that exercises shard rotation (12M tokens > 10 shards). 3. Why kill smoke instead of complete? 13 shards = sufficient evidence; finishing wouldn't add information. 4. Why Qwen 70% slower? BPE merge-table size dominates encoding cost. 5. Why not parallelize? Out of 5g.1 scope; single-thread wall is below 48hr ceiling; ROI negative for current cycle. - Spec v3.00.0 → v3.01.0 (assumes §55 lands first; safe either way — §56 has no code/contract changes). - 5g.1 reaches CORRECTNESS-VALIDATED state. - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. Refs: SPEC-SHIP-TWO-001 §54 (PR #1496), §55 (PR #1500 in-flight), §56 (this PR), evidence/section-56-5g-1-smoke-2026-05-05/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ted; full run is ~17hr operator-dispatch (#1501) §55 (in-flight PR #1500) closes the polymorphic preflight strictness gap and unblocks 5g.1 dispatch. §56 records the LIVE smoke that validates 5g.1's correctness end-to-end before committing to the multi-hour full run. apr tokenize encode-corpus \ --corpus <python-permissive-5k.jsonl> \ --tokenizer /tmp/qwen-0.5b-tokenizer-extracted \ --output <smoke-shards> --shard-tokens 1000000 → 13 valid u32 shards (12 full × ~1M + 1 partial = ~13M tokens for 5000 docs) → ~110 sec / M-token single-thread → No errors; shard rotation correct → Killed before manifest.json write (sufficient evidence accumulated) Legacy 50257-vocab: ~64 sec / M-token → 9.99 hr for 565M (validated) Qwen 151643-vocab: ~110 sec / M-token → ~17 hr for 565M (projected) Qwen is ~70% slower per-token because the BPE merge table is 3× larger (151387 vs 49997 merges); per-character merge-table search dominates encoding cost. Below the 48hr feedback_compute_pre_authorized.md ceiling, so 5g.1 full run is pre-authorized. 5g.0 ✅ MERGED PR #1497 (apr tokenize import-hf) 5g.0.1 in-flight PR #1500 (§55 polymorphic preflight relaxation) 5g.1 CORRECTNESS-VALIDATED (this PR), full run pending operator 5g.2 gated on 5g.1 full run 5g.3 gated on 5g.2 (val_loss < 9.38 verdict) 1. Why smoke before full run? ~17hr non-trivial; smoke proves chain correctness before committing to long wall. 2. Why 5000 docs? Smallest slice that exercises shard rotation (12M tokens > 10 shards). 3. Why kill smoke instead of complete? 13 shards = sufficient evidence; finishing wouldn't add information. 4. Why Qwen 70% slower? BPE merge-table size dominates encoding cost. 5. Why not parallelize? Out of 5g.1 scope; single-thread wall is below 48hr ceiling; ROI negative for current cycle. - Spec v3.00.0 → v3.01.0 (assumes §55 lands first; safe either way — §56 has no code/contract changes). - 5g.1 reaches CORRECTNESS-VALIDATED state. - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. Refs: SPEC-SHIP-TWO-001 §54 (PR #1496), §55 (PR #1500 in-flight), §56 (this PR), evidence/section-56-5g-1-smoke-2026-05-05/ Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… (PMAT-CODE-PRETRAIN-INIT-FINETUNE-001) Adds contracts/apr-pretrain-init-finetune-v1.yaml v1.0.0 DRAFT, the falsifier scaffold for SHIP-TWO §56.4 step 5g.2 — the LIVE 500-step fine-tune dispatch that flips MODEL-2 ship % 57% → ≥58%. Pins six falsifiable invariants for `apr pretrain --mode from-init --init <Qwen.apr> --shards-dir <5g.1-corpus> --steps 500 --device cuda`: - FALSIFY-001 (ship-blocking): exit code == 0 - FALSIFY-002 (advisory): wall ≤ 3600 s on RTX 4090 - FALSIFY-003 (ship-blocking): step-0 loss ≤ 0.7 × ln(151936) ≈ 8.35 (proves init weights flow through forward) - FALSIFY-004 (ship-blocking): checkpoint.apr written with valid magic bytes (0x41 0x50 0x52 0x00 v2 OR 0x41 0x50 0x52 0x4E v1) - FALSIFY-005 (ship-blocking): val_loss after 500 steps < 9.38 (the §34 370M-from-scratch ceiling) - FALSIFY-006 (advisory): no CUDA OOM / illegal-address / launch- OoR errors during run Five-Whys (why this contract first, then live dispatch): 1. Why a contract before the dispatch? Per CLAUDE.md "Contract-first design: NEVER write code before writing a provable contract." Even though 5g.2 is "0 LOC operator-dispatch", it has shippable semantics that deserve falsification scaffolding. 2. Why these particular six gates? They cover the four orthogonal failure modes of a fine-tune-from-init dispatch: process-level (exit/wall), correctness (step-0 baseline + val_loss), and serialization (checkpoint magic bytes + GPU resource health). 3. Why DRAFT status (not PROPOSED, not ACTIVE)? DRAFT means "schema validated, falsifiers authored, but no live evidence yet." Status flips to ACTIVE_RUNTIME via §59 spec amendment after the live dispatch produces evidence. 4. Why a separate contract from apr-pretrain-from-init-v1? The sibling contract pins the in-process semantics of init loading (load_init_tensors_from_apr, populate_trainer_from_init_tensors). This new contract pins the END-TO-END dispatch outcome — they compose at the dispatch boundary. 5. Why the val_loss < 9.38 threshold (not 5.0 or 7.0)? §34's 200K- step retrain confirmed val_loss=9.38 as the 370M-from-scratch capacity ceiling on this corpus. A from-init pivot must beat from-scratch, otherwise §49's strategy reasoning is wrong. Pre-requisites VERIFIED on host (lambda-vector RTX 4090): - /mnt/nvme-raid0/models/qwen2.5-coder-0.5b-instruct-fp16.apr exists - /mnt/nvme-raid0/data/codeparrot-python-permissive-shards-qwen has 228 shards / 2.278B tokens (manifest.json reconstructed by PR #1575) - `apr pretrain --init <PATH>` end-to-end runnable per §53 (#1494 MERGED) - Polymorphic preflight per §55 (#1500 MERGED) Quality gates: - `pv validate contracts/apr-pretrain-init-finetune-v1.yaml`: 0 errors - `pv lint --strict-test-binding`: 9/9 gates PASS SHIP-TWO impact: - MODEL-1 ship %: unchanged at 91% (this is MODEL-2 prep work) - MODEL-2 ship %: unchanged at 57% (this PR is contract-only; ship-% flips on §59 amendment after live verdict) - Unblocks: §59 spec amendment recording 5g.2 dispatch result Next steps (follow-ups, NOT this PR): - LIVE dispatch on RTX 4090 (~20-60 min wall, pre-authorized per feedback_compute_pre_authorized.md) - §59 spec amendment v3.05.0 → v3.06.0 with verdict + ship-% flip - Contract status DRAFT → ACTIVE_RUNTIME Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… (PMAT-CODE-PRETRAIN-INIT-FINETUNE-001) (#1576) Adds contracts/apr-pretrain-init-finetune-v1.yaml v1.0.0 DRAFT, the falsifier scaffold for SHIP-TWO §56.4 step 5g.2 — the LIVE 500-step fine-tune dispatch that flips MODEL-2 ship % 57% → ≥58%. Pins six falsifiable invariants for `apr pretrain --mode from-init --init <Qwen.apr> --shards-dir <5g.1-corpus> --steps 500 --device cuda`: - FALSIFY-001 (ship-blocking): exit code == 0 - FALSIFY-002 (advisory): wall ≤ 3600 s on RTX 4090 - FALSIFY-003 (ship-blocking): step-0 loss ≤ 0.7 × ln(151936) ≈ 8.35 (proves init weights flow through forward) - FALSIFY-004 (ship-blocking): checkpoint.apr written with valid magic bytes (0x41 0x50 0x52 0x00 v2 OR 0x41 0x50 0x52 0x4E v1) - FALSIFY-005 (ship-blocking): val_loss after 500 steps < 9.38 (the §34 370M-from-scratch ceiling) - FALSIFY-006 (advisory): no CUDA OOM / illegal-address / launch- OoR errors during run Five-Whys (why this contract first, then live dispatch): 1. Why a contract before the dispatch? Per CLAUDE.md "Contract-first design: NEVER write code before writing a provable contract." Even though 5g.2 is "0 LOC operator-dispatch", it has shippable semantics that deserve falsification scaffolding. 2. Why these particular six gates? They cover the four orthogonal failure modes of a fine-tune-from-init dispatch: process-level (exit/wall), correctness (step-0 baseline + val_loss), and serialization (checkpoint magic bytes + GPU resource health). 3. Why DRAFT status (not PROPOSED, not ACTIVE)? DRAFT means "schema validated, falsifiers authored, but no live evidence yet." Status flips to ACTIVE_RUNTIME via §59 spec amendment after the live dispatch produces evidence. 4. Why a separate contract from apr-pretrain-from-init-v1? The sibling contract pins the in-process semantics of init loading (load_init_tensors_from_apr, populate_trainer_from_init_tensors). This new contract pins the END-TO-END dispatch outcome — they compose at the dispatch boundary. 5. Why the val_loss < 9.38 threshold (not 5.0 or 7.0)? §34's 200K- step retrain confirmed val_loss=9.38 as the 370M-from-scratch capacity ceiling on this corpus. A from-init pivot must beat from-scratch, otherwise §49's strategy reasoning is wrong. Pre-requisites VERIFIED on host (lambda-vector RTX 4090): - /mnt/nvme-raid0/models/qwen2.5-coder-0.5b-instruct-fp16.apr exists - /mnt/nvme-raid0/data/codeparrot-python-permissive-shards-qwen has 228 shards / 2.278B tokens (manifest.json reconstructed by PR #1575) - `apr pretrain --init <PATH>` end-to-end runnable per §53 (#1494 MERGED) - Polymorphic preflight per §55 (#1500 MERGED) Quality gates: - `pv validate contracts/apr-pretrain-init-finetune-v1.yaml`: 0 errors - `pv lint --strict-test-binding`: 9/9 gates PASS SHIP-TWO impact: - MODEL-1 ship %: unchanged at 91% (this is MODEL-2 prep work) - MODEL-2 ship %: unchanged at 57% (this PR is contract-only; ship-% flips on §59 amendment after live verdict) - Unblocks: §59 spec amendment recording 5g.2 dispatch result Next steps (follow-ups, NOT this PR): - LIVE dispatch on RTX 4090 (~20-60 min wall, pre-authorized per feedback_compute_pre_authorized.md) - §59 spec amendment v3.05.0 → v3.06.0 with verdict + ship-% flip - Contract status DRAFT → ACTIVE_RUNTIME Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 5, 2026 03:59

noahgift added 2 commits May 5, 2026 06:25

Merge branch 'main' into feat/section-55-polymorphic-preflight-relaxa…

8fdef09

…tion

Merge branch 'main' into feat/section-55-polymorphic-preflight-relaxa…

31530e8

…tion

noahgift mentioned this pull request May 5, 2026

spec(ship-two-models): v3.00 → v3.01 — §56 5g.1 LIVE smoke validated; ~17hr full run operator-dispatch #1501

Merged

6 tasks

noahgift merged commit d4beb7d into main May 5, 2026
10 checks passed

noahgift deleted the feat/section-55-polymorphic-preflight-relaxation branch May 5, 2026 05:06

This was referenced May 5, 2026

docs(M61-M63): record §50.4 cascade aprender PRs #1500/#1501/#1502 SHIPPED paiml/claude-code-parity-apr#49

Merged

feat(contracts): apr-pretrain-init-finetune-v1 5g.2 dispatch (PMAT-CODE-PRETRAIN-INIT-FINETUNE-001) #1576

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(apr-cli + aprender-train + spec): §55 polymorphic preflight relaxation v1.2 → v1.3 FUNCTIONAL#1500

feat(apr-cli + aprender-train + spec): §55 polymorphic preflight relaxation v1.2 → v1.3 FUNCTIONAL#1500
noahgift merged 3 commits into
mainfrom
feat/section-55-polymorphic-preflight-relaxation

noahgift commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 5, 2026

Summary

Why

What ships

LIVE smoke

No GATE-ARCH-370M-011 violations in output

Five Whys

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant