feat(pretrain): INV-INIT-ARCH-MATCH-001 — fail-fast on arch-mismatched --init (§86.6) by noahgift · Pull Request #1760 · paiml/aprender

noahgift · 2026-05-17T15:02:12Z

Summary

Closes the SPEC §86.6 follow-up: catches the §86 silent-failure pattern at the gate before training silently falls back to random init.

The §86 case this catches

P2-G v1 was dispatched to resume P2-E ep49 for 10,000 more steps. Init eval at step 0 produced val_loss = 8.60 — 1.86× P2-E ep49's recorded 4.62. Silent failure: --init loaded random weights. Root cause: metadata.architecture = \"LlamaForCausalLM\" (the §82 P0-H fallback) drove the trainer to build Llama-family parameter names; the APR's Qwen2-style tensor names didn't match; populate fell back to random init. Detection latency before this PR: 1 epoch (~55s on RTX 4090) once the operator notices the init eval val_loss disagrees with the init checkpoint's recorded val_loss.

After this PR: detection latency = 0 (gated at load time, before training starts).

What this adds

Three public functions in aprender-train::train::pretrain_real:

family_from_tensor_names(names) → &'static str — lightweight name-only family inference (qwen3 / qwen2 / llama / mamba / rwkv / gpt-neox / opt / bert / gpt2 / unknown). Mirrors the heavyweight infer_architecture_from_names in aprender-core::format::converter::tokenizer_loader.
normalize_metadata_arch_family(arch) → Option<&'static str> — maps all three forms of the metadata architecture field (HF class names like Qwen2ForCausalLM, family slugs like qwen2, capitalised legacy like Qwen2) to a canonical family slug.
validate_init_arch_matches_tensor_evidence(metadata_arch, &tensors) — the actual gate. Errors with FALSIFY-INIT-ARCH-MATCH-001 naming both the claimed and inferred families plus an inline apr stamp salvage recipe (PR #1757).

Wired into build_shared_trainer_with_init between load_init_tensors_from_apr and populate_trainer_from_init_tensors. Reads the raw metadata.architecture string via a small helper — TransformerConfig::hf_architecture is None for pre-P0-K APRs (the §86 case), so the cross-check needs the raw field.

Three skip-the-check fallback cases (no false-positives)

No metadata claim (metadata.architecture absent) → allow
Unmappable claim (e.g. WeirdNovelArch) → allow (novel arch isn't §86)
Tensor inference returns unknown (GGUF blk.* names) → trust metadata

Only fails when BOTH inferences produce concrete family slugs AND they differ.

Salvage recipe (in the error message)

apr stamp <pre-p0k.apr> --architecture qwen2 --hf-architecture Qwen2ForCausalLM \
                       -o <stamped.apr>
apr pretrain --init <stamped.apr> ...

Requires PR #1757 (apr stamp HF identity extension).

Test plan

7 INV-INIT-ARCH-MATCH-001 tests:
- rejects_llama_stamped_qwen2_tensors (canonical §86 case)
- rejects_qwen2_stamped_llama_tensors (inverse)
- accepts_matching_qwen2 (no false-positive)
- accepts_matching_llama (no false-positive)
- skips_when_metadata_absent
- skips_unmappable_metadata
- trusts_metadata_when_tensors_unknown
family_from_tensor_names_distinguishes_qwen2_from_llama (helper)
normalize_metadata_arch_family_handles_three_forms (normalizer)
All 9 new tests pass
7,595 existing aprender-train lib tests still pass
3 pre-existing prune::snapshot_tests failures (insta-snapshot drift in main) NOT caused by this PR

Discharges

SPEC §86.6 follow-up (the "Recommended follow-up — new INV-INIT-ARCH-MATCH-001 invariant" section)
INV-INIT-ARCH-MATCH-001 invariant for contracts/apr-pretrain-from-init-v1.yaml — the contract amendment is a small separate follow-up PR

Refs

PR #1742 (PMAT-690 P0-K base — apr_convert + apr_import stamping)
PR #1757 (apr stamp HF identity extension — the salvage path this invariant points operators to)
PR #1758 (SPEC §86 amendment — context this invariant operationalizes)
evidence/p2g-2026-05-17/section-86-draft.md
memory/feedback_upstream_metadata_masquerade.md (methodology Add feature importance example to random_forest_regression.rs #33)

🤖 Generated with Claude Code

…d --init (SPEC §86.6) Catches the §86 silent-failure pattern at the gate: when an APR's metadata `architecture` claim contradicts what its tensor names imply, `apr pretrain --init` exits non-zero with a clear naming-both-claims error and an inline `apr stamp` salvage recipe. ## Background — the §86 case this catches P2-G v1 was dispatched to resume P2-E ep49 for 10,000 more steps. The init eval at step 0 produced val_loss = 8.60 — 1.86× P2-E ep49's recorded 4.62. Silent failure: `--init` loaded random weights instead of the trained checkpoint. Root cause walk-through: 1. `read_apr_architecture` parses `metadata.architecture = "LlamaForCausalLM"` (the §82 P0-H fallback when init_arch.hf_architecture is None). 2. `transformer_config_from_apr_metadata` builds a Llama-family TransformerConfig (dimensions correct, family discriminator wrong). 3. `populate_trainer_from_init_tensors` walks `trainer.named_parameters()` — produces Llama-style names — and looks them up in the APR tensor map which has Qwen2-style names. Mismatch → silent random-init fallback. 4. Training begins at random-init magnitude (val_loss ≈ 8.60). This invariant catches step 1's wrong claim BEFORE step 3 silently falls through. ## What this adds Three new public functions in `aprender-train::train::pretrain_real`: - `family_from_tensor_names(names: impl IntoIterator<Item=&str>)` → `&'static str` — lightweight tensor-name-only family inference (no data needed). Returns one of qwen3 / qwen2 / llama / mamba / rwkv / gpt-neox / opt / bert / gpt2 / unknown. Mirrors the heavyweight `infer_architecture_from_names` in aprender-core::format::converter::tokenizer_loader. - `normalize_metadata_arch_family(arch: &str)` → `Option<&'static str>` — maps all three forms of the metadata `architecture` field to a canonical family slug: HF class names ("Qwen2ForCausalLM"), family slugs ("qwen2"), and capitalised legacy ("Qwen2"). Returns None for "unknown" / unmappable strings — caller treats as "no claim". - `validate_init_arch_matches_tensor_evidence(metadata_arch, &tensors)` → `Result<(), String>` — the actual invariant gate. Errors with `FALSIFY-INIT-ARCH-MATCH-001` naming both the claimed and inferred families, plus an inline `apr stamp` recipe (PR #1757) for §86 salvage. Wired into `build_shared_trainer_with_init` between `load_init_tensors_from_apr` and `populate_trainer_from_init_tensors`. Read the raw metadata `architecture` string via a new small helper (the `TransformerConfig`'s `hf_architecture` field is None for pre-P0-K APRs — the §86 case — so the cross-check needs the raw string field). ## Three skip-the-check fallback cases (no false-positives) 1. **No metadata claim** (metadata.architecture absent): nothing to contradict, allow. 2. **Unmappable claim** (e.g. "WeirdNovelArch"): novel arch is not §86, allow. 3. **Tensor inference returns "unknown"** (GGUF blk.* names can't disambiguate): trust the metadata, allow. Only fail when BOTH inferences produce concrete family slugs AND they differ. ## Tests - 7 new INV-INIT-ARCH-MATCH-001 tests in `pretrain_real::tests`: - `inv_init_arch_match_001_rejects_llama_stamped_qwen2_tensors` — canonical §86 case, must fail with falsifier ID + salvage recipe - `inv_init_arch_match_001_rejects_qwen2_stamped_llama_tensors` — inverse §86 case, must fail - `inv_init_arch_match_001_accepts_matching_qwen2/llama` — no false-positive on correctly-stamped APRs - `inv_init_arch_match_001_skips_when_metadata_absent` — None metadata - `inv_init_arch_match_001_skips_unmappable_metadata` — novel arch - `inv_init_arch_match_001_trusts_metadata_when_tensors_unknown` — GGUF blk.* case - 1 helper test: `family_from_tensor_names_distinguishes_qwen2_from_llama` - 1 normalizer test: `normalize_metadata_arch_family_handles_three_forms` All 9 new tests pass. 7,595 existing aprender-train lib tests still pass (the 3 pre-existing prune::snapshot_tests failures are insta-snapshot drift in main, unrelated to this PR). ## Discharges - §86.6 SPEC follow-up (forthcoming via #1758 stack) - INV-INIT-ARCH-MATCH-001 invariant for `contracts/apr-pretrain-from-init-v1.yaml` (contract amendment is a separate small follow-up PR) ## Refs - PR #1742 (PMAT-690 P0-K base — apr_convert + apr_import stamping) - PR #1757 (apr stamp HF identity extension — the salvage path this invariant points operators to) - PR #1758 (SPEC §86 amendment — context this invariant operationalizes) - evidence/p2g-2026-05-17/section-86-draft.md - memory/feedback_upstream_metadata_masquerade.md (methodology #33) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…H-MATCH-001 (SPEC §86.6 closure) (#1761) Codifies the INV-INIT-ARCH-MATCH-001 invariant authored as runtime code in PR #1760 (`validate_init_arch_matches_tensor_evidence` in aprender-train::train::pretrain_real). Adds: - FALSIFY-INIT-ARCH-MATCH-001: integration falsifier bound to the unit-test family `cargo test -p aprender-train --lib inv_init_arch_match_001` (7 tests covering: canonical §86 reject, inverse reject, matching qwen2 accept, matching llama accept, None metadata skip, unmappable metadata skip, GGUF-unknown tensor skip). - INV-INIT-ARCH-MATCH-001 proof_obligation: safety invariant — when both metadata.architecture and tensor-name-inferred family resolve to concrete distinct slugs, gate MUST fail-fast before any training step. No false-positive when either side returns "unknown". ## Salvage path The error message includes an inline `apr stamp` recipe (PR #1757): ``` apr stamp <pre-p0k.apr> --architecture qwen2 --hf-architecture Qwen2ForCausalLM \ -o <stamped.apr> apr pretrain --init <stamped.apr> ... ``` ## Refs - PR #1742 (PMAT-690 P0-K base — producer-side stamping) - PR #1750 (P3-A `apr inspect --quality` — surfaces hf_identity=0/20 pre-stamp) - PR #1754 (SPEC §85 P2-E findings — context) - PR #1757 (apr stamp HF identity extension — salvage path) - PR #1758 (SPEC §86 amendment — defect specification this contract closes) - PR #1760 (INV-INIT-ARCH-MATCH-001 runtime implementation) - memory/feedback_upstream_metadata_masquerade.md (methodology #33) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…t (P3-C prep) (#1764) Author the HuggingFace model card for `paiml/albor-370m-v1` and the publish-readiness pre-flight script. Per SPEC §88: this model is shipped as a stack-existence-proof, not a production code-completion model. Both artifacts make that framing explicit so HF Hub users calibrate expectations correctly. ## docs/model-cards/albor-370m-v1.md (255 lines) Standard HF model card with model-index frontmatter: - YAML metadata: Apache-2.0, code/python/stack-existence-proof tags, Qwen2.5-Coder-0.5B-Instruct base, codeparrot + the-stack-dedup datasets, val_loss=4.6227 / val_perplexity=101.78 metrics. - §88 framing section spelling out the stack-existence-proof purpose. - Training procedure table (architecture, optimizer, LR schedule, hardware, wall time, throughput — all from the §85 P2-E run). - Trajectory table (every 5 epochs from 7.43 → 4.62). - Intended uses (✅ stack demos, infra validation, tokenization round-trip, quantization research) vs NOT-recommended uses (production code-LM, zero-shot reasoning, long-context, HumanEval submission). - Limitations (compute-bounded, plateau evidence, init lineage, val drift). - Training data table (sources, sizes, licenses, role). - How-to-use code snippets (apr CLI, Rust direct load, format export). - Reproduce-the-run shell example using the exact §85 P2-E recipe. - Citation, license/provenance, acknowledgments. ## scripts/publish/albor-370m-publish-readiness.sh (182 lines) 7-gate pre-flight checklist. GO / NO-GO verdict before invoking `apr publish`. Gates: 1. `apr validate` exits 0 2. `apr inspect --quality` ≥ 90 (P3-A scorer; surfaces §86 salvage recipe inline if hf_identity < 20 or provenance < 25) 3. `apr qa --json` verdict = GO (8 gates) 4. Model card present + has HF YAML frontmatter 5. HF_TOKEN set 6. Smoke `apr run` produces text-like output 7. GGUF Q4_K + SafeTensors export round-trip both succeed Exit 0 = ready to publish. Exit 1 = NO-GO with explicit blocker list. Bashrs-validated (1 SEC011 false-positive on multi-condition rm -rf guard; functionally safe). ## What this PR does NOT do - Does NOT invoke `apr publish` (external action; requires user OK) - Does NOT touch any APR files (read-only checks) - Does NOT modify the §85 P2-E ep49 checkpoint (operator runs `apr stamp` via the §86.4 salvage recipe separately) ## Operator workflow (post-PR landing) ```bash # 1. Stamp the pre-P0-K P2-E ep49 checkpoint to bring hf_identity up apr stamp /mnt/nvme-raid0/runs/model-2-p2e-tuned-hp-20260517/ckpt/epoch-049.apr \ --architecture qwen2 \ --hf-architecture Qwen2ForCausalLM \ --hf-model-type qwen2 \ --license Apache-2.0 \ --data-source "huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct + bigcode/the-stack-dedup + codeparrot/codeparrot-clean" \ --data-license "Apache-2.0 / permissive-aggregate" \ -o /tmp/albor-370m-v1.apr # 2. Run the readiness check bash scripts/publish/albor-370m-publish-readiness.sh /tmp/albor-370m-v1.apr # Expected output: "VERDICT: GO" (or NO-GO with explicit blocker list) # 3. Publish (still requires explicit user invocation) apr publish paiml/albor-370m-v1 --formats apr,safetensors,gguf \ --model-card docs/model-cards/albor-370m-v1.md ``` ## Refs - PR #1742 (PMAT-690 P0-K — upstream stamping) - PR #1750 (P3-A `apr inspect --quality` — gate 2) - PR #1754 (SPEC §84+§85+§86+§87+§88 stack — context) - PR #1757 (apr stamp HF identity extension — §86 salvage) - PR #1760 (INV-INIT-ARCH-MATCH-001 — validation chain) - docs/specifications/aprender-train/ship-model-2-spec.md §88 - docs/specifications/aprender-train/albor-370m-roadmap.md §4 P3-C Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 17, 2026 15:02

noahgift mentioned this pull request May 17, 2026

contracts(apr-pretrain-from-init): v1.2.0 → v1.3.0 — FALSIFY-INIT-ARCH-MATCH-001 (§86.6 closure) #1761

Merged

2 tasks

Merge branch 'main' into feat/init-arch-match-invariant

6786e09

This was referenced May 17, 2026

feat(pretrain): §87 P0-J' — Chinchilla 20·N hard gate (was 10·N) #1762

Closed

docs(model-cards): albor-370m-v1 model card + publish-readiness script (P3-C prep) #1764

Merged

Merge branch 'main' into feat/init-arch-match-invariant

98b8683

noahgift added 4 commits May 18, 2026 06:42

Merge branch 'main' into feat/init-arch-match-invariant

e3cf03f

Merge branch 'main' into feat/init-arch-match-invariant

df0711e

Merge branch 'main' into feat/init-arch-match-invariant

c6b3787

Merge branch 'main' into feat/init-arch-match-invariant

61b0672

noahgift added 3 commits May 18, 2026 09:40

Merge branch 'main' into feat/init-arch-match-invariant

9a5cea1

Merge branch 'main' into feat/init-arch-match-invariant

54b7b39

Merge branch 'main' into feat/init-arch-match-invariant

8f567f2

noahgift merged commit 2d42992 into main May 18, 2026
10 checks passed

noahgift deleted the feat/init-arch-match-invariant branch May 18, 2026 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pretrain): INV-INIT-ARCH-MATCH-001 — fail-fast on arch-mismatched --init (§86.6)#1760

feat(pretrain): INV-INIT-ARCH-MATCH-001 — fail-fast on arch-mismatched --init (§86.6)#1760
noahgift merged 10 commits into
mainfrom
feat/init-arch-match-invariant

noahgift commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 17, 2026

Summary

The §86 case this catches

What this adds

Three skip-the-check fallback cases (no false-positives)

Salvage recipe (in the error message)

Test plan

Discharges

Refs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant