feat(pretrain): INV-INIT-ARCH-MATCH-001 — fail-fast on arch-mismatched --init (§86.6)#1760
Merged
Conversation
…d --init (SPEC §86.6)
Catches the §86 silent-failure pattern at the gate: when an APR's
metadata `architecture` claim contradicts what its tensor names imply,
`apr pretrain --init` exits non-zero with a clear naming-both-claims
error and an inline `apr stamp` salvage recipe.
## Background — the §86 case this catches
P2-G v1 was dispatched to resume P2-E ep49 for 10,000 more steps. The
init eval at step 0 produced val_loss = 8.60 — 1.86× P2-E ep49's
recorded 4.62. Silent failure: `--init` loaded random weights instead
of the trained checkpoint. Root cause walk-through:
1. `read_apr_architecture` parses `metadata.architecture = "LlamaForCausalLM"`
(the §82 P0-H fallback when init_arch.hf_architecture is None).
2. `transformer_config_from_apr_metadata` builds a Llama-family
TransformerConfig (dimensions correct, family discriminator wrong).
3. `populate_trainer_from_init_tensors` walks `trainer.named_parameters()` —
produces Llama-style names — and looks them up in the APR tensor map
which has Qwen2-style names. Mismatch → silent random-init fallback.
4. Training begins at random-init magnitude (val_loss ≈ 8.60).
This invariant catches step 1's wrong claim BEFORE step 3 silently
falls through.
## What this adds
Three new public functions in `aprender-train::train::pretrain_real`:
- `family_from_tensor_names(names: impl IntoIterator<Item=&str>)`
→ `&'static str` — lightweight tensor-name-only family inference
(no data needed). Returns one of qwen3 / qwen2 / llama / mamba /
rwkv / gpt-neox / opt / bert / gpt2 / unknown. Mirrors the
heavyweight `infer_architecture_from_names` in
aprender-core::format::converter::tokenizer_loader.
- `normalize_metadata_arch_family(arch: &str)` → `Option<&'static str>`
— maps all three forms of the metadata `architecture` field to a
canonical family slug: HF class names ("Qwen2ForCausalLM"), family
slugs ("qwen2"), and capitalised legacy ("Qwen2"). Returns None
for "unknown" / unmappable strings — caller treats as "no claim".
- `validate_init_arch_matches_tensor_evidence(metadata_arch, &tensors)`
→ `Result<(), String>` — the actual invariant gate. Errors with
`FALSIFY-INIT-ARCH-MATCH-001` naming both the claimed and inferred
families, plus an inline `apr stamp` recipe (PR #1757) for §86 salvage.
Wired into `build_shared_trainer_with_init` between `load_init_tensors_from_apr`
and `populate_trainer_from_init_tensors`. Read the raw metadata
`architecture` string via a new small helper (the `TransformerConfig`'s
`hf_architecture` field is None for pre-P0-K APRs — the §86 case — so
the cross-check needs the raw string field).
## Three skip-the-check fallback cases (no false-positives)
1. **No metadata claim** (metadata.architecture absent): nothing to
contradict, allow.
2. **Unmappable claim** (e.g. "WeirdNovelArch"): novel arch is not §86,
allow.
3. **Tensor inference returns "unknown"** (GGUF blk.* names can't
disambiguate): trust the metadata, allow.
Only fail when BOTH inferences produce concrete family slugs AND they differ.
## Tests
- 7 new INV-INIT-ARCH-MATCH-001 tests in `pretrain_real::tests`:
- `inv_init_arch_match_001_rejects_llama_stamped_qwen2_tensors` —
canonical §86 case, must fail with falsifier ID + salvage recipe
- `inv_init_arch_match_001_rejects_qwen2_stamped_llama_tensors` —
inverse §86 case, must fail
- `inv_init_arch_match_001_accepts_matching_qwen2/llama` — no
false-positive on correctly-stamped APRs
- `inv_init_arch_match_001_skips_when_metadata_absent` — None metadata
- `inv_init_arch_match_001_skips_unmappable_metadata` — novel arch
- `inv_init_arch_match_001_trusts_metadata_when_tensors_unknown` —
GGUF blk.* case
- 1 helper test: `family_from_tensor_names_distinguishes_qwen2_from_llama`
- 1 normalizer test: `normalize_metadata_arch_family_handles_three_forms`
All 9 new tests pass. 7,595 existing aprender-train lib tests still pass
(the 3 pre-existing prune::snapshot_tests failures are insta-snapshot
drift in main, unrelated to this PR).
## Discharges
- §86.6 SPEC follow-up (forthcoming via #1758 stack)
- INV-INIT-ARCH-MATCH-001 invariant for `contracts/apr-pretrain-from-init-v1.yaml`
(contract amendment is a separate small follow-up PR)
## Refs
- PR #1742 (PMAT-690 P0-K base — apr_convert + apr_import stamping)
- PR #1757 (apr stamp HF identity extension — the salvage path this
invariant points operators to)
- PR #1758 (SPEC §86 amendment — context this invariant operationalizes)
- evidence/p2g-2026-05-17/section-86-draft.md
- memory/feedback_upstream_metadata_masquerade.md (methodology #33)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 tasks
This was referenced May 17, 2026
noahgift
added a commit
that referenced
this pull request
May 17, 2026
…H-MATCH-001 (SPEC §86.6 closure) (#1761) Codifies the INV-INIT-ARCH-MATCH-001 invariant authored as runtime code in PR #1760 (`validate_init_arch_matches_tensor_evidence` in aprender-train::train::pretrain_real). Adds: - FALSIFY-INIT-ARCH-MATCH-001: integration falsifier bound to the unit-test family `cargo test -p aprender-train --lib inv_init_arch_match_001` (7 tests covering: canonical §86 reject, inverse reject, matching qwen2 accept, matching llama accept, None metadata skip, unmappable metadata skip, GGUF-unknown tensor skip). - INV-INIT-ARCH-MATCH-001 proof_obligation: safety invariant — when both metadata.architecture and tensor-name-inferred family resolve to concrete distinct slugs, gate MUST fail-fast before any training step. No false-positive when either side returns "unknown". ## Salvage path The error message includes an inline `apr stamp` recipe (PR #1757): ``` apr stamp <pre-p0k.apr> --architecture qwen2 --hf-architecture Qwen2ForCausalLM \ -o <stamped.apr> apr pretrain --init <stamped.apr> ... ``` ## Refs - PR #1742 (PMAT-690 P0-K base — producer-side stamping) - PR #1750 (P3-A `apr inspect --quality` — surfaces hf_identity=0/20 pre-stamp) - PR #1754 (SPEC §85 P2-E findings — context) - PR #1757 (apr stamp HF identity extension — salvage path) - PR #1758 (SPEC §86 amendment — defect specification this contract closes) - PR #1760 (INV-INIT-ARCH-MATCH-001 runtime implementation) - memory/feedback_upstream_metadata_masquerade.md (methodology #33) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 18, 2026
…t (P3-C prep) (#1764) Author the HuggingFace model card for `paiml/albor-370m-v1` and the publish-readiness pre-flight script. Per SPEC §88: this model is shipped as a stack-existence-proof, not a production code-completion model. Both artifacts make that framing explicit so HF Hub users calibrate expectations correctly. ## docs/model-cards/albor-370m-v1.md (255 lines) Standard HF model card with model-index frontmatter: - YAML metadata: Apache-2.0, code/python/stack-existence-proof tags, Qwen2.5-Coder-0.5B-Instruct base, codeparrot + the-stack-dedup datasets, val_loss=4.6227 / val_perplexity=101.78 metrics. - §88 framing section spelling out the stack-existence-proof purpose. - Training procedure table (architecture, optimizer, LR schedule, hardware, wall time, throughput — all from the §85 P2-E run). - Trajectory table (every 5 epochs from 7.43 → 4.62). - Intended uses (✅ stack demos, infra validation, tokenization round-trip, quantization research) vs NOT-recommended uses (production code-LM, zero-shot reasoning, long-context, HumanEval submission). - Limitations (compute-bounded, plateau evidence, init lineage, val drift). - Training data table (sources, sizes, licenses, role). - How-to-use code snippets (apr CLI, Rust direct load, format export). - Reproduce-the-run shell example using the exact §85 P2-E recipe. - Citation, license/provenance, acknowledgments. ## scripts/publish/albor-370m-publish-readiness.sh (182 lines) 7-gate pre-flight checklist. GO / NO-GO verdict before invoking `apr publish`. Gates: 1. `apr validate` exits 0 2. `apr inspect --quality` ≥ 90 (P3-A scorer; surfaces §86 salvage recipe inline if hf_identity < 20 or provenance < 25) 3. `apr qa --json` verdict = GO (8 gates) 4. Model card present + has HF YAML frontmatter 5. HF_TOKEN set 6. Smoke `apr run` produces text-like output 7. GGUF Q4_K + SafeTensors export round-trip both succeed Exit 0 = ready to publish. Exit 1 = NO-GO with explicit blocker list. Bashrs-validated (1 SEC011 false-positive on multi-condition rm -rf guard; functionally safe). ## What this PR does NOT do - Does NOT invoke `apr publish` (external action; requires user OK) - Does NOT touch any APR files (read-only checks) - Does NOT modify the §85 P2-E ep49 checkpoint (operator runs `apr stamp` via the §86.4 salvage recipe separately) ## Operator workflow (post-PR landing) ```bash # 1. Stamp the pre-P0-K P2-E ep49 checkpoint to bring hf_identity up apr stamp /mnt/nvme-raid0/runs/model-2-p2e-tuned-hp-20260517/ckpt/epoch-049.apr \ --architecture qwen2 \ --hf-architecture Qwen2ForCausalLM \ --hf-model-type qwen2 \ --license Apache-2.0 \ --data-source "huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct + bigcode/the-stack-dedup + codeparrot/codeparrot-clean" \ --data-license "Apache-2.0 / permissive-aggregate" \ -o /tmp/albor-370m-v1.apr # 2. Run the readiness check bash scripts/publish/albor-370m-publish-readiness.sh /tmp/albor-370m-v1.apr # Expected output: "VERDICT: GO" (or NO-GO with explicit blocker list) # 3. Publish (still requires explicit user invocation) apr publish paiml/albor-370m-v1 --formats apr,safetensors,gguf \ --model-card docs/model-cards/albor-370m-v1.md ``` ## Refs - PR #1742 (PMAT-690 P0-K — upstream stamping) - PR #1750 (P3-A `apr inspect --quality` — gate 2) - PR #1754 (SPEC §84+§85+§86+§87+§88 stack — context) - PR #1757 (apr stamp HF identity extension — §86 salvage) - PR #1760 (INV-INIT-ARCH-MATCH-001 — validation chain) - docs/specifications/aprender-train/ship-model-2-spec.md §88 - docs/specifications/aprender-train/albor-370m-roadmap.md §4 P3-C Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the SPEC §86.6 follow-up: catches the §86 silent-failure pattern at the gate before training silently falls back to random init.
The §86 case this catches
P2-G v1 was dispatched to resume P2-E ep49 for 10,000 more steps. Init eval at step 0 produced val_loss = 8.60 — 1.86× P2-E ep49's recorded 4.62. Silent failure:
--initloaded random weights. Root cause:metadata.architecture = \"LlamaForCausalLM\"(the §82 P0-H fallback) drove the trainer to build Llama-family parameter names; the APR's Qwen2-style tensor names didn't match; populate fell back to random init. Detection latency before this PR: 1 epoch (~55s on RTX 4090) once the operator notices the init eval val_loss disagrees with the init checkpoint's recorded val_loss.After this PR: detection latency = 0 (gated at load time, before training starts).
What this adds
Three public functions in
aprender-train::train::pretrain_real:family_from_tensor_names(names)→&'static str— lightweight name-only family inference (qwen3 / qwen2 / llama / mamba / rwkv / gpt-neox / opt / bert / gpt2 / unknown). Mirrors the heavyweightinfer_architecture_from_namesinaprender-core::format::converter::tokenizer_loader.normalize_metadata_arch_family(arch)→Option<&'static str>— maps all three forms of the metadataarchitecturefield (HF class names likeQwen2ForCausalLM, family slugs likeqwen2, capitalised legacy likeQwen2) to a canonical family slug.validate_init_arch_matches_tensor_evidence(metadata_arch, &tensors)— the actual gate. Errors withFALSIFY-INIT-ARCH-MATCH-001naming both the claimed and inferred families plus an inlineapr stampsalvage recipe (PR #1757).Wired into
build_shared_trainer_with_initbetweenload_init_tensors_from_aprandpopulate_trainer_from_init_tensors. Reads the rawmetadata.architecturestring via a small helper —TransformerConfig::hf_architectureisNonefor pre-P0-K APRs (the §86 case), so the cross-check needs the raw field.Three skip-the-check fallback cases (no false-positives)
metadata.architectureabsent) → allowWeirdNovelArch) → allow (novel arch isn't §86)unknown(GGUFblk.*names) → trust metadataOnly fails when BOTH inferences produce concrete family slugs AND they differ.
Salvage recipe (in the error message)
Requires PR #1757 (apr stamp HF identity extension).
Test plan
rejects_llama_stamped_qwen2_tensors(canonical §86 case)rejects_qwen2_stamped_llama_tensors(inverse)accepts_matching_qwen2(no false-positive)accepts_matching_llama(no false-positive)skips_when_metadata_absentskips_unmappable_metadatatrusts_metadata_when_tensors_unknownfamily_from_tensor_names_distinguishes_qwen2_from_llama(helper)normalize_metadata_arch_family_handles_three_forms(normalizer)prune::snapshot_testsfailures (insta-snapshot drift in main) NOT caused by this PRDischarges
contracts/apr-pretrain-from-init-v1.yaml— the contract amendment is a small separate follow-up PRRefs
evidence/p2g-2026-05-17/section-86-draft.mdmemory/feedback_upstream_metadata_masquerade.md(methodology Add feature importance example to random_forest_regression.rs #33)🤖 Generated with Claude Code