docs(spec): §86 — apr pretrain --init mismatch defect + #1757 salvage workflow#1758
Merged
Merged
Conversation
…matched APRs; PR #1757 ships in-place stamp salvage P2-G v1 dispatch surfaced a SECOND symptom of the §81-§84 cascade root cause: pre-P0-K APR checkpoints (architecture="LlamaForCausalLM" P0-H fallback + Qwen2-tensor shape) are silently non-resumable via `apr pretrain --init`. The init eval at step 0 produced val_loss=8.60 instead of P2-E ep49's recorded 4.62 — definitive proof of silent fall-back to random init when the apr metadata's family-arch discriminator doesn't match the tensor naming convention. ## What §86 covers 1. Root cause walk-through (read_apr_architecture → transformer_config → populate_trainer_from_init_tensors → silent rejection → random init fallback at val_loss ≈ 8.60). 2. Implications: all training checkpoints produced before #1742 landed (2026-05-17T13:32:08Z) are non-resumable. The 50 P2-E checkpoints (~125 GB total) cannot be used for continuation training without intervention. 3. Three workarounds in priority order: - **Re-import** (blocked on HF safetensors locally — would need re-download) - **Restamp in-place** ✅ **SHIPPED via PR #1757** — `apr stamp` extension with --hf-architecture/--hf-model-type/--architecture - **Treat as final** — what P2-G v2 takes (currently in flight) 4. Operator recipe for the §86 salvage (3-line shell example). 5. Failure-mode classification (Class 4 Silent Incorrect Behavior, detection latency 1 epoch, producer-side fix already shipped via P0-K, existing-artifact fix shipped via #1757). 6. Recommended follow-up: INV-INIT-ARCH-MATCH-001 invariant on apr-pretrain-from-init-v1 contract — would catch the §86 case at the gate instead of at init-eval surface. Defer to follow-up PR. ## Stacked on PR #1754 (SPEC §85) Base: `feat/spec-85-p2e-findings`. The §86 amendment depends on §85 context (the P2-E run that surfaced §86). Will auto-rebase to main after #1754 lands. ## Refs - PR #1742 (PMAT-690 P0-K base — apr_import + apr_convert stamping) - PR #1750 (P3-A `apr inspect --quality` scorer — the diagnostic that surfaces §86 quality=40 pre-stamp, 60 post-stamp) - PR #1754 (SPEC §85 P2-E findings — the run that surfaced §86) - PR #1757 (apr stamp HF identity extension — workaround #2 above) - evidence/p2g-2026-05-17/section-86-draft.md - memory/feedback_upstream_metadata_masquerade.md (methodology #33) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 17, 2026
…H-MATCH-001 (SPEC §86.6 closure) (#1761) Codifies the INV-INIT-ARCH-MATCH-001 invariant authored as runtime code in PR #1760 (`validate_init_arch_matches_tensor_evidence` in aprender-train::train::pretrain_real). Adds: - FALSIFY-INIT-ARCH-MATCH-001: integration falsifier bound to the unit-test family `cargo test -p aprender-train --lib inv_init_arch_match_001` (7 tests covering: canonical §86 reject, inverse reject, matching qwen2 accept, matching llama accept, None metadata skip, unmappable metadata skip, GGUF-unknown tensor skip). - INV-INIT-ARCH-MATCH-001 proof_obligation: safety invariant — when both metadata.architecture and tensor-name-inferred family resolve to concrete distinct slugs, gate MUST fail-fast before any training step. No false-positive when either side returns "unknown". ## Salvage path The error message includes an inline `apr stamp` recipe (PR #1757): ``` apr stamp <pre-p0k.apr> --architecture qwen2 --hf-architecture Qwen2ForCausalLM \ -o <stamped.apr> apr pretrain --init <stamped.apr> ... ``` ## Refs - PR #1742 (PMAT-690 P0-K base — producer-side stamping) - PR #1750 (P3-A `apr inspect --quality` — surfaces hf_identity=0/20 pre-stamp) - PR #1754 (SPEC §85 P2-E findings — context) - PR #1757 (apr stamp HF identity extension — salvage path) - PR #1758 (SPEC §86 amendment — defect specification this contract closes) - PR #1760 (INV-INIT-ARCH-MATCH-001 runtime implementation) - memory/feedback_upstream_metadata_masquerade.md (methodology #33) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 18, 2026
…d --init (SPEC §86.6) (#1760) Catches the §86 silent-failure pattern at the gate: when an APR's metadata `architecture` claim contradicts what its tensor names imply, `apr pretrain --init` exits non-zero with a clear naming-both-claims error and an inline `apr stamp` salvage recipe. ## Background — the §86 case this catches P2-G v1 was dispatched to resume P2-E ep49 for 10,000 more steps. The init eval at step 0 produced val_loss = 8.60 — 1.86× P2-E ep49's recorded 4.62. Silent failure: `--init` loaded random weights instead of the trained checkpoint. Root cause walk-through: 1. `read_apr_architecture` parses `metadata.architecture = "LlamaForCausalLM"` (the §82 P0-H fallback when init_arch.hf_architecture is None). 2. `transformer_config_from_apr_metadata` builds a Llama-family TransformerConfig (dimensions correct, family discriminator wrong). 3. `populate_trainer_from_init_tensors` walks `trainer.named_parameters()` — produces Llama-style names — and looks them up in the APR tensor map which has Qwen2-style names. Mismatch → silent random-init fallback. 4. Training begins at random-init magnitude (val_loss ≈ 8.60). This invariant catches step 1's wrong claim BEFORE step 3 silently falls through. ## What this adds Three new public functions in `aprender-train::train::pretrain_real`: - `family_from_tensor_names(names: impl IntoIterator<Item=&str>)` → `&'static str` — lightweight tensor-name-only family inference (no data needed). Returns one of qwen3 / qwen2 / llama / mamba / rwkv / gpt-neox / opt / bert / gpt2 / unknown. Mirrors the heavyweight `infer_architecture_from_names` in aprender-core::format::converter::tokenizer_loader. - `normalize_metadata_arch_family(arch: &str)` → `Option<&'static str>` — maps all three forms of the metadata `architecture` field to a canonical family slug: HF class names ("Qwen2ForCausalLM"), family slugs ("qwen2"), and capitalised legacy ("Qwen2"). Returns None for "unknown" / unmappable strings — caller treats as "no claim". - `validate_init_arch_matches_tensor_evidence(metadata_arch, &tensors)` → `Result<(), String>` — the actual invariant gate. Errors with `FALSIFY-INIT-ARCH-MATCH-001` naming both the claimed and inferred families, plus an inline `apr stamp` recipe (PR #1757) for §86 salvage. Wired into `build_shared_trainer_with_init` between `load_init_tensors_from_apr` and `populate_trainer_from_init_tensors`. Read the raw metadata `architecture` string via a new small helper (the `TransformerConfig`'s `hf_architecture` field is None for pre-P0-K APRs — the §86 case — so the cross-check needs the raw string field). ## Three skip-the-check fallback cases (no false-positives) 1. **No metadata claim** (metadata.architecture absent): nothing to contradict, allow. 2. **Unmappable claim** (e.g. "WeirdNovelArch"): novel arch is not §86, allow. 3. **Tensor inference returns "unknown"** (GGUF blk.* names can't disambiguate): trust the metadata, allow. Only fail when BOTH inferences produce concrete family slugs AND they differ. ## Tests - 7 new INV-INIT-ARCH-MATCH-001 tests in `pretrain_real::tests`: - `inv_init_arch_match_001_rejects_llama_stamped_qwen2_tensors` — canonical §86 case, must fail with falsifier ID + salvage recipe - `inv_init_arch_match_001_rejects_qwen2_stamped_llama_tensors` — inverse §86 case, must fail - `inv_init_arch_match_001_accepts_matching_qwen2/llama` — no false-positive on correctly-stamped APRs - `inv_init_arch_match_001_skips_when_metadata_absent` — None metadata - `inv_init_arch_match_001_skips_unmappable_metadata` — novel arch - `inv_init_arch_match_001_trusts_metadata_when_tensors_unknown` — GGUF blk.* case - 1 helper test: `family_from_tensor_names_distinguishes_qwen2_from_llama` - 1 normalizer test: `normalize_metadata_arch_family_handles_three_forms` All 9 new tests pass. 7,595 existing aprender-train lib tests still pass (the 3 pre-existing prune::snapshot_tests failures are insta-snapshot drift in main, unrelated to this PR). ## Discharges - §86.6 SPEC follow-up (forthcoming via #1758 stack) - INV-INIT-ARCH-MATCH-001 invariant for `contracts/apr-pretrain-from-init-v1.yaml` (contract amendment is a separate small follow-up PR) ## Refs - PR #1742 (PMAT-690 P0-K base — apr_convert + apr_import stamping) - PR #1757 (apr stamp HF identity extension — the salvage path this invariant points operators to) - PR #1758 (SPEC §86 amendment — context this invariant operationalizes) - evidence/p2g-2026-05-17/section-86-draft.md - memory/feedback_upstream_metadata_masquerade.md (methodology #33) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Documents the §86 finding surfaced by P2-G v1's failed dispatch: pre-P0-K APR checkpoints are silently non-resumable via
apr pretrain --initdue to the §82 P0-H "LlamaForCausalLM" fallback stamp colliding with Qwen2 tensor names. Symptoms manifest as val_loss = 8.60 at init eval instead of the expected ≈ 4.62, i.e. a 1.86× wrong starting point.What §86 covers
read_apr_architecture→ wrong family discriminator →populate_trainer_from_init_tensorsparameter-name mismatch → silent random-init fallbackapr stampinvocation that brings hf_identity sub-score from 0/20 → 20/20apr-pretrain-from-init-v1contract (defer to follow-up PR)Plus the
evidence/p2g-2026-05-17/section-86-draft.mdsource — the raw analysis that this spec section formalizes.Stacked on #1754
Base:
feat/spec-85-p2e-findings(PR #1754 — SPEC §85 P2-E findings). §86 depends on §85's P2-E context.Will auto-rebase to
mainonce #1754 lands.Test plan
grep -n \"^## §86\" docs/specifications/aprender-train/ship-model-2-spec.md— §86 section present at line 2310wc -l evidence/p2g-2026-05-17/section-86-draft.md— 58 lines, full draft preservedRefs
apr inspect --quality— the diagnostic that surfaces §86)evidence/p2g-2026-05-17/section-86-draft.mdmemory/feedback_upstream_metadata_masquerade.md(methodology Add feature importance example to random_forest_regression.rs #33)🤖 Generated with Claude Code