Skip to content

docs(spec): §86 — apr pretrain --init mismatch defect + #1757 salvage workflow#1758

Merged
noahgift merged 1 commit into
feat/spec-85-p2e-findingsfrom
feat/spec-86-init-mismatch
May 17, 2026
Merged

docs(spec): §86 — apr pretrain --init mismatch defect + #1757 salvage workflow#1758
noahgift merged 1 commit into
feat/spec-85-p2e-findingsfrom
feat/spec-86-init-mismatch

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Documents the §86 finding surfaced by P2-G v1's failed dispatch: pre-P0-K APR checkpoints are silently non-resumable via apr pretrain --init due to the §82 P0-H "LlamaForCausalLM" fallback stamp colliding with Qwen2 tensor names. Symptoms manifest as val_loss = 8.60 at init eval instead of the expected ≈ 4.62, i.e. a 1.86× wrong starting point.

What §86 covers

  1. Root cause walk-throughread_apr_architecture → wrong family discriminator → populate_trainer_from_init_tensors parameter-name mismatch → silent random-init fallback
  2. Implications — all ~125 GB of pre-feat(apr-convert): stamp hf_architecture/hf_model_type from config.json (PMAT-690 P0-K) #1742 trained checkpoints (50 P2-E epochs) are non-resumable
  3. Three workarounds in priority order:
    • Re-import (blocked on HF safetensors locally)
    • Restamp in-place ✅ SHIPPED via PR #1757
    • Treat as final (what P2-G v2 takes — currently in flight)
  4. Operator recipe — 3-line apr stamp invocation that brings hf_identity sub-score from 0/20 → 20/20
  5. Failure-mode classification — Class 4 Silent Incorrect Behavior, detection latency 1 epoch
  6. Recommended follow-up — INV-INIT-ARCH-MATCH-001 invariant on apr-pretrain-from-init-v1 contract (defer to follow-up PR)

Plus the evidence/p2g-2026-05-17/section-86-draft.md source — the raw analysis that this spec section formalizes.

Stacked on #1754

Base: feat/spec-85-p2e-findings (PR #1754 — SPEC §85 P2-E findings). §86 depends on §85's P2-E context.

Will auto-rebase to main once #1754 lands.

Test plan

Refs

🤖 Generated with Claude Code

…matched APRs; PR #1757 ships in-place stamp salvage

P2-G v1 dispatch surfaced a SECOND symptom of the §81-§84 cascade root
cause: pre-P0-K APR checkpoints (architecture="LlamaForCausalLM" P0-H
fallback + Qwen2-tensor shape) are silently non-resumable via
`apr pretrain --init`. The init eval at step 0 produced val_loss=8.60
instead of P2-E ep49's recorded 4.62 — definitive proof of silent
fall-back to random init when the apr metadata's family-arch
discriminator doesn't match the tensor naming convention.

## What §86 covers

1. Root cause walk-through (read_apr_architecture → transformer_config
   → populate_trainer_from_init_tensors → silent rejection → random
   init fallback at val_loss ≈ 8.60).
2. Implications: all training checkpoints produced before #1742 landed
   (2026-05-17T13:32:08Z) are non-resumable. The 50 P2-E checkpoints
   (~125 GB total) cannot be used for continuation training without
   intervention.
3. Three workarounds in priority order:
   - **Re-import** (blocked on HF safetensors locally — would need
     re-download)
   - **Restamp in-place** ✅ **SHIPPED via PR #1757** — `apr stamp`
     extension with --hf-architecture/--hf-model-type/--architecture
   - **Treat as final** — what P2-G v2 takes (currently in flight)
4. Operator recipe for the §86 salvage (3-line shell example).
5. Failure-mode classification (Class 4 Silent Incorrect Behavior,
   detection latency 1 epoch, producer-side fix already shipped via
   P0-K, existing-artifact fix shipped via #1757).
6. Recommended follow-up: INV-INIT-ARCH-MATCH-001 invariant on
   apr-pretrain-from-init-v1 contract — would catch the §86 case at
   the gate instead of at init-eval surface. Defer to follow-up PR.

## Stacked on PR #1754 (SPEC §85)

Base: `feat/spec-85-p2e-findings`. The §86 amendment depends on §85
context (the P2-E run that surfaced §86). Will auto-rebase to main
after #1754 lands.

## Refs

- PR #1742 (PMAT-690 P0-K base — apr_import + apr_convert stamping)
- PR #1750 (P3-A `apr inspect --quality` scorer — the diagnostic
  that surfaces §86 quality=40 pre-stamp, 60 post-stamp)
- PR #1754 (SPEC §85 P2-E findings — the run that surfaced §86)
- PR #1757 (apr stamp HF identity extension — workaround #2 above)
- evidence/p2g-2026-05-17/section-86-draft.md
- memory/feedback_upstream_metadata_masquerade.md (methodology #33)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 7fcd4aa into feat/spec-85-p2e-findings May 17, 2026
1 check passed
@noahgift noahgift deleted the feat/spec-86-init-mismatch branch May 17, 2026 14:45
noahgift added a commit that referenced this pull request May 17, 2026
…H-MATCH-001 (SPEC §86.6 closure) (#1761)

Codifies the INV-INIT-ARCH-MATCH-001 invariant authored as runtime code
in PR #1760 (`validate_init_arch_matches_tensor_evidence` in
aprender-train::train::pretrain_real). Adds:

- FALSIFY-INIT-ARCH-MATCH-001: integration falsifier bound to the
  unit-test family `cargo test -p aprender-train --lib
  inv_init_arch_match_001` (7 tests covering: canonical §86 reject,
  inverse reject, matching qwen2 accept, matching llama accept, None
  metadata skip, unmappable metadata skip, GGUF-unknown tensor skip).
- INV-INIT-ARCH-MATCH-001 proof_obligation: safety invariant — when
  both metadata.architecture and tensor-name-inferred family resolve
  to concrete distinct slugs, gate MUST fail-fast before any training
  step. No false-positive when either side returns "unknown".

## Salvage path

The error message includes an inline `apr stamp` recipe (PR #1757):

```
apr stamp <pre-p0k.apr> --architecture qwen2 --hf-architecture Qwen2ForCausalLM \
                       -o <stamped.apr>
apr pretrain --init <stamped.apr> ...
```

## Refs

- PR #1742 (PMAT-690 P0-K base — producer-side stamping)
- PR #1750 (P3-A `apr inspect --quality` — surfaces hf_identity=0/20 pre-stamp)
- PR #1754 (SPEC §85 P2-E findings — context)
- PR #1757 (apr stamp HF identity extension — salvage path)
- PR #1758 (SPEC §86 amendment — defect specification this contract closes)
- PR #1760 (INV-INIT-ARCH-MATCH-001 runtime implementation)
- memory/feedback_upstream_metadata_masquerade.md (methodology #33)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 18, 2026
…d --init (SPEC §86.6) (#1760)

Catches the §86 silent-failure pattern at the gate: when an APR's
metadata `architecture` claim contradicts what its tensor names imply,
`apr pretrain --init` exits non-zero with a clear naming-both-claims
error and an inline `apr stamp` salvage recipe.

## Background — the §86 case this catches

P2-G v1 was dispatched to resume P2-E ep49 for 10,000 more steps. The
init eval at step 0 produced val_loss = 8.60 — 1.86× P2-E ep49's
recorded 4.62. Silent failure: `--init` loaded random weights instead
of the trained checkpoint. Root cause walk-through:

1. `read_apr_architecture` parses `metadata.architecture = "LlamaForCausalLM"`
   (the §82 P0-H fallback when init_arch.hf_architecture is None).
2. `transformer_config_from_apr_metadata` builds a Llama-family
   TransformerConfig (dimensions correct, family discriminator wrong).
3. `populate_trainer_from_init_tensors` walks `trainer.named_parameters()` —
   produces Llama-style names — and looks them up in the APR tensor map
   which has Qwen2-style names. Mismatch → silent random-init fallback.
4. Training begins at random-init magnitude (val_loss ≈ 8.60).

This invariant catches step 1's wrong claim BEFORE step 3 silently
falls through.

## What this adds

Three new public functions in `aprender-train::train::pretrain_real`:

- `family_from_tensor_names(names: impl IntoIterator<Item=&str>)`
  → `&'static str` — lightweight tensor-name-only family inference
  (no data needed). Returns one of qwen3 / qwen2 / llama / mamba /
  rwkv / gpt-neox / opt / bert / gpt2 / unknown. Mirrors the
  heavyweight `infer_architecture_from_names` in
  aprender-core::format::converter::tokenizer_loader.
- `normalize_metadata_arch_family(arch: &str)` → `Option<&'static str>`
  — maps all three forms of the metadata `architecture` field to a
  canonical family slug: HF class names ("Qwen2ForCausalLM"), family
  slugs ("qwen2"), and capitalised legacy ("Qwen2"). Returns None
  for "unknown" / unmappable strings — caller treats as "no claim".
- `validate_init_arch_matches_tensor_evidence(metadata_arch, &tensors)`
  → `Result<(), String>` — the actual invariant gate. Errors with
  `FALSIFY-INIT-ARCH-MATCH-001` naming both the claimed and inferred
  families, plus an inline `apr stamp` recipe (PR #1757) for §86 salvage.

Wired into `build_shared_trainer_with_init` between `load_init_tensors_from_apr`
and `populate_trainer_from_init_tensors`. Read the raw metadata
`architecture` string via a new small helper (the `TransformerConfig`'s
`hf_architecture` field is None for pre-P0-K APRs — the §86 case — so
the cross-check needs the raw string field).

## Three skip-the-check fallback cases (no false-positives)

1. **No metadata claim** (metadata.architecture absent): nothing to
   contradict, allow.
2. **Unmappable claim** (e.g. "WeirdNovelArch"): novel arch is not §86,
   allow.
3. **Tensor inference returns "unknown"** (GGUF blk.* names can't
   disambiguate): trust the metadata, allow.

Only fail when BOTH inferences produce concrete family slugs AND they differ.

## Tests

- 7 new INV-INIT-ARCH-MATCH-001 tests in `pretrain_real::tests`:
  - `inv_init_arch_match_001_rejects_llama_stamped_qwen2_tensors` —
    canonical §86 case, must fail with falsifier ID + salvage recipe
  - `inv_init_arch_match_001_rejects_qwen2_stamped_llama_tensors` —
    inverse §86 case, must fail
  - `inv_init_arch_match_001_accepts_matching_qwen2/llama` — no
    false-positive on correctly-stamped APRs
  - `inv_init_arch_match_001_skips_when_metadata_absent` — None metadata
  - `inv_init_arch_match_001_skips_unmappable_metadata` — novel arch
  - `inv_init_arch_match_001_trusts_metadata_when_tensors_unknown` —
    GGUF blk.* case
- 1 helper test: `family_from_tensor_names_distinguishes_qwen2_from_llama`
- 1 normalizer test: `normalize_metadata_arch_family_handles_three_forms`

All 9 new tests pass. 7,595 existing aprender-train lib tests still pass
(the 3 pre-existing prune::snapshot_tests failures are insta-snapshot
drift in main, unrelated to this PR).

## Discharges

- §86.6 SPEC follow-up (forthcoming via #1758 stack)
- INV-INIT-ARCH-MATCH-001 invariant for `contracts/apr-pretrain-from-init-v1.yaml`
  (contract amendment is a separate small follow-up PR)

## Refs

- PR #1742 (PMAT-690 P0-K base — apr_convert + apr_import stamping)
- PR #1757 (apr stamp HF identity extension — the salvage path this
  invariant points operators to)
- PR #1758 (SPEC §86 amendment — context this invariant operationalizes)
- evidence/p2g-2026-05-17/section-86-draft.md
- memory/feedback_upstream_metadata_masquerade.md (methodology #33)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant