Skip to content

feat(apr-stamp): --hf-architecture/--hf-model-type/--architecture (PMAT-690 P0-K extension)#1757

Merged
noahgift merged 2 commits into
mainfrom
feat/apr-stamp-hf-identity
May 17, 2026
Merged

feat(apr-stamp): --hf-architecture/--hf-model-type/--architecture (PMAT-690 P0-K extension)#1757
noahgift merged 2 commits into
mainfrom
feat/apr-stamp-hf-identity

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Extends apr stamp to patch HF identity + architecture family slug in place. Unblocks salvage of pre-P0-K APR checkpoints whose architecture stamps were corrupted by the §82 P0-H fallback.

SPEC §86 root cause

The §85 P2-E live run produced 50 epoch checkpoints (~125 GB) at best val_loss=4.6227. P2-G v1 attempted to resume from P2-E ep49 — init eval surfaced val_loss=8.60, proving --init silently failed to load. Root cause: P2-E's init APR pre-dates P0-K, so the §82 P0-H fallback stamped architecture=\"LlamaForCausalLM\" into the trained checkpoint despite Qwen2 tensors. apr pretrain --init reads the wrong architecture stamp and rejects the load.

What this PR adds

Three new CLI flags on apr stamp:

  • --hf-architecture <CLASS> (e.g. Qwen2ForCausalLM) — the HF class name. PMAT-690 P0-K's upstream stamp.
  • --hf-model-type <SLUG> (e.g. qwen2) — config.json::model_type.
  • --architecture <SLUG> (e.g. qwen2) — the lowercase family slug that apr pretrain --init reads for arch dispatch. This is the load-bearing field for §86 salvage — patching just hf_architecture alone won't unblock apr pretrain --init.

The existing --license / --data-source / --data-license are unchanged.

Operator workflow for §86 salvage

# Patch a pre-P0-K Qwen2-actual-Llama-stamped checkpoint in place
apr stamp /path/to/p2e-epoch-049.apr \
  --architecture qwen2 \
  --hf-architecture Qwen2ForCausalLM \
  --hf-model-type qwen2 \
  -o /path/to/p2e-epoch-049-stamped.apr

# Verify quality scorer jump
apr inspect /path/to/p2e-epoch-049-stamped.apr --quality --json | jq .quality
# breakdown.hf_identity: 0 → 20

# Now usable as resume init:
apr pretrain --init /path/to/p2e-epoch-049-stamped.apr ...

Test plan

  • 2 new unit tests in aprender-core::format::v2::stampProvenancePatch round-trips the three new fields. 6/6 pass.
  • 2 new CLI tests in apr-cli::commands::stamp:
    • stamp_p0k_recovers_pre_p0k_apr_identity — full §86 use case (Llama-stamped + Qwen2-tensor APR → patched to qwen2)
    • stamp_p0k_partial_hf_architecture_only — field independence (patch one without touching others)
    • 7/7 pass.
  • cargo test -p apr-cli --features training --lib — 5,944 tests, 0 regressions.
  • cargo test -p aprender-core --lib format::v2::stamp — 6 tests, 0 regressions.

Discharges

  • §86 SPEC amendment (evidence/p2g-2026-05-17/section-86-draft.md) workaround Feature Request: Cross-Validation Utilities #2 (in-place restamp)
  • Salvages ~125 GB of pre-P0-K P2-E checkpoints without a 53-min retrain
  • Establishes a pattern for in-place metadata patching that future spec amendments can build on

Refs

🤖 Generated with Claude Code

…AT-690 P0-K extension)

Extends the existing `apr stamp` command (PR #1050 — provenance fields)
to also patch HF identity + architecture family slug in place. Unblocks
in-place salvage of pre-P0-K APR checkpoints whose architecture stamps
were corrupted by the §82 P0-H fallback.

## Background — SPEC §86 root cause

The §85 P2-E live run produced 50 epoch checkpoints (~125 GB total) at
best val_loss=4.6227. P2-G v1 attempted to resume from P2-E ep49 and
the init eval surfaced val_loss=8.60 — proof that --init silently
failed to load the trained weights. Root cause: P2-E's init APR
pre-dates P0-K (PR #1742), so the P0-H fallback stamped
architecture="LlamaForCausalLM" into the trained checkpoint despite
the actual tensors being Qwen2-shaped. `apr pretrain --init` reads
the (wrong) architecture stamp and rejects the load.

## What this PR adds

Three new CLI flags on `apr stamp`:

- `--hf-architecture <CLASS>` (e.g. Qwen2ForCausalLM) — the HF class
  name. PMAT-690 P0-K's upstream stamp.
- `--hf-model-type <SLUG>` (e.g. qwen2) — config.json::model_type.
- `--architecture <SLUG>` (e.g. qwen2) — the lowercase family slug
  that `apr pretrain --init` reads for arch dispatch. **This is the
  load-bearing field** for §86 salvage — patching just hf_architecture
  alone won't unblock `apr pretrain --init`.

The existing `--license` / `--data-source` / `--data-license` flags
are unchanged. The patch struct's `has_any()` gate now accepts any
combination of the six fields; at least one must be specified or the
stamp is rejected up-front.

## Operator workflow for §86 salvage

```bash
# Patch a pre-P0-K Qwen2-actual-Llama-stamped checkpoint in place
apr stamp /path/to/p2e-epoch-049.apr \
  --architecture qwen2 \
  --hf-architecture Qwen2ForCausalLM \
  --hf-model-type qwen2 \
  -o /path/to/p2e-epoch-049-stamped.apr

# Verify
apr inspect /path/to/p2e-epoch-049-stamped.apr --quality --json | jq .quality
# breakdown.hf_identity should jump 0 → 20

# Now usable as init for resume training:
apr pretrain --init /path/to/p2e-epoch-049-stamped.apr ...
```

## Discharges

- §86 SPEC amendment (`evidence/p2g-2026-05-17/section-86-draft.md`) —
  workaround #2 (in-place restamp)
- Salvages ~125 GB of pre-P0-K P2-E checkpoints without a 53-min retrain
- Establishes a pattern for in-place metadata patching that future
  spec amendments can build on (e.g., a `--name` / `--description`
  extension for model card metadata)

## Tests

- 2 new unit tests in `aprender-core::format::v2::stamp` (extends 6
  → 6, existing tests adjusted for new struct fields via Default)
- 2 new CLI tests in `apr-cli::commands::stamp` (extends 5 → 7):
  - `stamp_p0k_recovers_pre_p0k_apr_identity` — full §86 use case
  - `stamp_p0k_partial_hf_architecture_only` — verifies field
    independence (stamp one without touching others)
- All 5,944 apr-cli lib tests pass — 0 regressions
- All 13,800 aprender-core lib tests pass — 0 regressions

## Refs

- PR [#1742](#1742) (PMAT-690 P0-K base)
- PR [#1750](#1750) (P3-A `apr inspect --quality` scorer)
- PR [#1754](#1754) (SPEC §85 P2-E findings)
- PR #1050 (the original `apr stamp` PR — this extends it)
- `docs/specifications/aprender-train/ship-model-2-spec.md §86` (forthcoming)
- `evidence/p2g-2026-05-17/section-86-draft.md` (root cause + workaround analysis)
- `memory/feedback_upstream_metadata_masquerade.md` (methodology #33)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit af058b1 into main May 17, 2026
10 checks passed
@noahgift noahgift deleted the feat/apr-stamp-hf-identity branch May 17, 2026 15:50
noahgift added a commit that referenced this pull request May 17, 2026
…H-MATCH-001 (SPEC §86.6 closure) (#1761)

Codifies the INV-INIT-ARCH-MATCH-001 invariant authored as runtime code
in PR #1760 (`validate_init_arch_matches_tensor_evidence` in
aprender-train::train::pretrain_real). Adds:

- FALSIFY-INIT-ARCH-MATCH-001: integration falsifier bound to the
  unit-test family `cargo test -p aprender-train --lib
  inv_init_arch_match_001` (7 tests covering: canonical §86 reject,
  inverse reject, matching qwen2 accept, matching llama accept, None
  metadata skip, unmappable metadata skip, GGUF-unknown tensor skip).
- INV-INIT-ARCH-MATCH-001 proof_obligation: safety invariant — when
  both metadata.architecture and tensor-name-inferred family resolve
  to concrete distinct slugs, gate MUST fail-fast before any training
  step. No false-positive when either side returns "unknown".

## Salvage path

The error message includes an inline `apr stamp` recipe (PR #1757):

```
apr stamp <pre-p0k.apr> --architecture qwen2 --hf-architecture Qwen2ForCausalLM \
                       -o <stamped.apr>
apr pretrain --init <stamped.apr> ...
```

## Refs

- PR #1742 (PMAT-690 P0-K base — producer-side stamping)
- PR #1750 (P3-A `apr inspect --quality` — surfaces hf_identity=0/20 pre-stamp)
- PR #1754 (SPEC §85 P2-E findings — context)
- PR #1757 (apr stamp HF identity extension — salvage path)
- PR #1758 (SPEC §86 amendment — defect specification this contract closes)
- PR #1760 (INV-INIT-ARCH-MATCH-001 runtime implementation)
- memory/feedback_upstream_metadata_masquerade.md (methodology #33)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 17, 2026
…C-prep defect 1) (#1769)

Closes Defect 1 surfaced by the §86 publish-readiness preflight on
P2-E ep49: pre-P0-K APRs trained from inits without embedded tokenizers
fail `apr run` with PMAT-172 ("APR file missing embedded tokenizer").
Without this fix, the §86 salvage produces a 6 GB HF-publish-ready
directory whose headline command doesn't work.

## What ships

- `ProvenancePatch` gains three optional fields:
  - `tokenizer_vocab: Option<Vec<String>>` — token strings indexed by id
  - `tokenizer_merges: Option<Vec<String>>` — BPE merge rules
  - `tokenizer_model_type: Option<String>` — e.g. "BPE", "Unigram"
- `stamp_provenance_bytes` extended to write these into
  `metadata.custom["tokenizer.vocabulary"]` / `tokenizer.merges` /
  `tokenizer.model_type` AND set the HAS_VOCAB header flag (the
  load-bearing check in `apr run`'s PMAT-172 gate).
- `apr stamp` CLI gains `--tokenizer <DIR>` flag. Accepts either:
  - `<dir>/vocab.json + <dir>/merges.txt` (HF GPT-2/Qwen BPE format,
    the Qwen-coder pretrain default)
  - `<dir>/tokenizer.json` (HF unified format)

## Operator workflow post-this-PR

```bash
apr stamp /mnt/.../p2e-epoch-049.apr \
    --architecture qwen2 \
    --hf-architecture Qwen2ForCausalLM \
    --hf-model-type qwen2 \
    --license Apache-2.0 \
    --data-source "..." \
    --data-license "Apache-2.0 / permissive-aggregate" \
    --tokenizer /mnt/nvme-raid0/tokenizers/qwen-0.5b-tokenizer-v3/ \
    -o /tmp/albor-370m-v1.apr
# Resulting APR is self-contained: apr run works without --tokenizer flag
apr run /tmp/albor-370m-v1.apr "def fibonacci(n):" --max-tokens 32
```

## Tests

- 3 new CLI tests in `apr-cli::commands::stamp::tests`:
  - `stamp_p3c_defect1_embeds_tokenizer_from_vocab_merges` — full
    happy path: vocab.json + merges.txt → embedded vocab array +
    merges array + HAS_VOCAB flag + BPE model_type
  - `stamp_p3c_defect1_tokenizer_alone_passes_has_any_gate` —
    --tokenizer alone (no other patches) satisfies has_any()
  - `stamp_p3c_defect1_tokenizer_dir_without_files_errors` —
    empty dir surfaces clear "neither tokenizer.json nor vocab.json"
- 10/10 stamp tests pass (3 new + 7 existing updated for the
  new `tokenizer_dir: Option<&Path>` arg slot)
- aprender-core stamp.rs tests: 6/6 pass (existing literals updated
  for the 3 new ProvenancePatch fields)

## Refs

- PR #1742 (PMAT-690 P0-K base — upstream stamping)
- PR #1750 (P3-A apr inspect --quality — the diagnostic that surfaces
  hf_identity=0/20 + tokenizer=0/15 pre-stamp)
- PR #1757 (apr stamp HF identity extension — this PR extends it)
- evidence/p2e-2026-05-17/ (the run this defect was surfaced on)
- memory/feedback_publish_readiness_preflight.md (#37)
- PMAT-172 (the gate that motivates this fix)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 18, 2026
…t (P3-C prep) (#1764)

Author the HuggingFace model card for `paiml/albor-370m-v1` and the
publish-readiness pre-flight script. Per SPEC §88: this model is
shipped as a stack-existence-proof, not a production code-completion
model. Both artifacts make that framing explicit so HF Hub users
calibrate expectations correctly.

## docs/model-cards/albor-370m-v1.md (255 lines)

Standard HF model card with model-index frontmatter:

- YAML metadata: Apache-2.0, code/python/stack-existence-proof tags,
  Qwen2.5-Coder-0.5B-Instruct base, codeparrot + the-stack-dedup
  datasets, val_loss=4.6227 / val_perplexity=101.78 metrics.
- §88 framing section spelling out the stack-existence-proof purpose.
- Training procedure table (architecture, optimizer, LR schedule,
  hardware, wall time, throughput — all from the §85 P2-E run).
- Trajectory table (every 5 epochs from 7.43 → 4.62).
- Intended uses (✅ stack demos, infra validation, tokenization
  round-trip, quantization research) vs NOT-recommended uses
  (production code-LM, zero-shot reasoning, long-context, HumanEval
  submission).
- Limitations (compute-bounded, plateau evidence, init lineage,
  val drift).
- Training data table (sources, sizes, licenses, role).
- How-to-use code snippets (apr CLI, Rust direct load, format export).
- Reproduce-the-run shell example using the exact §85 P2-E recipe.
- Citation, license/provenance, acknowledgments.

## scripts/publish/albor-370m-publish-readiness.sh (182 lines)

7-gate pre-flight checklist. GO / NO-GO verdict before invoking
`apr publish`. Gates:

1. `apr validate` exits 0
2. `apr inspect --quality` ≥ 90 (P3-A scorer; surfaces §86 salvage
   recipe inline if hf_identity < 20 or provenance < 25)
3. `apr qa --json` verdict = GO (8 gates)
4. Model card present + has HF YAML frontmatter
5. HF_TOKEN set
6. Smoke `apr run` produces text-like output
7. GGUF Q4_K + SafeTensors export round-trip both succeed

Exit 0 = ready to publish. Exit 1 = NO-GO with explicit blocker list.
Bashrs-validated (1 SEC011 false-positive on multi-condition rm -rf
guard; functionally safe).

## What this PR does NOT do

- Does NOT invoke `apr publish` (external action; requires user OK)
- Does NOT touch any APR files (read-only checks)
- Does NOT modify the §85 P2-E ep49 checkpoint (operator runs
  `apr stamp` via the §86.4 salvage recipe separately)

## Operator workflow (post-PR landing)

```bash
# 1. Stamp the pre-P0-K P2-E ep49 checkpoint to bring hf_identity up
apr stamp /mnt/nvme-raid0/runs/model-2-p2e-tuned-hp-20260517/ckpt/epoch-049.apr \
    --architecture qwen2 \
    --hf-architecture Qwen2ForCausalLM \
    --hf-model-type qwen2 \
    --license Apache-2.0 \
    --data-source "huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct + bigcode/the-stack-dedup + codeparrot/codeparrot-clean" \
    --data-license "Apache-2.0 / permissive-aggregate" \
    -o /tmp/albor-370m-v1.apr

# 2. Run the readiness check
bash scripts/publish/albor-370m-publish-readiness.sh /tmp/albor-370m-v1.apr
# Expected output: "VERDICT: GO" (or NO-GO with explicit blocker list)

# 3. Publish (still requires explicit user invocation)
apr publish paiml/albor-370m-v1 --formats apr,safetensors,gguf \
    --model-card docs/model-cards/albor-370m-v1.md
```

## Refs

- PR #1742 (PMAT-690 P0-K — upstream stamping)
- PR #1750 (P3-A `apr inspect --quality` — gate 2)
- PR #1754 (SPEC §84+§85+§86+§87+§88 stack — context)
- PR #1757 (apr stamp HF identity extension — §86 salvage)
- PR #1760 (INV-INIT-ARCH-MATCH-001 — validation chain)
- docs/specifications/aprender-train/ship-model-2-spec.md §88
- docs/specifications/aprender-train/albor-370m-roadmap.md §4 P3-C

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 18, 2026
…d --init (SPEC §86.6) (#1760)

Catches the §86 silent-failure pattern at the gate: when an APR's
metadata `architecture` claim contradicts what its tensor names imply,
`apr pretrain --init` exits non-zero with a clear naming-both-claims
error and an inline `apr stamp` salvage recipe.

## Background — the §86 case this catches

P2-G v1 was dispatched to resume P2-E ep49 for 10,000 more steps. The
init eval at step 0 produced val_loss = 8.60 — 1.86× P2-E ep49's
recorded 4.62. Silent failure: `--init` loaded random weights instead
of the trained checkpoint. Root cause walk-through:

1. `read_apr_architecture` parses `metadata.architecture = "LlamaForCausalLM"`
   (the §82 P0-H fallback when init_arch.hf_architecture is None).
2. `transformer_config_from_apr_metadata` builds a Llama-family
   TransformerConfig (dimensions correct, family discriminator wrong).
3. `populate_trainer_from_init_tensors` walks `trainer.named_parameters()` —
   produces Llama-style names — and looks them up in the APR tensor map
   which has Qwen2-style names. Mismatch → silent random-init fallback.
4. Training begins at random-init magnitude (val_loss ≈ 8.60).

This invariant catches step 1's wrong claim BEFORE step 3 silently
falls through.

## What this adds

Three new public functions in `aprender-train::train::pretrain_real`:

- `family_from_tensor_names(names: impl IntoIterator<Item=&str>)`
  → `&'static str` — lightweight tensor-name-only family inference
  (no data needed). Returns one of qwen3 / qwen2 / llama / mamba /
  rwkv / gpt-neox / opt / bert / gpt2 / unknown. Mirrors the
  heavyweight `infer_architecture_from_names` in
  aprender-core::format::converter::tokenizer_loader.
- `normalize_metadata_arch_family(arch: &str)` → `Option<&'static str>`
  — maps all three forms of the metadata `architecture` field to a
  canonical family slug: HF class names ("Qwen2ForCausalLM"), family
  slugs ("qwen2"), and capitalised legacy ("Qwen2"). Returns None
  for "unknown" / unmappable strings — caller treats as "no claim".
- `validate_init_arch_matches_tensor_evidence(metadata_arch, &tensors)`
  → `Result<(), String>` — the actual invariant gate. Errors with
  `FALSIFY-INIT-ARCH-MATCH-001` naming both the claimed and inferred
  families, plus an inline `apr stamp` recipe (PR #1757) for §86 salvage.

Wired into `build_shared_trainer_with_init` between `load_init_tensors_from_apr`
and `populate_trainer_from_init_tensors`. Read the raw metadata
`architecture` string via a new small helper (the `TransformerConfig`'s
`hf_architecture` field is None for pre-P0-K APRs — the §86 case — so
the cross-check needs the raw string field).

## Three skip-the-check fallback cases (no false-positives)

1. **No metadata claim** (metadata.architecture absent): nothing to
   contradict, allow.
2. **Unmappable claim** (e.g. "WeirdNovelArch"): novel arch is not §86,
   allow.
3. **Tensor inference returns "unknown"** (GGUF blk.* names can't
   disambiguate): trust the metadata, allow.

Only fail when BOTH inferences produce concrete family slugs AND they differ.

## Tests

- 7 new INV-INIT-ARCH-MATCH-001 tests in `pretrain_real::tests`:
  - `inv_init_arch_match_001_rejects_llama_stamped_qwen2_tensors` —
    canonical §86 case, must fail with falsifier ID + salvage recipe
  - `inv_init_arch_match_001_rejects_qwen2_stamped_llama_tensors` —
    inverse §86 case, must fail
  - `inv_init_arch_match_001_accepts_matching_qwen2/llama` — no
    false-positive on correctly-stamped APRs
  - `inv_init_arch_match_001_skips_when_metadata_absent` — None metadata
  - `inv_init_arch_match_001_skips_unmappable_metadata` — novel arch
  - `inv_init_arch_match_001_trusts_metadata_when_tensors_unknown` —
    GGUF blk.* case
- 1 helper test: `family_from_tensor_names_distinguishes_qwen2_from_llama`
- 1 normalizer test: `normalize_metadata_arch_family_handles_three_forms`

All 9 new tests pass. 7,595 existing aprender-train lib tests still pass
(the 3 pre-existing prune::snapshot_tests failures are insta-snapshot
drift in main, unrelated to this PR).

## Discharges

- §86.6 SPEC follow-up (forthcoming via #1758 stack)
- INV-INIT-ARCH-MATCH-001 invariant for `contracts/apr-pretrain-from-init-v1.yaml`
  (contract amendment is a separate small follow-up PR)

## Refs

- PR #1742 (PMAT-690 P0-K base — apr_convert + apr_import stamping)
- PR #1757 (apr stamp HF identity extension — the salvage path this
  invariant points operators to)
- PR #1758 (SPEC §86 amendment — context this invariant operationalizes)
- evidence/p2g-2026-05-17/section-86-draft.md
- memory/feedback_upstream_metadata_masquerade.md (methodology #33)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 18, 2026
…erified (#1754)

* docs(spec): SPEC §84+§85 — P2-C/P2-E live findings, hyperparameter hypothesis CORROBORATED, P0-K closure live-verified

Two new spec sections + full P2-E evidence directory.

## §84 — P2-C dispatched; audit hypothesis FALSIFIED; P0-K surfaced

P2-C ran the audit-recommended multi-source corpus (49.6B tokens, 80×
§82's 1.24B) at the same hyperparameters as §82. Result: val_loss=4.91
@ ep20 (vs §82's 4.71) — IDENTICAL termination shape, +0.2 WORSE despite
80× more data. The Chinchilla-data-starvation hypothesis is FALSIFIED.

Debugging the §81-§83 5-PR cascade surfaced PMAT-690 P0-K: `apr convert`
(both apr_import and apr_convert paths) didn't stamp hf_architecture /
hf_model_type / embedded tokenizer. Five downstream consumer fixes had
been patching None values that read from the upstream gap. P0-K closes
the producer.

## §85 — P2-E live findings; hyperparameter hypothesis CORROBORATED

P2-E ran same qwen-v3 corpus at LR=1.5e-5 (-3.3× lower) + warmup=500
(5× longer). Result: val_loss=4.6227 @ ep49 — BELOW §82's 4.71 AND
P2-C's 4.91 floors. No early-stop; smooth monotonic descent across all
50 epochs. Hypothesis from §84 P2-E queue is CORROBORATED.

Training throughput: 15,460 tok/s pure (12,880 tok/s end-to-end with
checkpoint write) on RTX 4090, sm_89, cuBLAS TF32. This is the
canonical apr-cli CUDA training perf baseline for future dispatches.

§30 a-priori falsification lesson amendment: the audit's
pre-falsification of P2-A2 was correct at the original LR but wrong
as a general claim. Future audits MUST explicitly bound their
falsification to the hyperparameter region tested.

## P0-K live-verification

Synthetic `apr convert` → `apr inspect --quality` round-trip on
/tmp/p0k-demo/out.apr (Qwen2 config.json + tiny safetensors fixture)
produces:
- metadata.hf_architecture = "Qwen2ForCausalLM" (was null pre-P0-K)
- metadata.hf_model_type = "qwen2" (was null pre-P0-K)
- quality.score = 60/100, hf_identity sub-score = 20/20

vs the pre-P0-K P2-E ep49 checkpoint (trained from an init APR that
pre-dates P0-K):
- metadata.hf_architecture = null
- quality.score = 40/100, hf_identity sub-score = 0/20

The +20 delta on hf_identity empirically confirms P0-K closes the
§81-§83 cascade root cause at the CLI surface.

## Ship % impact

MODEL-2 stays at 79%. val_loss 4.62 > 3.0 ship gate. Marginal-gain
decay analysis says more-of-the-same plateaus ~4.4. Next move (§85
P2-G/H/I queue) requires architectural change or different init.

## Refs

- PR #1742 (PMAT-690 P0-K base — apr_import + apr_convert stamping)
- PR #1744 (PMAT-690 P2-F — apr pretrain --val-shard)
- PR #1746 (P0-K inspect surface)
- PR #1748 (P0-K E2E test + apr_convert second path)
- PR #1750 (P3-A apr inspect --quality scorer)
- memory/feedback_upstream_metadata_masquerade.md (lesson #33)
- memory/feedback_parallel_session_worktree_isolation.md (lesson #34)
- memory/feedback_cargo_feature_cache_staleness.md (lesson #35)
- evidence/p2c-2026-05-17/findings.md (P2-C trajectory + root cause)
- evidence/p2e-2026-05-17/findings.md (P2-E corroboration + perf baseline)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(spec): SPEC §86 — apr pretrain --init silently fails on arch-mismatched APRs; PR #1757 ships in-place stamp salvage

P2-G v1 dispatch surfaced a SECOND symptom of the §81-§84 cascade root
cause: pre-P0-K APR checkpoints (architecture="LlamaForCausalLM" P0-H
fallback + Qwen2-tensor shape) are silently non-resumable via
`apr pretrain --init`. The init eval at step 0 produced val_loss=8.60
instead of P2-E ep49's recorded 4.62 — definitive proof of silent
fall-back to random init when the apr metadata's family-arch
discriminator doesn't match the tensor naming convention.

## What §86 covers

1. Root cause walk-through (read_apr_architecture → transformer_config
   → populate_trainer_from_init_tensors → silent rejection → random
   init fallback at val_loss ≈ 8.60).
2. Implications: all training checkpoints produced before #1742 landed
   (2026-05-17T13:32:08Z) are non-resumable. The 50 P2-E checkpoints
   (~125 GB total) cannot be used for continuation training without
   intervention.
3. Three workarounds in priority order:
   - **Re-import** (blocked on HF safetensors locally — would need
     re-download)
   - **Restamp in-place** ✅ **SHIPPED via PR #1757** — `apr stamp`
     extension with --hf-architecture/--hf-model-type/--architecture
   - **Treat as final** — what P2-G v2 takes (currently in flight)
4. Operator recipe for the §86 salvage (3-line shell example).
5. Failure-mode classification (Class 4 Silent Incorrect Behavior,
   detection latency 1 epoch, producer-side fix already shipped via
   P0-K, existing-artifact fix shipped via #1757).
6. Recommended follow-up: INV-INIT-ARCH-MATCH-001 invariant on
   apr-pretrain-from-init-v1 contract — would catch the §86 case at
   the gate instead of at init-eval surface. Defer to follow-up PR.

## Stacked on PR #1754 (SPEC §85)

Base: `feat/spec-85-p2e-findings`. The §86 amendment depends on §85
context (the P2-E run that surfaced §86). Will auto-rebase to main
after #1754 lands.

## Refs

- PR #1742 (PMAT-690 P0-K base — apr_import + apr_convert stamping)
- PR #1750 (P3-A `apr inspect --quality` scorer — the diagnostic
  that surfaces §86 quality=40 pre-stamp, 60 post-stamp)
- PR #1754 (SPEC §85 P2-E findings — the run that surfaced §86)
- PR #1757 (apr stamp HF identity extension — workaround #2 above)
- evidence/p2g-2026-05-17/section-86-draft.md
- memory/feedback_upstream_metadata_masquerade.md (methodology #33)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(spec): §87 + §88 — Chinchilla 20·N gate + AC-SHIP2-003 compute-bounded ship target; MODEL-2 ships at 95%

Two new spec sections plus the AC-SHIP2-003 row amendment that
unblocks the Two-Model spec closure.

## §87 — Chinchilla 20·N hard gate (P0-J' upgrade)

Per the §85 P2-E + §85.4 P2-G empirical sequence, the 10-20× "ablation
band" hits a val_loss ≈ 4.65 plateau regardless of hyperparameter
tuning. The §83 v1.0.0 gate (hard at <10, warn-only at 10-20) is
upgraded to hard at <20. Audit's compute-optimal target now enforced
as the hard floor. Codified via PR #1762.

## §88 — AC-SHIP2-003 compute-bounded ship target

Per user direction (Option 4): the strict CE ≤ 2.2 target requires
9-day continuous compute (213 GPU-hours), violating the 48-hour
single-shot limit. §88 amends:

- `AC-SHIP2-003` (loose form, new compute-bounded target):
  val CE ≤ 4.7. P2-E's 4.6227 DISCHARGES.
- `AC-SHIP2-003-STRICT` (NEW, preserved as distillation epic
  target): val CE ≤ 2.2. Belongs to PMAT-683/684 (multi-week).

Rationale: the Two-Model spec is an EXISTENCE PROOF of the Sovereign
AI Stack. P2-E's converged 4.62 proves the Rust-only pipeline
end-to-end works perfectly — compute time, not software capability,
is the bottleneck. Iteration speed on the stack outweighs hitting a
specific perplexity target on a proof-of-concept model.

Downstream effects:
- MODEL-2 ship % advances 79% → 95%.
- All remaining unblocked ACs (AC-SHIP2-007/008/009/010) become
  operator-dispatchable within the 48-hr compute budget.
- P3-C (HF publish) and P3-D (/dogfood) are unblocked.
- AC-SHIP2-003-STRICT is the dispatch target for the distillation
  follow-up epic (NOT a ship blocker for v1).

## What §88 explicitly does NOT do

- Does NOT lower the model-quality bar for production. The shipped
  artifact is a stack-capability proof, not a production model.
  Model card will note val_loss ≈ 4.62 and the §88 framing.
- Does NOT retire AC-SHIP2-003 — renames the strict form to
  AC-SHIP2-003-STRICT, amends the loose form.
- Does NOT block future stricter ships on larger architectures.

## Refs

- PR #1742 (PMAT-690 P0-K base)
- PR #1754 (SPEC §84+§85+§86 context)
- PR #1762 (§87 Chinchilla 20×N hard gate runtime)
- docs/specifications/audits/albor-370.md (external audit motivation)
- docs/specifications/aprender-train/albor-370m-roadmap.md (P3 phases)
- memory/feedback_a_priori_theoretical_falsification.md (#30)
- memory/feedback_audit_hypothesis_bounds.md (#36)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(spec): §89 distillation epic scoping + roadmap status sweep + /dogfood template

Closes the §80-class spec stack for MODEL-2 v1 ship. Three artifacts:

## §89 — distillation epic scoping (SPEC)

Documents the path to AC-SHIP2-003-STRICT (val_loss ≤ 2.2) via
Qwen-7B teacher distillation. ~110 lines covering:

- 89.1 Why distillation works at this scale (Stanton et al. 2021's
  5× token-reduction claim → 9.88B → 2B tokens → 43h GPU fits the
  48-hour iteration budget).
- 89.2 Existing infrastructure inventory (aprender-train::distill
  + apr distill CLI + realizar 7B Q4_K load + apr pretrain --init
  with post-§86 INV-INIT-ARCH-MATCH-001 gate — all already in-tree).
- 89.3 PMAT-683 teacher selection + pull (4-6h scope).
- 89.4 PMAT-684 distillation training dispatch + evidence (~43h
  GPU + 8h operator, fits 48-hour budget).
- 89.5 PMAT-685 hardening (deferred — multi-teacher / curriculum /
  LR cycling / layer-wise losses).
- 89.6 Out-of-scope alternatives explicitly rejected (9-day compute,
  1.5B+ arch, multi-host distributed).
- 89.7 Sequencing — v1 must ship + /dogfood GO + at least one
  external consumer validation BEFORE v2 dispatches.
- 89.8 Discharge criteria.

## Roadmap status sweep

`docs/specifications/aprender-train/albor-370m-roadmap.md` P3 table
updated to reflect actual ship state:

- P3-A apr inspect --quality: ✅ SHIPPED (PR #1750)
- P3-B apr lint: ⚙️ operator-dispatchable
- P3-C-prep model card + readiness: ✅ SHIPPED (PR #1764)
- P3-C-exec apr publish: 🟡 OPERATOR-READY
- P3-D /dogfood: 🟡 TEMPLATE READY (this PR)

Plus new P4 section for the distillation epic (PMAT-683/684/685
expanded entries with effort + probability + acceptance criteria),
and a new §7 Post-§88 shipping plan that supersedes the 4-week plan
which assumed val_loss < 3.0 was achievable within iteration budget.

## /dogfood verdict template

`docs/dogfood-templates/albor-370m-v1-dogfood-template.md` (236
lines) — pre-author the post-publish QA checklist so when operator
runs /dogfood after apr publish, the structure is ready. 8 sections:
provenance + identity, pull/install verification, inference smoke,
benchmark, format export round-trip, apr qa, /dogfood 12+5 gates,
independent consumer test (the §89.7 validation-by-use gate that
sequences v2 distillation dispatch), final verdict + post-verdict
actions (GO / WARN / NO-GO branching).

## What this PR does NOT do

- Does NOT actually run /dogfood (template only — execution gated
  on P3-C-exec which requires user authorization)
- Does NOT dispatch PMAT-683/684 distillation (43h GPU; explicit
  user authorization required + sequencing per §89.7)
- Does NOT close ship-model-2-spec.md (stays at 95% per §88 until
  P3-C-exec lands)

## Stacked on PR #1754 (SPEC §84-§88)

Base: `feat/spec-85-p2e-findings`. The §89 scoping depends on the
§88 framing. Will auto-rebase to main after #1754 lands.

## Refs

- PR #1742 (PMAT-690 P0-K base)
- PR #1750 (P3-A apr inspect --quality)
- PR #1754 (SPEC §84-§88 stack — context)
- PR #1757 (apr stamp HF identity — §86 salvage path)
- PR #1764 (model card + readiness script — P3-C-prep)
- memory/feedback_post_publish_qa_required.md (#29)
- memory/feedback_publish_readiness_preflight.md (#37)
- Hinton et al. 2015 (arXiv:1503.02531) — distillation foundations
- Stanton et al. 2021 (arXiv:2106.05945) — 5× token-reduction claim

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant