feat(apr-convert): stamp hf_architecture/hf_model_type from config.json (PMAT-690 P0-K) by noahgift · Pull Request #1742 · paiml/aprender

noahgift · 2026-05-17T10:34:14Z

Summary

apr convert <src.safetensors> now extracts architectures[0] and model_type from a sibling config.json and stamps them into AprV2Metadata.hf_architecture + .hf_model_type. Closes the upstream producer gap that masqueraded as the §81–§83 Class 3 packaging cascade — 5 prior PRs patched downstream consumers (apr qa, apr bench, GGUF export mapper, apr pretrain checkpoint stamping) but each re-failed on every fresh P2-C-style live training run because the imported init APR had hf_architecture = None.

GGUF → APR conversion has no architectures[] source, so the GGUF import path synthesizes the canonical HF class name from the family slug via synthesize_hf_architecture_from_family (qwen2 → Qwen2ForCausalLM, llama → LlamaForCausalLM, gemma2 → Gemma2ForCausalLM, etc.) so round-tripping a GGUF through APR preserves arch identity for llama-cli interop.

What changes

AprV2Metadata (crates/aprender-core/src/format/v2/header_impl.rs): adds hf_architecture: Option<String> + hf_model_type: Option<String> after the existing architecture field.
GgufModelConfig (crates/aprender-core/src/format/gguf/api.rs): same two fields.
load_model_config_from_json (crates/aprender-core/src/format/converter/source_load_result.rs): extracts architectures[0] into hf_architecture, mirrors model_type into hf_model_type.
write_apr_file (crates/aprender-core/src/format/converter/write.rs): stamps both new fields into AprV2Metadata.
synthesize_hf_architecture_from_family (crates/aprender-core/src/format/converter/write_model_config.rs): new helper used by the GGUF→APR path; canonical naming convention with explicit Qwen/Llama/Mistral/Phi/Gemma special cases + default capitalize-and-suffix for unknown families.

Discharges

PMAT-690 P0-K (per docs/specifications/aprender-train/albor-370m-roadmap.md §4 P0-K)
INV-CONVERT-HF-ARCH-001/002/003/004 (new contract contracts/apr-convert-hf-arch-v1.yaml)

Methodology

Lesson #33 (memory/feedback_upstream_metadata_masquerade.md): when a Class 3 packaging wave extends past 4–5 defects in the same code area, the producer is the defect, not the consumers. The 2026-05-15 → 2026-05-17 §81–§83 cascade fixed 5 downstream consumers without addressing the upstream producer. P2-C's live training run (evidence/p2c-2026-05-17/findings.md) re-exhibited every failure because the imported init APR had hf_architecture = None.

Test plan

cargo test -p aprender-core --lib loads_hf_architecture_from_architectures_array — passes (extract from config.json)
cargo test -p aprender-core --lib missing_architectures_leaves_hf_architecture_none — passes (no fabrication)
cargo test -p aprender-core --lib picks_first_when_multiple_architectures — passes (HF convention)
cargo test -p aprender-core --lib known_families_map_to_canonical_class_names — passes (qwen2/llama/mistral/gemma2)
cargo test -p aprender-core --lib unknown_family_capitalizes_first_letter — passes (fallback)
cargo test -p aprender-core --lib format::converter — 1,260 tests pass, 0 failures, 0 regressions
cargo test -p aprender-contracts --lib lint::gates::tests::load_contracts_real — passes (new YAML parses against schema)
cargo test -p aprender-contracts --lib lint::tests::lint_passes_on_real_contracts — passes (no warnings introduced)
cargo check -p aprender-core --lib — clean

Refs

docs/specifications/aprender-train/ship-model-2-spec.md §84
evidence/p2c-2026-05-17/findings.md
memory/feedback_upstream_metadata_masquerade.md (methodology Add feature importance example to random_forest_regression.rs #33)
contracts/apr-convert-hf-arch-v1.yaml

🤖 Generated with Claude Code

…on (PMAT-690 P0-K) `apr convert <src.safetensors>` now extracts `architectures[0]` and `model_type` from a sibling `config.json` and stamps them into `AprV2Metadata.hf_architecture` + `.hf_model_type`. Closes the upstream producer gap that masqueraded as the §81-§83 Class 3 packaging cascade (5 PRs patching downstream consumers — each re-failed on every fresh P2-C-style live training run because the imported init APR had hf_architecture = None). GGUF -> APR conversion has no `architectures[]` source, so the GGUF import path synthesizes the canonical HF class name from the family slug via `synthesize_hf_architecture_from_family` (qwen2 -> Qwen2ForCausalLM, llama -> LlamaForCausalLM, etc.) so round-tripping a GGUF through APR preserves arch identity for llama-cli interop. Discharges: - PMAT-690 P0-K (per albor-370m-roadmap.md §4 P0-K) - INV-CONVERT-HF-ARCH-001/002/003/004 (new contract apr-convert-hf-arch-v1) Tests: 3 unit tests on load_model_config_from_json + 2 on synthesize_hf_architecture_from_family. Full converter module: 1,260 tests pass locally. Methodology lesson #33 applied: when a Class 3 packaging wave extends past 4-5 defects, the producer is the defect (memory/feedback_upstream_metadata_masquerade.md). Refs: - docs/specifications/aprender-train/ship-model-2-spec.md §84 - evidence/p2c-2026-05-17/findings.md - contracts/apr-convert-hf-arch-v1.yaml Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…P0-K follow-up) (#1746) `apr inspect` now renders the HF identity fields that PMAT-690 P0-K stamps into AprV2Metadata. Operators can verify upstream `apr convert` stamping via `apr inspect --json | jq .metadata.hf_architecture` and `.metadata.hf_model_type` instead of grepping source code. ## What changes - MetadataInfo gains `hf_architecture: Option<String>` + `hf_model_type: Option<String>` fields (both serialize as null when None — NOT skipped via skip_serializing_if, mirroring the C-APR-PROVENANCE pattern so auditors can grep-check every output). - `read_metadata` copies the two fields from AprV2Metadata into MetadataInfo. - `output_architecture` (text path) renders new "HF Class" and "HF model_type" rows beneath the existing "Family" row when populated. ## Stacked on top of PR #1742 (P0-K) This branch is based on `feat/pmat-690-p0k-apr-convert-hf-arch-v2` because it depends on the AprV2Metadata fields that #1742 adds. Will rebase to main after #1742 lands. ## Tests - `pmat_690_p0k_inspect_emits_hf_arch_keys_when_none` — both keys serialize as null (not skipped) when absent. Required for the grep-check audit recipe. - `pmat_690_p0k_inspect_emits_hf_arch_values_when_populated` — when populated, keys render the actual values (Qwen2ForCausalLM / qwen2). - Full apr-cli lib suite: 5,938 tests pass, 0 regressions. ## Refs - PR #1742 (PMAT-690 P0-K — the upstream stamping) - contracts/apr-convert-hf-arch-v1.yaml (round-trip invariant) - docs/specifications/aprender-train/ship-model-2-spec.md §84 - memory/feedback_upstream_metadata_masquerade.md (methodology #33) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…tecture (PMAT-690 P0-K extension) Closes the second half of the §84 P0-K root cause: `apr convert <safetensors>` goes through `apr_convert` (in mod.rs), NOT through `apr_import` (which is used by `apr pull` / `apr import`). The original P0-K commit patched `apr_import` end-to-end but left `apr_convert` reading no config.json for SafeTensors sources — meaning the very CLI the §84 evidence indicted ("apr convert ... does NOT stamp apr_metadata.hf_architecture") was still broken after P0-K v1. This commit: - Makes `apr_convert` read sibling config.json for SafeTensors sources (previously only for GGUF), populating the full GgufModelConfig including hf_architecture + hf_model_type. - Threads hf_architecture + hf_model_type through `save_model_tensors_with_gguf_config_and_tokenizer` (the writer used by the apr_convert path). - Patches the streaming-import AprV2Metadata initializer in import.rs (the `realizar#136` path triggered for sharded SafeTensors >10B params) that had been missed in P0-K v1. ## Integration test (closes FALSIFY-CONVERT-HF-ARCH-001 at the CLI surface) `crates/apr-cli/tests/p0k_convert_inspect_e2e_test.rs` exercises the FULL chain that the §81-§83 cascade unknowingly assumed worked: 1. Stage tempdir with synthetic Qwen2 config.json + safetensors fixture 2. Run `apr convert <safetensors> -o out.apr --compress none` 3. Run `apr inspect out.apr --json` 4. Assert `metadata.hf_architecture == "Qwen2ForCausalLM"` 5. Assert `metadata.hf_model_type == "qwen2"` This is the test that would have caught the §81-§83 cascade in the first place per methodology lesson #33 (memory/feedback_upstream_metadata_masquerade.md): the absent end-to-end test was what let 5 PRs ship downstream consumer fixes without anyone noticing the upstream producer was broken. Also includes a negative test: when config.json is ABSENT alongside the safetensors, hf_architecture / hf_model_type MUST remain null (no fabrication). ## Stacked on PR #1742 (P0-K) + #1746 (P0-K inspect surfacing) Base: feat/pmat-690-p0k-apr-convert-hf-arch-v2 (which already absorbed #1746). Will auto-rebase to main after #1742 merges. ## Tests - 2 new E2E integration tests pass - 5,938 apr-cli unit/integration tests pass (no regressions) - 1,260 aprender-core converter tests pass (no regressions) - contracts lint: clean ## Refs - PR #1742 (PMAT-690 P0-K — base stamping) - PR #1746 (PMAT-690 P0-K — apr inspect surfacing) - contracts/apr-convert-hf-arch-v1.yaml - evidence/p2c-2026-05-17/findings.md §76-§83 - memory/feedback_upstream_metadata_masquerade.md (methodology #33) - memory/feedback_parallel_session_worktree_isolation.md (methodology #34) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

`apr inspect --quality` emits a 0-100 model quality score for any APR file. Per SPEC-SHIP-TWO-001 §84 P3-A (AC-SHIP2-007), ship-ready models MUST score ≥ 90. The scorer is a transparent weighted sum across five sub-scores: | Sub-score | Weight | Checks | |--------------|--------|-----------------------------------------------| | physics | 20 | header.checksum_valid | | structural | 20 | arch + hidden_size + num_layers + num_heads | | provenance | 25 | license + data_source + data_license non-null | | hf_identity | 20 | hf_architecture + hf_model_type non-null | | tokenizer | 15 | has_vocab flag (HAS_VOCAB bit set) | Weights reflect SPEC §84 ship-blocker priorities: provenance + HF identity are weighted heaviest because their absence was the exact §81-§83 cascade root cause we shipped in P0-K (#1742). The ≥ 90 gate allows at most one sub-score missing — typically `has_vocab` (15 pts) is the recoverable one for distilled / from-scratch models without an embedded tokenizer. ## Operator workflow ```bash # Verify a model is ship-ready apr inspect model.apr --quality --json | jq '.quality' # { # "score": 100, # "ship_ready": true, # "threshold": 90, # "breakdown": { "physics": 20, "structural": 20, "provenance": 25, ... } # } # Text mode for human review apr inspect model.apr --quality # Quality (0-100): # Score: 75 / 100 # Ship-ready (≥90 per AC-SHIP2-007): NO # Breakdown: # physics: 20 / 20 # structural: 20 / 20 # provenance: 0 / 25 ← missing license/data_source/data_license # hf_identity: 20 / 20 # tokenizer: 15 / 15 ``` ## Stacked on PR #1742 (P0-K base) + #1746 (inspect surfacing) + #1748 (E2E test) Base: `feat/pmat-690-p0k-apr-convert-hf-arch-v2`. Depends on the hf_architecture / hf_model_type fields that P0-K v1 + v2 added. Will auto-rebase to main after the P0-K stack lands. ## Tests - 4 new unit tests in `inspect_tests.rs::pmat_690_p3a_*`: - Ship-ready model scores ≥ 90 (full provenance + HF + has_vocab) - No HF + no provenance caps at ≤ 55 (the §81-§83 cascade scenario) - Invalid checksum drops physics to 0, blocks ship gate - QualityReport JSON contains all 5 breakdown sub-scores - Full apr-cli lib suite: 5,942 tests pass, 0 regressions ## Discharges - PMAT-690 P3-A (per albor-370m-roadmap.md §4 P3-A) - AC-SHIP2-007 (apr inspect --quality ≥ 90 gate per spec §5.2) ## Refs - PR #1742 (PMAT-690 P0-K — base stamping) - PR #1746 (P0-K inspect surfacing) - PR #1748 (P0-K E2E test) - docs/specifications/aprender-train/ship-model-2-spec.md §84 - docs/specifications/aprender-train/albor-370m-roadmap.md §4 P3-A - memory/feedback_upstream_metadata_masquerade.md (methodology #33) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-e2e feat(apr-convert): apr_convert path + E2E integration test (PMAT-690 P0-K extension)

feat(apr-inspect): --quality 0-100 model quality scorer (PMAT-690 P3-A)

…AT-690 P0-K extension) (#1757) Extends the existing `apr stamp` command (PR #1050 — provenance fields) to also patch HF identity + architecture family slug in place. Unblocks in-place salvage of pre-P0-K APR checkpoints whose architecture stamps were corrupted by the §82 P0-H fallback. ## Background — SPEC §86 root cause The §85 P2-E live run produced 50 epoch checkpoints (~125 GB total) at best val_loss=4.6227. P2-G v1 attempted to resume from P2-E ep49 and the init eval surfaced val_loss=8.60 — proof that --init silently failed to load the trained weights. Root cause: P2-E's init APR pre-dates P0-K (PR #1742), so the P0-H fallback stamped architecture="LlamaForCausalLM" into the trained checkpoint despite the actual tensors being Qwen2-shaped. `apr pretrain --init` reads the (wrong) architecture stamp and rejects the load. ## What this PR adds Three new CLI flags on `apr stamp`: - `--hf-architecture <CLASS>` (e.g. Qwen2ForCausalLM) — the HF class name. PMAT-690 P0-K's upstream stamp. - `--hf-model-type <SLUG>` (e.g. qwen2) — config.json::model_type. - `--architecture <SLUG>` (e.g. qwen2) — the lowercase family slug that `apr pretrain --init` reads for arch dispatch. **This is the load-bearing field** for §86 salvage — patching just hf_architecture alone won't unblock `apr pretrain --init`. The existing `--license` / `--data-source` / `--data-license` flags are unchanged. The patch struct's `has_any()` gate now accepts any combination of the six fields; at least one must be specified or the stamp is rejected up-front. ## Operator workflow for §86 salvage ```bash # Patch a pre-P0-K Qwen2-actual-Llama-stamped checkpoint in place apr stamp /path/to/p2e-epoch-049.apr \ --architecture qwen2 \ --hf-architecture Qwen2ForCausalLM \ --hf-model-type qwen2 \ -o /path/to/p2e-epoch-049-stamped.apr # Verify apr inspect /path/to/p2e-epoch-049-stamped.apr --quality --json | jq .quality # breakdown.hf_identity should jump 0 → 20 # Now usable as init for resume training: apr pretrain --init /path/to/p2e-epoch-049-stamped.apr ... ``` ## Discharges - §86 SPEC amendment (`evidence/p2g-2026-05-17/section-86-draft.md`) — workaround #2 (in-place restamp) - Salvages ~125 GB of pre-P0-K P2-E checkpoints without a 53-min retrain - Establishes a pattern for in-place metadata patching that future spec amendments can build on (e.g., a `--name` / `--description` extension for model card metadata) ## Tests - 2 new unit tests in `aprender-core::format::v2::stamp` (extends 6 → 6, existing tests adjusted for new struct fields via Default) - 2 new CLI tests in `apr-cli::commands::stamp` (extends 5 → 7): - `stamp_p0k_recovers_pre_p0k_apr_identity` — full §86 use case - `stamp_p0k_partial_hf_architecture_only` — verifies field independence (stamp one without touching others) - All 5,944 apr-cli lib tests pass — 0 regressions - All 13,800 aprender-core lib tests pass — 0 regressions ## Refs - PR [#1742](#1742) (PMAT-690 P0-K base) - PR [#1750](#1750) (P3-A `apr inspect --quality` scorer) - PR [#1754](#1754) (SPEC §85 P2-E findings) - PR #1050 (the original `apr stamp` PR — this extends it) - `docs/specifications/aprender-train/ship-model-2-spec.md §86` (forthcoming) - `evidence/p2g-2026-05-17/section-86-draft.md` (root cause + workaround analysis) - `memory/feedback_upstream_metadata_masquerade.md` (methodology #33) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…H-MATCH-001 (SPEC §86.6 closure) (#1761) Codifies the INV-INIT-ARCH-MATCH-001 invariant authored as runtime code in PR #1760 (`validate_init_arch_matches_tensor_evidence` in aprender-train::train::pretrain_real). Adds: - FALSIFY-INIT-ARCH-MATCH-001: integration falsifier bound to the unit-test family `cargo test -p aprender-train --lib inv_init_arch_match_001` (7 tests covering: canonical §86 reject, inverse reject, matching qwen2 accept, matching llama accept, None metadata skip, unmappable metadata skip, GGUF-unknown tensor skip). - INV-INIT-ARCH-MATCH-001 proof_obligation: safety invariant — when both metadata.architecture and tensor-name-inferred family resolve to concrete distinct slugs, gate MUST fail-fast before any training step. No false-positive when either side returns "unknown". ## Salvage path The error message includes an inline `apr stamp` recipe (PR #1757): ``` apr stamp <pre-p0k.apr> --architecture qwen2 --hf-architecture Qwen2ForCausalLM \ -o <stamped.apr> apr pretrain --init <stamped.apr> ... ``` ## Refs - PR #1742 (PMAT-690 P0-K base — producer-side stamping) - PR #1750 (P3-A `apr inspect --quality` — surfaces hf_identity=0/20 pre-stamp) - PR #1754 (SPEC §85 P2-E findings — context) - PR #1757 (apr stamp HF identity extension — salvage path) - PR #1758 (SPEC §86 amendment — defect specification this contract closes) - PR #1760 (INV-INIT-ARCH-MATCH-001 runtime implementation) - memory/feedback_upstream_metadata_masquerade.md (methodology #33) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…C-prep defect 1) (#1769) Closes Defect 1 surfaced by the §86 publish-readiness preflight on P2-E ep49: pre-P0-K APRs trained from inits without embedded tokenizers fail `apr run` with PMAT-172 ("APR file missing embedded tokenizer"). Without this fix, the §86 salvage produces a 6 GB HF-publish-ready directory whose headline command doesn't work. ## What ships - `ProvenancePatch` gains three optional fields: - `tokenizer_vocab: Option<Vec<String>>` — token strings indexed by id - `tokenizer_merges: Option<Vec<String>>` — BPE merge rules - `tokenizer_model_type: Option<String>` — e.g. "BPE", "Unigram" - `stamp_provenance_bytes` extended to write these into `metadata.custom["tokenizer.vocabulary"]` / `tokenizer.merges` / `tokenizer.model_type` AND set the HAS_VOCAB header flag (the load-bearing check in `apr run`'s PMAT-172 gate). - `apr stamp` CLI gains `--tokenizer <DIR>` flag. Accepts either: - `<dir>/vocab.json + <dir>/merges.txt` (HF GPT-2/Qwen BPE format, the Qwen-coder pretrain default) - `<dir>/tokenizer.json` (HF unified format) ## Operator workflow post-this-PR ```bash apr stamp /mnt/.../p2e-epoch-049.apr \ --architecture qwen2 \ --hf-architecture Qwen2ForCausalLM \ --hf-model-type qwen2 \ --license Apache-2.0 \ --data-source "..." \ --data-license "Apache-2.0 / permissive-aggregate" \ --tokenizer /mnt/nvme-raid0/tokenizers/qwen-0.5b-tokenizer-v3/ \ -o /tmp/albor-370m-v1.apr # Resulting APR is self-contained: apr run works without --tokenizer flag apr run /tmp/albor-370m-v1.apr "def fibonacci(n):" --max-tokens 32 ``` ## Tests - 3 new CLI tests in `apr-cli::commands::stamp::tests`: - `stamp_p3c_defect1_embeds_tokenizer_from_vocab_merges` — full happy path: vocab.json + merges.txt → embedded vocab array + merges array + HAS_VOCAB flag + BPE model_type - `stamp_p3c_defect1_tokenizer_alone_passes_has_any_gate` — --tokenizer alone (no other patches) satisfies has_any() - `stamp_p3c_defect1_tokenizer_dir_without_files_errors` — empty dir surfaces clear "neither tokenizer.json nor vocab.json" - 10/10 stamp tests pass (3 new + 7 existing updated for the new `tokenizer_dir: Option<&Path>` arg slot) - aprender-core stamp.rs tests: 6/6 pass (existing literals updated for the 3 new ProvenancePatch fields) ## Refs - PR #1742 (PMAT-690 P0-K base — upstream stamping) - PR #1750 (P3-A apr inspect --quality — the diagnostic that surfaces hf_identity=0/20 + tokenizer=0/15 pre-stamp) - PR #1757 (apr stamp HF identity extension — this PR extends it) - evidence/p2e-2026-05-17/ (the run this defect was surfaced on) - memory/feedback_publish_readiness_preflight.md (#37) - PMAT-172 (the gate that motivates this fix) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…t (P3-C prep) (#1764) Author the HuggingFace model card for `paiml/albor-370m-v1` and the publish-readiness pre-flight script. Per SPEC §88: this model is shipped as a stack-existence-proof, not a production code-completion model. Both artifacts make that framing explicit so HF Hub users calibrate expectations correctly. ## docs/model-cards/albor-370m-v1.md (255 lines) Standard HF model card with model-index frontmatter: - YAML metadata: Apache-2.0, code/python/stack-existence-proof tags, Qwen2.5-Coder-0.5B-Instruct base, codeparrot + the-stack-dedup datasets, val_loss=4.6227 / val_perplexity=101.78 metrics. - §88 framing section spelling out the stack-existence-proof purpose. - Training procedure table (architecture, optimizer, LR schedule, hardware, wall time, throughput — all from the §85 P2-E run). - Trajectory table (every 5 epochs from 7.43 → 4.62). - Intended uses (✅ stack demos, infra validation, tokenization round-trip, quantization research) vs NOT-recommended uses (production code-LM, zero-shot reasoning, long-context, HumanEval submission). - Limitations (compute-bounded, plateau evidence, init lineage, val drift). - Training data table (sources, sizes, licenses, role). - How-to-use code snippets (apr CLI, Rust direct load, format export). - Reproduce-the-run shell example using the exact §85 P2-E recipe. - Citation, license/provenance, acknowledgments. ## scripts/publish/albor-370m-publish-readiness.sh (182 lines) 7-gate pre-flight checklist. GO / NO-GO verdict before invoking `apr publish`. Gates: 1. `apr validate` exits 0 2. `apr inspect --quality` ≥ 90 (P3-A scorer; surfaces §86 salvage recipe inline if hf_identity < 20 or provenance < 25) 3. `apr qa --json` verdict = GO (8 gates) 4. Model card present + has HF YAML frontmatter 5. HF_TOKEN set 6. Smoke `apr run` produces text-like output 7. GGUF Q4_K + SafeTensors export round-trip both succeed Exit 0 = ready to publish. Exit 1 = NO-GO with explicit blocker list. Bashrs-validated (1 SEC011 false-positive on multi-condition rm -rf guard; functionally safe). ## What this PR does NOT do - Does NOT invoke `apr publish` (external action; requires user OK) - Does NOT touch any APR files (read-only checks) - Does NOT modify the §85 P2-E ep49 checkpoint (operator runs `apr stamp` via the §86.4 salvage recipe separately) ## Operator workflow (post-PR landing) ```bash # 1. Stamp the pre-P0-K P2-E ep49 checkpoint to bring hf_identity up apr stamp /mnt/nvme-raid0/runs/model-2-p2e-tuned-hp-20260517/ckpt/epoch-049.apr \ --architecture qwen2 \ --hf-architecture Qwen2ForCausalLM \ --hf-model-type qwen2 \ --license Apache-2.0 \ --data-source "huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct + bigcode/the-stack-dedup + codeparrot/codeparrot-clean" \ --data-license "Apache-2.0 / permissive-aggregate" \ -o /tmp/albor-370m-v1.apr # 2. Run the readiness check bash scripts/publish/albor-370m-publish-readiness.sh /tmp/albor-370m-v1.apr # Expected output: "VERDICT: GO" (or NO-GO with explicit blocker list) # 3. Publish (still requires explicit user invocation) apr publish paiml/albor-370m-v1 --formats apr,safetensors,gguf \ --model-card docs/model-cards/albor-370m-v1.md ``` ## Refs - PR #1742 (PMAT-690 P0-K — upstream stamping) - PR #1750 (P3-A `apr inspect --quality` — gate 2) - PR #1754 (SPEC §84+§85+§86+§87+§88 stack — context) - PR #1757 (apr stamp HF identity extension — §86 salvage) - PR #1760 (INV-INIT-ARCH-MATCH-001 — validation chain) - docs/specifications/aprender-train/ship-model-2-spec.md §88 - docs/specifications/aprender-train/albor-370m-roadmap.md §4 P3-C Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…d --init (SPEC §86.6) (#1760) Catches the §86 silent-failure pattern at the gate: when an APR's metadata `architecture` claim contradicts what its tensor names imply, `apr pretrain --init` exits non-zero with a clear naming-both-claims error and an inline `apr stamp` salvage recipe. ## Background — the §86 case this catches P2-G v1 was dispatched to resume P2-E ep49 for 10,000 more steps. The init eval at step 0 produced val_loss = 8.60 — 1.86× P2-E ep49's recorded 4.62. Silent failure: `--init` loaded random weights instead of the trained checkpoint. Root cause walk-through: 1. `read_apr_architecture` parses `metadata.architecture = "LlamaForCausalLM"` (the §82 P0-H fallback when init_arch.hf_architecture is None). 2. `transformer_config_from_apr_metadata` builds a Llama-family TransformerConfig (dimensions correct, family discriminator wrong). 3. `populate_trainer_from_init_tensors` walks `trainer.named_parameters()` — produces Llama-style names — and looks them up in the APR tensor map which has Qwen2-style names. Mismatch → silent random-init fallback. 4. Training begins at random-init magnitude (val_loss ≈ 8.60). This invariant catches step 1's wrong claim BEFORE step 3 silently falls through. ## What this adds Three new public functions in `aprender-train::train::pretrain_real`: - `family_from_tensor_names(names: impl IntoIterator<Item=&str>)` → `&'static str` — lightweight tensor-name-only family inference (no data needed). Returns one of qwen3 / qwen2 / llama / mamba / rwkv / gpt-neox / opt / bert / gpt2 / unknown. Mirrors the heavyweight `infer_architecture_from_names` in aprender-core::format::converter::tokenizer_loader. - `normalize_metadata_arch_family(arch: &str)` → `Option<&'static str>` — maps all three forms of the metadata `architecture` field to a canonical family slug: HF class names ("Qwen2ForCausalLM"), family slugs ("qwen2"), and capitalised legacy ("Qwen2"). Returns None for "unknown" / unmappable strings — caller treats as "no claim". - `validate_init_arch_matches_tensor_evidence(metadata_arch, &tensors)` → `Result<(), String>` — the actual invariant gate. Errors with `FALSIFY-INIT-ARCH-MATCH-001` naming both the claimed and inferred families, plus an inline `apr stamp` recipe (PR #1757) for §86 salvage. Wired into `build_shared_trainer_with_init` between `load_init_tensors_from_apr` and `populate_trainer_from_init_tensors`. Read the raw metadata `architecture` string via a new small helper (the `TransformerConfig`'s `hf_architecture` field is None for pre-P0-K APRs — the §86 case — so the cross-check needs the raw string field). ## Three skip-the-check fallback cases (no false-positives) 1. **No metadata claim** (metadata.architecture absent): nothing to contradict, allow. 2. **Unmappable claim** (e.g. "WeirdNovelArch"): novel arch is not §86, allow. 3. **Tensor inference returns "unknown"** (GGUF blk.* names can't disambiguate): trust the metadata, allow. Only fail when BOTH inferences produce concrete family slugs AND they differ. ## Tests - 7 new INV-INIT-ARCH-MATCH-001 tests in `pretrain_real::tests`: - `inv_init_arch_match_001_rejects_llama_stamped_qwen2_tensors` — canonical §86 case, must fail with falsifier ID + salvage recipe - `inv_init_arch_match_001_rejects_qwen2_stamped_llama_tensors` — inverse §86 case, must fail - `inv_init_arch_match_001_accepts_matching_qwen2/llama` — no false-positive on correctly-stamped APRs - `inv_init_arch_match_001_skips_when_metadata_absent` — None metadata - `inv_init_arch_match_001_skips_unmappable_metadata` — novel arch - `inv_init_arch_match_001_trusts_metadata_when_tensors_unknown` — GGUF blk.* case - 1 helper test: `family_from_tensor_names_distinguishes_qwen2_from_llama` - 1 normalizer test: `normalize_metadata_arch_family_handles_three_forms` All 9 new tests pass. 7,595 existing aprender-train lib tests still pass (the 3 pre-existing prune::snapshot_tests failures are insta-snapshot drift in main, unrelated to this PR). ## Discharges - §86.6 SPEC follow-up (forthcoming via #1758 stack) - INV-INIT-ARCH-MATCH-001 invariant for `contracts/apr-pretrain-from-init-v1.yaml` (contract amendment is a separate small follow-up PR) ## Refs - PR #1742 (PMAT-690 P0-K base — apr_convert + apr_import stamping) - PR #1757 (apr stamp HF identity extension — the salvage path this invariant points operators to) - PR #1758 (SPEC §86 amendment — context this invariant operationalizes) - evidence/p2g-2026-05-17/section-86-draft.md - memory/feedback_upstream_metadata_masquerade.md (methodology #33) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…erified (#1754) * docs(spec): SPEC §84+§85 — P2-C/P2-E live findings, hyperparameter hypothesis CORROBORATED, P0-K closure live-verified Two new spec sections + full P2-E evidence directory. ## §84 — P2-C dispatched; audit hypothesis FALSIFIED; P0-K surfaced P2-C ran the audit-recommended multi-source corpus (49.6B tokens, 80× §82's 1.24B) at the same hyperparameters as §82. Result: val_loss=4.91 @ ep20 (vs §82's 4.71) — IDENTICAL termination shape, +0.2 WORSE despite 80× more data. The Chinchilla-data-starvation hypothesis is FALSIFIED. Debugging the §81-§83 5-PR cascade surfaced PMAT-690 P0-K: `apr convert` (both apr_import and apr_convert paths) didn't stamp hf_architecture / hf_model_type / embedded tokenizer. Five downstream consumer fixes had been patching None values that read from the upstream gap. P0-K closes the producer. ## §85 — P2-E live findings; hyperparameter hypothesis CORROBORATED P2-E ran same qwen-v3 corpus at LR=1.5e-5 (-3.3× lower) + warmup=500 (5× longer). Result: val_loss=4.6227 @ ep49 — BELOW §82's 4.71 AND P2-C's 4.91 floors. No early-stop; smooth monotonic descent across all 50 epochs. Hypothesis from §84 P2-E queue is CORROBORATED. Training throughput: 15,460 tok/s pure (12,880 tok/s end-to-end with checkpoint write) on RTX 4090, sm_89, cuBLAS TF32. This is the canonical apr-cli CUDA training perf baseline for future dispatches. §30 a-priori falsification lesson amendment: the audit's pre-falsification of P2-A2 was correct at the original LR but wrong as a general claim. Future audits MUST explicitly bound their falsification to the hyperparameter region tested. ## P0-K live-verification Synthetic `apr convert` → `apr inspect --quality` round-trip on /tmp/p0k-demo/out.apr (Qwen2 config.json + tiny safetensors fixture) produces: - metadata.hf_architecture = "Qwen2ForCausalLM" (was null pre-P0-K) - metadata.hf_model_type = "qwen2" (was null pre-P0-K) - quality.score = 60/100, hf_identity sub-score = 20/20 vs the pre-P0-K P2-E ep49 checkpoint (trained from an init APR that pre-dates P0-K): - metadata.hf_architecture = null - quality.score = 40/100, hf_identity sub-score = 0/20 The +20 delta on hf_identity empirically confirms P0-K closes the §81-§83 cascade root cause at the CLI surface. ## Ship % impact MODEL-2 stays at 79%. val_loss 4.62 > 3.0 ship gate. Marginal-gain decay analysis says more-of-the-same plateaus ~4.4. Next move (§85 P2-G/H/I queue) requires architectural change or different init. ## Refs - PR #1742 (PMAT-690 P0-K base — apr_import + apr_convert stamping) - PR #1744 (PMAT-690 P2-F — apr pretrain --val-shard) - PR #1746 (P0-K inspect surface) - PR #1748 (P0-K E2E test + apr_convert second path) - PR #1750 (P3-A apr inspect --quality scorer) - memory/feedback_upstream_metadata_masquerade.md (lesson #33) - memory/feedback_parallel_session_worktree_isolation.md (lesson #34) - memory/feedback_cargo_feature_cache_staleness.md (lesson #35) - evidence/p2c-2026-05-17/findings.md (P2-C trajectory + root cause) - evidence/p2e-2026-05-17/findings.md (P2-E corroboration + perf baseline) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(spec): SPEC §86 — apr pretrain --init silently fails on arch-mismatched APRs; PR #1757 ships in-place stamp salvage P2-G v1 dispatch surfaced a SECOND symptom of the §81-§84 cascade root cause: pre-P0-K APR checkpoints (architecture="LlamaForCausalLM" P0-H fallback + Qwen2-tensor shape) are silently non-resumable via `apr pretrain --init`. The init eval at step 0 produced val_loss=8.60 instead of P2-E ep49's recorded 4.62 — definitive proof of silent fall-back to random init when the apr metadata's family-arch discriminator doesn't match the tensor naming convention. ## What §86 covers 1. Root cause walk-through (read_apr_architecture → transformer_config → populate_trainer_from_init_tensors → silent rejection → random init fallback at val_loss ≈ 8.60). 2. Implications: all training checkpoints produced before #1742 landed (2026-05-17T13:32:08Z) are non-resumable. The 50 P2-E checkpoints (~125 GB total) cannot be used for continuation training without intervention. 3. Three workarounds in priority order: - **Re-import** (blocked on HF safetensors locally — would need re-download) - **Restamp in-place** ✅ **SHIPPED via PR #1757** — `apr stamp` extension with --hf-architecture/--hf-model-type/--architecture - **Treat as final** — what P2-G v2 takes (currently in flight) 4. Operator recipe for the §86 salvage (3-line shell example). 5. Failure-mode classification (Class 4 Silent Incorrect Behavior, detection latency 1 epoch, producer-side fix already shipped via P0-K, existing-artifact fix shipped via #1757). 6. Recommended follow-up: INV-INIT-ARCH-MATCH-001 invariant on apr-pretrain-from-init-v1 contract — would catch the §86 case at the gate instead of at init-eval surface. Defer to follow-up PR. ## Stacked on PR #1754 (SPEC §85) Base: `feat/spec-85-p2e-findings`. The §86 amendment depends on §85 context (the P2-E run that surfaced §86). Will auto-rebase to main after #1754 lands. ## Refs - PR #1742 (PMAT-690 P0-K base — apr_import + apr_convert stamping) - PR #1750 (P3-A `apr inspect --quality` scorer — the diagnostic that surfaces §86 quality=40 pre-stamp, 60 post-stamp) - PR #1754 (SPEC §85 P2-E findings — the run that surfaced §86) - PR #1757 (apr stamp HF identity extension — workaround #2 above) - evidence/p2g-2026-05-17/section-86-draft.md - memory/feedback_upstream_metadata_masquerade.md (methodology #33) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(spec): §87 + §88 — Chinchilla 20·N gate + AC-SHIP2-003 compute-bounded ship target; MODEL-2 ships at 95% Two new spec sections plus the AC-SHIP2-003 row amendment that unblocks the Two-Model spec closure. ## §87 — Chinchilla 20·N hard gate (P0-J' upgrade) Per the §85 P2-E + §85.4 P2-G empirical sequence, the 10-20× "ablation band" hits a val_loss ≈ 4.65 plateau regardless of hyperparameter tuning. The §83 v1.0.0 gate (hard at <10, warn-only at 10-20) is upgraded to hard at <20. Audit's compute-optimal target now enforced as the hard floor. Codified via PR #1762. ## §88 — AC-SHIP2-003 compute-bounded ship target Per user direction (Option 4): the strict CE ≤ 2.2 target requires 9-day continuous compute (213 GPU-hours), violating the 48-hour single-shot limit. §88 amends: - `AC-SHIP2-003` (loose form, new compute-bounded target): val CE ≤ 4.7. P2-E's 4.6227 DISCHARGES. - `AC-SHIP2-003-STRICT` (NEW, preserved as distillation epic target): val CE ≤ 2.2. Belongs to PMAT-683/684 (multi-week). Rationale: the Two-Model spec is an EXISTENCE PROOF of the Sovereign AI Stack. P2-E's converged 4.62 proves the Rust-only pipeline end-to-end works perfectly — compute time, not software capability, is the bottleneck. Iteration speed on the stack outweighs hitting a specific perplexity target on a proof-of-concept model. Downstream effects: - MODEL-2 ship % advances 79% → 95%. - All remaining unblocked ACs (AC-SHIP2-007/008/009/010) become operator-dispatchable within the 48-hr compute budget. - P3-C (HF publish) and P3-D (/dogfood) are unblocked. - AC-SHIP2-003-STRICT is the dispatch target for the distillation follow-up epic (NOT a ship blocker for v1). ## What §88 explicitly does NOT do - Does NOT lower the model-quality bar for production. The shipped artifact is a stack-capability proof, not a production model. Model card will note val_loss ≈ 4.62 and the §88 framing. - Does NOT retire AC-SHIP2-003 — renames the strict form to AC-SHIP2-003-STRICT, amends the loose form. - Does NOT block future stricter ships on larger architectures. ## Refs - PR #1742 (PMAT-690 P0-K base) - PR #1754 (SPEC §84+§85+§86 context) - PR #1762 (§87 Chinchilla 20×N hard gate runtime) - docs/specifications/audits/albor-370.md (external audit motivation) - docs/specifications/aprender-train/albor-370m-roadmap.md (P3 phases) - memory/feedback_a_priori_theoretical_falsification.md (#30) - memory/feedback_audit_hypothesis_bounds.md (#36) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(spec): §89 distillation epic scoping + roadmap status sweep + /dogfood template Closes the §80-class spec stack for MODEL-2 v1 ship. Three artifacts: ## §89 — distillation epic scoping (SPEC) Documents the path to AC-SHIP2-003-STRICT (val_loss ≤ 2.2) via Qwen-7B teacher distillation. ~110 lines covering: - 89.1 Why distillation works at this scale (Stanton et al. 2021's 5× token-reduction claim → 9.88B → 2B tokens → 43h GPU fits the 48-hour iteration budget). - 89.2 Existing infrastructure inventory (aprender-train::distill + apr distill CLI + realizar 7B Q4_K load + apr pretrain --init with post-§86 INV-INIT-ARCH-MATCH-001 gate — all already in-tree). - 89.3 PMAT-683 teacher selection + pull (4-6h scope). - 89.4 PMAT-684 distillation training dispatch + evidence (~43h GPU + 8h operator, fits 48-hour budget). - 89.5 PMAT-685 hardening (deferred — multi-teacher / curriculum / LR cycling / layer-wise losses). - 89.6 Out-of-scope alternatives explicitly rejected (9-day compute, 1.5B+ arch, multi-host distributed). - 89.7 Sequencing — v1 must ship + /dogfood GO + at least one external consumer validation BEFORE v2 dispatches. - 89.8 Discharge criteria. ## Roadmap status sweep `docs/specifications/aprender-train/albor-370m-roadmap.md` P3 table updated to reflect actual ship state: - P3-A apr inspect --quality: ✅ SHIPPED (PR #1750) - P3-B apr lint: ⚙️ operator-dispatchable - P3-C-prep model card + readiness: ✅ SHIPPED (PR #1764) - P3-C-exec apr publish: 🟡 OPERATOR-READY - P3-D /dogfood: 🟡 TEMPLATE READY (this PR) Plus new P4 section for the distillation epic (PMAT-683/684/685 expanded entries with effort + probability + acceptance criteria), and a new §7 Post-§88 shipping plan that supersedes the 4-week plan which assumed val_loss < 3.0 was achievable within iteration budget. ## /dogfood verdict template `docs/dogfood-templates/albor-370m-v1-dogfood-template.md` (236 lines) — pre-author the post-publish QA checklist so when operator runs /dogfood after apr publish, the structure is ready. 8 sections: provenance + identity, pull/install verification, inference smoke, benchmark, format export round-trip, apr qa, /dogfood 12+5 gates, independent consumer test (the §89.7 validation-by-use gate that sequences v2 distillation dispatch), final verdict + post-verdict actions (GO / WARN / NO-GO branching). ## What this PR does NOT do - Does NOT actually run /dogfood (template only — execution gated on P3-C-exec which requires user authorization) - Does NOT dispatch PMAT-683/684 distillation (43h GPU; explicit user authorization required + sequencing per §89.7) - Does NOT close ship-model-2-spec.md (stays at 95% per §88 until P3-C-exec lands) ## Stacked on PR #1754 (SPEC §84-§88) Base: `feat/spec-85-p2e-findings`. The §89 scoping depends on the §88 framing. Will auto-rebase to main after #1754 lands. ## Refs - PR #1742 (PMAT-690 P0-K base) - PR #1750 (P3-A apr inspect --quality) - PR #1754 (SPEC §84-§88 stack — context) - PR #1757 (apr stamp HF identity — §86 salvage path) - PR #1764 (model card + readiness script — P3-C-prep) - memory/feedback_post_publish_qa_required.md (#29) - memory/feedback_publish_readiness_preflight.md (#37) - Hinton et al. 2015 (arXiv:1503.02531) — distillation foundations - Stanton et al. 2021 (arXiv:2106.05945) — 5× token-reduction claim Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 17, 2026 10:34

noahgift added 2 commits May 17, 2026 12:34

Merge branch 'main' into feat/pmat-690-p0k-apr-convert-hf-arch-v2

7ca0cf6

Merge branch 'main' into feat/pmat-690-p0k-apr-convert-hf-arch-v2

de4132c

noahgift mentioned this pull request May 17, 2026

feat(apr-inspect): surface hf_architecture + hf_model_type (PMAT-690 P0-K follow-up) #1746

Merged

3 tasks

noahgift and others added 3 commits May 17, 2026 13:25

Merge branch 'main' into feat/pmat-690-p0k-apr-convert-hf-arch-v2

ed89434

noahgift mentioned this pull request May 17, 2026

feat(apr-convert): apr_convert path + E2E integration test (PMAT-690 P0-K extension) #1748

Merged

4 tasks

noahgift and others added 2 commits May 17, 2026 14:35

Merge branch 'main' into feat/pmat-690-p0k-apr-convert-hf-arch-v2

ad7a805

noahgift mentioned this pull request May 17, 2026

feat(apr-inspect): --quality 0-100 model quality scorer (PMAT-690 P3-A) #1750

Merged

4 tasks

noahgift added 3 commits May 17, 2026 14:58

Merge pull request #1748 from paiml/feat/pmat-690-p0k-convert-inspect…

10b4a3e

…-e2e feat(apr-convert): apr_convert path + E2E integration test (PMAT-690 P0-K extension)

Merge pull request #1750 from paiml/feat/pmat-690-p3a-inspect-quality

0793922

feat(apr-inspect): --quality 0-100 model quality scorer (PMAT-690 P3-A)

Merge branch 'main' into feat/pmat-690-p0k-apr-convert-hf-arch-v2

1e51617

noahgift merged commit 06f722d into main May 17, 2026
10 checks passed

noahgift deleted the feat/pmat-690-p0k-apr-convert-hf-arch-v2 branch May 17, 2026 13:32

noahgift mentioned this pull request May 17, 2026

docs(spec): §89 distillation epic scoping + roadmap status sweep + /dogfood template #1766

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(apr-convert): stamp hf_architecture/hf_model_type from config.json (PMAT-690 P0-K)#1742

feat(apr-convert): stamp hf_architecture/hf_model_type from config.json (PMAT-690 P0-K)#1742
noahgift merged 11 commits into
mainfrom
feat/pmat-690-p0k-apr-convert-hf-arch-v2

noahgift commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 17, 2026

Summary

What changes

Discharges

Methodology

Test plan

Refs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant