Skip to content

feat(apr-inspect): surface hf_architecture + hf_model_type (PMAT-690 P0-K follow-up)#1746

Merged
noahgift merged 1 commit into
feat/pmat-690-p0k-apr-convert-hf-arch-v2from
feat/pmat-690-p0k-inspect-surface-v2
May 17, 2026
Merged

feat(apr-inspect): surface hf_architecture + hf_model_type (PMAT-690 P0-K follow-up)#1746
noahgift merged 1 commit into
feat/pmat-690-p0k-apr-convert-hf-arch-v2from
feat/pmat-690-p0k-inspect-surface-v2

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

apr inspect now renders the HF identity fields that PMAT-690 P0-K stamps into AprV2Metadata. Operators can verify upstream apr convert stamping via apr inspect --json | jq .metadata.hf_architecture and .metadata.hf_model_type instead of grepping source code.

This closes the operator-visibility gap left by #1742: the stamping happens, but until this PR there was no CLI surface to verify it.

Stacked on #1742

This branch is based on feat/pmat-690-p0k-apr-convert-hf-arch-v2 (the P0-K branch) because it depends on the AprV2Metadata.hf_architecture / .hf_model_type fields that #1742 adds. Will rebase to main after #1742 lands (or you can merge them in order — this PR is small and isolated).

What changes

  • MetadataInfo (crates/apr-cli/src/commands/inspect.rs): adds hf_architecture + hf_model_type fields. Both serialize as null when None (NOT skipped via skip_serializing_if), mirroring the C-APR-PROVENANCE pattern so auditors can grep-check every output.
  • read_metadata: copies the two fields from AprV2Metadata into MetadataInfo.
  • output_architecture (text path): renders new "HF Class" and "HF model_type" rows beneath the existing "Family" row when populated.

Test plan

  • cargo test -p apr-cli --features training --lib pmat_690_p0k — 2/2 pass:
    • pmat_690_p0k_inspect_emits_hf_arch_keys_when_none — both keys serialize as null (not skipped) when absent. Required for the grep-check audit recipe.
    • pmat_690_p0k_inspect_emits_hf_arch_values_when_populated — when populated, keys render Qwen2ForCausalLM / qwen2.
  • cargo test -p apr-cli --features training --lib — 5,938 tests pass, 0 regressions.
  • cargo check -p apr-cli --features training — clean.

Refs

🤖 Generated with Claude Code

…P0-K follow-up)

`apr inspect` now renders the HF identity fields that PMAT-690 P0-K
stamps into AprV2Metadata. Operators can verify upstream `apr convert`
stamping via `apr inspect --json | jq .metadata.hf_architecture` and
`.metadata.hf_model_type` instead of grepping source code.

## What changes

- MetadataInfo gains `hf_architecture: Option<String>` +
  `hf_model_type: Option<String>` fields (both serialize as null when
  None — NOT skipped via skip_serializing_if, mirroring the
  C-APR-PROVENANCE pattern so auditors can grep-check every output).
- `read_metadata` copies the two fields from AprV2Metadata into
  MetadataInfo.
- `output_architecture` (text path) renders new "HF Class" and
  "HF model_type" rows beneath the existing "Family" row when
  populated.

## Stacked on top of PR #1742 (P0-K)

This branch is based on `feat/pmat-690-p0k-apr-convert-hf-arch-v2`
because it depends on the AprV2Metadata fields that #1742 adds.
Will rebase to main after #1742 lands.

## Tests

- `pmat_690_p0k_inspect_emits_hf_arch_keys_when_none` — both keys
  serialize as null (not skipped) when absent. Required for the
  grep-check audit recipe.
- `pmat_690_p0k_inspect_emits_hf_arch_values_when_populated` — when
  populated, keys render the actual values (Qwen2ForCausalLM / qwen2).
- Full apr-cli lib suite: 5,938 tests pass, 0 regressions.

## Refs

- PR #1742 (PMAT-690 P0-K — the upstream stamping)
- contracts/apr-convert-hf-arch-v1.yaml (round-trip invariant)
- docs/specifications/aprender-train/ship-model-2-spec.md §84
- memory/feedback_upstream_metadata_masquerade.md (methodology #33)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 1cf3bdb into feat/pmat-690-p0k-apr-convert-hf-arch-v2 May 17, 2026
1 check passed
@noahgift noahgift deleted the feat/pmat-690-p0k-inspect-surface-v2 branch May 17, 2026 11:25
noahgift added a commit that referenced this pull request May 17, 2026
…on (PMAT-690 P0-K) (#1742)

* feat(apr-convert): stamp hf_architecture/hf_model_type from config.json (PMAT-690 P0-K)

`apr convert <src.safetensors>` now extracts `architectures[0]` and
`model_type` from a sibling `config.json` and stamps them into
`AprV2Metadata.hf_architecture` + `.hf_model_type`. Closes the upstream
producer gap that masqueraded as the §81-§83 Class 3 packaging cascade
(5 PRs patching downstream consumers — each re-failed on every fresh
P2-C-style live training run because the imported init APR had
hf_architecture = None).

GGUF -> APR conversion has no `architectures[]` source, so the GGUF
import path synthesizes the canonical HF class name from the family
slug via `synthesize_hf_architecture_from_family` (qwen2 -> Qwen2ForCausalLM,
llama -> LlamaForCausalLM, etc.) so round-tripping a GGUF through APR
preserves arch identity for llama-cli interop.

Discharges:
- PMAT-690 P0-K (per albor-370m-roadmap.md §4 P0-K)
- INV-CONVERT-HF-ARCH-001/002/003/004 (new contract apr-convert-hf-arch-v1)

Tests: 3 unit tests on load_model_config_from_json + 2 on
synthesize_hf_architecture_from_family. Full converter module: 1,260
tests pass locally.

Methodology lesson #33 applied: when a Class 3 packaging wave extends
past 4-5 defects, the producer is the defect (memory/feedback_upstream_metadata_masquerade.md).

Refs:
- docs/specifications/aprender-train/ship-model-2-spec.md §84
- evidence/p2c-2026-05-17/findings.md
- contracts/apr-convert-hf-arch-v1.yaml

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(apr-inspect): surface hf_architecture + hf_model_type (PMAT-690 P0-K follow-up) (#1746)

`apr inspect` now renders the HF identity fields that PMAT-690 P0-K
stamps into AprV2Metadata. Operators can verify upstream `apr convert`
stamping via `apr inspect --json | jq .metadata.hf_architecture` and
`.metadata.hf_model_type` instead of grepping source code.

## What changes

- MetadataInfo gains `hf_architecture: Option<String>` +
  `hf_model_type: Option<String>` fields (both serialize as null when
  None — NOT skipped via skip_serializing_if, mirroring the
  C-APR-PROVENANCE pattern so auditors can grep-check every output).
- `read_metadata` copies the two fields from AprV2Metadata into
  MetadataInfo.
- `output_architecture` (text path) renders new "HF Class" and
  "HF model_type" rows beneath the existing "Family" row when
  populated.

## Stacked on top of PR #1742 (P0-K)

This branch is based on `feat/pmat-690-p0k-apr-convert-hf-arch-v2`
because it depends on the AprV2Metadata fields that #1742 adds.
Will rebase to main after #1742 lands.

## Tests

- `pmat_690_p0k_inspect_emits_hf_arch_keys_when_none` — both keys
  serialize as null (not skipped) when absent. Required for the
  grep-check audit recipe.
- `pmat_690_p0k_inspect_emits_hf_arch_values_when_populated` — when
  populated, keys render the actual values (Qwen2ForCausalLM / qwen2).
- Full apr-cli lib suite: 5,938 tests pass, 0 regressions.

## Refs

- PR #1742 (PMAT-690 P0-K — the upstream stamping)
- contracts/apr-convert-hf-arch-v1.yaml (round-trip invariant)
- docs/specifications/aprender-train/ship-model-2-spec.md §84
- memory/feedback_upstream_metadata_masquerade.md (methodology #33)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* feat(apr-convert): apr_convert + streaming-import also stamp hf_architecture (PMAT-690 P0-K extension)

Closes the second half of the §84 P0-K root cause: `apr convert <safetensors>`
goes through `apr_convert` (in mod.rs), NOT through `apr_import` (which is
used by `apr pull` / `apr import`). The original P0-K commit patched
`apr_import` end-to-end but left `apr_convert` reading no config.json for
SafeTensors sources — meaning the very CLI the §84 evidence indicted
("apr convert ... does NOT stamp apr_metadata.hf_architecture") was still
broken after P0-K v1.

This commit:
- Makes `apr_convert` read sibling config.json for SafeTensors sources
  (previously only for GGUF), populating the full GgufModelConfig including
  hf_architecture + hf_model_type.
- Threads hf_architecture + hf_model_type through
  `save_model_tensors_with_gguf_config_and_tokenizer` (the writer used by
  the apr_convert path).
- Patches the streaming-import AprV2Metadata initializer in import.rs
  (the `realizar#136` path triggered for sharded SafeTensors >10B params)
  that had been missed in P0-K v1.

## Integration test (closes FALSIFY-CONVERT-HF-ARCH-001 at the CLI surface)

`crates/apr-cli/tests/p0k_convert_inspect_e2e_test.rs` exercises the FULL
chain that the §81-§83 cascade unknowingly assumed worked:

  1. Stage tempdir with synthetic Qwen2 config.json + safetensors fixture
  2. Run `apr convert <safetensors> -o out.apr --compress none`
  3. Run `apr inspect out.apr --json`
  4. Assert `metadata.hf_architecture == "Qwen2ForCausalLM"`
  5. Assert `metadata.hf_model_type == "qwen2"`

This is the test that would have caught the §81-§83 cascade in the first
place per methodology lesson #33 (memory/feedback_upstream_metadata_masquerade.md):
the absent end-to-end test was what let 5 PRs ship downstream consumer
fixes without anyone noticing the upstream producer was broken.

Also includes a negative test: when config.json is ABSENT alongside the
safetensors, hf_architecture / hf_model_type MUST remain null (no
fabrication).

## Stacked on PR #1742 (P0-K) + #1746 (P0-K inspect surfacing)

Base: feat/pmat-690-p0k-apr-convert-hf-arch-v2 (which already absorbed #1746).
Will auto-rebase to main after #1742 merges.

## Tests

- 2 new E2E integration tests pass
- 5,938 apr-cli unit/integration tests pass (no regressions)
- 1,260 aprender-core converter tests pass (no regressions)
- contracts lint: clean

## Refs

- PR #1742 (PMAT-690 P0-K — base stamping)
- PR #1746 (PMAT-690 P0-K — apr inspect surfacing)
- contracts/apr-convert-hf-arch-v1.yaml
- evidence/p2c-2026-05-17/findings.md §76-§83
- memory/feedback_upstream_metadata_masquerade.md (methodology #33)
- memory/feedback_parallel_session_worktree_isolation.md (methodology #34)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(apr-inspect): --quality 0-100 model quality scorer (PMAT-690 P3-A)

`apr inspect --quality` emits a 0-100 model quality score for any APR
file. Per SPEC-SHIP-TWO-001 §84 P3-A (AC-SHIP2-007), ship-ready models
MUST score ≥ 90. The scorer is a transparent weighted sum across five
sub-scores:

| Sub-score    | Weight | Checks                                        |
|--------------|--------|-----------------------------------------------|
| physics      | 20     | header.checksum_valid                         |
| structural   | 20     | arch + hidden_size + num_layers + num_heads   |
| provenance   | 25     | license + data_source + data_license non-null |
| hf_identity  | 20     | hf_architecture + hf_model_type non-null      |
| tokenizer    | 15     | has_vocab flag (HAS_VOCAB bit set)            |

Weights reflect SPEC §84 ship-blocker priorities: provenance + HF
identity are weighted heaviest because their absence was the exact
§81-§83 cascade root cause we shipped in P0-K (#1742). The ≥ 90 gate
allows at most one sub-score missing — typically `has_vocab` (15 pts)
is the recoverable one for distilled / from-scratch models without an
embedded tokenizer.

## Operator workflow

```bash
# Verify a model is ship-ready
apr inspect model.apr --quality --json | jq '.quality'
# {
#   "score": 100,
#   "ship_ready": true,
#   "threshold": 90,
#   "breakdown": { "physics": 20, "structural": 20, "provenance": 25, ... }
# }

# Text mode for human review
apr inspect model.apr --quality
# Quality (0-100):
#   Score: 75 / 100
#   Ship-ready (≥90 per AC-SHIP2-007): NO
#   Breakdown:
#     physics:     20 / 20
#     structural:  20 / 20
#     provenance:   0 / 25  ← missing license/data_source/data_license
#     hf_identity: 20 / 20
#     tokenizer:   15 / 15
```

## Stacked on PR #1742 (P0-K base) + #1746 (inspect surfacing) + #1748 (E2E test)

Base: `feat/pmat-690-p0k-apr-convert-hf-arch-v2`. Depends on the
hf_architecture / hf_model_type fields that P0-K v1 + v2 added. Will
auto-rebase to main after the P0-K stack lands.

## Tests

- 4 new unit tests in `inspect_tests.rs::pmat_690_p3a_*`:
  - Ship-ready model scores ≥ 90 (full provenance + HF + has_vocab)
  - No HF + no provenance caps at ≤ 55 (the §81-§83 cascade scenario)
  - Invalid checksum drops physics to 0, blocks ship gate
  - QualityReport JSON contains all 5 breakdown sub-scores
- Full apr-cli lib suite: 5,942 tests pass, 0 regressions

## Discharges

- PMAT-690 P3-A (per albor-370m-roadmap.md §4 P3-A)
- AC-SHIP2-007 (apr inspect --quality ≥ 90 gate per spec §5.2)

## Refs

- PR #1742 (PMAT-690 P0-K — base stamping)
- PR #1746 (P0-K inspect surfacing)
- PR #1748 (P0-K E2E test)
- docs/specifications/aprender-train/ship-model-2-spec.md §84
- docs/specifications/aprender-train/albor-370m-roadmap.md §4 P3-A
- memory/feedback_upstream_metadata_masquerade.md (methodology #33)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 18, 2026
…erified (#1754)

* docs(spec): SPEC §84+§85 — P2-C/P2-E live findings, hyperparameter hypothesis CORROBORATED, P0-K closure live-verified

Two new spec sections + full P2-E evidence directory.

## §84 — P2-C dispatched; audit hypothesis FALSIFIED; P0-K surfaced

P2-C ran the audit-recommended multi-source corpus (49.6B tokens, 80×
§82's 1.24B) at the same hyperparameters as §82. Result: val_loss=4.91
@ ep20 (vs §82's 4.71) — IDENTICAL termination shape, +0.2 WORSE despite
80× more data. The Chinchilla-data-starvation hypothesis is FALSIFIED.

Debugging the §81-§83 5-PR cascade surfaced PMAT-690 P0-K: `apr convert`
(both apr_import and apr_convert paths) didn't stamp hf_architecture /
hf_model_type / embedded tokenizer. Five downstream consumer fixes had
been patching None values that read from the upstream gap. P0-K closes
the producer.

## §85 — P2-E live findings; hyperparameter hypothesis CORROBORATED

P2-E ran same qwen-v3 corpus at LR=1.5e-5 (-3.3× lower) + warmup=500
(5× longer). Result: val_loss=4.6227 @ ep49 — BELOW §82's 4.71 AND
P2-C's 4.91 floors. No early-stop; smooth monotonic descent across all
50 epochs. Hypothesis from §84 P2-E queue is CORROBORATED.

Training throughput: 15,460 tok/s pure (12,880 tok/s end-to-end with
checkpoint write) on RTX 4090, sm_89, cuBLAS TF32. This is the
canonical apr-cli CUDA training perf baseline for future dispatches.

§30 a-priori falsification lesson amendment: the audit's
pre-falsification of P2-A2 was correct at the original LR but wrong
as a general claim. Future audits MUST explicitly bound their
falsification to the hyperparameter region tested.

## P0-K live-verification

Synthetic `apr convert` → `apr inspect --quality` round-trip on
/tmp/p0k-demo/out.apr (Qwen2 config.json + tiny safetensors fixture)
produces:
- metadata.hf_architecture = "Qwen2ForCausalLM" (was null pre-P0-K)
- metadata.hf_model_type = "qwen2" (was null pre-P0-K)
- quality.score = 60/100, hf_identity sub-score = 20/20

vs the pre-P0-K P2-E ep49 checkpoint (trained from an init APR that
pre-dates P0-K):
- metadata.hf_architecture = null
- quality.score = 40/100, hf_identity sub-score = 0/20

The +20 delta on hf_identity empirically confirms P0-K closes the
§81-§83 cascade root cause at the CLI surface.

## Ship % impact

MODEL-2 stays at 79%. val_loss 4.62 > 3.0 ship gate. Marginal-gain
decay analysis says more-of-the-same plateaus ~4.4. Next move (§85
P2-G/H/I queue) requires architectural change or different init.

## Refs

- PR #1742 (PMAT-690 P0-K base — apr_import + apr_convert stamping)
- PR #1744 (PMAT-690 P2-F — apr pretrain --val-shard)
- PR #1746 (P0-K inspect surface)
- PR #1748 (P0-K E2E test + apr_convert second path)
- PR #1750 (P3-A apr inspect --quality scorer)
- memory/feedback_upstream_metadata_masquerade.md (lesson #33)
- memory/feedback_parallel_session_worktree_isolation.md (lesson #34)
- memory/feedback_cargo_feature_cache_staleness.md (lesson #35)
- evidence/p2c-2026-05-17/findings.md (P2-C trajectory + root cause)
- evidence/p2e-2026-05-17/findings.md (P2-E corroboration + perf baseline)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(spec): SPEC §86 — apr pretrain --init silently fails on arch-mismatched APRs; PR #1757 ships in-place stamp salvage

P2-G v1 dispatch surfaced a SECOND symptom of the §81-§84 cascade root
cause: pre-P0-K APR checkpoints (architecture="LlamaForCausalLM" P0-H
fallback + Qwen2-tensor shape) are silently non-resumable via
`apr pretrain --init`. The init eval at step 0 produced val_loss=8.60
instead of P2-E ep49's recorded 4.62 — definitive proof of silent
fall-back to random init when the apr metadata's family-arch
discriminator doesn't match the tensor naming convention.

## What §86 covers

1. Root cause walk-through (read_apr_architecture → transformer_config
   → populate_trainer_from_init_tensors → silent rejection → random
   init fallback at val_loss ≈ 8.60).
2. Implications: all training checkpoints produced before #1742 landed
   (2026-05-17T13:32:08Z) are non-resumable. The 50 P2-E checkpoints
   (~125 GB total) cannot be used for continuation training without
   intervention.
3. Three workarounds in priority order:
   - **Re-import** (blocked on HF safetensors locally — would need
     re-download)
   - **Restamp in-place** ✅ **SHIPPED via PR #1757** — `apr stamp`
     extension with --hf-architecture/--hf-model-type/--architecture
   - **Treat as final** — what P2-G v2 takes (currently in flight)
4. Operator recipe for the §86 salvage (3-line shell example).
5. Failure-mode classification (Class 4 Silent Incorrect Behavior,
   detection latency 1 epoch, producer-side fix already shipped via
   P0-K, existing-artifact fix shipped via #1757).
6. Recommended follow-up: INV-INIT-ARCH-MATCH-001 invariant on
   apr-pretrain-from-init-v1 contract — would catch the §86 case at
   the gate instead of at init-eval surface. Defer to follow-up PR.

## Stacked on PR #1754 (SPEC §85)

Base: `feat/spec-85-p2e-findings`. The §86 amendment depends on §85
context (the P2-E run that surfaced §86). Will auto-rebase to main
after #1754 lands.

## Refs

- PR #1742 (PMAT-690 P0-K base — apr_import + apr_convert stamping)
- PR #1750 (P3-A `apr inspect --quality` scorer — the diagnostic
  that surfaces §86 quality=40 pre-stamp, 60 post-stamp)
- PR #1754 (SPEC §85 P2-E findings — the run that surfaced §86)
- PR #1757 (apr stamp HF identity extension — workaround #2 above)
- evidence/p2g-2026-05-17/section-86-draft.md
- memory/feedback_upstream_metadata_masquerade.md (methodology #33)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(spec): §87 + §88 — Chinchilla 20·N gate + AC-SHIP2-003 compute-bounded ship target; MODEL-2 ships at 95%

Two new spec sections plus the AC-SHIP2-003 row amendment that
unblocks the Two-Model spec closure.

## §87 — Chinchilla 20·N hard gate (P0-J' upgrade)

Per the §85 P2-E + §85.4 P2-G empirical sequence, the 10-20× "ablation
band" hits a val_loss ≈ 4.65 plateau regardless of hyperparameter
tuning. The §83 v1.0.0 gate (hard at <10, warn-only at 10-20) is
upgraded to hard at <20. Audit's compute-optimal target now enforced
as the hard floor. Codified via PR #1762.

## §88 — AC-SHIP2-003 compute-bounded ship target

Per user direction (Option 4): the strict CE ≤ 2.2 target requires
9-day continuous compute (213 GPU-hours), violating the 48-hour
single-shot limit. §88 amends:

- `AC-SHIP2-003` (loose form, new compute-bounded target):
  val CE ≤ 4.7. P2-E's 4.6227 DISCHARGES.
- `AC-SHIP2-003-STRICT` (NEW, preserved as distillation epic
  target): val CE ≤ 2.2. Belongs to PMAT-683/684 (multi-week).

Rationale: the Two-Model spec is an EXISTENCE PROOF of the Sovereign
AI Stack. P2-E's converged 4.62 proves the Rust-only pipeline
end-to-end works perfectly — compute time, not software capability,
is the bottleneck. Iteration speed on the stack outweighs hitting a
specific perplexity target on a proof-of-concept model.

Downstream effects:
- MODEL-2 ship % advances 79% → 95%.
- All remaining unblocked ACs (AC-SHIP2-007/008/009/010) become
  operator-dispatchable within the 48-hr compute budget.
- P3-C (HF publish) and P3-D (/dogfood) are unblocked.
- AC-SHIP2-003-STRICT is the dispatch target for the distillation
  follow-up epic (NOT a ship blocker for v1).

## What §88 explicitly does NOT do

- Does NOT lower the model-quality bar for production. The shipped
  artifact is a stack-capability proof, not a production model.
  Model card will note val_loss ≈ 4.62 and the §88 framing.
- Does NOT retire AC-SHIP2-003 — renames the strict form to
  AC-SHIP2-003-STRICT, amends the loose form.
- Does NOT block future stricter ships on larger architectures.

## Refs

- PR #1742 (PMAT-690 P0-K base)
- PR #1754 (SPEC §84+§85+§86 context)
- PR #1762 (§87 Chinchilla 20×N hard gate runtime)
- docs/specifications/audits/albor-370.md (external audit motivation)
- docs/specifications/aprender-train/albor-370m-roadmap.md (P3 phases)
- memory/feedback_a_priori_theoretical_falsification.md (#30)
- memory/feedback_audit_hypothesis_bounds.md (#36)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(spec): §89 distillation epic scoping + roadmap status sweep + /dogfood template

Closes the §80-class spec stack for MODEL-2 v1 ship. Three artifacts:

## §89 — distillation epic scoping (SPEC)

Documents the path to AC-SHIP2-003-STRICT (val_loss ≤ 2.2) via
Qwen-7B teacher distillation. ~110 lines covering:

- 89.1 Why distillation works at this scale (Stanton et al. 2021's
  5× token-reduction claim → 9.88B → 2B tokens → 43h GPU fits the
  48-hour iteration budget).
- 89.2 Existing infrastructure inventory (aprender-train::distill
  + apr distill CLI + realizar 7B Q4_K load + apr pretrain --init
  with post-§86 INV-INIT-ARCH-MATCH-001 gate — all already in-tree).
- 89.3 PMAT-683 teacher selection + pull (4-6h scope).
- 89.4 PMAT-684 distillation training dispatch + evidence (~43h
  GPU + 8h operator, fits 48-hour budget).
- 89.5 PMAT-685 hardening (deferred — multi-teacher / curriculum /
  LR cycling / layer-wise losses).
- 89.6 Out-of-scope alternatives explicitly rejected (9-day compute,
  1.5B+ arch, multi-host distributed).
- 89.7 Sequencing — v1 must ship + /dogfood GO + at least one
  external consumer validation BEFORE v2 dispatches.
- 89.8 Discharge criteria.

## Roadmap status sweep

`docs/specifications/aprender-train/albor-370m-roadmap.md` P3 table
updated to reflect actual ship state:

- P3-A apr inspect --quality: ✅ SHIPPED (PR #1750)
- P3-B apr lint: ⚙️ operator-dispatchable
- P3-C-prep model card + readiness: ✅ SHIPPED (PR #1764)
- P3-C-exec apr publish: 🟡 OPERATOR-READY
- P3-D /dogfood: 🟡 TEMPLATE READY (this PR)

Plus new P4 section for the distillation epic (PMAT-683/684/685
expanded entries with effort + probability + acceptance criteria),
and a new §7 Post-§88 shipping plan that supersedes the 4-week plan
which assumed val_loss < 3.0 was achievable within iteration budget.

## /dogfood verdict template

`docs/dogfood-templates/albor-370m-v1-dogfood-template.md` (236
lines) — pre-author the post-publish QA checklist so when operator
runs /dogfood after apr publish, the structure is ready. 8 sections:
provenance + identity, pull/install verification, inference smoke,
benchmark, format export round-trip, apr qa, /dogfood 12+5 gates,
independent consumer test (the §89.7 validation-by-use gate that
sequences v2 distillation dispatch), final verdict + post-verdict
actions (GO / WARN / NO-GO branching).

## What this PR does NOT do

- Does NOT actually run /dogfood (template only — execution gated
  on P3-C-exec which requires user authorization)
- Does NOT dispatch PMAT-683/684 distillation (43h GPU; explicit
  user authorization required + sequencing per §89.7)
- Does NOT close ship-model-2-spec.md (stays at 95% per §88 until
  P3-C-exec lands)

## Stacked on PR #1754 (SPEC §84-§88)

Base: `feat/spec-85-p2e-findings`. The §89 scoping depends on the
§88 framing. Will auto-rebase to main after #1754 lands.

## Refs

- PR #1742 (PMAT-690 P0-K base)
- PR #1750 (P3-A apr inspect --quality)
- PR #1754 (SPEC §84-§88 stack — context)
- PR #1757 (apr stamp HF identity — §86 salvage path)
- PR #1764 (model card + readiness script — P3-C-prep)
- memory/feedback_post_publish_qa_required.md (#29)
- memory/feedback_publish_readiness_preflight.md (#37)
- Hinton et al. 2015 (arXiv:1503.02531) — distillation foundations
- Stanton et al. 2021 (arXiv:2106.05945) — 5× token-reduction claim

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant