feat(p1-0): apr-cli-pull-dataset-v1 — contract for apr pull dataset --include --license-allowlist#1080
Merged
Merged
Conversation
…--include --license-allowlist` per spec §26.8 P1.0 of the SHIP-TWO-001 §26.9 corpus pipeline. Authoring this contract is the prerequisite for P1.1 (extend apr CLI) per the binding methodology rule §26.8.1: when `apr` lacks a feature, author contract → extend apr → use extended stack tool. Never route around to non-stack CLIs (huggingface-cli) or to deprecated namespaces (batuta hf pull). Contract defines: - New `apr pull dataset <REPO>` asset-type (currently apr pull is model-only with `apr pull <MODEL>`) - --include <GLOB> for shard-pattern selection (fnmatch, no-match fails fast) - --license-allowlist <CSV> for row-level SPDX-id filtering - --revision <REV> propagated from existing model path - --output <DIR> with sensible default 8 falsification tests cover: - Subcommand exists with required flags - include glob filters correctly - No-match glob fails fast (not silent no-op) - License allowlist drops disallowed rows - Model-path backward compatibility preserved - 3-surface drift prevention (clap + registry yaml + cli_commands test) - pv validate passes - Deprecated namespaces (batuta hf pull, huggingface-cli) not used in P1 pipeline 4 proof obligations (1 invariant + 1 invariant + 1 safety + 1 liveness). 2 Kani harnesses with bounds. `pv validate` exits 0, 0 errors / 0 warnings. `pv score` = 0.71 Grade C — Falsify 1.00, Spec 0.70, Bind 1.00. Kani/Lean scores upgrade in P1.1 with implementation. Status: PROPOSED. Promotion to ACTIVE requires P1.1 (implementation) + all 8 FALSIFY tests passing live. Spec: SPEC-SHIP-TWO-001 §26.8 + §26.9 References: - feedback_monorepo_single_source_of_truth.md (APR-MONO) - feedback_fix_root_cause_never_route_around.md - feedback_cli_subcommand_three_surface_drift.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
noahgift
added a commit
that referenced
this pull request
Apr 27, 2026
…oard + critical-path map — spec v2.73.0 → v2.74.0 (#1087) Session-end snapshot consolidating today's 10-PR cascade into a single source-of-truth for next session. The goal: ship two models to HF, both built end-to-end on the in-tree Sovereign AI Stack. Coverage scoreboard EOD 2026-04-27: | Category | DISCHARGED | PARTIAL | Total | %D | |-------------|-----------:|--------:|------:|----:| | MODEL-1 | 5 | 5 | 10 | 50% | | MODEL-2 | 3 | 9 | 12 | 25% | | GPUTRAIN | 7 | 0 | 7 |100% | | Ship Gates | - | 12 | 12 | 0% | | Falsifiers | - | 7 | 7 | 0% | | Sum | 15 | 33 | 48 | 31% | Critical path — MODEL-1: PR E (replace helpers::f32_matmul with Q4K-fused dispatch) discharges 5 PARTIALs at one fix site. ~150-300 LOC. Critical path — MODEL-2: P1.1 (apr pull dataset extension) → P1.4 (corpus pull) → P2 (100K-step training) discharges 9 PARTIALs. 10-PR session cascade (6 merged, 4 open + this): - #1076-#1080: spec + contract foundation (MERGED) - #1081: P3 PR A scaffold (MERGED) - #1082-#1083: P3 PR B+C wiring (OPEN, stacked) - #1084-#1085: §27/§28 binding criterion + root cause (OPEN) - #1086: PR D forward-parity contract (OPEN) Falsification chain (complete, root-reached): §15.4 → §16 → §17 → §23 → §27 → §28 → PR D contract → PR E (next) "forward path" → ... → "APR F32 vs GGUF Q4K matmul precision" → "binding criterion as durable spec" → "fix at mod_apr_transformer.rs:138-140" Methodology preserved: zero eprintln!, zero route-arounds, apr canonical, contract-first, lambda-labs pre-authorized, 5-whys reaches root. Next session: PR E first (5 ACs), then P1.1 + P1.4 + P2 (9 ACs). Spec v2.73.0 → v2.74.0. No coverage flip at amendment — §29 is a scoreboard, not a discharge. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
noahgift
added a commit
that referenced
this pull request
Apr 28, 2026
…confirms §25 corpus-diversity hypothesis — v2.77 → v2.78 (#1094) P1 corpus pipeline complete end-to-end. P2 MODEL-2 retrain on 565.6M-token codeparrot Python+permissive corpus (7.6× the 4× CSN-Python baseline) pushes val_loss from the 9.7507 plateau to 9.3837 — a 0.367-nat (4.7%) improvement with the SAME training configuration. §25 had concluded (after 80K-step LR-budget falsification on 4× CSN-Python): "There is no LR/step configuration that beats val_loss=9.75 on CSN-Python — only Stack v2 will move the needle." §33 confirms this empirically. The corpus-diversity binding criterion of §26.9 is satisfied. ## Pipeline (all stack-canonical, no muda) | Phase | Outcome | |-------|---------| | P1.0 contract authored (PROPOSED → ACTIVE) | #1080 → #1089 | | P1.1 apr pull dataset extension | #1089 MERGED | | P1.4 codeparrot pull | 80 shards / 27 GB | | P1.5a parquet → JSONL filter | 405,904 rows / 3.17 GB | | P1.5b BPE encode-corpus | 57 shards / 565.6M tokens / 10h | | P2 MODEL-2 retrain on RTX 4090 | EARLY_STOP at 51 ep / 47 min | Total wall time from contract authoring to val_loss=9.3837: ~14 hours. ## Training curve highlights - epoch 0: train=9.7567, val=10.0698 (init) - epoch 10: train=9.4610, val=9.5657 (post-warmup) - epoch 30: train=9.2x, val=9.42x - epoch 44: val=9.3837 (BEST) - epoch 50: train=9.2093, val=9.3889 (EARLY_STOP next) Full per-epoch metadata in evidence/model-2-codeparrot-retrain-2026-04-28/all-epochs.json. ## Coverage impact §33 is binding evidence for SHIP-021 (corpus diversity binding) — promotion to DISCHARGED is deferred to a separate PR that updates the SHIP-021 contract atomically. Spec scoreboard unchanged (15+33) in this PR. ## Files - evidence/model-2-codeparrot-retrain-2026-04-28/launch.log - evidence/model-2-codeparrot-retrain-2026-04-28/all-epochs.json - §33 spec section (8 subsections, ~80 lines) - Header: v2.77.0 → v2.78.0 ## Methodology landed The §26.8 stack-tool-extension rule paid off concretely: - 6h authoring cost (P1.0 contract + P1.1 impl) → permanent apr capability - Every future dataset pull benefits - §33's val_loss=9.3837 is downstream proof of the methodology This commit represents the first cycle in §22→§33 where the spec amendment has the same priority as the empirical result. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
P1.0 of the SHIP-TWO-001 §26.9 corpus pipeline. Authoring
contracts/apr-cli-pull-dataset-v1.yamlis the prerequisite for P1.1 (extendaprCLI) per the §26.8.1 binding methodology rule: whenaprlacks a feature, author contract → extendapr→ use extended stack tool. Never route around to non-stack CLIs (huggingface-cli) or deprecated namespaces (batuta hf pull).What this contract codifies
apr_pull_dataset_signatureinclude_glob_semanticslicense_allowlist_semanticslicense; row-levelregistry_drift_preventionFalsification tests (8 total)
FALSIFY-APR-PULL-DATASET-001: subcommand exists with required flags-002: include glob filters correctly (1 file in/1 file out)-003: no-match glob fails fast (exit non-zero)-004: license allowlist drops disallowed rows (parquet row filter)-005: model-path backward compatible (apr pull <MODEL>unchanged)-006: 3-surface drift prevention (registry test passes)-007:pv validateexits 0-008: deprecated namespaces (batuta hf pull,huggingface-cli) not used in P1Validation
Kani/Lean scores upgrade in P1.1 once implementation provides harnesses + theorems.
Status
PROPOSED — promotion to ACTIVE requires:
FALSIFY-APR-PULL-DATASET-*tests pass liveapr-cli-commands-v1.yamlregistry updatedcli_commands::registered_commands()test PASSES with new dataset asset-typeSpec references
SPEC-SHIP-TWO-001 §26.8— apr-is-canonical binding methodology ruleSPEC-SHIP-TWO-001 §26.9— P1.0 prerequisite of corpus pipelinefeedback_monorepo_single_source_of_truth.md— APR-MONO consolidation 2026-04-23feedback_fix_root_cause_never_route_around.mdfeedback_cli_subcommand_three_surface_drift.mdTest plan
pv validate contracts/apr-cli-pull-dataset-v1.yamlexits 0🤖 Generated with Claude Code