feat(falsify-ship-017): MODEL-2 AC-SHIP2-007 PARTIAL discharge (restacked) by noahgift · Pull Request #1032 · paiml/aprender

noahgift · 2026-04-23T17:49:38Z

Summary

Restacks the MODEL-2 FALSIFY-SHIP-017 PARTIAL discharge (originally PR #1004) onto the completed MODEL-1 stack (SHIP-009 at v2.33.0 → this PR at v2.34.0).

What this discharges

AC-SHIP2-007 — "apr run produces syntactically valid Python on 100 held-out prompts" — via new GATE-ARCH-370M-005 in contracts/model-families/llama-370m-sovereign-v1.yaml (v1.5.0 → v1.6.0, stays ACTIVE) with discharge_status: PARTIAL_ALGORITHM_LEVEL.

Decision rule: "≤ 1 SyntaxError tolerated out of 100 held-out prompts, ≥ 2 is a ship-blocker" — a pure integer threshold bound in crates/aprender-train/src/models/llama_370m.rs:

AC_SHIP2_007_HELDOUT_PROMPT_COUNT = 100
AC_SHIP2_007_MAX_TOLERATED_SYNTAX_ERRORS = 1
verdict_from_syntax_error_count(errors: usize) -> Ship017Verdict
2 falsification tests (threshold logic + YAML shape bind via include_str!)

Full discharge blocks on real trained 370M .apr + 100-prompt apr run harness (AC-SHIP2-003/004 pretraining compute-dispatch) — fixture swap only, no harness rewrite.

Stacked on

Base: main @ 57adc9f0c (SHIP-009 via PR feat(falsify-ship-009): MODEL-1 apr-provenance multi-bind PARTIAL discharge (10/10 — last MODEL-1 row) #1031)
Prior MODEL-1 stack: SHIP-008 → SHIP-006 → SHIP-002 → SHIP-005 → SHIP-010 → SHIP-007 → SHIP-003 → SHIP-004 → SHIP-001 → SHIP-009

Spec bump

v2.33.0 → v2.34.0 (amendment block rewritten; AC-SHIP2-007 table row marked **(PARTIAL_ALGORITHM_LEVEL v2.34.0)**).

Aggregate status

MODEL-2 coverage 4/12 → 5/12 touched. MODEL-1 remains fully saturated at 10/10 PARTIAL. Combined: 18 PARTIAL + 3 DISCHARGED across both models.

Verification

cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/model-families/llama-370m-sovereign-v1.yaml → Contract is valid, 0 errors
cargo test -p aprender-train --lib llama_370m → 12/12 pass (including both new falsify_ship_017_* tests)

Supersedes

Supersedes and closes #1004 (the DIRTY pre-stack version).

Test plan

Contract validates clean
All new tests green
CI gate + workspace-test pass

🤖 Generated with Claude Code

…ce-v1 multi-bind FALSIFY-SHIP-009 (AC-SHIP1-009 "MODEL-1 teacher license + data provenance recorded in model.apr metadata") attains PARTIAL_ALGORITHM_LEVEL by attaching a second binding to the same C-APR-PROVENANCE contract that already discharges MODEL-2's AC-SHIP2-012. The AprV2Metadata + serde-JSON decision rule is model-agnostic, so one contract cleanly carries both discharges. Changes: - contracts/apr-provenance-v1.yaml v1.0.0 → v1.1.0 (stays ACTIVE): new GATE-APR-PROV-004 block binds AC-SHIP1-009 / FALSIFY-SHIP-009 at PARTIAL_ALGORITHM_LEVEL with ship_blocking=true; full discharge blocks on teacher .apr republish populating license, data_source, data_license as named fields (PMAT-686 fixture-swap). - crates/aprender-core/src/format/tests/provenance_tests.rs: - falsify_ship_009_apr_metadata_applies_to_model_1_teacher — teacher-representative round-trip (license="apache-2.0", data_source="qwen2.5-coder-7b-instruct", data_license="apache-2.0"). - falsify_ship_009_gate_apr_prov_004_has_partial_discharge_marker — include_str! YAML-binding assertion that the new gate has the correct binds_to / falsification_id / discharge_status / flags. - crates/aprender-core/Cargo.toml: add serde_yaml to [dev-dependencies] (needed for the YAML-binding test). - docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0 → v2.24.0: new v2.24.0 amendment block documenting the first MODEL-1 PARTIAL and first multi-model multi-bind on one contract. Pattern extensions: - First MODEL-1 PARTIAL (prior six targeted MODEL-2). - First multi-model multi-bind on ONE contract (prior PARTIALs each had a dedicated contract). - Sixth falsification of the "exhausted" verdict: SHIP-019 → SHIP-017 → SHIP-020 → SHIP-018 → SHIP-016 → SHIP-009 — sixth is cross-model, strictly more surprising than the prior five. All 5 provenance tests green (3 SHIP-022 + 2 SHIP-009). Status after v2.24.0: - MODEL-2: 3/12 ACTIVE + 7/12 PARTIAL = 10/12 touched (83.3%) - MODEL-1: 9/10 DISCHARGED (via SHIP-TWO-001-MODEL-1-TEACHER tag) + 1/10 PARTIAL (009). Will flip to fully ACTIVE when PMAT-686 republishes teacher.apr with provenance fields populated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…EVEL discharge (task #149) MODEL-2 (albor 370M Sovereign) gate #4 at PARTIAL: binds AC-SHIP2-007 ("apr run produces syntactically valid Python on 100 held-out prompts") to FALSIFY-SHIP-017 via new GATE-ARCH-370M-005 with `discharge_status: PARTIAL_ALGORITHM_LEVEL`. The decision rule — "≤ 1 SyntaxError tolerated out of 100, ≥ 2 is a ship-blocker" — is a pure integer threshold and is proven correct at `cargo test` time today. Full discharge (100-prompt `apr run` harness against a trained 370M .apr) remains PENDING on pretraining compute-dispatch (AC-SHIP2-003/004) — fixture swap is data-only, no harness rewrite required. Changes: - crates/aprender-train/src/models/llama_370m.rs: - Adds `AC_SHIP2_007_HELDOUT_PROMPT_COUNT` (=100) + `AC_SHIP2_007_MAX_TOLERATED_SYNTAX_ERRORS` (=1) consts mirroring the spec §6 harness size and §8.3 FALSIFY-SHIP-017 tolerance. - Adds `verdict_from_syntax_error_count(errors) -> Ship017Verdict` const fn — the pure threshold. - Adds `falsify_ship_017_syntax_error_count_threshold_logic` — Pass boundary (0,1), Fail boundary (2,50,100), monotonicity sweep ∈ [0,100], and provenance pinning. - Adds `falsify_ship_017_gate_arch_370m_005_has_partial_discharge_marker` — binds sovereign contract YAML shape (falsification_id, binds_to, discharge_status, evidence_discharged_by, full_discharge_blocks_on, ship_blocking) to Rust tests via include_str!. - contracts/model-families/llama-370m-sovereign-v1.yaml v1.5.0 → v1.6.0 (stays ACTIVE): adds GATE-ARCH-370M-005. - docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0 → v2.25.0 with amendment block: counter-example survey continues to find new PARTIAL levers after two prior "exhausted" verdicts (SHIP-015 → SHIP-019 → SHIP-017). New status: 3/12 ACTIVE + 4/12 PARTIAL = 7/12 touched (58.3%). Verification: - cargo test -p aprender-train --lib llama_370m → 12/12 pass (including both new falsify_ship_017_* tests) - cargo clippy -p aprender-train --lib -- -D warnings → clean - pv validate contracts/model-families/llama-370m-sovereign-v1.yaml → Contract is valid Closes task #149. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift · 2026-04-24T11:42:13Z

Superseded by #1044 — 11-PR cascade collapsed into single squash-merge to avoid O(n²) rebase treadmill. Content identical; this branch's commit is in #1044.

noahgift enabled auto-merge (squash) April 23, 2026 17:49

noahgift force-pushed the feat/falsify-ship-017-restacked branch 2 times, most recently from 12298c5 to 42b12cf Compare April 24, 2026 06:43

noahgift and others added 2 commits April 24, 2026 12:40

noahgift force-pushed the feat/falsify-ship-017-restacked branch from 42b12cf to 166c305 Compare April 24, 2026 10:42

noahgift mentioned this pull request Apr 24, 2026

feat(ship-two-001): full algorithmic coverage bundle + README contract-backed rewrite (v2.30 → v2.43) #1044

Merged

noahgift closed this Apr 24, 2026

auto-merge was automatically disabled April 24, 2026 11:42
Pull request was closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(falsify-ship-017): MODEL-2 AC-SHIP2-007 PARTIAL discharge (restacked)#1032

feat(falsify-ship-017): MODEL-2 AC-SHIP2-007 PARTIAL discharge (restacked)#1032
noahgift wants to merge 2 commits into
mainfrom
feat/falsify-ship-017-restacked

noahgift commented Apr 23, 2026

Uh oh!

noahgift commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 23, 2026

Summary

What this discharges

Stacked on

Spec bump

Aggregate status

Verification

Supersedes

Test plan

Uh oh!

noahgift commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant