feat(ship-two-001): FALSIFY-SHIP-020 algorithm-level PARTIAL discharge (5th PARTIAL) by noahgift · Pull Request #1005 · paiml/aprender

noahgift · 2026-04-22T13:25:37Z

Summary

Binds AC-SHIP2-010 (inference decode throughput ≥ 100 tok/s on RTX 4090) ↔ FALSIFY-SHIP-020 via new GATE-ARCH-370M-006 in the sovereign contract
Pure f32 threshold fn verdict_from_decode_tps(measured_tps) -> Ship020Verdict + const AC_SHIP2_010_MIN_DECODE_TPS_RTX4090 = 100.0 in crates/aprender-train/src/models/llama_370m.rs
Two unit tests covering 5 invariants (Pass boundary, Fail boundary at one f32 ULP, bidirectional monotonicity, conservative Fail for NaN/±∞, provenance pinning)
Sovereign contract contracts/model-families/llama-370m-sovereign-v1.yaml v1.5.0 → v1.6.0 (stays ACTIVE)
Spec bump v2.23.0 → v2.26.0 with amendment block

Status lift

MODEL-2 ship-gate status after this PR:

3/12 ACTIVE (001, 011, 012) + 5/12 PARTIAL_ALGORITHM_LEVEL (002 via SHIP-012, 005 via SHIP-015, 007 via SHIP-017 [PR #1004], 009 via SHIP-019, 010 via SHIP-020 ← this PR) = 8/12 touched (66.7%).

Remaining 4 (003/004/006/008) all need real 370M training compute or a benchmark pipeline on RTX 4090.

Pattern lesson

v2.22.0 of the spec declared MODEL-2 non-compute PARTIAL levers "exhausted". The counter-example survey has now falsified that verdict three times:

SHIP-019 (v2.22.0 itself, task chore(deps): Bump entrenar from 0.2.6 to 0.2.9 in the production-dependencies group #117)
SHIP-017 (PR feat(ship-two-001): FALSIFY-SHIP-017 AC-SHIP2-007 PARTIAL_ALGORITHM_LEVEL discharge (task #149) #1004, task feat(pruning): Implement Lottery Ticket Hypothesis pruning #149)
SHIP-020 (this PR, task apr fails to find config.json #150)

Rule (reinforced): when a SHIP gate names a threshold / tolerance / ratio / cut-off and the compute-heavy harness is separable from the decision function, the threshold fn can land today at unit-test time — even when the full end-to-end harness is blocked on compute.

Full discharge blocks on

Real 370M .apr from AC-SHIP2-003/004 compute-dispatch + three independent apr bench --tokens 128 --json medians on the RTX 4090 host. Fixture-swap only — no decision-rule rewrite.

Scope note

Also includes 6 lines of pre-existing cargo fmt -p aprender-train --check fixes in crates/aprender-train/src/train/device.rs (whitespace only). Keeps cargo fmt -p aprender-train --check green under the Toyota Way "all defects are your defects" rule.

Test plan

cargo test -p aprender-train --lib models::llama_370m → 11/11 PASS (9 pre-existing + 2 new)
pv validate contracts/model-families/llama-370m-sovereign-v1.yaml → "Contract is valid. 0 error(s), 0 warning(s)."
cargo clippy -p aprender-train --lib -- -D warnings → green
cargo fmt -p aprender-train --check → green
PMAT pre-commit quality gates → green
CI (ci / gate, workspace-test)
Full discharge: real 370M .apr + apr bench --tokens 128 --json on RTX 4090 (blocked on AC-SHIP2-003/004 compute-dispatch — task fix(lint): Resolve bashrs false positives #126 in-flight)

Task #150.

🤖 Generated with Claude Code

Binds AC-SHIP2-010 (inference decode throughput ≥ 100 tok/s on RTX 4090) to a new GATE-ARCH-370M-006 in the sovereign contract via a pure f32 threshold fn + two unit tests. The compute-heavy half (`apr bench` on a real trained 370M .apr) is deferred to AC-SHIP2-003/004 compute-dispatch; the decision rule itself is proven today. Changes: - crates/aprender-train/src/models/llama_370m.rs: * AC_SHIP2_010_MIN_DECODE_TPS_RTX4090 = 100.0 (const floor) * Ship020Verdict { Pass, Fail } * verdict_from_decode_tps(f32) -> Ship020Verdict (fn, non-finite → Fail) * falsify_ship_020_decode_tps_threshold_logic (5 invariants: Pass boundary, Fail boundary at one f32 ULP, monotonicity in both directions, conservative Fail for NaN/±∞, provenance pinning that the const stays = 100.0) * falsify_ship_020_gate_arch_370m_006_has_partial_discharge_marker (contract parses + advertises PARTIAL_ALGORITHM_LEVEL + evidence_discharged_by populated + full_discharge_blocks_on documented + ship_blocking:true) - contracts/model-families/llama-370m-sovereign-v1.yaml: * v1.5.0 → v1.6.0, stays ACTIVE * New GATE-ARCH-370M-006 binding AC-SHIP2-010 ↔ FALSIFY-SHIP-020 with discharge_status: PARTIAL_ALGORITHM_LEVEL - docs/specifications/aprender-train/ship-two-models-spec.md: * v2.23.0 → v2.26.0 with amendment block * MODEL-2 ship-gate status updated: 3/12 ACTIVE + 5/12 PARTIAL = 8/12 touched (66.7%) - crates/aprender-train/src/train/device.rs: * 2 pre-existing fmt fixes (6 lines of whitespace) — restores `cargo fmt -p aprender-train --check` green. Pre-existing on origin/main; kept in this PR under Toyota Way "all defects are your defects" rule. Pattern lesson: v2.22.0 declared MODEL-2 non-compute PARTIAL levers "exhausted" — re-running the counter-example survey has now falsified that verdict three times (SHIP-019 → SHIP-017 → SHIP-020). When a SHIP gate names a threshold / tolerance / ratio / cut-off and the compute-heavy harness is separable from the decision function, the threshold fn can land today at unit-test time — even when the full end-to-end harness is blocked on compute. Full discharge blocks on: real 370M .apr from AC-SHIP2-003/004 compute-dispatch + three independent `apr bench --tokens 128 --json` medians on RTX 4090 host. Fixture-swap only — no decision-rule rewrite. Verification: - cargo test -p aprender-train --lib models::llama_370m → 11/11 PASS - pv validate contracts/model-families/llama-370m-sovereign-v1.yaml → "Contract is valid. 0 error(s), 0 warning(s)." - cargo clippy -p aprender-train --lib → green - cargo fmt -p aprender-train --check → green Task #150. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…lean branch) Clean-branch rebuild of SHIP-007 PARTIAL_ALGORITHM_LEVEL discharge on main (superseding stale PR #1014 which was stacked on feat/falsify-ship-008/006-partial-discharge branches that had not yet merged to main). Algorithm commit carries the same 7-section mutation survey as the original be6d129, re-based onto post-SHIP-002 main (commit f615148, contract v1.1.0). Wires AC-SHIP1-007 "apr bench decode throughput ≥30 tok/s on RTX 4090 (7B Q4_K target)" at PARTIAL_ALGORITHM_LEVEL: a pure f32 threshold verdict fn bound to the MODEL-1 teacher ship floor. Decision rule is proven today; compute-heavy half (live `apr bench` on RTX 4090) is deferred to hardware evidence collection. Files: - `crates/aprender-core/src/bench/ship_007.rs` (NEW) — `AC_SHIP1_007_MIN_DECODE_TPS_RTX4090_7B = 30.0`, `Ship007Verdict { Pass, Fail }`, `verdict_from_decode_tps(f32) -> Ship007Verdict`, `falsify_ship_007_decode_tps_threshold_logic` 7-section survey: 1. boundary (30.0 exactly → Pass; contract is ≥, not >) 2. one-ULP-below → Fail (sharpest off-by-one counter-example) 3. clear Pass band (45 / 100 tok/s) 4. clear Fail band (0 / 10 / 29.999999) 5. monotonicity above floor + below floor 6. non-finite → Fail conservatively (NaN, +∞, -∞) 7. provenance pin binding 30.0 to spec §4.2. - `crates/aprender-core/src/bench/mod.rs` — register `pub mod ship_007;`. - `contracts/qwen2-e2e-verification-v1.yaml` v1.1.0 → v1.2.0 — adds `FALSIFY-QW2E-SHIP-007` with `ship_blocking: true`, `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by` pointing at ship_007.rs + the harness test, and `full_discharge_blocks_on` live `apr bench --iterations 5 --max-tokens 128 paiml/qwen2.5-coder-7b-apache-q4k-v1` on RTX 4090 with --features cuda; median of 5 iterations must be ≥ 30.0. - `docs/specifications/aprender-train/ship-two-models-spec.md` v2.26.0 → v2.27.0 — annotates AC-SHIP1-007 row with PARTIAL_ALGORITHM_LEVEL v2.27.0 marker and adds v2.27.0 amendment entry. Design: mirrors MODEL-2 SHIP-020 single-f32-threshold shape (PR #1005 not yet on main). Once both ship, the two `verdict_from_decode_tps_*` fns should be deduplicated into a single parameterized helper `verdict_from_decode_tps(measured, floor) -> ThresholdVerdict` with model-specific floors pinned as module-level consts. MODEL-1 floor is 30.0 (7B Q4_K, bandwidth-bound at ~3.5× the 370M size); MODEL-2 floor is 100.0 (370M sovereign, compute-bound at RTX 4090 bandwidth). MODEL-1 AC-SHIP1 coverage: 4/10 touched (SHIP-009 + SHIP-008 + SHIP-006 + SHIP-002) → **5/10** touched (+ SHIP-007). Test: `cargo test -p aprender-core --lib falsify_ship_007_decode_tps_threshold_logic` → 1 passed. Contract: `pv validate contracts/qwen2-e2e-verification-v1.yaml` → 0 errors. Clippy: `cargo clippy -p aprender-core --lib -- -D warnings` → clean. Fmt: `cargo fmt --check -p aprender-core` → clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…lean branch) (#1019) * feat(falsify-ship-007): MODEL-1 apr bench decode ≥30 tok/s PARTIAL (clean branch) Clean-branch rebuild of SHIP-007 PARTIAL_ALGORITHM_LEVEL discharge on main (superseding stale PR #1014 which was stacked on feat/falsify-ship-008/006-partial-discharge branches that had not yet merged to main). Algorithm commit carries the same 7-section mutation survey as the original be6d129, re-based onto post-SHIP-002 main (commit f615148, contract v1.1.0). Wires AC-SHIP1-007 "apr bench decode throughput ≥30 tok/s on RTX 4090 (7B Q4_K target)" at PARTIAL_ALGORITHM_LEVEL: a pure f32 threshold verdict fn bound to the MODEL-1 teacher ship floor. Decision rule is proven today; compute-heavy half (live `apr bench` on RTX 4090) is deferred to hardware evidence collection. Files: - `crates/aprender-core/src/bench/ship_007.rs` (NEW) — `AC_SHIP1_007_MIN_DECODE_TPS_RTX4090_7B = 30.0`, `Ship007Verdict { Pass, Fail }`, `verdict_from_decode_tps(f32) -> Ship007Verdict`, `falsify_ship_007_decode_tps_threshold_logic` 7-section survey: 1. boundary (30.0 exactly → Pass; contract is ≥, not >) 2. one-ULP-below → Fail (sharpest off-by-one counter-example) 3. clear Pass band (45 / 100 tok/s) 4. clear Fail band (0 / 10 / 29.999999) 5. monotonicity above floor + below floor 6. non-finite → Fail conservatively (NaN, +∞, -∞) 7. provenance pin binding 30.0 to spec §4.2. - `crates/aprender-core/src/bench/mod.rs` — register `pub mod ship_007;`. - `contracts/qwen2-e2e-verification-v1.yaml` v1.1.0 → v1.2.0 — adds `FALSIFY-QW2E-SHIP-007` with `ship_blocking: true`, `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by` pointing at ship_007.rs + the harness test, and `full_discharge_blocks_on` live `apr bench --iterations 5 --max-tokens 128 paiml/qwen2.5-coder-7b-apache-q4k-v1` on RTX 4090 with --features cuda; median of 5 iterations must be ≥ 30.0. - `docs/specifications/aprender-train/ship-two-models-spec.md` v2.26.0 → v2.27.0 — annotates AC-SHIP1-007 row with PARTIAL_ALGORITHM_LEVEL v2.27.0 marker and adds v2.27.0 amendment entry. Design: mirrors MODEL-2 SHIP-020 single-f32-threshold shape (PR #1005 not yet on main). Once both ship, the two `verdict_from_decode_tps_*` fns should be deduplicated into a single parameterized helper `verdict_from_decode_tps(measured, floor) -> ThresholdVerdict` with model-specific floors pinned as module-level consts. MODEL-1 floor is 30.0 (7B Q4_K, bandwidth-bound at ~3.5× the 370M size); MODEL-2 floor is 100.0 (370M sovereign, compute-bound at RTX 4090 bandwidth). MODEL-1 AC-SHIP1 coverage: 4/10 touched (SHIP-009 + SHIP-008 + SHIP-006 + SHIP-002) → **5/10** touched (+ SHIP-007). Test: `cargo test -p aprender-core --lib falsify_ship_007_decode_tps_threshold_logic` → 1 passed. Contract: `pv validate contracts/qwen2-e2e-verification-v1.yaml` → 0 errors. Clippy: `cargo clippy -p aprender-core --lib -- -D warnings` → clean. Fmt: `cargo fmt --check -p aprender-core` → clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci: retrigger after 3 disk-guard race failures --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift · 2026-04-23T18:00:32Z

Superseded by #1033 — rebased onto MODEL-1 stack + SHIP-017 at v2.35.0.

AC-SHIP2-008 / FALSIFY-SHIP-018 bound via new GATE-ARCH-370M-007 at PARTIAL_ALGORITHM_LEVEL. Pure two-number threshold fn `verdict_from_pass_at_1(correct, total, threshold_pct)` + const `AC_SHIP2_008_MIN_HUMANEVAL_PASS_AT_1_PCT = 30.0` in crates/aprender-train/src/models/llama_370m.rs — proves the spec's 'HumanEval pass@1 ≥ 30.0%' decision rule at `cargo test` time, independent of a trained artifact. Two unit tests prove: - boundary (f32-exact 50/100 = 50.0% with ±ULP shift showing `>=` is inclusive; 49/164 and 29/100 fail the 30.0 floor) - monotonicity (correct sweep 0..=164 at total=164 never flips Pass → Fail) - div-safety (total=0 fails closed) + sanity (correct>total fails) - non-finite threshold guard (NaN / ±∞ all Fail) - provenance pin (const stays = 30.0) - YAML marker (GATE-ARCH-370M-007 carries PARTIAL_ALGORITHM_LEVEL, binds AC-SHIP2-008, cites FALSIFY-SHIP-018, ship_blocking:true) Full discharge blocks on real 370M .apr (AC-SHIP2-003/004 compute) + three seed=0 `apr eval --benchmark humaneval --json` median pass@1 values fed into the verdict fn — all three must Pass. Fixture-swap only; no harness rewrite. 6th PARTIAL for MODEL-2 (after SHIP-012/015/017/019/020). Spec v2.22.0's 'exhausted' verdict now falsified 4×. Remaining 5th-PARTIAL candidate: SHIP-016 (`apr qa` 8-of-8 aggregate — not a single threshold). SHIP-013/014 genuinely need real compute. Contract: llama-370m-sovereign-v1.yaml v1.5.0 → v1.6.0 (stays ACTIVE). Spec: ship-two-models-spec.md v2.23.0 → v2.24.0 (amendment block). Also: 6-line pre-existing fmt fix in train/device.rs under Toyota Way "all defects are your defects" (same pattern as PR #1005). Status: MODEL-2 ship-gates 3/12 ACTIVE + 6/12 PARTIAL = 9/12 touched (75.0%). Remaining 3 (003/004/006) all need real 370M compute. Tests: cargo test -p aprender-train --lib models::llama_370m → 11/11 pass. `pv validate contracts/model-families/llama-370m-sovereign-v1.yaml` → Contract is valid. cargo fmt -p aprender-train --check → clean. cargo clippy -p aprender-train --lib -- -D warnings → clean. Refs: SHIP-TWO-001, task #151, FALSIFY-SHIP-018. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…t-backed rewrite (v2.30 → v2.43) (#1044) * feat(ship-009): PARTIAL discharge first MODEL-1 gate via apr-provenance-v1 multi-bind FALSIFY-SHIP-009 (AC-SHIP1-009 "MODEL-1 teacher license + data provenance recorded in model.apr metadata") attains PARTIAL_ALGORITHM_LEVEL by attaching a second binding to the same C-APR-PROVENANCE contract that already discharges MODEL-2's AC-SHIP2-012. The AprV2Metadata + serde-JSON decision rule is model-agnostic, so one contract cleanly carries both discharges. Changes: - contracts/apr-provenance-v1.yaml v1.0.0 → v1.1.0 (stays ACTIVE): new GATE-APR-PROV-004 block binds AC-SHIP1-009 / FALSIFY-SHIP-009 at PARTIAL_ALGORITHM_LEVEL with ship_blocking=true; full discharge blocks on teacher .apr republish populating license, data_source, data_license as named fields (PMAT-686 fixture-swap). - crates/aprender-core/src/format/tests/provenance_tests.rs: - falsify_ship_009_apr_metadata_applies_to_model_1_teacher — teacher-representative round-trip (license="apache-2.0", data_source="qwen2.5-coder-7b-instruct", data_license="apache-2.0"). - falsify_ship_009_gate_apr_prov_004_has_partial_discharge_marker — include_str! YAML-binding assertion that the new gate has the correct binds_to / falsification_id / discharge_status / flags. - crates/aprender-core/Cargo.toml: add serde_yaml to [dev-dependencies] (needed for the YAML-binding test). - docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0 → v2.24.0: new v2.24.0 amendment block documenting the first MODEL-1 PARTIAL and first multi-model multi-bind on one contract. Pattern extensions: - First MODEL-1 PARTIAL (prior six targeted MODEL-2). - First multi-model multi-bind on ONE contract (prior PARTIALs each had a dedicated contract). - Sixth falsification of the "exhausted" verdict: SHIP-019 → SHIP-017 → SHIP-020 → SHIP-018 → SHIP-016 → SHIP-009 — sixth is cross-model, strictly more surprising than the prior five. All 5 provenance tests green (3 SHIP-022 + 2 SHIP-009). Status after v2.24.0: - MODEL-2: 3/12 ACTIVE + 7/12 PARTIAL = 10/12 touched (83.3%) - MODEL-1: 9/10 DISCHARGED (via SHIP-TWO-001-MODEL-1-TEACHER tag) + 1/10 PARTIAL (009). Will flip to fully ACTIVE when PMAT-686 republishes teacher.apr with provenance fields populated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(ship-two-001): FALSIFY-SHIP-017 AC-SHIP2-007 PARTIAL_ALGORITHM_LEVEL discharge (task #149) MODEL-2 (albor 370M Sovereign) gate #4 at PARTIAL: binds AC-SHIP2-007 ("apr run produces syntactically valid Python on 100 held-out prompts") to FALSIFY-SHIP-017 via new GATE-ARCH-370M-005 with `discharge_status: PARTIAL_ALGORITHM_LEVEL`. The decision rule — "≤ 1 SyntaxError tolerated out of 100, ≥ 2 is a ship-blocker" — is a pure integer threshold and is proven correct at `cargo test` time today. Full discharge (100-prompt `apr run` harness against a trained 370M .apr) remains PENDING on pretraining compute-dispatch (AC-SHIP2-003/004) — fixture swap is data-only, no harness rewrite required. Changes: - crates/aprender-train/src/models/llama_370m.rs: - Adds `AC_SHIP2_007_HELDOUT_PROMPT_COUNT` (=100) + `AC_SHIP2_007_MAX_TOLERATED_SYNTAX_ERRORS` (=1) consts mirroring the spec §6 harness size and §8.3 FALSIFY-SHIP-017 tolerance. - Adds `verdict_from_syntax_error_count(errors) -> Ship017Verdict` const fn — the pure threshold. - Adds `falsify_ship_017_syntax_error_count_threshold_logic` — Pass boundary (0,1), Fail boundary (2,50,100), monotonicity sweep ∈ [0,100], and provenance pinning. - Adds `falsify_ship_017_gate_arch_370m_005_has_partial_discharge_marker` — binds sovereign contract YAML shape (falsification_id, binds_to, discharge_status, evidence_discharged_by, full_discharge_blocks_on, ship_blocking) to Rust tests via include_str!. - contracts/model-families/llama-370m-sovereign-v1.yaml v1.5.0 → v1.6.0 (stays ACTIVE): adds GATE-ARCH-370M-005. - docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0 → v2.25.0 with amendment block: counter-example survey continues to find new PARTIAL levers after two prior "exhausted" verdicts (SHIP-015 → SHIP-019 → SHIP-017). New status: 3/12 ACTIVE + 4/12 PARTIAL = 7/12 touched (58.3%). Verification: - cargo test -p aprender-train --lib llama_370m → 12/12 pass (including both new falsify_ship_017_* tests) - cargo clippy -p aprender-train --lib -- -D warnings → clean - pv validate contracts/model-families/llama-370m-sovereign-v1.yaml → Contract is valid Closes task #149. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(ship-two-001): FALSIFY-SHIP-020 algorithm-level PARTIAL discharge Binds AC-SHIP2-010 (inference decode throughput ≥ 100 tok/s on RTX 4090) to a new GATE-ARCH-370M-006 in the sovereign contract via a pure f32 threshold fn + two unit tests. The compute-heavy half (`apr bench` on a real trained 370M .apr) is deferred to AC-SHIP2-003/004 compute-dispatch; the decision rule itself is proven today. Changes: - crates/aprender-train/src/models/llama_370m.rs: * AC_SHIP2_010_MIN_DECODE_TPS_RTX4090 = 100.0 (const floor) * Ship020Verdict { Pass, Fail } * verdict_from_decode_tps(f32) -> Ship020Verdict (fn, non-finite → Fail) * falsify_ship_020_decode_tps_threshold_logic (5 invariants: Pass boundary, Fail boundary at one f32 ULP, monotonicity in both directions, conservative Fail for NaN/±∞, provenance pinning that the const stays = 100.0) * falsify_ship_020_gate_arch_370m_006_has_partial_discharge_marker (contract parses + advertises PARTIAL_ALGORITHM_LEVEL + evidence_discharged_by populated + full_discharge_blocks_on documented + ship_blocking:true) - contracts/model-families/llama-370m-sovereign-v1.yaml: * v1.5.0 → v1.6.0, stays ACTIVE * New GATE-ARCH-370M-006 binding AC-SHIP2-010 ↔ FALSIFY-SHIP-020 with discharge_status: PARTIAL_ALGORITHM_LEVEL - docs/specifications/aprender-train/ship-two-models-spec.md: * v2.23.0 → v2.26.0 with amendment block * MODEL-2 ship-gate status updated: 3/12 ACTIVE + 5/12 PARTIAL = 8/12 touched (66.7%) - crates/aprender-train/src/train/device.rs: * 2 pre-existing fmt fixes (6 lines of whitespace) — restores `cargo fmt -p aprender-train --check` green. Pre-existing on origin/main; kept in this PR under Toyota Way "all defects are your defects" rule. Pattern lesson: v2.22.0 declared MODEL-2 non-compute PARTIAL levers "exhausted" — re-running the counter-example survey has now falsified that verdict three times (SHIP-019 → SHIP-017 → SHIP-020). When a SHIP gate names a threshold / tolerance / ratio / cut-off and the compute-heavy harness is separable from the decision function, the threshold fn can land today at unit-test time — even when the full end-to-end harness is blocked on compute. Full discharge blocks on: real 370M .apr from AC-SHIP2-003/004 compute-dispatch + three independent `apr bench --tokens 128 --json` medians on RTX 4090 host. Fixture-swap only — no decision-rule rewrite. Verification: - cargo test -p aprender-train --lib models::llama_370m → 11/11 PASS - pv validate contracts/model-families/llama-370m-sovereign-v1.yaml → "Contract is valid. 0 error(s), 0 warning(s)." - cargo clippy -p aprender-train --lib → green - cargo fmt -p aprender-train --check → green Task #150. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * falsify(ship): SHIP-018 PARTIAL — humaneval pass@1 ≥30.0% threshold fn AC-SHIP2-008 / FALSIFY-SHIP-018 bound via new GATE-ARCH-370M-007 at PARTIAL_ALGORITHM_LEVEL. Pure two-number threshold fn `verdict_from_pass_at_1(correct, total, threshold_pct)` + const `AC_SHIP2_008_MIN_HUMANEVAL_PASS_AT_1_PCT = 30.0` in crates/aprender-train/src/models/llama_370m.rs — proves the spec's 'HumanEval pass@1 ≥ 30.0%' decision rule at `cargo test` time, independent of a trained artifact. Two unit tests prove: - boundary (f32-exact 50/100 = 50.0% with ±ULP shift showing `>=` is inclusive; 49/164 and 29/100 fail the 30.0 floor) - monotonicity (correct sweep 0..=164 at total=164 never flips Pass → Fail) - div-safety (total=0 fails closed) + sanity (correct>total fails) - non-finite threshold guard (NaN / ±∞ all Fail) - provenance pin (const stays = 30.0) - YAML marker (GATE-ARCH-370M-007 carries PARTIAL_ALGORITHM_LEVEL, binds AC-SHIP2-008, cites FALSIFY-SHIP-018, ship_blocking:true) Full discharge blocks on real 370M .apr (AC-SHIP2-003/004 compute) + three seed=0 `apr eval --benchmark humaneval --json` median pass@1 values fed into the verdict fn — all three must Pass. Fixture-swap only; no harness rewrite. 6th PARTIAL for MODEL-2 (after SHIP-012/015/017/019/020). Spec v2.22.0's 'exhausted' verdict now falsified 4×. Remaining 5th-PARTIAL candidate: SHIP-016 (`apr qa` 8-of-8 aggregate — not a single threshold). SHIP-013/014 genuinely need real compute. Contract: llama-370m-sovereign-v1.yaml v1.5.0 → v1.6.0 (stays ACTIVE). Spec: ship-two-models-spec.md v2.23.0 → v2.24.0 (amendment block). Also: 6-line pre-existing fmt fix in train/device.rs under Toyota Way "all defects are your defects" (same pattern as PR #1005). Status: MODEL-2 ship-gates 3/12 ACTIVE + 6/12 PARTIAL = 9/12 touched (75.0%). Remaining 3 (003/004/006) all need real 370M compute. Tests: cargo test -p aprender-train --lib models::llama_370m → 11/11 pass. `pv validate contracts/model-families/llama-370m-sovereign-v1.yaml` → Contract is valid. cargo fmt -p aprender-train --check → clean. cargo clippy -p aprender-train --lib -- -D warnings → clean. Refs: SHIP-TWO-001, task #151, FALSIFY-SHIP-018. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(falsify-ship-016): PARTIAL discharge — apr qa 8-of-8 aggregate-AND verdict fn Wires GATE-ARCH-370M-008 (AC-SHIP2-006) to a pure verdict_from_qa_gates(&[bool]) -> Ship016Verdict aggregate-AND fn in aprender-train/src/models/llama_370m.rs, proven today by exhaustive 2^8 = 256-combination sweep + single-gate-flip falsifiability + monotonicity + 3 contract-drift guards (slice length 0/7/9/16 → Fail even when all-true). Discharge marker: PARTIAL_ALGORITHM_LEVEL. Pattern note: SHIP-016 is the first aggregate-AND shape — SHIP-017/018/020 were single-threshold shapes. The proof pattern now covers two distinct decision-rule shapes, confirming decision-rule/compute-harness separation is a reusable pattern, not a one-off. **5th PARTIAL after "exhausted" verdict falsified 4× already** (SHIP-019 → SHIP-017 → SHIP-020 → SHIP-018 → SHIP-016). **MODEL-2 ship-gate coverage: 3/12 ACTIVE + 7/12 PARTIAL = 10/12 touched (83.3%).** Remaining 2 truly compute-blocked (003 CE ≤ 2.2, 004 ≤21-day wall-clock) have no fixture-swap trick. Changes: - contracts/model-families/llama-370m-sovereign-v1.yaml v1.5.0 → v1.6.0 (GATE-ARCH-370M-008 block added; stays ACTIVE) - crates/aprender-train/src/models/llama_370m.rs: + AC_SHIP2_006_REQUIRED_QA_GATE_COUNT = 8 const + Ship016Verdict enum + verdict_from_qa_gates(&[bool]) pure fn with aggregate-AND + falsify_ship_016_apr_qa_aggregate_and_logic test (2^8 sweep + single-gate-flip + monotonicity + 3 contract-drift guards) + falsify_ship_016_gate_arch_370m_008_has_partial_discharge_marker test (YAML binding: binds_to AC-SHIP2-006, falsification_id FALSIFY-SHIP-016, discharge_status PARTIAL_ALGORITHM_LEVEL) - docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0 → v2.24.0 (amendment block documenting 5th PARTIAL, first aggregate-AND shape) - crates/aprender-train/src/train/device.rs: pre-existing fmt fixes bundled per Toyota Way "all defects are your defects" Full discharge blocks on: real 370M .apr from AC-SHIP2-003/004 compute-dispatch + 8-gate apr qa harness invocation with exit 0 → feed the 8 gate-result booleans into verdict_from_qa_gates and require Ship016Verdict::Pass. Fixture-swap only — no harness rewrite. Refs #152 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(falsify-ship-013-014): MODEL-2 bundle — CE ≤ 2.2 val + 21-day RTX 4090 training-budget PARTIAL discharges (12/12 MODEL-2 complete) Bundled PARTIAL_ALGORITHM_LEVEL discharge of the last two untouched MODEL-2 AC rows: AC-SHIP2-003 (val CE ≤ 2.2) and AC-SHIP2-004 (training ≤ 21 days on RTX 4090). First bundled double-discharge on the SHIP-TWO-001 surface. **FALSIFY-SHIP-013 / AC-SHIP2-003 / GATE-ARCH-370M-013** — val CE floor - `AC_SHIP2_003_MAX_VAL_CROSS_ENTROPY_LOSS: f32 = 2.2` - `Ship013Verdict { Pass, Fail }` - `const fn verdict_from_val_ce_loss(f32) -> Ship013Verdict` — Pass iff measured CE is finite AND non-negative AND ≤ 2.2. Negative values Fail conservatively because cross-entropy H(p,q) ≥ 0 by definition. - `falsify_ship_013_val_ce_loss_threshold_logic` — 7-section mutation survey: 1. Exact boundary 2.2 → Pass (inclusive floor, not strict <) 2. ULP asymmetry — above 2.2 → Fail, below 2.2 → Pass 3. Clear Pass band {0.0, 0.5, 1.0, 2.0, 2.199} 4. Clear Fail band {2.201, 3.0, 10.0, f32::MAX} 5. Non-finite {NaN, +∞, -∞} → Fail conservatively 6. Negative-CE domain-violation Fail ({-0.001, -1.0, -∞}) 7. Provenance pin: const stays = 2.2_f32 **FALSIFY-SHIP-014 / AC-SHIP2-004 / GATE-ARCH-370M-014** — training budget - `AC_SHIP2_004_MAX_TRAINING_DURATION_DAYS: u32 = 21` - `Ship014Verdict { Pass, Fail }` - `const fn verdict_from_training_duration_days(u32) -> Ship014Verdict` — Pass iff measured ≤ 21. u32 auto-rules out negatives and non-finites. - `falsify_ship_014_training_duration_threshold_logic` — 6-section mutation survey: 1. Exact boundary 21 → Pass (inclusive ceiling) 2. Adjacent: 20 → Pass, 22 → Fail 3. Clear Pass band {0, 1, 7, 14, 20, 21} 4. Clear Fail band {22, 30, 100, u32::MAX} 5. Monotonicity sweep 0..=42 — flips exactly once at 21→22 6. Provenance pin: const stays = 21_u32 **Changes:** - crates/aprender-train/src/models/llama_370m.rs: * 2 new public const floors + 2 verdict enums + 2 pure `const fn` verdict fns * 2 new mutation-survey unit tests (inside existing tests mod) - contracts/model-families/llama-370m-sovereign-v1.yaml: * v1.9.0 → v1.10.0, stays ACTIVE * New GATE-ARCH-370M-013 binding AC-SHIP2-003 ↔ FALSIFY-SHIP-013 with discharge_status: PARTIAL_ALGORITHM_LEVEL * New GATE-ARCH-370M-014 binding AC-SHIP2-004 ↔ FALSIFY-SHIP-014 with discharge_status: PARTIAL_ALGORITHM_LEVEL * v1.10.0 changelog entry at top of changelog block - docs/specifications/aprender-train/ship-two-models-spec.md: * Version 2.37.0 → 2.38.0 * v2.38.0 Date-field entry describing the bundle * AC-SHIP2-003 and AC-SHIP2-004 rows tagged `**(PARTIAL_ALGORITHM_LEVEL v2.38.0)**` **Verification:** - `cargo fmt -p aprender-train --check` — clean - `cargo test -p aprender-train --lib ship_013` → 1 passed - `cargo test -p aprender-train --lib ship_014` → 1 passed - `cargo test -p aprender-train --lib llama_370m` → 20 passed - `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/model-families/llama-370m-sovereign-v1.yaml` → 0 errors **Full discharge still blocks on:** - SHIP-013: live `apr pretrain --mode from-scratch --validate` loop on RTX 4090 with `--features cuda` producing a real MODEL-2 val CE. - SHIP-014: real wall-clock measurement of a MODEL-2 pretraining run on RTX 4090 from first `apr pretrain` dispatch to final checkpoint write. **Status shift:** - MODEL-2 coverage: 8/12 → **12/12 PARTIAL_ALGORITHM_LEVEL touched** (complete) - Across both models: 23 PARTIAL + 3 DISCHARGED Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(ship-two-001): v2.39.0 back-annotate §5.2/§7 with discharge status for single-source-of-truth Back-annotates discharge status already documented in prior v2.20/v2.21/v2.22 amendments into the §5.2 MODEL-2 acceptance-criteria table, and adds PARTIAL_ALGORITHM_LEVEL + contract + `cargo test` cross-references to §7.1/§7.2 falsification tables — so the three tables together form a true single source of truth for SHIP-TWO-001 algorithm-level ship-gate coverage. Changes: - §5.2 MODEL-2 table: 6 new annotations (3 DISCHARGED + 3 PARTIAL_ALGORITHM_LEVEL) * AC-SHIP2-001 FALSIFY-SHIP-011 DISCHARGED v2.21.0 (evidence 338c6eb) * AC-SHIP2-002 FALSIFY-SHIP-012 PARTIAL v2.21.0 (evidence 2e8b8b8) * AC-SHIP2-005 FALSIFY-SHIP-015 PARTIAL v2.21.0 (evidence bfb8831) * AC-SHIP2-009 FALSIFY-SHIP-019 PARTIAL v2.22.0 (evidence 846cc1d) * AC-SHIP2-011 FALSIFY-SHIP-021 DISCHARGED v2.20.0 (evidence 0b8ca8c) * AC-SHIP2-012 FALSIFY-SHIP-022 DISCHARGED v2.20.0 (evidence 8f0607d) - §4.2 MODEL-1 table drift fix: AC-SHIP1-007 v2.27.0 → v2.29.0 (correct SHIP-007 amendment ref) - §7.1 MODEL-1 Falsification: 6 new PARTIAL cross-references (SHIP-001/003/004/007/009/010) - §7.2 MODEL-2 Falsification: 12 new annotations (2 DISCHARGED + 10 PARTIAL) covering SHIP-011..022 - Version bump 2.38.0 → 2.39.0 + v2.39.0 changelog line appended to Date field Pure documentation hygiene; no Rust, no contracts, no tests, no meaning changes. Task #119. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(falsify-ship-023-024): MODEL-1 stability bundle — score-drift ≤1.2pp + adversarial-suite 0-tolerance PARTIAL discharges Bundled PARTIAL_ALGORITHM_LEVEL discharge of the last two MODEL-1 §7.1 stability tests: SHIP-023 (cross-run score drift ≤ 1.2 pp) + SHIP-024 (adversarial suite 0-tolerance across ≥ 50 prompts). Files: ship_023.rs + ship_024.rs + mod.rs + qwen2-e2e-verification-v1.yaml v1.7.0 + ship-two-models-spec.md v2.40.0. Tests: 1 unit + 1 doc-test each; pv validate → 0 errors; completes MODEL-1 §7.1 at 12/12 algorithmically bound; 25 PARTIAL + 3 DISCHARGED aggregate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(falsify-gputrain): Task #132 Phase 2 algorithm bundle — 5 FALSIFY-GPUTRAIN PARTIAL discharges Binds the remaining 5 GPU-training-backend invariants (003..007) at PARTIAL_ALGORITHM_LEVEL via pure Rust verdict functions, each accompanied by a 6-8 section mutation survey. GPUTRAIN-001 (grammar) and GPUTRAIN-002 (no-silent-fallback) are already bound in `crates/aprender-train/src/train/device.rs` with 17 passing tests. New modules (5): - crates/aprender-train/src/train/gputrain_003.rs — nvidia-smi residency proof: parse_nvidia_smi_compute_apps + verdict_from_residency bound to 5-s poll window + 1-MiB floor; 7-section survey (happy path / zero-mem / other-pid / empty / multi-process / malformed / u32::MAX-u64::MAX boundary / provenance pin) - crates/aprender-train/src/train/gputrain_004.rs — CPU-fallback- preserved dispatch invariant via verdict_from_dispatch_label over disjoint CPU/CUDA label sets; 7-section survey (cpu→cpu / cuda→cuda / cpu→cuda silent-promotion Fail / cuda→cpu task-#126 silent-fallback Fail / unknown / empty / case-sensitivity) - crates/aprender-train/src/train/gputrain_005.rs — 500-ms step-time ceiling on RTX 4090 370M via const fn verdict_from_step_time_ms; 7-section survey mirroring SHIP-007/020 shape (inclusive boundary / ULP-above / Pass band / Fail band / non-finite / negative / provenance pin) - crates/aprender-train/src/train/gputrain_006.rs — same-device seed reproducibility at 1e-5 tolerance via verdict_from_loss_delta + aggregate verdict_from_loss_trajectories; 7-section survey (boundary / trajectory single-step-fail / length mismatch / empty / non-finite / negative tolerance / provenance pin) - crates/aprender-train/src/train/gputrain_007.rs — apr --version --json schema + field-shape invariants via verdict_from_version_json_keys + verdict_from_version_json_fields; 7-section survey (all-keys-present / each-key-missing / 3 valid (feature, runtime) combos Pass / FM-GPUTRAIN-STALE-BUILD Fail / boundary 16 Pass / 17 Fail / forward-compat extras / provenance pin) Contract (contracts/entrenar/gpu-training-backend-v1.yaml): v1.0.0 PROPOSED → v1.1.0 PROPOSED (stays PROPOSED until Phase 3 live evidence). Each of FALSIFY-GPUTRAIN-003..007 now carries discharge_status: PARTIAL_ALGORITHM_LEVEL, evidence_discharged_by listing the Rust symbols, full_discharge_blocks_on describing the live lambda-labs harness, and 6 counter_example_classes. Spec (docs/specifications/aprender-train/ship-two-models-spec.md): v2.40.0 → v2.41.0; §14.5 table updated to mark the algorithm-level Phase-2 row DONE and leave the live-wire Phase-2 row pending; across both models: 30 PARTIAL + 3 DISCHARGED. Validation gates (all green): - cargo fmt --all --check — clean - cargo test -p aprender-train --lib gputrain_003 → 1/1 pass - cargo test -p aprender-train --lib gputrain_004 → 1/1 pass - cargo test -p aprender-train --lib gputrain_005 → 1/1 pass - cargo test -p aprender-train --lib gputrain_006 → 1/1 pass - cargo test -p aprender-train --lib gputrain_007 → 1/1 pass - cargo test -p aprender-train --lib train::device → 17/17 pass (regression clean) - pv validate contracts/entrenar/gpu-training-backend-v1.yaml → 0 errors, 0 warnings Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(gate-ship-001-006): §6 Compound Ship Gates bundle — 6 PARTIAL algorithm-level bindings Algorithmically binds the 6 bindable compound ship gates from §6 of SHIP-TWO-001. Gates 007-012 are CI/lint meta-policy (enforced by .clippy.toml, .pmat-gates.toml, and CI workflows) and are intentionally out of scope. New contract: contracts/compound-ship-gates-v1.yaml v1.0.0 PROPOSED (metadata.kind: pattern; 6 falsification_tests each PARTIAL_ALGORITHM_LEVEL). New Rust modules (6, all in crates/aprender-core/src/format/): * gate_ship_001.rs — MODEL-1 aggregate-AND over 10 AC-SHIP1-* bools (AC_GATE_SHIP_001_MODEL_1_AC_COUNT = 10; verdict_from_model1_ac_aggregate; 6-section survey incl. 2^10=1024 exhaustive bitmask proof) * gate_ship_002.rs — MODEL-2 aggregate-AND over 12 AC-SHIP2-* bools (AC_GATE_SHIP_002_MODEL_2_AC_COUNT = 12; verdict_from_model2_ac_aggregate; 6-section survey incl. 2^12=4096 exhaustive bitmask proof) * gate_ship_003.rs — apr qa Golden Output byte-identity across quantize round-trip (verdict_from_golden_output_diff; 6-section survey with conservative-Fail on empty input — SKIPPED Golden Output = no regression proof) * gate_ship_004.rs — HumanEval bitwise-identical determinism on two seed=0 runs (verdict_from_identical_humaneval_scores uses f32::to_bits() equality — STRICTLY STRICTER than FALSIFY-SHIP-023's 1.2 pp drift tolerance; 7-section survey) * gate_ship_005.rs — License metadata byte-equal + non-empty + ASCII-printable (AC_GATE_SHIP_005_REQUIRED_LICENSE_FIELD = "license"; verdict_from_license_metadata; 6-section survey incl. SPDX case-sensitivity guard) * gate_ship_006.rs — GGUF round-trip first-token probability delta (AC_GATE_SHIP_006_MAX_FIRST_TOKEN_DELTA = 1e-3; const fn verdict_from_first_token_probability_delta; symmetric via .abs(); 7-section survey) Test counts: cargo test -p aprender-core --lib format::gate_ship → 6/6 pass cargo test -p aprender-core --doc format::gate_ship → 6/6 pass Contract validation: pv validate contracts/compound-ship-gates-v1.yaml → 0 errors, 0 warnings Spec update: v2.41.0 → v2.42.0; §6 Compound Ship Gates table annotates GATE-SHIP-001..006 with (PARTIAL_ALGORITHM_LEVEL v2.42.0) markers. Across both models: 30 PARTIAL + 3 DISCHARGED → 36 PARTIAL + 3 DISCHARGED. Full discharge of each gate blocks on the live compound-gate harness (all 10 per-AC MODEL-1 checks for GATE-SHIP-001; all 12 per-AC MODEL-2 checks for GATE-SHIP-002; apr qa --golden-output on pre+post quantize checkpoints for GATE-SHIP-003; two consecutive apr eval --seed 0 runs for GATE-SHIP-004; apr inspect .metadata.license vs upstream HF card for GATE-SHIP-005; apr run --emit-logprobs vs llama-cli --logits-all for GATE-SHIP-006). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(gate-ship-007-012): §6 meta-policy gates bundle — 6 PARTIAL algorithm-level bindings (12/12 §6 total) Task #123: completes §6 Compound Ship Gates coverage by algorithmically binding the 6 merge-gate meta-policy rows (GATE-SHIP-007..012) at PARTIAL_ALGORITHM_LEVEL. Prior v2.42.0 bundle covered 001..006 (ship- blocking); this release adds the remaining 6 (merge-gate) via new sibling modules in `crates/aprender-core/src/format/gate_ship_0XX.rs`: - GATE-SHIP-007 (`.unwrap()` count): const fn `verdict_from_unwrap_count` bound to `AC_GATE_SHIP_007_MAX_TOLERATED_UNWRAP_COUNT = 0` via zero- tolerance threshold + 5-section survey. - GATE-SHIP-008 (contract density): `verdict_from_contract_density` bound to `AC_GATE_SHIP_008_MIN_CONTRACT_DENSITY_NEW_CODE = 1.0` via divide- by-zero-guarded ratio threshold + 7-section survey. - GATE-SHIP-009 (CI aggregate): const fn `verdict_from_ci_aggregate` over (fmt, clippy, test) + 8-section survey incl. exhaustive 2^3 = 8 bitmask proof + AND-symmetry pin. - GATE-SHIP-010 (advisory count): const fn `verdict_from_advisory_count` bound to `AC_GATE_SHIP_010_MAX_TOLERATED_ADVISORY_COUNT = 0` via zero- tolerance threshold + 5-section survey. - GATE-SHIP-011 (PMAT TDG): const fn `verdict_from_tdg_score` bound to `AC_GATE_SHIP_011_MIN_PMAT_TDG_SCORE = 90.0` via inclusive-floor threshold + 7-section survey. - GATE-SHIP-012 (line coverage): const fn `verdict_from_line_coverage_pct` bound to `AC_GATE_SHIP_012_MIN_LINE_COVERAGE_PCT = 95.0` via inclusive- floor threshold + 7-section survey. Contract: `contracts/compound-ship-gates-v1.yaml` v1.0.0 → v1.1.0 (stays PROPOSED) adds 6 new `falsification_tests` (FALSIFY-GATE-SHIP-007..012), 6 new equations, and 6 new proof_obligations. Spec: `docs/specifications/aprender-train/ship-two-models-spec.md` v2.42.0 → v2.43.0; §6 table rows 007..012 annotated PARTIAL_ALGORITHM_LEVEL v2.43.0. §6 Compound Ship Gates now 12/12 algorithmically bound. Full discharge still blocks on live CI tooling invocation (`cargo clippy -- -D warnings` / `pmat density` / branch-protection ci-gate / `cargo deny check advisories` / `pmat tdg` / `cargo llvm-cov report --json`). Validation: - cargo fmt --check — clean (aprender-core only) - cargo test -p aprender-core --lib format::gate_ship_0XX — 6/6 pass - cargo test -p aprender-core --doc format::gate_ship_0XX — 6/6 pass - pv validate contracts/compound-ship-gates-v1.yaml — 0 errors, 0 warnings Across both models: 42 PARTIAL + 3 DISCHARGED. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(readme): contract-backed claims + apr-cookbook link + structural cleanup The README had three independent crate counts (70/76/80), three contract counts (405/799/1095), two CLI command counts (58/79), and two test totals (25,391/28,700+) — stale numbers drifting because nothing bound them to live repo state. Users saw different facts depending on which paragraph they read. Fix: pin every quantitative claim to a falsifiable re-derivation and collapse the narrative to a single source of truth. CHANGES - `contracts/readme-claims-v1.yaml` (NEW, kind=pattern) — 4 equations + 4 falsification tests (FALSIFY-README-001..004) binding: * workspace crate count to `ls crates/ | wc -l` * provable contract count to `find contracts -name '*.yaml' | wc -l` * CLI command count to `apr --help` subcommand lines * apr-cookbook link presence `pv validate contracts/readme-claims-v1.yaml` → 0 errors / 0 warnings. - `scripts/check_readme_claims.sh` (NEW) — runs the 4 falsification tests; supports `--claim <name>` to target one, `--regen` to print live numbers for manual README edit. Uses `cargo run -p apr-cli --bin apr -- --help` for CLI count so the number tracks HEAD, not a stale `cargo install aprender` binary on PATH. 0 bashrs errors, 0 warnings. - `README.md` — rewrote. Numbers now live (80/1095/79) and every one carries a source-of-truth footnote. Added cookbook section linking paiml/apr-cookbook (341 recipes). Removed the stale three-table framework-comparison block that bled Candle/Ludwig/llama.cpp numbers across Inference/Batched/Training sections. Kept the essential perf/architecture/migration facts in single canonical form. - `.bashrsignore` — documented 4 false-positive suppressions (SC1020, SC1140, SC1009, SC2102) where bashrs mis-parses regex character classes inside grep/awk single-quoted patterns. TEST bash scripts/check_readme_claims.sh PASS FALSIFY-README-001 crate_count: 80 PASS FALSIFY-README-002 contract_count: 1095 PASS FALSIFY-README-003 cli_command_count: 79 PASS FALSIFY-README-004 cookbook_link: present Does not touch #1044 (SHIP-TWO-001 algorithmic coverage cascade); this PR stands off main independently. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ci): clippy + readme_contract regressions surfaced by #1044 CI Three issues caught on first CI run: 1. `clippy::result-unit-err` in crates/aprender-train/src/train/gputrain_003.rs: `parse_nvidia_smi_compute_apps` returns `Result<_, ()>` by design — the parser emits a single opaque "parse failed" signal and callers conservatively map it to Fail. A custom Error type would add noise with no information gain. Added `#[allow(clippy::result_unit_err)]` with the rationale already documented in the fn's doc comment. 2. `clippy::doc-overindented-list-items` in gputrain_007.rs:94: continuation line of a nested list item was indented 6 spaces instead of the canonical 5 (3 for `///` + 2 for list nesting). Stripped the extra space. 3. `crates/aprender-core/tests/readme_contract.rs` hardcoded stale benchmark numbers (`369.9` tok/s, `3,220` batched) and required a specific `## Framework Comparison` heading — both dropped in the README rewrite that ships in this same PR (contract-backed claims via `contracts/readme-claims-v1.yaml`). Relaxed the two tests to: - `test_readme_crate_count_matches_workspace` now counts `ls crates/` (matches the canonical scripts/check_readme_claims.sh source of truth, not `cargo metadata` which over-counts for crates with multiple [package] definitions). - `test_readme_has_framework_comparison` now asserts the reproducible POC repo citations (`candle-vs-apr`, `ground-truth-apr-ludwig`) without freezing the tok/s numbers themselves — those should re-derive from the POC repos at review time, not trap the README on stale values. All 10 tests in readme_contract pass locally; clippy clean on aprender-train. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

This was referenced Apr 22, 2026

falsify(ship): SHIP-018 PARTIAL — humaneval pass@1 ≥30.0% threshold fn #1006

Closed

feat(falsify-ship-007): MODEL-1 apr bench decode ≥30 tok/s PARTIAL discharge #1014

Closed

noahgift mentioned this pull request Apr 23, 2026

feat(falsify-ship-020): MODEL-2 AC-SHIP2-010 PARTIAL discharge (restacked) #1033

Closed

5 tasks

noahgift closed this Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ship-two-001): FALSIFY-SHIP-020 algorithm-level PARTIAL discharge (5th PARTIAL)#1005

feat(ship-two-001): FALSIFY-SHIP-020 algorithm-level PARTIAL discharge (5th PARTIAL)#1005
noahgift wants to merge 1 commit into
mainfrom
feat/falsify-ship-020-partial-discharge

noahgift commented Apr 22, 2026

Uh oh!

noahgift commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 22, 2026

Summary

Status lift

Pattern lesson

Full discharge blocks on

Scope note

Test plan

Uh oh!

noahgift commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant