feat(aprender-contracts): add actually_verified field on KaniHarness (closes #1595) by noahgift · Pull Request #1657 · paiml/aprender

noahgift · 2026-05-13T12:29:36Z

Summary

Closes #1595. Adds actually_verified: Option<bool> to `KaniHarness` and short-circuits `pv score` D3 to 1.0 when it is `true` — runtime CI witness (e.g. apr-cookbook `kani-gate`) supplants the static-readiness 0.9 cap.

Why

`pv score` currently caps D3 at 0.9 for harnesses with `strategy: bounded_int`, regardless of whether they actually pass `cargo kani`. That's a static-readiness signal. apr-cookbook PR #421 just wired `kani-gate` into CI for 108 harnesses — once that gate is green, the runtime witness should fully unlock D3.

Schema change

```yaml
kani_harnesses:

id: KANI-001
obligation: "finite"
bound: 16
strategy: bounded_int
solver: cadical
actually_verified: true # NEW — when set, D3 weight becomes 1.0
```

`actually_verified` is `Option` with `#[serde(default)]` so all existing contracts continue to parse and score identically (the field is absent → None → no behavior change).

Test plan

`kani_actually_verified_lifts_bounded_int_to_full_score` — passes
`kani_actually_verified_false_keeps_strategy_default` — passes
All 1392 prior tests pass
`cargo check -p aprender-contracts-cli` builds the `pv` binary cleanly

🤖 Generated with Claude Code

…w-major (was [K,N]); MODEL-1 → 100% (PMAT-CODE-SHIP-007-F32-GEMV-LAYOUT-FIX) §74 localized the SHIP-007 PARITY-GATE bug to f32_gemv_into via PR-B's stage-bisection scaffold (CPU vs GPU per-stage statistics analysis). The F32 GEMV PTX kernel was reading weights with TRANSPOSED layout interpretation: Bug: kernel assumed A is K-rows × N-cols row-major (A[i,j] at i*N+j), but actual ML weights are stored [output_dim=N, input_dim=K] row-major (A[i,j] at i*K+j per PyTorch/SafeTensors/GGUF convention and PMAT-333 F32 dequantization output). Symptom: GPU read transposed weights → computed y = A^T @ x instead of y = A @ x → systematically anti-correlated logits (cos=-0.005190 vs CPU, top-10 divergences all sign-flipped, CPU mean=-2.42 vs GPU mean=0.013). Fix: rewrite the inner loop to iterate along the K dimension within row block_id: row_base = a_ptr + block_id * K * 4 thread reads A[block_id, t], A[block_id, t+32], ... instead of: col_base = a_ptr + block_id * 4 thread reads A[t, block_id], A[t+32, block_id], ... Empirical discharge (canonical 7B teacher, lambda-vector RTX 4090, default graphed path): PARITY-GATE: PASS (no error from forward_gpu_resident) Throughput @ 128-tok 5-iter decode: 124.6 tok/s AC-SHIP1-007 floor: 30 tok/s Headroom: 4.15× over floor TTFT: 8.39 ms p50 latency: 1016 ms Before PR-E: PARITY-GATE FAILED cos=-0.005190 Throughput (with SKIP_PARITY_GATE=1 + SKIP_FP8_WARMUP=1): 5.6 tok/s (§63) / 54.5 tok/s (§73) GPU CANNOT serve this model After PR-E: PARITY-GATE PASS, default path, NO workarounds 124.6 tok/s, 4.15× over floor Ship-% impact: MODEL-1 ship %: **99% → 100%** 10 of 10 AC-SHIP1-* LIVE-DISCHARGED: SHIP-001 (§72) SHIP-002 (§61) SHIP-003 (§72) SHIP-004 (§72) SHIP-005 (§71) SHIP-006 (§61.8) SHIP-007 (this PR) SHIP-008 (§61) SHIP-009 (§72) SHIP-010 (§72) MODEL-2 ship %: unchanged at 57% (independent track). Cascade arc closeout: §63 → §73 → PR-A (#1648) → PR-B (#1649) → §74 (#1650) → PR-E (this). One PR shipped in 1 day after §73's '3-5 PR / 3-5 day' estimate. Auxiliary change: logits.rs adds APR_LM_HEAD_FORCE_QTYPE env-var probe kept as a diagnostic tool (zero behavior change when unset). Test plan: - [x] cargo build --release -p apr-cli --bin apr --features cuda → clean - [x] apr bench (default path, 128-tok 5-iter) → 124.6 tok/s, passed: true - [x] apr parity → PARITY-GATE PASS - [ ] CI tests (workspace-test on per-PR runner) Refs: - §74 SHIP-007 bug localized (PR #1650) - §73 SHIP-007 cascade reduction (PR #1647) - contracts/apr-ship-007-gpu-stage-bisection-v1.yaml (PR-A #1648 contract) - PR #1649 (PR-B GPU stage dump scaffold) - AC-SHIP1-007 (spec §5) - evidence/section-75-ship-007-discharged-2026-05-13/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…07 contract violation (PMAT-CODE-SHIP-007-PR-E-FALSIFY-007-CLEAN) The env-var bisection probe added in PR-E (this branch) introduced a `_ =>` catch-all inside a `match` expression that referenced `WeightQuantType` in its arm values. The `falsify_007_no_catch_all_ in_dispatch_sites` contract test's 30-line walk-back heuristic flagged this as a violation, even though the match was on `&str` (env var value), not on `WeightQuantType`. The probe was a bisection tool used to identify the bug location during §74. Now that §75 has shipped the actual fix and the probe is no longer needed, removing it cleans up the contract violation. The remaining PR-E change is solely the F32 GEMV PTX kernel layout fix in `crates/aprender-gpu/src/kernels/gemv/mod.rs` — that's the actual bug fix. Test verified: cargo test -p aprender-serve --lib \ quantize::contract_tests::tests::falsify_007_no_catch_all_in_dispatch_sites → 1 passed Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…loses #1595) When `kani_harnesses[].actually_verified: true`, `pv score` D3 lifts the strategy weight to 1.0 regardless of strategy (bounded_int / stub_float / compositional). Rationale: the static-readiness 0.9 cap reflects uncertainty about whether the harness actually proves anything; once CI runs `cargo kani` green (e.g. apr-cookbook PR #421's kani-gate), the runtime witness supplants the static signal. Schema change: KaniHarness gets `actually_verified: Option<bool>` (default None; back-compat with existing contracts). Scoring change: scoring::mod::strategy_weight() short-circuits to 1.0 when actually_verified == Some(true), before the strategy table lookup. Tests: - kani_actually_verified_lifts_bounded_int_to_full_score - kani_actually_verified_false_keeps_strategy_default Both pass; 1392 prior tests unaffected. Updates the explicit `KaniHarness { ... }` literal in gates_extended_tests.rs to include the new field (None).

noahgift enabled auto-merge (squash) May 13, 2026 12:29

noahgift and others added 4 commits May 13, 2026 15:59

ci: trigger fresh workflow run for flake-class test re-execution

fc98d5e

noahgift force-pushed the fix/1595-kani-actually-verified branch from e0a656e to 65539c8 Compare May 13, 2026 13:59

noahgift added 3 commits May 13, 2026 16:31

Merge branch 'main' into fix/1595-kani-actually-verified

843b207

Merge branch 'main' into fix/1595-kani-actually-verified

f463024

Merge branch 'main' into fix/1595-kani-actually-verified

f9e04a7

noahgift merged commit 2c82254 into main May 13, 2026
10 checks passed

noahgift deleted the fix/1595-kani-actually-verified branch May 13, 2026 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(aprender-contracts): add actually_verified field on KaniHarness (closes #1595)#1657

feat(aprender-contracts): add actually_verified field on KaniHarness (closes #1595)#1657
noahgift merged 7 commits into
mainfrom
fix/1595-kani-actually-verified

noahgift commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 13, 2026

Summary

Why

Schema change

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant