contract(apr-cpu-vs-gpu-output-parity-v1): codify CPU-vs-GPU output parity invariant by noahgift · Pull Request #1427 · paiml/aprender

noahgift · 2026-05-03T13:50:53Z

Summary

Authors a new schema-kind contract contracts/apr-cpu-vs-gpu-output-parity-v1.yaml that codifies the invariant uncovered by the SHIP-007 v5 finding: for any model and prompt under greedy decode, apr run GPU output MUST match apr run --no-gpu CPU output.

This regression-class capture prevents future SHIP-007-like silent gibberish-on-GPU bugs.

v5 finding (motivation)

Configuration	Output for "What is 2+2?"	Time
`apr run` (GPU default)	"ampiezza = 0.5\ndiametro = 10"	72.95s
`apr run --no-gpu`	"2 + 2 equals 4."	9.81s

The bug slipped through because the existing parity_gate (mod_parity_gate.rs) only fires for OwnedQuantizedModelCuda construction (used by apr parity and apr run --force-gpu). The default apr run path on .apr files goes through OwnedQuantizedModel::from_apr → trueno manual graph (646 kernels), which has NO parity gate. The code comment at apr_inference.rs:46 even acknowledges: "Both AprF32ToGpuAdapter and forward_token_apr_q4k produce garbage on GPU."

Contract structure

kind: schema (matches existing apr-cli-trace-save-tensor-v1 pattern)
3 equations: greedy_argmax_parity, cosine_parity, no_gpu_flag_honor
4 falsifiers:
- FALSIFY-CPU-GPU-001 — argmax match (PARTIAL_ALGORITHM_LEVEL, FALSIFIED in live data)
- FALSIFY-CPU-GPU-002 — cosine ≥ 0.99 (PARTIAL, apr parity reports cos=-0.005)
- FALSIFY-CPU-GPU-003 — gate enforced at apr run init (PARTIAL, gap on trueno path)
- FALSIFY-CPU-GPU-004 — --no-gpu honored (FUNCTIONAL, verified empirically)
4 proof obligations: invariant + completeness + liveness gates

pv validate

pv validate contracts/apr-cpu-vs-gpu-output-parity-v1.yaml returns 0 errors / 0 warnings.

Five Whys (full detail in commit)

Why this contract? Codify regression class.
Why not extend apr-gpu-parity-consistency-v1? That one is about CLI scope-clarity, not runtime correctness gates.
Why not extend apr-vs-gguf-forward-parity-v1? Different concern (format parity vs backend parity).
Why PARTIAL on 3 of 4? They're FALSIFIED in live data — the gate doesn't exist for the trueno path yet.
Why FUNCTIONAL on 004? Verified empirically — --no-gpu produces correct output with no GPU log lines.

Implementation follow-up (separate PR)

Extend mod_parity_gate.rs to fire on OwnedQuantizedModel::from_apr GPU init. Pattern already proven in mod.rs:268-279.

Test plan

pv validate returns 0 errors / 0 warnings
Single YAML contract addition (179 LOC)
References v5 evidence (PR evidence(ship-007): v5 — ROOT CAUSE CONFIRMED, GPU forward path is broken #1426 already merged)
CI green
Auto-merge

🤖 Generated with Claude Code

…arity invariant Authors a new schema-kind contract that codifies the invariant uncovered by the SHIP-007 v5 finding (PR #1426): for any model and prompt under greedy decode (`--temperature 0.0`), `apr run` GPU output MUST match `apr run --no-gpu` CPU output. ## Why now The v5 falsifier showed: - `apr run` GPU emits "ampiezza = 0.5\ndiametro = 10" (gibberish, 72.95s) - `apr run --no-gpu` emits "2 + 2 equals 4." (correct, 9.81s) Same model, same prompt, same temperature. This regression slipped through because the existing parity_gate (crates/aprender-serve/src/gguf/cuda/mod_parity_gate.rs) only fires for `OwnedQuantizedModelCuda` construction (used by `apr parity` and `apr run --force-gpu`). The default `apr run` path on .apr files goes through `OwnedQuantizedModel::from_apr` → trueno manual graph (646 kernels), which has NO parity gate. The code at `crates/aprender-serve/src/cli/apr_inference.rs:46` already acknowledges: "Both AprF32ToGpuAdapter and forward_token_apr_q4k produce garbage on GPU." ## Contract structure - **kind**: schema (matches `apr-cli-trace-save-tensor-v1` pattern) - **3 equations**: greedy_argmax_parity, cosine_parity, no_gpu_flag_honor - **4 falsifiers**: - FALSIFY-CPU-GPU-001 (greedy argmax match) — PARTIAL, currently FALSIFIED in live data - FALSIFY-CPU-GPU-002 (cosine ≥ 0.99) — PARTIAL, apr parity reports cos=-0.005 - FALSIFY-CPU-GPU-003 (parity gate enforced at apr run init) — PARTIAL, gap in trueno path - FALSIFY-CPU-GPU-004 (--no-gpu honored) — FUNCTIONAL, verified empirically - **4 proof obligations**: invariant + completeness + liveness gates ## Five Whys 1. Why this contract? To codify a regression class so it can never slip through silently again. 2. Why didn't existing apr-gpu-parity-consistency-v1 cover it? That contract is about CLI scope-clarity (apr parity command output messages), not about runtime output correctness gates in apr run. 3. Why a new contract instead of extending apr-vs-gguf-forward-parity-v1? That one is about APR vs GGUF format parity (different concern). 4. Why PARTIAL on 3 of 4 falsifiers? FALSIFY-CPU-GPU-001/002/003 are all currently FALSIFIED in live data — the gate doesn't exist for the .apr trueno graph path. They're algorithm-bound to existing parity_gate code but functional discharge requires the implementation to extend coverage. 5. Why FUNCTIONAL on 004? Live tested 2026-05-03: --no-gpu produces correct "2 + 2 equals 4." with no [trueno#243] or [PMAT-082] log lines. ## Implementation follow-up (separate PR) Extend mod_parity_gate.rs OR the OwnedQuantizedModel::from_apr loader path to call parity_gate during the trueno graph init. The pattern is already proven in mod.rs:268-279 (with SKIP_PARITY_GATE=1 escape hatch). `pv validate contracts/apr-cpu-vs-gpu-output-parity-v1.yaml` returns 0 errors / 0 warnings. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…vs-gpu-output-parity-v1` chain (PRs #1427-#1430) (#1431) Canonical record of today's 4-PR session that lands a 3-layer jidoka armor at the GPU-CPU dispatch boundary, closing §40's silent-gibberish loophole *as a regression class* (separate from the underlying GPU kernel fix). Net effect: default `apr run <model.apr>` on a SHIP-007-broken GPU build now emits the full backend-fallback chain on stderr without `--verbose`: [apr-cpu-vs-gpu-output-parity-v1] CUDA path rejected, attempting fallback: ... Backend: wgpu (Vulkan) ... so users always know which backend is actually serving their tokens — the `--no-gpu` documented workaround is now self-evidently the correct path on this build. MODEL-1 ship % nudges 80% → 87% because shipping `apr run` users with the documented workaround is now jidoka-safe. This section explicitly does NOT claim the SHIP-007 kernel bug is fixed — that remains an open track per §40. What §41 codifies is that the failure mode is now LOUD instead of SILENT. Five Whys + table of all 4 PRs + next-session pickup list (FALSIFY-CPU-GPU-005 part b OR MODEL-2 distill-train scaffolding) included. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 3, 2026 13:50

noahgift merged commit 040a7a1 into main May 3, 2026
11 checks passed

noahgift deleted the contract/apr-cpu-vs-gpu-output-parity-v1 branch May 3, 2026 14:13

noahgift mentioned this pull request May 4, 2026

docs: pre-v0.32.0 — fill [Unreleased] CHANGELOG + repair README drift gate #1448

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contract(apr-cpu-vs-gpu-output-parity-v1): codify CPU-vs-GPU output parity invariant#1427

contract(apr-cpu-vs-gpu-output-parity-v1): codify CPU-vs-GPU output parity invariant#1427
noahgift merged 1 commit into
mainfrom
contract/apr-cpu-vs-gpu-output-parity-v1

noahgift commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 3, 2026

Summary

v5 finding (motivation)

Contract structure

pv validate

Five Whys (full detail in commit)

Implementation follow-up (separate PR)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant