contract(apr-cpu-vs-gpu-output-parity-v1): codify CPU-vs-GPU output parity invariant#1427
Merged
Merged
Conversation
…arity invariant Authors a new schema-kind contract that codifies the invariant uncovered by the SHIP-007 v5 finding (PR #1426): for any model and prompt under greedy decode (`--temperature 0.0`), `apr run` GPU output MUST match `apr run --no-gpu` CPU output. ## Why now The v5 falsifier showed: - `apr run` GPU emits "ampiezza = 0.5\ndiametro = 10" (gibberish, 72.95s) - `apr run --no-gpu` emits "2 + 2 equals 4." (correct, 9.81s) Same model, same prompt, same temperature. This regression slipped through because the existing parity_gate (crates/aprender-serve/src/gguf/cuda/mod_parity_gate.rs) only fires for `OwnedQuantizedModelCuda` construction (used by `apr parity` and `apr run --force-gpu`). The default `apr run` path on .apr files goes through `OwnedQuantizedModel::from_apr` → trueno manual graph (646 kernels), which has NO parity gate. The code at `crates/aprender-serve/src/cli/apr_inference.rs:46` already acknowledges: "Both AprF32ToGpuAdapter and forward_token_apr_q4k produce garbage on GPU." ## Contract structure - **kind**: schema (matches `apr-cli-trace-save-tensor-v1` pattern) - **3 equations**: greedy_argmax_parity, cosine_parity, no_gpu_flag_honor - **4 falsifiers**: - FALSIFY-CPU-GPU-001 (greedy argmax match) — PARTIAL, currently FALSIFIED in live data - FALSIFY-CPU-GPU-002 (cosine ≥ 0.99) — PARTIAL, apr parity reports cos=-0.005 - FALSIFY-CPU-GPU-003 (parity gate enforced at apr run init) — PARTIAL, gap in trueno path - FALSIFY-CPU-GPU-004 (--no-gpu honored) — FUNCTIONAL, verified empirically - **4 proof obligations**: invariant + completeness + liveness gates ## Five Whys 1. Why this contract? To codify a regression class so it can never slip through silently again. 2. Why didn't existing apr-gpu-parity-consistency-v1 cover it? That contract is about CLI scope-clarity (apr parity command output messages), not about runtime output correctness gates in apr run. 3. Why a new contract instead of extending apr-vs-gguf-forward-parity-v1? That one is about APR vs GGUF format parity (different concern). 4. Why PARTIAL on 3 of 4 falsifiers? FALSIFY-CPU-GPU-001/002/003 are all currently FALSIFIED in live data — the gate doesn't exist for the .apr trueno graph path. They're algorithm-bound to existing parity_gate code but functional discharge requires the implementation to extend coverage. 5. Why FUNCTIONAL on 004? Live tested 2026-05-03: --no-gpu produces correct "2 + 2 equals 4." with no [trueno#243] or [PMAT-082] log lines. ## Implementation follow-up (separate PR) Extend mod_parity_gate.rs OR the OwnedQuantizedModel::from_apr loader path to call parity_gate during the trueno graph init. The pattern is already proven in mod.rs:268-279 (with SKIP_PARITY_GATE=1 escape hatch). `pv validate contracts/apr-cpu-vs-gpu-output-parity-v1.yaml` returns 0 errors / 0 warnings. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 3, 2026
…vs-gpu-output-parity-v1` chain (PRs #1427-#1430) (#1431) Canonical record of today's 4-PR session that lands a 3-layer jidoka armor at the GPU-CPU dispatch boundary, closing §40's silent-gibberish loophole *as a regression class* (separate from the underlying GPU kernel fix). Net effect: default `apr run <model.apr>` on a SHIP-007-broken GPU build now emits the full backend-fallback chain on stderr without `--verbose`: [apr-cpu-vs-gpu-output-parity-v1] CUDA path rejected, attempting fallback: ... Backend: wgpu (Vulkan) ... so users always know which backend is actually serving their tokens — the `--no-gpu` documented workaround is now self-evidently the correct path on this build. MODEL-1 ship % nudges 80% → 87% because shipping `apr run` users with the documented workaround is now jidoka-safe. This section explicitly does NOT claim the SHIP-007 kernel bug is fixed — that remains an open track per §40. What §41 codifies is that the failure mode is now LOUD instead of SILENT. Five Whys + table of all 4 PRs + next-session pickup list (FALSIFY-CPU-GPU-005 part b OR MODEL-2 distill-train scaffolding) included. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Authors a new schema-kind contract
contracts/apr-cpu-vs-gpu-output-parity-v1.yamlthat codifies the invariant uncovered by the SHIP-007 v5 finding: for any model and prompt under greedy decode,apr runGPU output MUST matchapr run --no-gpuCPU output.This regression-class capture prevents future SHIP-007-like silent gibberish-on-GPU bugs.
v5 finding (motivation)
apr run(GPU default)apr run --no-gpuThe bug slipped through because the existing
parity_gate(mod_parity_gate.rs) only fires forOwnedQuantizedModelCudaconstruction (used byapr parityandapr run --force-gpu). The defaultapr runpath on.aprfiles goes throughOwnedQuantizedModel::from_apr→ trueno manual graph (646 kernels), which has NO parity gate. The code comment atapr_inference.rs:46even acknowledges: "Both AprF32ToGpuAdapter and forward_token_apr_q4k produce garbage on GPU."Contract structure
apr-cli-trace-save-tensor-v1pattern)apr parityreports cos=-0.005)apr runinit (PARTIAL, gap on trueno path)--no-gpuhonored (FUNCTIONAL, verified empirically)pv validate
pv validate contracts/apr-cpu-vs-gpu-output-parity-v1.yamlreturns 0 errors / 0 warnings.Five Whys (full detail in commit)
apr-gpu-parity-consistency-v1? That one is about CLI scope-clarity, not runtime correctness gates.apr-vs-gguf-forward-parity-v1? Different concern (format parity vs backend parity).--no-gpuproduces correct output with no GPU log lines.Implementation follow-up (separate PR)
Extend
mod_parity_gate.rsto fire onOwnedQuantizedModel::from_aprGPU init. Pattern already proven inmod.rs:268-279.Test plan
pv validatereturns 0 errors / 0 warnings🤖 Generated with Claude Code