fix(apr-cpu-vs-gpu-output-parity-v1): make CUDA fallback decision visible without --verbose (v1.0→v1.1, PROPOSED→ACTIVE) by noahgift · Pull Request #1428 · paiml/aprender

noahgift · 2026-05-03T14:35:00Z

Summary

Empirical correction: parity_gate IS already wired on the .apr GPU path; the user-visible gap was that the rejection was verbose-only so default apr run showed silent gibberish from the wgpu fallback.
One-line fix in gguf_gpu_generate.rs:487-494 makes the fallback decision unconditional, tagged with contract ID [apr-cpu-vs-gpu-output-parity-v1].
Contract apr-cpu-vs-gpu-output-parity-v1 bumped v1.0.0 PROPOSED → v1.1.0 ACTIVE with corrected algorithm_evidence and a flagged follow-up (FALSIFY-CPU-GPU-005 — add wgpu parity gate).
Evidence: evidence/ship-007-layer-0-oracle-bisection-2026-05-03/findings-v6-parity-gate-fires-but-fallback-is-silent.md walks the live trace + Five Whys.

Why this PR (Five Whys, abridged)

Why did apr run ship gibberish silently on canonical 7B? CUDA parity_gate fails (ILLEGAL_ADDRESS in gate's GPU forward) → returns Err → swallowed by if verbose → falls to wgpu → wgpu also produces gibberish.
Why was the verbose gate there? The pattern was inherited from non-fatal load failures; for backend-fallback decisions it hides Toyota Way jidoka information.
Why didn't tests catch this? apr parity --assert exits non-zero, but apr run has no equivalent stderr-tag assertion.

Test plan

cargo check -p aprender-serve --features cuda — clean
pv validate contracts/apr-cpu-vs-gpu-output-parity-v1.yaml — 0 errors
Live smoke after release rebuild: apr run <canonical-7b-teacher.apr> --prompt 'What is 2+2?' --max-tokens 4 --temperature 0.0 2>&1 | grep '\[apr-cpu-vs-gpu-output-parity-v1\] CUDA path rejected' — should hit
Live smoke regression: apr run --no-gpu <model> still produces correct '2 + 2 equals 4.' (workaround path unchanged)

Out of scope (follow-ups)

FALSIFY-CPU-GPU-005: wgpu path parity gate so wgpu-gibberish is caught too
SHIP-007 root-cause GPU kernel audit (independent track per memory feedback_model_1_ships_gpu_only)

🤖 Generated with Claude Code

…ible without --verbose Empirical correction of the parity contract's PARTIAL_ALGORITHM_LEVEL claim. Live `apr -v run` on the canonical 7B teacher today shows the parity_gate IS already wired into the .apr GPU init path (load_apr_cuda_model:487 → OwnedQuantizedModelCuda::with_max_seq_len → preload_and_verify → parity_gate). The contract's v1.0.0 narrative — "no gate runs for the trueno-graph .apr load path" — was wrong. The actual user-visible gap was that the gate's failure (cosine < 0.99 or, on the canonical teacher, ILLEGAL_ADDRESS during the gate's own GPU forward) was logged behind `if verbose`, so default-mode `apr run` swallowed the rejection silently and fell through to wgpu, which itself produces gibberish on this build. Net user experience: garbage output with zero stderr signal that GPU had been rejected — a Toyota Way / jidoka regression. This PR converts the verbose-only fallback log to one unconditional eprintln tagged with the contract ID. Default-mode `apr run` on a broken GPU build will now always surface: [apr-cpu-vs-gpu-output-parity-v1] CUDA path rejected, attempting fallback: ... Five Whys + the empirical reproducer trace are documented in the new evidence file. Contract bumped v1.0.0 PROPOSED → v1.1.0 ACTIVE with corrected algorithm_evidence and a flagged follow-up (FALSIFY-CPU-GPU-005) to add a parity gate to the wgpu path so wgpu-gibberish is also caught. Files: - crates/aprender-serve/src/infer/gguf_gpu_generate.rs:487-494 — drop `if verbose` - contracts/apr-cpu-vs-gpu-output-parity-v1.yaml — v1.0.0 → v1.1.0, ACTIVE - evidence/ship-007-.../findings-v6-parity-gate-fires-but-fallback-is-silent.md Verification: - cargo check -p aprender-serve --features cuda → clean - pv validate contracts/apr-cpu-vs-gpu-output-parity-v1.yaml → 0 errors Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…back log tag (#1429) Promotes the FALSIFY-CPU-GPU-003 jidoka eprintln tag to a `pub(crate) const` and adds a unit test asserting the prefix shape. Locks in PR #1428 against two regression classes: 1. Renaming the contract tag without bumping `apr-cpu-vs-gpu-output-parity-v1` in lockstep 2. Re-wrapping the eprintln in `if verbose { ... }` (which would re-introduce the silent-gibberish behaviour v6 fixed). Five Whys: 1. Why this test? Because the v1.1.0 contract requires "stderr grep tag visibility" but a future refactor could quietly delete or rename the tag. 2. Why a string-literal const + assert vs a full integration test? The full test would need a real CUDA GPU + a model that fails parity, which can't run in CI deterministically. A const-shape test catches the regression class at compile/test time without GPU. 3. Why two assertions (starts_with + contains)? `starts_with` locks the contract ID prefix for greppability; `contains("CUDA path rejected")` locks the human-readable backend name so users still understand the message even if the contract ID changes between major versions. Verified locally: - cargo build -p aprender-serve --features cuda --release → clean - cargo test -p aprender-serve --features cuda --lib cuda_fallback_log_prefix_is_contract_tagged → 1 passed Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…CPU-GPU-005 wgpu visibility + parity-gate entry (#1430) Closes the second half of the silent-fallback loophole. After PR #1428 made CUDA rejection visible (FALSIFY-CPU-GPU-003), the wgpu fallback path still ships gibberish silently for the canonical 7B teacher because (a) its init and "Backend: wgpu (Vulkan)" logs are verbose-gated and (b) it has no parity_gate analog to CUDA's. This PR lands the (a) visibility fix immediately and binds the (b) parity- gate at PARTIAL_ALGORITHM_LEVEL pending a follow-up implementation (~100-150 LOC, requires extracting the per-token wgpu decode loop body into a callable single-step function). Five Whys: 1. Why ship visibility before the parity gate? Visibility is one-line, low- risk, and immediately useful — users now see "Backend: wgpu (Vulkan)" on stderr without --verbose, so they know which backend is serving their tokens after CUDA falls through. 2. Why not full gate now? wgpu's existing API doesn't expose a single-step forward; adding one means refactoring the autoregressive loop body. Doable but bigger PR — keep this one bounded. 3. Why bump v1.1.0 → v1.2.0 not v2.0.0? FALSIFY-CPU-GPU-005 is additive; no existing falsifier semantics changed. Minor bump per semver. Code: gguf_gpu_generate.rs:23-32 (try_wgpu_generate) and 311-326 (try_apr_wgpu_inference) drop `if verbose { ... }` from the wgpu init/ Backend log lines. Verification: - pv validate contracts/apr-cpu-vs-gpu-output-parity-v1.yaml → 0 errors - cargo build -p aprender-serve --features cuda --release → clean - cargo test -p aprender-serve --features cuda --lib cuda_fallback_log_prefix_is_contract_tagged → 1 passed (existing drift-prevention from PR #1429 still green) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… drift-prevention tests (#1435) Closes the contract drift between v1.2.0's prediction and the actual code: contract `apr-cpu-vs-gpu-output-parity-v1` v1.2.0 (PR #1430) said the wgpu rejection log should emit `[apr-cpu-vs-gpu-output-parity-v1] wgpu path rejected, attempting fallback: ...` symmetric to the CUDA tag, but #1430 only made the existing `[GH-559]`/`Backend:` logs unconditional — the contract-tagged wgpu rejection log itself was missing. Mirror of the FALSIFY-CPU-GPU-003 chain (#1428 visibility + #1429 drift test) for FALSIFY-CPU-GPU-005: - Adds `pub(crate) const WGPU_FALLBACK_LOG_PREFIX = "[apr-cpu-vs-gpu-output-parity-v1] wgpu path rejected"` - Updates `try_apr_wgpu_inference` to emit it on `GpuDevice::new()` failure (alongside the existing `[GH-559]` runbook tag — both, not either) - 3 new unit tests: * `wgpu_fallback_log_prefix_is_contract_tagged` (symmetric to the CUDA test) * `cuda_and_wgpu_fallback_log_prefixes_share_contract_tag` (symmetry guard: both prefixes must start with the same contract ID and end with "path rejected" so grep recipes work uniformly across backends) Five Whys: 1. Why was this gap in v1.2.0? PR #1430 conflated "make existing wgpu log visible" (done) with "add contract-tagged rejection log" (deferred). The contract's prediction text wrote the second; the code shipped only the first. 2. Why catch it now? `--features hub` build healthy across PRs #1432-#1434 means the test surface is reliable; this is the natural follow-up. 3. Why the symmetry test? `cuda_and_wgpu_fallback_log_prefixes_share_contract_tag` locks in that BOTH backends use the same `[CONTRACT_ID] <backend> path rejected` shape. Without it a future PR could drift one but not the other and grep recipes would silently skip backends. 4. Why keep `[GH-559]` alongside the new contract tag? Runbook continuity — humans tracking that issue tag in logs over time shouldn't lose it. 5. Why no contract version bump? v1.2.0 already specifies this tag in FALSIFY-CPU-GPU-005's prediction; this PR closes the implementation gap. Bumping again would imply a contract semantic change, which isn't happening — only the code catches up to the contract. Verified locally: - `cargo test -p aprender-serve --features cuda --lib --release fallback_log_prefix` → 3/3 pass (cuda + wgpu + symmetry) - `cargo fmt --all -- --check` → no diff in touched file Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift merged commit 94b46a7 into main May 3, 2026
11 checks passed

noahgift deleted the feat/falsify-cpu-gpu-003-parity-gate-apr-init branch May 3, 2026 15:01

noahgift mentioned this pull request May 3, 2026

test(apr-cpu-vs-gpu-output-parity-v1): drift-prevention for CUDA fallback log tag #1429

Merged

2 tasks

noahgift mentioned this pull request May 3, 2026

contract(apr-cpu-vs-gpu-output-parity-v1): v1.2.0 — FALSIFY-CPU-GPU-005 wgpu visibility + parity-gate entry #1430

Merged

4 tasks

noahgift mentioned this pull request May 3, 2026

spec(ship-two-models-spec): v2.86.0 — §41 records apr-cpu-vs-gpu-output-parity-v1 chain (PRs #1427-#1430) #1431

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(apr-cpu-vs-gpu-output-parity-v1): make CUDA fallback decision visible without --verbose (v1.0→v1.1, PROPOSED→ACTIVE)#1428

fix(apr-cpu-vs-gpu-output-parity-v1): make CUDA fallback decision visible without --verbose (v1.0→v1.1, PROPOSED→ACTIVE)#1428
noahgift merged 1 commit into
mainfrom
feat/falsify-cpu-gpu-003-parity-gate-apr-init

noahgift commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 3, 2026

Summary

Why this PR (Five Whys, abridged)

Test plan

Out of scope (follow-ups)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant