Skip to content

fix(apr-cpu-vs-gpu-output-parity-v1): make CUDA fallback decision visible without --verbose (v1.0→v1.1, PROPOSED→ACTIVE)#1428

Merged
noahgift merged 1 commit into
mainfrom
feat/falsify-cpu-gpu-003-parity-gate-apr-init
May 3, 2026
Merged

fix(apr-cpu-vs-gpu-output-parity-v1): make CUDA fallback decision visible without --verbose (v1.0→v1.1, PROPOSED→ACTIVE)#1428
noahgift merged 1 commit into
mainfrom
feat/falsify-cpu-gpu-003-parity-gate-apr-init

Conversation

@noahgift

@noahgift noahgift commented May 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Empirical correction: parity_gate IS already wired on the .apr GPU path; the user-visible gap was that the rejection was verbose-only so default apr run showed silent gibberish from the wgpu fallback.
  • One-line fix in gguf_gpu_generate.rs:487-494 makes the fallback decision unconditional, tagged with contract ID [apr-cpu-vs-gpu-output-parity-v1].
  • Contract apr-cpu-vs-gpu-output-parity-v1 bumped v1.0.0 PROPOSED → v1.1.0 ACTIVE with corrected algorithm_evidence and a flagged follow-up (FALSIFY-CPU-GPU-005 — add wgpu parity gate).
  • Evidence: evidence/ship-007-layer-0-oracle-bisection-2026-05-03/findings-v6-parity-gate-fires-but-fallback-is-silent.md walks the live trace + Five Whys.

Why this PR (Five Whys, abridged)

  1. Why did apr run ship gibberish silently on canonical 7B? CUDA parity_gate fails (ILLEGAL_ADDRESS in gate's GPU forward) → returns Err → swallowed by if verbose → falls to wgpu → wgpu also produces gibberish.
  2. Why was the verbose gate there? The pattern was inherited from non-fatal load failures; for backend-fallback decisions it hides Toyota Way jidoka information.
  3. Why didn't tests catch this? apr parity --assert exits non-zero, but apr run has no equivalent stderr-tag assertion.

Test plan

  • cargo check -p aprender-serve --features cuda — clean
  • pv validate contracts/apr-cpu-vs-gpu-output-parity-v1.yaml — 0 errors
  • Live smoke after release rebuild: apr run <canonical-7b-teacher.apr> --prompt 'What is 2+2?' --max-tokens 4 --temperature 0.0 2>&1 | grep '\[apr-cpu-vs-gpu-output-parity-v1\] CUDA path rejected' — should hit
  • Live smoke regression: apr run --no-gpu <model> still produces correct '2 + 2 equals 4.' (workaround path unchanged)

Out of scope (follow-ups)

  • FALSIFY-CPU-GPU-005: wgpu path parity gate so wgpu-gibberish is caught too
  • SHIP-007 root-cause GPU kernel audit (independent track per memory feedback_model_1_ships_gpu_only)

🤖 Generated with Claude Code

…ible without --verbose

Empirical correction of the parity contract's PARTIAL_ALGORITHM_LEVEL claim.
Live `apr -v run` on the canonical 7B teacher today shows the parity_gate IS
already wired into the .apr GPU init path (load_apr_cuda_model:487 →
OwnedQuantizedModelCuda::with_max_seq_len → preload_and_verify → parity_gate).
The contract's v1.0.0 narrative — "no gate runs for the trueno-graph .apr
load path" — was wrong.

The actual user-visible gap was that the gate's failure (cosine < 0.99 or, on
the canonical teacher, ILLEGAL_ADDRESS during the gate's own GPU forward) was
logged behind `if verbose`, so default-mode `apr run` swallowed the rejection
silently and fell through to wgpu, which itself produces gibberish on this
build. Net user experience: garbage output with zero stderr signal that GPU
had been rejected — a Toyota Way / jidoka regression.

This PR converts the verbose-only fallback log to one unconditional eprintln
tagged with the contract ID. Default-mode `apr run` on a broken GPU build will
now always surface:

  [apr-cpu-vs-gpu-output-parity-v1] CUDA path rejected, attempting fallback: ...

Five Whys + the empirical reproducer trace are documented in the new evidence
file. Contract bumped v1.0.0 PROPOSED → v1.1.0 ACTIVE with corrected
algorithm_evidence and a flagged follow-up (FALSIFY-CPU-GPU-005) to add a
parity gate to the wgpu path so wgpu-gibberish is also caught.

Files:
- crates/aprender-serve/src/infer/gguf_gpu_generate.rs:487-494 — drop `if verbose`
- contracts/apr-cpu-vs-gpu-output-parity-v1.yaml — v1.0.0 → v1.1.0, ACTIVE
- evidence/ship-007-.../findings-v6-parity-gate-fires-but-fallback-is-silent.md

Verification:
- cargo check -p aprender-serve --features cuda → clean
- pv validate contracts/apr-cpu-vs-gpu-output-parity-v1.yaml → 0 errors

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 94b46a7 into main May 3, 2026
11 checks passed
@noahgift noahgift deleted the feat/falsify-cpu-gpu-003-parity-gate-apr-init branch May 3, 2026 15:01
noahgift added a commit that referenced this pull request May 3, 2026
…back log tag (#1429)

Promotes the FALSIFY-CPU-GPU-003 jidoka eprintln tag to a `pub(crate) const`
and adds a unit test asserting the prefix shape. Locks in PR #1428 against
two regression classes:
  1. Renaming the contract tag without bumping
     `apr-cpu-vs-gpu-output-parity-v1` in lockstep
  2. Re-wrapping the eprintln in `if verbose { ... }` (which would
     re-introduce the silent-gibberish behaviour v6 fixed).

Five Whys:
1. Why this test? Because the v1.1.0 contract requires "stderr grep tag
   visibility" but a future refactor could quietly delete or rename the tag.
2. Why a string-literal const + assert vs a full integration test? The full
   test would need a real CUDA GPU + a model that fails parity, which can't
   run in CI deterministically. A const-shape test catches the regression
   class at compile/test time without GPU.
3. Why two assertions (starts_with + contains)? `starts_with` locks the
   contract ID prefix for greppability; `contains("CUDA path rejected")`
   locks the human-readable backend name so users still understand the
   message even if the contract ID changes between major versions.

Verified locally:
- cargo build -p aprender-serve --features cuda --release → clean
- cargo test -p aprender-serve --features cuda --lib
    cuda_fallback_log_prefix_is_contract_tagged → 1 passed

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 3, 2026
…CPU-GPU-005 wgpu visibility + parity-gate entry (#1430)

Closes the second half of the silent-fallback loophole. After PR #1428 made
CUDA rejection visible (FALSIFY-CPU-GPU-003), the wgpu fallback path still
ships gibberish silently for the canonical 7B teacher because (a) its init
and "Backend: wgpu (Vulkan)" logs are verbose-gated and (b) it has no
parity_gate analog to CUDA's.

This PR lands the (a) visibility fix immediately and binds the (b) parity-
gate at PARTIAL_ALGORITHM_LEVEL pending a follow-up implementation
(~100-150 LOC, requires extracting the per-token wgpu decode loop body
into a callable single-step function).

Five Whys:
1. Why ship visibility before the parity gate? Visibility is one-line, low-
   risk, and immediately useful — users now see "Backend: wgpu (Vulkan)"
   on stderr without --verbose, so they know which backend is serving
   their tokens after CUDA falls through.
2. Why not full gate now? wgpu's existing API doesn't expose a single-step
   forward; adding one means refactoring the autoregressive loop body.
   Doable but bigger PR — keep this one bounded.
3. Why bump v1.1.0 → v1.2.0 not v2.0.0? FALSIFY-CPU-GPU-005 is additive;
   no existing falsifier semantics changed. Minor bump per semver.

Code: gguf_gpu_generate.rs:23-32 (try_wgpu_generate) and 311-326
(try_apr_wgpu_inference) drop `if verbose { ... }` from the wgpu init/
Backend log lines.

Verification:
- pv validate contracts/apr-cpu-vs-gpu-output-parity-v1.yaml → 0 errors
- cargo build -p aprender-serve --features cuda --release → clean
- cargo test -p aprender-serve --features cuda --lib
    cuda_fallback_log_prefix_is_contract_tagged → 1 passed (existing
    drift-prevention from PR #1429 still green)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 3, 2026
… drift-prevention tests (#1435)

Closes the contract drift between v1.2.0's prediction and the actual code:
contract `apr-cpu-vs-gpu-output-parity-v1` v1.2.0 (PR #1430) said the wgpu
rejection log should emit `[apr-cpu-vs-gpu-output-parity-v1] wgpu path
rejected, attempting fallback: ...` symmetric to the CUDA tag, but #1430
only made the existing `[GH-559]`/`Backend:` logs unconditional — the
contract-tagged wgpu rejection log itself was missing.

Mirror of the FALSIFY-CPU-GPU-003 chain (#1428 visibility + #1429 drift
test) for FALSIFY-CPU-GPU-005:
- Adds `pub(crate) const WGPU_FALLBACK_LOG_PREFIX = "[apr-cpu-vs-gpu-output-parity-v1] wgpu path rejected"`
- Updates `try_apr_wgpu_inference` to emit it on `GpuDevice::new()` failure
  (alongside the existing `[GH-559]` runbook tag — both, not either)
- 3 new unit tests:
  * `wgpu_fallback_log_prefix_is_contract_tagged` (symmetric to the CUDA test)
  * `cuda_and_wgpu_fallback_log_prefixes_share_contract_tag` (symmetry guard:
    both prefixes must start with the same contract ID and end with
    "path rejected" so grep recipes work uniformly across backends)

Five Whys:
1. Why was this gap in v1.2.0? PR #1430 conflated "make existing wgpu log
   visible" (done) with "add contract-tagged rejection log" (deferred). The
   contract's prediction text wrote the second; the code shipped only the
   first.
2. Why catch it now? `--features hub` build healthy across PRs #1432-#1434
   means the test surface is reliable; this is the natural follow-up.
3. Why the symmetry test? `cuda_and_wgpu_fallback_log_prefixes_share_contract_tag`
   locks in that BOTH backends use the same `[CONTRACT_ID] <backend> path
   rejected` shape. Without it a future PR could drift one but not the
   other and grep recipes would silently skip backends.
4. Why keep `[GH-559]` alongside the new contract tag? Runbook continuity
   — humans tracking that issue tag in logs over time shouldn't lose it.
5. Why no contract version bump? v1.2.0 already specifies this tag in
   FALSIFY-CPU-GPU-005's prediction; this PR closes the implementation
   gap. Bumping again would imply a contract semantic change, which
   isn't happening — only the code catches up to the contract.

Verified locally:
- `cargo test -p aprender-serve --features cuda --lib --release fallback_log_prefix`
    → 3/3 pass (cuda + wgpu + symmetry)
- `cargo fmt --all -- --check` → no diff in touched file

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant