test(apr-cpu-vs-gpu-output-parity-v1): drift-prevention for CUDA fallback log tag by noahgift · Pull Request #1429 · paiml/aprender

noahgift · 2026-05-03T15:06:43Z

Summary

Locks in PR fix(apr-cpu-vs-gpu-output-parity-v1): make CUDA fallback decision visible without --verbose (v1.0→v1.1, PROPOSED→ACTIVE) #1428's user-visible eprintln tag against two regression classes: contract-ID rename without bump, and re-wrapping in if verbose.
Promotes the literal to pub(crate) const CUDA_FALLBACK_LOG_PREFIX in gguf_gpu_generate.rs.
Adds unit test cuda_fallback_log_prefix_is_contract_tagged (no GPU required).

Why this PR

Per the v1.1.0 contract apr-cpu-vs-gpu-output-parity-v1: drift-prevention test should grep stderr for the contract tag. A full integration test would need a real CUDA GPU + a deliberately broken model — not viable in CI. A const-shape unit test catches the regression class at compile/test time.

Test plan

cargo build -p aprender-serve --features cuda --release — clean
cargo test -p aprender-serve --features cuda --lib cuda_fallback_log_prefix_is_contract_tagged — 1 passed

🤖 Generated with Claude Code

…back log tag Promotes the FALSIFY-CPU-GPU-003 jidoka eprintln tag to a `pub(crate) const` and adds a unit test asserting the prefix shape. Locks in PR #1428 against two regression classes: 1. Renaming the contract tag without bumping `apr-cpu-vs-gpu-output-parity-v1` in lockstep 2. Re-wrapping the eprintln in `if verbose { ... }` (which would re-introduce the silent-gibberish behaviour v6 fixed). Five Whys: 1. Why this test? Because the v1.1.0 contract requires "stderr grep tag visibility" but a future refactor could quietly delete or rename the tag. 2. Why a string-literal const + assert vs a full integration test? The full test would need a real CUDA GPU + a model that fails parity, which can't run in CI deterministically. A const-shape test catches the regression class at compile/test time without GPU. 3. Why two assertions (starts_with + contains)? `starts_with` locks the contract ID prefix for greppability; `contains("CUDA path rejected")` locks the human-readable backend name so users still understand the message even if the contract ID changes between major versions. Verified locally: - cargo build -p aprender-serve --features cuda --release → clean - cargo test -p aprender-serve --features cuda --lib cuda_fallback_log_prefix_is_contract_tagged → 1 passed Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…CPU-GPU-005 wgpu visibility + parity-gate entry (#1430) Closes the second half of the silent-fallback loophole. After PR #1428 made CUDA rejection visible (FALSIFY-CPU-GPU-003), the wgpu fallback path still ships gibberish silently for the canonical 7B teacher because (a) its init and "Backend: wgpu (Vulkan)" logs are verbose-gated and (b) it has no parity_gate analog to CUDA's. This PR lands the (a) visibility fix immediately and binds the (b) parity- gate at PARTIAL_ALGORITHM_LEVEL pending a follow-up implementation (~100-150 LOC, requires extracting the per-token wgpu decode loop body into a callable single-step function). Five Whys: 1. Why ship visibility before the parity gate? Visibility is one-line, low- risk, and immediately useful — users now see "Backend: wgpu (Vulkan)" on stderr without --verbose, so they know which backend is serving their tokens after CUDA falls through. 2. Why not full gate now? wgpu's existing API doesn't expose a single-step forward; adding one means refactoring the autoregressive loop body. Doable but bigger PR — keep this one bounded. 3. Why bump v1.1.0 → v1.2.0 not v2.0.0? FALSIFY-CPU-GPU-005 is additive; no existing falsifier semantics changed. Minor bump per semver. Code: gguf_gpu_generate.rs:23-32 (try_wgpu_generate) and 311-326 (try_apr_wgpu_inference) drop `if verbose { ... }` from the wgpu init/ Backend log lines. Verification: - pv validate contracts/apr-cpu-vs-gpu-output-parity-v1.yaml → 0 errors - cargo build -p aprender-serve --features cuda --release → clean - cargo test -p aprender-serve --features cuda --lib cuda_fallback_log_prefix_is_contract_tagged → 1 passed (existing drift-prevention from PR #1429 still green) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… drift-prevention tests (#1435) Closes the contract drift between v1.2.0's prediction and the actual code: contract `apr-cpu-vs-gpu-output-parity-v1` v1.2.0 (PR #1430) said the wgpu rejection log should emit `[apr-cpu-vs-gpu-output-parity-v1] wgpu path rejected, attempting fallback: ...` symmetric to the CUDA tag, but #1430 only made the existing `[GH-559]`/`Backend:` logs unconditional — the contract-tagged wgpu rejection log itself was missing. Mirror of the FALSIFY-CPU-GPU-003 chain (#1428 visibility + #1429 drift test) for FALSIFY-CPU-GPU-005: - Adds `pub(crate) const WGPU_FALLBACK_LOG_PREFIX = "[apr-cpu-vs-gpu-output-parity-v1] wgpu path rejected"` - Updates `try_apr_wgpu_inference` to emit it on `GpuDevice::new()` failure (alongside the existing `[GH-559]` runbook tag — both, not either) - 3 new unit tests: * `wgpu_fallback_log_prefix_is_contract_tagged` (symmetric to the CUDA test) * `cuda_and_wgpu_fallback_log_prefixes_share_contract_tag` (symmetry guard: both prefixes must start with the same contract ID and end with "path rejected" so grep recipes work uniformly across backends) Five Whys: 1. Why was this gap in v1.2.0? PR #1430 conflated "make existing wgpu log visible" (done) with "add contract-tagged rejection log" (deferred). The contract's prediction text wrote the second; the code shipped only the first. 2. Why catch it now? `--features hub` build healthy across PRs #1432-#1434 means the test surface is reliable; this is the natural follow-up. 3. Why the symmetry test? `cuda_and_wgpu_fallback_log_prefixes_share_contract_tag` locks in that BOTH backends use the same `[CONTRACT_ID] <backend> path rejected` shape. Without it a future PR could drift one but not the other and grep recipes would silently skip backends. 4. Why keep `[GH-559]` alongside the new contract tag? Runbook continuity — humans tracking that issue tag in logs over time shouldn't lose it. 5. Why no contract version bump? v1.2.0 already specifies this tag in FALSIFY-CPU-GPU-005's prediction; this PR closes the implementation gap. Bumping again would imply a contract semantic change, which isn't happening — only the code catches up to the contract. Verified locally: - `cargo test -p aprender-serve --features cuda --lib --release fallback_log_prefix` → 3/3 pass (cuda + wgpu + symmetry) - `cargo fmt --all -- --check` → no diff in touched file Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift merged commit 6777383 into main May 3, 2026
11 checks passed

noahgift deleted the feat/falsify-cpu-gpu-003-drift-prevention-test branch May 3, 2026 15:32

noahgift mentioned this pull request May 3, 2026

contract(apr-cpu-vs-gpu-output-parity-v1): v1.2.0 — FALSIFY-CPU-GPU-005 wgpu visibility + parity-gate entry #1430

Merged

4 tasks

noahgift mentioned this pull request May 3, 2026

spec(ship-two-models-spec): v2.86.0 — §41 records apr-cpu-vs-gpu-output-parity-v1 chain (PRs #1427-#1430) #1431

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(apr-cpu-vs-gpu-output-parity-v1): drift-prevention for CUDA fallback log tag#1429

test(apr-cpu-vs-gpu-output-parity-v1): drift-prevention for CUDA fallback log tag#1429
noahgift merged 1 commit into
mainfrom
feat/falsify-cpu-gpu-003-drift-prevention-test

noahgift commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 3, 2026

Summary

Why this PR

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant