contract(gpu-training-backend-v1): GATE-GPUTRAIN-004 verdict pending → pass (v1.4 → v1.5)#1071
Merged
Merged
Conversation
…→ pass — spec §20 + #1059 evidence — v1.4.0 → v1.5.0 GATE-GPUTRAIN-004 (370M step-time budget < 500ms on RTX 4090) was marked `verdict: pending` despite its paired falsification test FALSIFY-GPUTRAIN-005 being DISCHARGED with median 101.30 ms (20.3% of budget) since 2026-04-24. This contract bump flips the gate to `verdict: pass` with a `verdict_basis` field citing both: 1. **FALSIFY-GPUTRAIN-005 evidence** (canonical config seq_len=2048 batch=1): median 101.30 ms across 25 steps on noah-Lambda-Vector RTX 4090 — `evidence/task-132/`. 2. **§20 evidence** (PR #1070, different config seq_len=512): median 264.74 ms across 100 steps — `evidence/task-132-residual-b/`. Both well under the 500ms ceiling. Two evidence files at different config bands demonstrate budget compliance is robust at this margin. Contract version v1.4.0 → v1.5.0 (additive metadata, no rule change). `pv validate`: 0 errors, 0 warnings. This is a contract-cosmetic flip — GATE-GPUTRAIN-004's underlying invariant has been satisfied since 2026-04-24; the `verdict: pending` field was only the gate's own pointer was missing. References: - spec §20 (PR #1070): live evidence capture 2026-04-26 - spec §19.4 Residual B: this is the contractual durable verdict - evidence/task-132/rtx4090-370m-step-budget-and-repro.json - evidence/task-132-residual-b/cuda-50step-2026-04-26.json Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5f962ce to
11f725c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pending→pass(370M step-time budget < 500ms on RTX 4090).gpu-training-backend-v1.yamlv1.4.0 → v1.5.0.Why
GATE-GPUTRAIN-004's paired falsification test FALSIFY-GPUTRAIN-005 has been DISCHARGED since 2026-04-24 with median 101.30 ms (20.3% of 500ms budget). The gate's own
verdict: pendingwas a contract-cosmetic gap — the underlying invariant was already satisfied.Evidence basis (now cited inline as
verdict_basis)evidence/task-132/evidence/task-132-residual-b/Two evidence files at different config bands show budget compliance is robust at this margin (well below 500ms in both).
Validation
pv validate contracts/entrenar/gpu-training-backend-v1.yaml: 0 errors, 0 warnings.Test plan
pv validatepassesStacks under
Coverage tally impact
GATE-GPUTRAIN-004 was already PARTIAL_ALGORITHM_LEVEL counted in the 33+12 tally. Promoting to verdict-pass keeps the overall PARTIAL count flat (already counted) and the DISCHARGED count unchanged (the falsifier was already DISCHARGED). The verdict flip is bookkeeping aligning gate state with falsifier state.
🤖 Generated with Claude Code