contract(apr-cli-trace-save-tensor-v1): v1.3.0 → v1.4.0 — FUNCTIONAL discharge for FALSIFY-009/010/011#1422
Merged
noahgift merged 1 commit intoMay 3, 2026
Conversation
…discharge for FALSIFY-009/010/011 End-to-end live smoke on canonical Qwen2.5-Coder-7B-Instruct-Q4K teacher (RTX 4090 lambda-labs, 2026-05-03) produced all 16 APRT stage files in a single forward pass via SHIP-007 PR-C-real step 3 (PRs #1416 + #1421): - 14 per-layer (layer-0/*): embedding, attn_norm, qkv_matmul, qkv_bias, attention, attn_out, post_attn_residual, ffn_norm, ffn_gate, ffn_up, ffn_silu, ffn_swigl, ffn_out, post_ffn_residual - 2 whole-model (root/*): final_norm, lm_head All 16 file sizes match `12 + 4 * dim_product` for their stage type (3584 hidden / 18944 intermediate / 4608 qkv / 152064 vocab). Three FALSIFY entries promoted PARTIAL_ALGORITHM_LEVEL → FUNCTIONAL: - FALSIFY-APR-TRACE-SAVE-009 (apr_diff_values_compat — APRT byte format) - FALSIFY-APR-TRACE-SAVE-010 (LmHead step-2 capture) - FALSIFY-APR-TRACE-SAVE-011 (CLI dispatch wire-up) `pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` returns 0 errors / 0 warnings. Five Whys 1. Why FUNCTIONAL not DISCHARGED? FUNCTIONAL means "behavior empirically verified in single live run". DISCHARGED requires either bytewise equivalence vs an oracle OR repeatable run-to-run cross-machine verification. SHIP-007 PR-C-real step 3 just ships the surface; the oracle comparison (APR vs HF Transformers reference) is the next leg. 2. Why bump on PR #1421 merge, not on a single follow-up commit? Each of FALSIFY-009/010/011 was already at PARTIAL with separate `_evidence` blocks; bumping all three together at FUNCTIONAL is the natural semver event. 3. Why `functional_evidence` block (alongside existing `algorithm_evidence`)? Drift-prevention: future readers need to see BOTH the algorithm-level tests that pin the impl AND the live byte-counts/file-counts that validate the impl runs end-to-end on the canonical teacher. 4. Why hand-cite the 16 stage names in the contract? They're the surface over which the next milestone (SHIP-007 layer-0 element-wise bisection vs HF reference) will diff — making them visible in the contract is the drift-prevention pin. 5. Why no v1.5.0 status: ACTIVE bump? The metadata `status: PROPOSED` tracks the document's lifecycle, not the falsifier maturity. Promoting to ACTIVE requires a separate decision after the spec audit (out of scope for this paperwork commit). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0183d98 to
7b2070b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
contracts/apr-cli-trace-save-tensor-v1.yamlv1.3.0 → v1.4.0apr_diff_values_compat)functional_evidenceblocks to each, citing the 2026-05-03 live smoke producing 16 APRT stage files on the canonical 7B teacherpv validatereturns 0 errors / 0 warningsWhy
SHIP-007 PR-C-real step 3 (PRs #1416 + #1421) lands per-layer SaveTensorPlan threading through
forward_traced. Live smoke on canonical Qwen2.5-Coder-7B-Instruct-Q4K produced all 16 APRT stages in a single forward pass with byte counts matching algebraic prediction. This is the empirical evidence that promotes the three falsifiers from algorithm-level (unit tests pinning the impl) to functional-level (impl observed running end-to-end on real model on real hardware).Five Whys
_evidenceblocks; promoting them together at FUNCTIONAL is the natural semver event tied to step 3.functional_evidencealongsidealgorithm_evidence? Drift-prevention: readers need BOTH the unit tests that pin the impl AND the live byte-counts that validate end-to-end.status: PROPOSEDtracks doc lifecycle, not falsifier maturity. ACTIVE promotion requires a separate spec audit.Test plan
pv validate contracts/apr-cli-trace-save-tensor-v1.yaml— 0 errors / 0 warnings🤖 Generated with Claude Code