Skip to content

contract(apr-cli-trace-save-tensor-v1): v1.3.0 → v1.4.0 — FUNCTIONAL discharge for FALSIFY-009/010/011#1422

Merged
noahgift merged 1 commit into
mainfrom
chore/contract-trace-save-tensor-v1.4.0-functional-discharge
May 3, 2026
Merged

contract(apr-cli-trace-save-tensor-v1): v1.3.0 → v1.4.0 — FUNCTIONAL discharge for FALSIFY-009/010/011#1422
noahgift merged 1 commit into
mainfrom
chore/contract-trace-save-tensor-v1.4.0-functional-discharge

Conversation

@noahgift

@noahgift noahgift commented May 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Bumps contracts/apr-cli-trace-save-tensor-v1.yaml v1.3.0 → v1.4.0
  • Promotes 3 FALSIFY entries from PARTIAL_ALGORITHM_LEVEL → FUNCTIONAL:
    • FALSIFY-APR-TRACE-SAVE-009 (apr_diff_values_compat)
    • FALSIFY-APR-TRACE-SAVE-010 (LmHead step-2 capture)
    • FALSIFY-APR-TRACE-SAVE-011 (CLI dispatch wire-up)
  • Adds functional_evidence blocks to each, citing the 2026-05-03 live smoke producing 16 APRT stage files on the canonical 7B teacher
  • pv validate returns 0 errors / 0 warnings

Why

SHIP-007 PR-C-real step 3 (PRs #1416 + #1421) lands per-layer SaveTensorPlan threading through forward_traced. Live smoke on canonical Qwen2.5-Coder-7B-Instruct-Q4K produced all 16 APRT stages in a single forward pass with byte counts matching algebraic prediction. This is the empirical evidence that promotes the three falsifiers from algorithm-level (unit tests pinning the impl) to functional-level (impl observed running end-to-end on real model on real hardware).

Five Whys

  1. Why FUNCTIONAL not DISCHARGED? FUNCTIONAL = "behavior verified in single live run"; DISCHARGED requires oracle bytewise equivalence (HF Transformers reference comparison) which is the next milestone.
  2. Why bundle the bump? All three were at PARTIAL with separate _evidence blocks; promoting them together at FUNCTIONAL is the natural semver event tied to step 3.
  3. Why functional_evidence alongside algorithm_evidence? Drift-prevention: readers need BOTH the unit tests that pin the impl AND the live byte-counts that validate end-to-end.
  4. Why cite the 16 stage names verbatim? They're the surface over which the next milestone (layer-0 element-wise bisection vs HF) will diff.
  5. Why no v1.5.0 ACTIVE bump? status: PROPOSED tracks doc lifecycle, not falsifier maturity. ACTIVE promotion requires a separate spec audit.

Test plan

  • pv validate contracts/apr-cli-trace-save-tensor-v1.yaml — 0 errors / 0 warnings
  • Diff confined to single contract YAML (20 insertions, 10 deletions)
  • CI green
  • Auto-merge on green

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) May 3, 2026 11:44
…discharge for FALSIFY-009/010/011

End-to-end live smoke on canonical Qwen2.5-Coder-7B-Instruct-Q4K teacher
(RTX 4090 lambda-labs, 2026-05-03) produced all 16 APRT stage files in a
single forward pass via SHIP-007 PR-C-real step 3 (PRs #1416 + #1421):

- 14 per-layer (layer-0/*): embedding, attn_norm, qkv_matmul, qkv_bias,
  attention, attn_out, post_attn_residual, ffn_norm, ffn_gate, ffn_up,
  ffn_silu, ffn_swigl, ffn_out, post_ffn_residual
- 2 whole-model (root/*): final_norm, lm_head

All 16 file sizes match `12 + 4 * dim_product` for their stage type
(3584 hidden / 18944 intermediate / 4608 qkv / 152064 vocab).

Three FALSIFY entries promoted PARTIAL_ALGORITHM_LEVEL → FUNCTIONAL:
- FALSIFY-APR-TRACE-SAVE-009 (apr_diff_values_compat — APRT byte format)
- FALSIFY-APR-TRACE-SAVE-010 (LmHead step-2 capture)
- FALSIFY-APR-TRACE-SAVE-011 (CLI dispatch wire-up)

`pv validate contracts/apr-cli-trace-save-tensor-v1.yaml` returns
0 errors / 0 warnings.

Five Whys
1. Why FUNCTIONAL not DISCHARGED? FUNCTIONAL means "behavior empirically
   verified in single live run". DISCHARGED requires either bytewise
   equivalence vs an oracle OR repeatable run-to-run cross-machine
   verification. SHIP-007 PR-C-real step 3 just ships the surface; the
   oracle comparison (APR vs HF Transformers reference) is the next leg.
2. Why bump on PR #1421 merge, not on a single follow-up commit? Each of
   FALSIFY-009/010/011 was already at PARTIAL with separate `_evidence`
   blocks; bumping all three together at FUNCTIONAL is the natural
   semver event.
3. Why `functional_evidence` block (alongside existing `algorithm_evidence`)?
   Drift-prevention: future readers need to see BOTH the algorithm-level
   tests that pin the impl AND the live byte-counts/file-counts that
   validate the impl runs end-to-end on the canonical teacher.
4. Why hand-cite the 16 stage names in the contract? They're the surface
   over which the next milestone (SHIP-007 layer-0 element-wise bisection
   vs HF reference) will diff — making them visible in the contract is
   the drift-prevention pin.
5. Why no v1.5.0 status: ACTIVE bump? The metadata `status: PROPOSED`
   tracks the document's lifecycle, not the falsifier maturity. Promoting
   to ACTIVE requires a separate decision after the spec audit (out of
   scope for this paperwork commit).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the chore/contract-trace-save-tensor-v1.4.0-functional-discharge branch from 0183d98 to 7b2070b Compare May 3, 2026 12:06
@noahgift noahgift merged commit f4b7f72 into main May 3, 2026
10 checks passed
@noahgift noahgift deleted the chore/contract-trace-save-tensor-v1.4.0-functional-discharge branch May 3, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant