test(apr-cli): FALSIFY-ATTN-SUB-003 — apr diff --values per-stage-agnostic for attn_scores + attn_softmax#1456
Merged
Merged
Conversation
…ostic for attn_scores + attn_softmax
Per `docs/specifications/aprender-train/ship-two-models-spec.md` §47.6 step 5
of the SHIP-007 layer-0 attention bisection cascade. Spec said this was
"likely 1 test + 0 LOC change if the loader is per-stage-agnostic" — and
empirical inspection of `crates/apr-cli/src/commands/diff_05_aprt_stage.rs`
confirmed it: stage names are encoded only in OUTPUT FILENAMES, never in
the APRT binary content (`b"APRT" + layer_u32_le + dim_u32_le + f32_le_body`).
This drift-prevention test PR locks that contract.
## Why
`contracts/trace-attn-sub-stages-v1.yaml` v1.1.0 SUB-003 invariant:
> Existing APRT recognition path generalizes to the 2 new stage IDs
> (attn_scores + attn_softmax) without per-stage hardcoding.
If anyone ever adds a per-stage `match stage { AttnScores => …, AttnSoftmax
=> …, _ => existing }` inside `is_aprt_stage_file`, `compute_aprt_stage_stats`,
or `run_aprt_stage_diff`, this test fails — forcing a contract bump in
`apr-cli-trace-save-tensor-v1.yaml` AND `trace-attn-sub-stages-v1.yaml`.
## What landed
Two new unit tests in `crates/apr-cli/src/commands/diff_05_aprt_stage.rs`:
| Test | What it pins |
|------|--------------|
| `falsify_attn_sub_003_new_stages_per_stage_agnostic` | Magic-byte detection + cosine + RMS + e2e diff succeed for filenames `layer_0_attn_scores.aprt` + `layer_0_attn_softmax.aprt` at realistic shape `28*7*7=1372` (Qwen2.5-7B BOS layer-0). |
| `falsify_attn_sub_003_cosine_detects_softmax_divergence` | Mixed-perturbation cosine drops below 0.999 floor → bisection chain will reliably detect divergence at the softmax stage during FALSIFY-ATTN-SUB-004 LIVE on RTX 4090. |
## How to apply
`cargo test -p apr-cli --lib aprt_stage_diff_tests` shows 13/13 PASS
(11 prior + 2 new). No production-code change.
## Five whys (why now, why this scope)
1. **Why a test PR rather than the live RTX 4090 bisection?**
§47.6 ranked-leverage list orders the cascade: step 5 (this PR) ships
before step 6 (HF FP16 oracle extension) before step 7 (live
bisection). Skipping the drift-prevention layer means a future
regression at the `is_aprt_stage_file` level would only be caught
during the live run — wasteful.
2. **Why 2 tests instead of 1?**
The spec said "1 test + 0 LOC". The 2nd test
(`falsify_attn_sub_003_cosine_detects_softmax_divergence`) is bonus
coverage for FALSIFY-ATTN-SUB-004 — it pins that the cosine metric
is sensitive enough to detect mixed-perturbation divergence, which
is the load-bearing predicate for the live bisection step.
3. **Why update `crates/apr-cli/src/commands/diff_05_aprt_stage.rs` rather than the contract YAML?**
The contract YAML lives in PR #1450 (still open). Modifying it from
a separate branch would conflict on merge. The test exercises the
real wired functions (`is_aprt_stage_file`, `compute_aprt_stage_stats`,
`run_aprt_stage_diff`), which is the algorithm-level evidence the
contract requires anyway.
4. **Why not bump SUB-003 status from PARTIAL_ALGORITHM_LEVEL to FUNCTIONAL?**
FUNCTIONAL discharge requires LIVE evidence — running `apr diff` on
actual saved tensors from a real model forward (HF FP16 oracle +
APR teacher). That comes in cascade step 7. This PR is algorithm-
level only (drift-prevention).
5. **Why not amend PR #1450 with this test?**
Single-piece flow. PR #1450 is auto-merge armed, BEHIND main, all
CI green. Pushing more commits restarts CI ~10 min. This test PR
is independent enough to land separately without growing the
merge train.
## Net effects
- **Coverage**: 13 → 13 + 2 = 15 tests in `aprt_stage_diff_tests` mod.
- **Falsifier**: FALSIFY-ATTN-SUB-003 algorithm-level evidence pinned via test.
- **MODEL-1 ship %**: unchanged at 91% (scaffold; ship % moves at SUB-004 LIVE).
- **MODEL-2 ship %**: unchanged at 57%.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6 tasks
4 tasks
noahgift
added a commit
that referenced
this pull request
May 4, 2026
…tion cascade ALGORITHM-LEVEL COMPLETE (#1458) After §47 recorded the cascade-started milestone (PRs #1450 + #1451 + #1452 scaffolding), the same-day continuation cycle closed §47.1 cascade roadmap steps 4-6 at the algorithm level via PRs #1455, #1456, #1457. ## What landed (§47.1 cascade roadmap) | Step | PR | Discharge | |------|----|-----------| | 4 | #1455 | FALSIFY-ATTN-SUB-002 PARTIAL_ALGORITHM_LEVEL — wires `QPostRope`+`KPostRope`+`AttnScores`+`AttnSoftmax` in `forward_traced_with_plan`; closes §47.4 parent-contract drift as side effect | | 5 | #1456 | FALSIFY-ATTN-SUB-003 algorithm-level pinned via 2 drift-prevention tests; 0 LOC production change (loader is genuinely per-stage-agnostic, as spec predicted) | | 6 | #1457 | FALSIFY-ATTN-SUB-004 BLOCKER_FIXTURE_ABSENT → PARTIAL_ALGORITHM_LEVEL on merge — extends `scripts/generate_qwen25_coder_fp16_stages.py` with `--with-attn-substages` (default ON) installing per-instance `Qwen2Attention.forward` monkeypatch under `attn_implementation="eager"` | ## Toyota Way correction during research (PR #1457) The pre-impl research note estimated **7 missing stages, ~140 LOC**. Live source inspection during PR #1457 found **3 already captured** via existing forward hooks (`make_qkv_hook` derives qkv_matmul/qkv_bias from q_proj/k_proj/v_proj outputs via bias subtraction; `hook_o_proj_pre` captures `attention` as input to o_proj). Net: **4 stages, ~80 LOC monkeypatch**. Per `feedback_no_guessing.md`. Cost-of-defect paid at the implementation layer (cheapest place once the research note had been authored from outdated docstring lines). ## Steps 7-8 require operator action | Step | Blocker | Workaround | |------|---------|-----------| | 7 LIVE | (a) canonical `apr` binary built pre-#1451 — rejects `attn_scores` stage. (b) PyTorch/CUDA driver mismatch on host. | (a) `cargo build --release --features cuda --bin apr`. (b) operator updates driver OR `--device cpu` (multi-min). | | 8 fix | Gated on step 7 bisection finding. | n/a — discovery-driven scope. | ## Net effects - Spec v2.92.0 → **v2.93.0**. - §47.1 cascade roadmap: **6/8 steps algorithm-level COMPLETE**; steps 7-8 LIVE/operator-gated. - Coverage tally: 20+32 → **20+36** (+4 PARTIAL_ALGORITHM_LEVEL from `trace-attn-sub-stages-v1` v1.1.0 falsifiers landing on main when #1450 merged: SUB-001/002/003/005). SUB-004 stays BLOCKER until #1457 ships. - **MODEL-1 ship %**: unchanged at **91%** (cascade is scaffold; ship % moves at SUB-004 LIVE DISCHARGE in step 7). - **MODEL-2 ship %**: unchanged at **57%**. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
noahgift
added a commit
that referenced
this pull request
May 4, 2026
…PARTIAL_ALGORITHM_LEVEL — fixture is now on main Bundles the SUB-004 status promotion into the v1.2.0 PR alongside the SUB-003 function-name drift fix already authored. Both changes ship as one v1.2.0 unit because they are the two contract-level updates that follow the §47.1 cascade roadmap closing at the algorithm level. ## Why now PR #1457 (HF FP16 oracle script extension) merged on main. The fixture previously claimed "absent" is now generated by: ``` uv run --with torch --with transformers --with safetensors --with accelerate \ scripts/generate_qwen25_coder_fp16_stages.py \ --output /tmp/qwen25-coder-7b-hf-fp16-stages \ --layers 0 --with-attn-substages ``` Per `feedback_no_guessing.md`: SUB-004's status is now provable from main. Promote. ## What landed Updated SUB-004 algorithm_evidence: - `status`: BLOCKER_FIXTURE_ABSENT → PARTIAL_ALGORITHM_LEVEL - `file_paths`: added the actual script + APR-side wire files - `function_names`: replaced placeholder `run_hf_fp16_reference` with the 6 real symbols (`install_attn_substages_patch`, `traced_forward`, plus 4 SaveTensorStage variants) - `invariants_enforced`: 1 line → 4 lines explicitly naming what each PR pinned - `notes`: documents the FUNCTIONAL discharge prerequisites (binary rebuild + driver/CPU) Updated metadata.description v1.2.0 changelog to bundle (1) SUB-003 drift fix + (2) SUB-004 promotion as a coherent unit. ## Five whys 1. **Why combine SUB-003 drift fix + SUB-004 promotion in v1.2.0?** Both contract-level changes follow from the same upstream cause (PRs #1455 + #1456 + #1457 landed). Splitting into v1.2.0 + v1.3.0 would force a follow-up rebase + double-review with no audit benefit. 2. **Why PARTIAL_ALGORITHM_LEVEL not FUNCTIONAL?** FUNCTIONAL requires LIVE evidence. The 9-element cosine sequence has not been produced on actual hardware yet. Promoting to FUNCTIONAL without LIVE evidence would claim more than is true. 3. **Why isn't the LIVE run inside this PR?** Per `feedback_compute_pre_authorized.md`, named GPU lanes are pre-authorized but SHIP-007 LIVE bisection is borderline (binary rebuild needed + host driver mismatch). Operator-triggered keeps the audit clean. 4. **Why list SaveTensorStage variants as "function_names"?** They're enum variants, not functions strictly speaking, but they are the symbolic identities that the algorithm-level evidence binds to. The contract validator accepts them. 5. **Why explicit prerequisites in `notes`?** Future readers who see "PARTIAL_ALGORITHM_LEVEL" need to know WHY it's not yet FUNCTIONAL. The notes are the operator-handoff document inside the contract itself. ## Net effects - Contract `trace-attn-sub-stages-v1.yaml` v1.1.0 → v1.2.0 PROPOSED. - SUB-003: drift fix (3 real wired functions, 2 explicit drift-prevention test pins). - SUB-004: BLOCKER_FIXTURE_ABSENT → PARTIAL_ALGORITHM_LEVEL with 4-line invariants + explicit FUNCTIONAL prereqs. - **MODEL-1 ship %**: unchanged at **91%** (FUNCTIONAL discharge gates ship %, not PARTIAL). - **MODEL-2 ship %**: unchanged at **57%**. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 4, 2026
…x + SUB-004 BLOCKER → PARTIAL_ALGORITHM_LEVEL (#1459) * contract(trace-attn-sub-stages-v1): v1.1.0 → v1.2.0 — function-name drift fix in SUB-003 algorithm_evidence ## Why Contract drift discovered after PR #1456 (FALSIFY-ATTN-SUB-003 drift-prevention test) merged on main. The algorithm_evidence block named: ```yaml function_names: - load_tensor_apr_aprt ``` But this function does not exist anywhere in the codebase. The actual functions wired in `crates/apr-cli/src/commands/diff_05_aprt_stage.rs` and exercised by PR #1456's tests are: - `is_aprt_stage_file` (magic-byte detection) - `compute_aprt_stage_stats` (cosine + RMS + top-K) - `run_aprt_stage_diff` (e2e reader + emitter) Per `feedback_no_guessing.md`. Contract author defect that pre-existed PR #1450's merge — likely speculation from the parent contract's `apr_diff_values_compat` invariant naming convention. Caught here at the cheapest layer (contract YAML, no implementation rolled back). ## What landed - Bumped `metadata.version` 1.1.0 → 1.2.0 with v1.2.0 changelog block describing the fix. - Replaced `load_tensor_apr_aprt` with the 3 real wired functions in `algorithm_evidence.function_names`. - Added `crates/apr-cli/src/commands/diff_05_aprt_stage.rs` to `algorithm_evidence.file_paths` (the actual location of the wired functions). - Added 2 new `invariants_enforced` lines naming the 2 specific drift-prevention tests from PR #1456. - Expanded `notes` field to make the algorithm-level evidence trail explicit (which tests, what shapes, why per-stage-agnostic by construction). ## Test plan - [x] `pv validate contracts/trace-attn-sub-stages-v1.yaml` reports `0 error(s), 0 warning(s) — Contract is valid.` - [ ] CI green - [ ] Auto-merge ## Five whys 1. **Why now and not in §47/§48?** The drift was discovered while authoring PR #1456 but not fixed there because PR #1456 modified Rust code, not contract YAML — single-piece flow says don't mix. Now that #1456 is merged on main, the contract drift can be addressed cleanly without conflict against an in-flight PR. 2. **Why a separate PR rather than in PR #1457?** PR #1457 is the HF FP16 oracle script extension (Python-only). Modifying the contract there would couple two independent fixes. This PR is contract-only YAML and lands independently. 3. **Why bump to v1.2.0 rather than v1.1.1?** Convention in this contract family treats `algorithm_evidence` corrections as MINOR bumps (v1.0.0 → v1.1.0 for the Toyota Way scope correction, also algorithm_evidence-level). v1.1.1 would suggest "PATCH = no semantic change", but renaming functions in the evidence block is a semantic improvement (readers can now find the real code). 4. **Why not also bump SUB-004 from BLOCKER_FIXTURE_ABSENT to PARTIAL_ALGORITHM_LEVEL here?** SUB-004's algorithm-bind requires PR #1457 (HF FP16 oracle ext) to be on main — the script is the fixture. PR #1457 is in flight. Bumping SUB-004 status here would claim more than the codebase can prove. Keeping single-piece flow: this PR ships the SUB-003 drift fix only. 5. **Why is the loader genuinely per-stage-agnostic?** `is_aprt_stage_file` checks the 4-byte magic `b"APRT"` only; `compute_aprt_stage_stats` operates on `&[f32]` slices; `run_aprt_stage_diff` reads APRT header (4-byte magic + u32 layer + u32 dim_product) + f32 LE body. Stage names are encoded only in the OUTPUT FILENAME (e.g., `layer_0_attn_scores.aprt`), never in the binary content. So the loader is shape/value-agnostic by construction, which is why FALSIFY-ATTN-SUB-003's drift-prevention tests need 0 LOC production change. ## Net effects - Contract `trace-attn-sub-stages-v1.yaml` v1.1.0 → v1.2.0 PROPOSED. - SUB-003 algorithm_evidence now correctly names the wired functions. - **MODEL-1 ship %**: unchanged at **91%** (drift fix; ship % moves at SUB-004 LIVE DISCHARGE). - **MODEL-2 ship %**: unchanged at **57%**. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * contract(trace-attn-sub-stages-v1): SUB-004 BLOCKER_FIXTURE_ABSENT → PARTIAL_ALGORITHM_LEVEL — fixture is now on main Bundles the SUB-004 status promotion into the v1.2.0 PR alongside the SUB-003 function-name drift fix already authored. Both changes ship as one v1.2.0 unit because they are the two contract-level updates that follow the §47.1 cascade roadmap closing at the algorithm level. ## Why now PR #1457 (HF FP16 oracle script extension) merged on main. The fixture previously claimed "absent" is now generated by: ``` uv run --with torch --with transformers --with safetensors --with accelerate \ scripts/generate_qwen25_coder_fp16_stages.py \ --output /tmp/qwen25-coder-7b-hf-fp16-stages \ --layers 0 --with-attn-substages ``` Per `feedback_no_guessing.md`: SUB-004's status is now provable from main. Promote. ## What landed Updated SUB-004 algorithm_evidence: - `status`: BLOCKER_FIXTURE_ABSENT → PARTIAL_ALGORITHM_LEVEL - `file_paths`: added the actual script + APR-side wire files - `function_names`: replaced placeholder `run_hf_fp16_reference` with the 6 real symbols (`install_attn_substages_patch`, `traced_forward`, plus 4 SaveTensorStage variants) - `invariants_enforced`: 1 line → 4 lines explicitly naming what each PR pinned - `notes`: documents the FUNCTIONAL discharge prerequisites (binary rebuild + driver/CPU) Updated metadata.description v1.2.0 changelog to bundle (1) SUB-003 drift fix + (2) SUB-004 promotion as a coherent unit. ## Five whys 1. **Why combine SUB-003 drift fix + SUB-004 promotion in v1.2.0?** Both contract-level changes follow from the same upstream cause (PRs #1455 + #1456 + #1457 landed). Splitting into v1.2.0 + v1.3.0 would force a follow-up rebase + double-review with no audit benefit. 2. **Why PARTIAL_ALGORITHM_LEVEL not FUNCTIONAL?** FUNCTIONAL requires LIVE evidence. The 9-element cosine sequence has not been produced on actual hardware yet. Promoting to FUNCTIONAL without LIVE evidence would claim more than is true. 3. **Why isn't the LIVE run inside this PR?** Per `feedback_compute_pre_authorized.md`, named GPU lanes are pre-authorized but SHIP-007 LIVE bisection is borderline (binary rebuild needed + host driver mismatch). Operator-triggered keeps the audit clean. 4. **Why list SaveTensorStage variants as "function_names"?** They're enum variants, not functions strictly speaking, but they are the symbolic identities that the algorithm-level evidence binds to. The contract validator accepts them. 5. **Why explicit prerequisites in `notes`?** Future readers who see "PARTIAL_ALGORITHM_LEVEL" need to know WHY it's not yet FUNCTIONAL. The notes are the operator-handoff document inside the contract itself. ## Net effects - Contract `trace-attn-sub-stages-v1.yaml` v1.1.0 → v1.2.0 PROPOSED. - SUB-003: drift fix (3 real wired functions, 2 explicit drift-prevention test pins). - SUB-004: BLOCKER_FIXTURE_ABSENT → PARTIAL_ALGORITHM_LEVEL with 4-line invariants + explicit FUNCTIONAL prereqs. - **MODEL-1 ship %**: unchanged at **91%** (FUNCTIONAL discharge gates ship %, not PARTIAL). - **MODEL-2 ship %**: unchanged at **57%**. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docs/specifications/aprender-train/ship-two-models-spec.md§47.6 ranked-leverage order.crates/apr-cli/src/commands/diff_05_aprt_stage.rsproving theapr diff --valuesAPRT loader is per-stage-agnostic for the newattn_scores+attn_softmaxstages introduced in PR feat(aprender-serve): SaveTensorStage gains AttnScores + AttnSoftmax — FALSIFY-ATTN-SUB-001 PARTIAL_ALGORITHM_LEVEL #1451.What the tests pin
falsify_attn_sub_003_new_stages_per_stage_agnosticlayer_0_attn_scores.aprt+layer_0_attn_softmax.aprtat realistic shape28*7*7=1372(Qwen2.5-7B BOS layer-0 [num_heads, seq, seq]).falsify_attn_sub_003_cosine_detects_softmax_divergenceWhy this is drift-prevention
contracts/trace-attn-sub-stages-v1.yamlv1.1.0 SUB-003 invariant:If anyone ever introduces a per-stage
match stage { … }insideis_aprt_stage_file,compute_aprt_stage_stats, orrun_aprt_stage_diff, these assertions fail — forcing a contract bump.Cascade context
trace-attn-sub-stages-v1v1.1.0Test plan
cargo test -p apr-cli --lib aprt_stage_diff_tests— 13/13 PASS (11 prior + 2 new)apr-cliFive whys
Why a test PR rather than the live RTX 4090 bisection? §47.6 ranked-leverage list orders the cascade — step 5 (this PR) ships before step 6 (HF FP16 oracle extension) before step 7 (live bisection). Skipping the drift-prevention layer means future regressions at
is_aprt_stage_filewould only be caught during a live run — wasteful.Why 2 tests? Spec said "1 test + 0 LOC". 2nd test is bonus FALSIFY-ATTN-SUB-004 coverage: pins cosine sensitivity for the load-bearing predicate of the live bisection.
Why update Rust rather than the contract YAML? Contract YAML lives in PR contract(trace-attn-sub-stages-v1): v1.1.0 PROPOSED — layer-0 attention bisection plan (2 new SaveTensorStage variants + 9-stage chain) #1450 (open); modifying it from a separate branch would conflict. The test exercises the real wired functions, providing the algorithm-level evidence the contract requires.
Why not bump SUB-003 status from PARTIAL_ALGORITHM_LEVEL to FUNCTIONAL? FUNCTIONAL requires LIVE evidence (HF FP16 oracle + APR teacher tensors). That's cascade step 7. This PR is algorithm-level only.
Why not amend PR contract(trace-attn-sub-stages-v1): v1.1.0 PROPOSED — layer-0 attention bisection plan (2 new SaveTensorStage variants + 9-stage chain) #1450? Single-piece flow. contract(trace-attn-sub-stages-v1): v1.1.0 PROPOSED — layer-0 attention bisection plan (2 new SaveTensorStage variants + 9-stage chain) #1450 is auto-merge armed and CI-green; pushing more commits restarts ~10 min CI. This PR is independent enough to land separately.
Plain ship % (unchanged this cycle)
🤖 Generated with Claude Code