feat(aprender-core): trace-ffn-sub-block-v1 SUB-FFN-005 PARTIAL_ALGORITHM_LEVEL#1149
Merged
Conversation
…ITHM_LEVEL
Per trace-ffn-sub-block-v1.yaml v1.0.0 PROPOSED. **Fresh contract** —
first algorithm-bound SUB-FFN falsifier (was 0/8). Diversifies to a 6th
PROPOSED contract surface.
## What FALSIFY-SUB-FFN-005 says
rule: Per-layer payload line count grows from 6 to 10
prediction: Stdout line count for one layer block SHALL be exactly 10
(was 6). Sentinel: `apr trace --payload | grep -c
"^\\s\\+ffn_"` SHALL return 4 * 28 = 112 (was 2 * 28 =
56) on the 28-layer teacher.
## What this file proves NOW
Decision rule: ffn_line_count == 4 * num_layers AND num_layers > 0 AND
ffn_line_count > 0. Computed via checked_mul to prevent overflow.
Pinning the constant `4` (post-implementation `ffn_*` line count per
layer) catches:
- Future regression to 2 (revert sub-FFN telemetry, undo SHIP-007 instrumentation)
- Future drift to 3 (drop a sub-FFN field) or 5 (add without contract bump)
New file crates/aprender-core/src/format/sub_ffn_005.rs:
- pub const AC_SUB_FFN_005_FFN_LINES_PER_LAYER: u64 = 4
- pub const AC_SUB_FFN_005_PRE_IMPL_LINES_PER_LAYER: u64 = 2
- pub enum SubFfn005Verdict { Pass, Fail }
- pub fn verdict_from_ffn_line_count(u64, u64) -> ..
17 unit tests + 2 doctests organized as a 7-section mutation survey:
1. Provenance pin (4 lines per layer, 2 pre-impl, post == 2 * pre)
2. Pass band (Qwen2.5-Coder-7B 28 layers, Llama-3.1-8B 32, minimal 1,
Llama-3.1-70B 80)
3. Fail band — pre-impl regression (28 layers, 32 layers, 80 layers)
4. Fail band — drift to 3 or 5 lines per layer
5. Fail band — caller errors (zero layers, zero count, both)
6. Off-by-one (113 vs 112, 111 vs 112)
7. Overflow protection (num_layers * 4 overflows u64)
Live results:
cargo test -p aprender-core --lib format::sub_ffn_005
test result: ok. 17 passed; 0 failed; 0 ignored.
trace-ffn-sub-block contract: 1 of 8 SUB-FFN falsifiers algorithm-bound
(was 0/8). Sixth PROPOSED contract on the algorithm-binding surface.
Five-Whys (Toyota Way):
Why 1: SHIP-007 layer-3 bisection landed sub-FFN telemetry (PR #1082).
Why 2: Without a guard, a future revert PR could silently undo the
instrumentation, removing the bisection capability.
Why 3: The decision rule (line_count == 4 * num_layers) is purely
arithmetic — testable today against expected per-model layer counts.
Why 4: Pinning the strict-== boundary NOW means a future impl cannot
silently regress to 2 (pre-impl) or drift to 3/5 (drop/add field).
Why 5: §26.8 stack-tool-extension methodology.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Per `trace-ffn-sub-block-v1.yaml` v1.0.0 PROPOSED. Fresh contract — first algorithm-bound SUB-FFN falsifier (was 0/8). Diversifies to a 6th PROPOSED contract surface.
FALSIFY-SUB-FFN-005: Per-layer payload line count grows from 6 to 10. Sentinel: `apr trace --payload | grep -c "^\s\+ffn_"` returns 4 × num_layers (= 112 on the 28-layer teacher).
This PR pins the decision rule — "line_count == 4 × num_layers, computed via checked_mul, both inputs non-zero" — at PARTIAL_ALGORITHM_LEVEL.
What's added
New file `crates/aprender-core/src/format/sub_ffn_005.rs` (~280 LOC):
Tests (17 unit + 2 doctests, all green) — 7-section mutation survey
Live verification
```
$ cargo test -p aprender-core --lib format::sub_ffn_005
test result: ok. 17 passed; 0 failed; 0 ignored
```
Why this is small
Tight: 1 new file (~280 LOC), 1 line added to `mod.rs`. No CLI surface change. First non-pretokenize-bin, non-distill, non-save-tensor, non-dataset-thestack, non-tokenize-parallel-bpe binding — diversifies to a 6th PROPOSED contract.
Five-Whys (Toyota Way)
Test plan
🤖 Generated with Claude Code