Skip to content

feat(aprender-core): trace-ffn-sub-block-v1 SUB-FFN-005 PARTIAL_ALGORITHM_LEVEL#1149

Merged
noahgift merged 1 commit into
mainfrom
feat/sub-ffn-005-verdict-binding
Apr 29, 2026
Merged

feat(aprender-core): trace-ffn-sub-block-v1 SUB-FFN-005 PARTIAL_ALGORITHM_LEVEL#1149
noahgift merged 1 commit into
mainfrom
feat/sub-ffn-005-verdict-binding

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Per `trace-ffn-sub-block-v1.yaml` v1.0.0 PROPOSED. Fresh contract — first algorithm-bound SUB-FFN falsifier (was 0/8). Diversifies to a 6th PROPOSED contract surface.

FALSIFY-SUB-FFN-005: Per-layer payload line count grows from 6 to 10. Sentinel: `apr trace --payload | grep -c "^\s\+ffn_"` returns 4 × num_layers (= 112 on the 28-layer teacher).

This PR pins the decision rule — "line_count == 4 × num_layers, computed via checked_mul, both inputs non-zero" — at PARTIAL_ALGORITHM_LEVEL.

What's added

New file `crates/aprender-core/src/format/sub_ffn_005.rs` (~280 LOC):

  • `pub const AC_SUB_FFN_005_FFN_LINES_PER_LAYER: u64 = 4`
  • `pub const AC_SUB_FFN_005_PRE_IMPL_LINES_PER_LAYER: u64 = 2` (regression-detection)
  • `pub enum SubFfn005Verdict { Pass, Fail }`
  • `pub fn verdict_from_ffn_line_count(u64, u64) -> SubFfn005Verdict`

Tests (17 unit + 2 doctests, all green) — 7-section mutation survey

  1. Provenance pin (3): 4 lines per layer, 2 pre-impl, post == 2 × pre.
  2. Pass band (4): Qwen2.5-Coder-7B (28 layers), Llama-3.1-8B (32), minimal (1), Llama-3.1-70B (80).
  3. Fail band — pre-impl regression (2): 28-layer pre-impl, all canonical sizes pre-impl.
  4. Fail band — drift (2): 3 lines/layer (drop a field), 5 lines/layer (add a field).
  5. Fail band — caller errors (3): zero layers, zero count, both.
  6. Off-by-one (2): 113 vs 112, 111 vs 112.
  7. Overflow protection (1): num_layers × 4 overflows u64.

Live verification

```
$ cargo test -p aprender-core --lib format::sub_ffn_005
test result: ok. 17 passed; 0 failed; 0 ignored
```

Why this is small

Tight: 1 new file (~280 LOC), 1 line added to `mod.rs`. No CLI surface change. First non-pretokenize-bin, non-distill, non-save-tensor, non-dataset-thestack, non-tokenize-parallel-bpe binding — diversifies to a 6th PROPOSED contract.

Five-Whys (Toyota Way)

  1. SHIP-007 layer-3 bisection landed sub-FFN telemetry (PR feat(p3-prb): SHIP-007 GGUF forward_traced sub-FFN populate — 4 sub-FFN ActivationStats slots filled #1082).
  2. Without a guard, a future revert PR could silently undo the instrumentation.
  3. The decision rule (line_count == 4 × num_layers) is purely arithmetic.
  4. Pinning the strict-`==` boundary NOW means a future impl cannot silently regress to 2 (pre-impl) or drift to 3/5 (drop/add field).
  5. §26.8 stack-tool-extension methodology.

Test plan

  • `cargo test -p aprender-core --lib format::sub_ffn_005` — 17 pass green
  • `cargo fmt -p aprender-core` — formatted
  • Pre-commit quality gates passed

🤖 Generated with Claude Code

…ITHM_LEVEL

Per trace-ffn-sub-block-v1.yaml v1.0.0 PROPOSED. **Fresh contract** —
first algorithm-bound SUB-FFN falsifier (was 0/8). Diversifies to a 6th
PROPOSED contract surface.

## What FALSIFY-SUB-FFN-005 says

  rule: Per-layer payload line count grows from 6 to 10
  prediction: Stdout line count for one layer block SHALL be exactly 10
              (was 6). Sentinel: `apr trace --payload | grep -c
              "^\\s\\+ffn_"` SHALL return 4 * 28 = 112 (was 2 * 28 =
              56) on the 28-layer teacher.

## What this file proves NOW

Decision rule: ffn_line_count == 4 * num_layers AND num_layers > 0 AND
ffn_line_count > 0. Computed via checked_mul to prevent overflow.

Pinning the constant `4` (post-implementation `ffn_*` line count per
layer) catches:
- Future regression to 2 (revert sub-FFN telemetry, undo SHIP-007 instrumentation)
- Future drift to 3 (drop a sub-FFN field) or 5 (add without contract bump)

New file crates/aprender-core/src/format/sub_ffn_005.rs:
- pub const AC_SUB_FFN_005_FFN_LINES_PER_LAYER: u64 = 4
- pub const AC_SUB_FFN_005_PRE_IMPL_LINES_PER_LAYER: u64 = 2
- pub enum SubFfn005Verdict { Pass, Fail }
- pub fn verdict_from_ffn_line_count(u64, u64) -> ..

17 unit tests + 2 doctests organized as a 7-section mutation survey:
1. Provenance pin (4 lines per layer, 2 pre-impl, post == 2 * pre)
2. Pass band (Qwen2.5-Coder-7B 28 layers, Llama-3.1-8B 32, minimal 1,
   Llama-3.1-70B 80)
3. Fail band — pre-impl regression (28 layers, 32 layers, 80 layers)
4. Fail band — drift to 3 or 5 lines per layer
5. Fail band — caller errors (zero layers, zero count, both)
6. Off-by-one (113 vs 112, 111 vs 112)
7. Overflow protection (num_layers * 4 overflows u64)

Live results:
  cargo test -p aprender-core --lib format::sub_ffn_005
    test result: ok. 17 passed; 0 failed; 0 ignored.

trace-ffn-sub-block contract: 1 of 8 SUB-FFN falsifiers algorithm-bound
(was 0/8). Sixth PROPOSED contract on the algorithm-binding surface.

Five-Whys (Toyota Way):
  Why 1: SHIP-007 layer-3 bisection landed sub-FFN telemetry (PR #1082).
  Why 2: Without a guard, a future revert PR could silently undo the
         instrumentation, removing the bisection capability.
  Why 3: The decision rule (line_count == 4 * num_layers) is purely
         arithmetic — testable today against expected per-model layer counts.
  Why 4: Pinning the strict-== boundary NOW means a future impl cannot
         silently regress to 2 (pre-impl) or drift to 3/5 (drop/add field).
  Why 5: §26.8 stack-tool-extension methodology.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) April 29, 2026 19:50
@noahgift noahgift merged commit f7b60b4 into main Apr 29, 2026
11 checks passed
@noahgift noahgift deleted the feat/sub-ffn-005-verdict-binding branch April 29, 2026 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant