Skip to content

contract(trace-ffn-sub-block-v1): pre-commit schema for sub-FFN telemetry (SHIP-007 load-bearing)#1065

Merged
noahgift merged 2 commits into
mainfrom
feat/trace-ffn-sub-block-v1-contract
Apr 26, 2026
Merged

contract(trace-ffn-sub-block-v1): pre-commit schema for sub-FFN telemetry (SHIP-007 load-bearing)#1065
noahgift merged 2 commits into
mainfrom
feat/trace-ffn-sub-block-v1-contract

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

  • Authors contracts/trace-ffn-sub-block-v1.yaml v1.0.0 PROPOSED — the contract envelope for extending realizar::apr_transformer::LayerActivation to capture intermediate FFN sub-tensor stats (gate_proj_out, up_proj_out, silu_gate, swiglu_inner, ffn_down_out).
  • Per feedback_apr_trace_not_eprintln.md: missing TraceStep granularity must be addressed via contract-first extension. This pre-commits the schema BEFORE the implementation lands.
  • Load-bearing for the SHIP-007 fix per ship-two-models-spec.md §15.5 + §17.4. §17 identified APR teacher CPU layer-3 ffn_out std=11.459 vs layer-2 std=0.216 (53× spike). Sub-FFN bisection requires this schema extension.

What this contract pins

  • 8 proof_obligations — LayerActivation field-set is additive; new fields reflect their respective compute steps; ffn_out backward-compat preserved; render order is total; JSON keys additive
  • 8 falsification_tests — the layer-3 spike SHALL be pinned to exactly one of 5 sub-FFN slots; existing ffn_out_stats values byte-identical; naming aligns with peer layer-parity-v1.yaml; GPU TracedForward populates new fields; renderer emits 10 lines per layer not 6; doc-comments cite the contract
  • 8 kani_harnesses (one per obligation)
  • 6 constants — including SHIP_007_REFERENCE_FFN_OUT_STD_LAYER_3=11.459 and SHIP_007_SPIKE_RATIO=53.0
  • qa_gate F-SUB-FFN-001
  • Equations — swiglu_inner + ffn_output formulas. NOTE: §17.4's prose erroneously said "silu(up_proj_out)"; the actual code is silu(gate) * up. This contract pins the correct schema.

Validation

pv validate contracts/trace-ffn-sub-block-v1.yaml → 0 error(s), 0 warning(s). Contract is valid.

What this is NOT

  • This is the contract envelope only. discharge_status: PROPOSED. Implementation, falsification tests, and live evidence land in a follow-up PR.

Stacks under

Test plan

Why this matters

Per §17.5 of the spec: "Whatever fix lands also discharges all 5 transitively-blocked MODEL-1 PARTIALs (SHIP-002/005/006/007/008) per §15.7's blast-radius inventory." This contract is the prerequisite for that fix.

🤖 Generated with Claude Code

…etry — load-bearing for SHIP-007

Per `feedback_apr_trace_not_eprintln.md`: "Missing TraceStep granularity
→ extend the enum behind a contract." This authors the contract envelope
BEFORE the implementation lands.

Why: §17 of ship-two-models-spec.md identified APR teacher CPU layer-3
ffn_out std=11.459 vs layer-2 std=0.216 (53× spike) on the canonical
paiml/qwen2.5-coder-7b-apache-q4k-v1 teacher. To localize the bug to a
sub-block (gate_proj / silu(gate) / silu(gate)*up / down_proj),
instrumentation must subdivide the existing `ffn_out_stats` field.

Contract pins:
- 8 proof_obligations (LayerActivation field-set additive,
  ffn_gate/up/silu_gate/swiglu_inner reflect their respective compute
  steps, ffn_out backward-compat semantics preserved, render order is
  total, JSON keys additive).
- 8 falsification_tests (the layer-3 spike SHALL be pinned to exactly
  one of 5 sub-FFN slots; existing ffn_out_stats values byte-identical;
  naming aligns with peer layer-parity-v1.yaml; GPU TracedForward
  populates new fields; renderer emits 10 lines per layer not 6;
  intermediate vectors have correct length; doc-comments cite the
  contract; coverage co-evolves).
- 8 kani_harnesses (one per obligation).
- 6 constants (including SHIP_007_REFERENCE_FFN_OUT_STD_LAYER_3=11.459).
- qa_gate F-SUB-FFN-001.
- Equations: swiglu_inner formula + ffn_output formula. NOTE: §17.4's
  prose erroneously said "silu(up_proj_out)"; the actual code is
  `silu(gate) * up`. This contract pins the correct schema.

Validates clean against `pv validate`.

`discharge_status: PROPOSED` — contract envelope only. Implementation
+ live evidence land in follow-up PR.

References:
- docs/specifications/aprender-train/ship-two-models-spec.md §15.5 / §17.4
- evidence/ship-007-layer-3-anomaly/discharge-evidence-v1.json
- contracts/layer-parity-v1.yaml (peer)

Closes task #155.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) April 26, 2026 07:14
@noahgift noahgift merged commit 2b4c941 into main Apr 26, 2026
10 checks passed
@noahgift noahgift deleted the feat/trace-ffn-sub-block-v1-contract branch April 26, 2026 08:03
noahgift added a commit that referenced this pull request Apr 26, 2026
…ml — 4 new ActivationStats fields on LayerActivation (#1066)

Implements `contracts/trace-ffn-sub-block-v1.yaml` v1.0.0 (PR #1065).
Load-bearing for the SHIP-007 fix per ship-two-models-spec.md §15.5 +
§17.4. Closes the per-layer telemetry gap that was the §17.4 falsifier
prerequisite.

## What changed

`LayerActivation` (apr_transformer/mod.rs) gains 4 new `ActivationStats`
fields between `ffn_norm_stats` and `ffn_out_stats`:

  - ffn_gate_stats          (post-gate-proj-matmul; OBL-SUB-FFN-001)
  - ffn_up_stats            (post-up-proj-matmul; OBL-SUB-FFN-002)
  - ffn_silu_gate_stats     (post-SiLU on gate; OBL-SUB-FFN-003)
  - ffn_swiglu_inner_stats  (post-elementwise silu(gate)*up; OBL-SUB-FFN-004)

`forward_traced` (CPU path, inference.rs) populates all 4 on the SwiGLU
path; on the GELU/non-gated path, only `ffn_up_stats` is populated
(other 3 stay default-zero — there's no SwiGLU sub-structure to capture).

GPU TracedForward (forward_from_model.rs, gpu_forward_pass.rs) zero-fills
the 4 new fields per OBL-SUB-FFN-008 (FALSIFY-SUB-FFN-004 follow-up — GPU-
side capture is staged separately).

CLI renderer (apr-cli/src/commands/vector_stats.rs) emits 4 new lines
per layer in computation order (ffn_gate/up/silu/swigl) BETWEEN ffn_norm
and ffn_out, suppressed when default-zero (GPU path until follow-up).

## Backward compat (FALSIFY-SUB-FFN-002)

`ffn_out_stats` semantics are byte-identical pre/post: same matmul
output, same residual contribution. All 720 existing apr_transformer
tests pass without modification.

## New tests

`tests/forward_traced.rs::test_sub_ffn_telemetry_swiglu_path_populates_all_4_fields`
asserts SwiGLU path populates all 4 fields with the correct
intermediate_dim count (FALSIFY-SUB-FFN-001 + FALSIFY-SUB-FFN-006).

`tests/forward_traced.rs::test_sub_ffn_telemetry_gelu_path_only_up_populated`
asserts GELU path leaves 3 fields default-zero and only populates
ffn_up_stats with pre-GELU values.

## What this enables

§17.4's falsifier next step: run `apr trace --payload` on the canonical
7B teacher and identify whichever of {ffn_gate, ffn_up, ffn_silu_gate,
ffn_swiglu_inner, ffn_out} carries the layer-3 53× spike. Whichever
sub-tensor first shows the discontinuity is the SHIP-007 bug site.

## Outstanding follow-ups (per contract)

- OBL-SUB-FFN-008: GPU sub-FFN telemetry capture — staged for follow-up
- Live evidence on the canonical 7B teacher pinning the layer-3 spike to a sub-FFN slot — staged for follow-up

Closes contract OBL-SUB-FFN-001..007 algorithm-level. Live evidence
discharge pending follow-up `apr trace` run.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant