contract(trace-ffn-sub-block-v1): pre-commit schema for sub-FFN telemetry (SHIP-007 load-bearing)#1065
Merged
Merged
Conversation
…etry — load-bearing for SHIP-007 Per `feedback_apr_trace_not_eprintln.md`: "Missing TraceStep granularity → extend the enum behind a contract." This authors the contract envelope BEFORE the implementation lands. Why: §17 of ship-two-models-spec.md identified APR teacher CPU layer-3 ffn_out std=11.459 vs layer-2 std=0.216 (53× spike) on the canonical paiml/qwen2.5-coder-7b-apache-q4k-v1 teacher. To localize the bug to a sub-block (gate_proj / silu(gate) / silu(gate)*up / down_proj), instrumentation must subdivide the existing `ffn_out_stats` field. Contract pins: - 8 proof_obligations (LayerActivation field-set additive, ffn_gate/up/silu_gate/swiglu_inner reflect their respective compute steps, ffn_out backward-compat semantics preserved, render order is total, JSON keys additive). - 8 falsification_tests (the layer-3 spike SHALL be pinned to exactly one of 5 sub-FFN slots; existing ffn_out_stats values byte-identical; naming aligns with peer layer-parity-v1.yaml; GPU TracedForward populates new fields; renderer emits 10 lines per layer not 6; intermediate vectors have correct length; doc-comments cite the contract; coverage co-evolves). - 8 kani_harnesses (one per obligation). - 6 constants (including SHIP_007_REFERENCE_FFN_OUT_STD_LAYER_3=11.459). - qa_gate F-SUB-FFN-001. - Equations: swiglu_inner formula + ffn_output formula. NOTE: §17.4's prose erroneously said "silu(up_proj_out)"; the actual code is `silu(gate) * up`. This contract pins the correct schema. Validates clean against `pv validate`. `discharge_status: PROPOSED` — contract envelope only. Implementation + live evidence land in follow-up PR. References: - docs/specifications/aprender-train/ship-two-models-spec.md §15.5 / §17.4 - evidence/ship-007-layer-3-anomaly/discharge-evidence-v1.json - contracts/layer-parity-v1.yaml (peer) Closes task #155. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5 tasks
noahgift
added a commit
that referenced
this pull request
Apr 26, 2026
…ml — 4 new ActivationStats fields on LayerActivation (#1066) Implements `contracts/trace-ffn-sub-block-v1.yaml` v1.0.0 (PR #1065). Load-bearing for the SHIP-007 fix per ship-two-models-spec.md §15.5 + §17.4. Closes the per-layer telemetry gap that was the §17.4 falsifier prerequisite. ## What changed `LayerActivation` (apr_transformer/mod.rs) gains 4 new `ActivationStats` fields between `ffn_norm_stats` and `ffn_out_stats`: - ffn_gate_stats (post-gate-proj-matmul; OBL-SUB-FFN-001) - ffn_up_stats (post-up-proj-matmul; OBL-SUB-FFN-002) - ffn_silu_gate_stats (post-SiLU on gate; OBL-SUB-FFN-003) - ffn_swiglu_inner_stats (post-elementwise silu(gate)*up; OBL-SUB-FFN-004) `forward_traced` (CPU path, inference.rs) populates all 4 on the SwiGLU path; on the GELU/non-gated path, only `ffn_up_stats` is populated (other 3 stay default-zero — there's no SwiGLU sub-structure to capture). GPU TracedForward (forward_from_model.rs, gpu_forward_pass.rs) zero-fills the 4 new fields per OBL-SUB-FFN-008 (FALSIFY-SUB-FFN-004 follow-up — GPU- side capture is staged separately). CLI renderer (apr-cli/src/commands/vector_stats.rs) emits 4 new lines per layer in computation order (ffn_gate/up/silu/swigl) BETWEEN ffn_norm and ffn_out, suppressed when default-zero (GPU path until follow-up). ## Backward compat (FALSIFY-SUB-FFN-002) `ffn_out_stats` semantics are byte-identical pre/post: same matmul output, same residual contribution. All 720 existing apr_transformer tests pass without modification. ## New tests `tests/forward_traced.rs::test_sub_ffn_telemetry_swiglu_path_populates_all_4_fields` asserts SwiGLU path populates all 4 fields with the correct intermediate_dim count (FALSIFY-SUB-FFN-001 + FALSIFY-SUB-FFN-006). `tests/forward_traced.rs::test_sub_ffn_telemetry_gelu_path_only_up_populated` asserts GELU path leaves 3 fields default-zero and only populates ffn_up_stats with pre-GELU values. ## What this enables §17.4's falsifier next step: run `apr trace --payload` on the canonical 7B teacher and identify whichever of {ffn_gate, ffn_up, ffn_silu_gate, ffn_swiglu_inner, ffn_out} carries the layer-3 53× spike. Whichever sub-tensor first shows the discontinuity is the SHIP-007 bug site. ## Outstanding follow-ups (per contract) - OBL-SUB-FFN-008: GPU sub-FFN telemetry capture — staged for follow-up - Live evidence on the canonical 7B teacher pinning the layer-3 spike to a sub-FFN slot — staged for follow-up Closes contract OBL-SUB-FFN-001..007 algorithm-level. Live evidence discharge pending follow-up `apr trace` run. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
contracts/trace-ffn-sub-block-v1.yamlv1.0.0 PROPOSED — the contract envelope for extendingrealizar::apr_transformer::LayerActivationto capture intermediate FFN sub-tensor stats (gate_proj_out, up_proj_out, silu_gate, swiglu_inner, ffn_down_out).feedback_apr_trace_not_eprintln.md: missing TraceStep granularity must be addressed via contract-first extension. This pre-commits the schema BEFORE the implementation lands.What this contract pins
SHIP_007_REFERENCE_FFN_OUT_STD_LAYER_3=11.459andSHIP_007_SPIKE_RATIO=53.0silu(gate) * up. This contract pins the correct schema.Validation
pv validate contracts/trace-ffn-sub-block-v1.yaml→ 0 error(s), 0 warning(s). Contract is valid.What this is NOT
discharge_status: PROPOSED. Implementation, falsification tests, and live evidence land in a follow-up PR.Stacks under
Test plan
pv validatepasses (0 errors, 0 warnings)Why this matters
Per §17.5 of the spec: "Whatever fix lands also discharges all 5 transitively-blocked MODEL-1 PARTIALs (SHIP-002/005/006/007/008) per §15.7's blast-radius inventory." This contract is the prerequisite for that fix.
🤖 Generated with Claude Code