docs(ship-007): §23 sub-FFN bisection — layer-3 ffn_swigl first 17× anomaly site (v2.67.0)#1075
Merged
Merged
Conversation
…t 17× anomaly site — spec v2.66.0 → v2.67.0 §17.4 specified sub-layer bisection of FFN as the falsifier next step. PR #1066 added the 4 sub-FFN ActivationStats fields. §23 records the first run on the canonical 7B teacher post-#1066-merge. (Originally authored as §21 in the closed PR #1072. Re-numbered as §23 because §22 (PR #1074) landed first with v2.66.0 banner; this PR brings v2.67.0.) ## Key finding Live `apr trace --payload` on `paiml/qwen2.5-coder-7b-apache-q4k-v1` teacher (CPU, prompt "What is 2+2?") layer-3 sub-FFN std: | Sub-FFN slot | L1-2 baseline | L3 | Ratio | |--------------|--------------:|----:|------:| | ffn_norm | 0.85 / 0.86 | 1.00 | 1.16× normal | | ffn_gate | 1.50 / 1.99 | 1.92 | 0.97× normal | | ffn_up | 1.10 / 0.94 | 1.34 | 1.42× small | | ffn_silu | 0.043 / 0.052 | 0.168 | 3.2× precursor | | **ffn_swigl** | **0.061 / 0.071** | **1.222** | **17.2× anomaly** | | ffn_out | 0.345 / 0.216 | 11.459 | 53× cascade | Gate/up individually normal at layer 3. Element-wise multiply at inference.rs:163 `ffn_hidden.push(silu_g * u)` is the named bug site (possibly off-by-one slice indexing). ## Bug surface narrowing chain - §15.4: GPU GQA kernel ELIMINATED - §16: GPU stack ELIMINATED (CPU APR vs GGUF) - §17: layer 3 FFN sub-block named (53× ffn_out) - **§23: layer 3 ffn_swigl named (17× first anomaly site)** ## Falsifiable next investigation step (§23.6) Extend `OwnedQuantizedModel::forward_traced` (the GGUF path; needs to be authored per `project_ship_007_gguf_forward_traced_plan.md`) with same 4 sub-FFN fields. Compare APR vs GGUF layer-3 ffn_swigl directly: - ≈0.07 → APR-side bug pinned to inference.rs:160-164 - ≈1.22 → spike is normal model behavior; bug elsewhere ## Evidence persisted - evidence/ship-007-layer-3-anomaly/sub-ffn-bisection-2026-04-26.txt (386 lines) - evidence/ship-007-layer-3-anomaly/sub-ffn-per-layer-stds.csv (28-layer × 6-field summary) Spec v2.66.0 → v2.67.0. No coverage tally change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
Apr 27, 2026
…irmed APR-side at inference.rs:160-164 — spec v2.71.0 → v2.72.0 (#1084) Live evidence on noah-Lambda-Vector RTX 4090 2026-04-27. Built apr from PR #1083 branch (commits 77c016b + c657968 + f249464 from PR A+B+C cascade). Ran `apr trace --payload` on canonical 7B teacher in BOTH formats with identical prompt + tokenizer. Result: | Layer | APR ffn_swigl std | GGUF ffn_swigl std | Ratio | |------:|------------------:|-------------------:|------:| | 3 | 1.2216 | 0.0670 | 18.23x | §26.4 binding criterion threshold: ≥10x → APR-side bug. **Observed 18.23x — 8x past the threshold, decisive verdict.** The investigation chain that started in §15.4 (GPU GQA elimination) has reached its conclusion at §27: §15.4 → §16 → §17 → §23 → §27 (this) "Whole forward path" → "GPU eliminated" → "(layer=3, FFN sub-block)" → "(layer=3, ffn_swigl)" → "**APR-side at inference.rs:160-164**" Cascade-damping signature confirmed: - Layers 0-2: ratio ~1.1x (normal) - Layer 3: 18.23x (anomaly) - Layers 4-5: 3.3-4.5x (cascade) - Layer 6+: ~1x (recovered) This is consistent with a localized perturbation (off-by-one, buffer aliasing, or F32-vs-Q4K dequant defect at layer-3- specifically) rather than persistent residual-stream corruption. Per §17.5, SHIP-007 fix discharges 5 MODEL-1 PARTIALs at once (SHIP-002/005/006/007/008). §26.5 expected coverage flip: 33+12 → 28+17 when fix lands. §27 does NOT discharge by itself — it locates the bug for fixing. Next investigation reads `inference.rs:160-164` and tests 4 hypotheses: 1. Off-by-one slice indexing 2. Buffer aliasing (scratch reuse pattern) 3. F32-vs-Q4K dequant defect at layer-3 input range 4. Activation overflow (SiLU saturation amplifies multiply) Methodology held throughout: zero eprintln!, zero route-arounds, apr is canonical (§26.8), all instrumentation via `apr trace --payload`. Lambda-labs lane pre-authorized. Evidence persisted to evidence/ship-007-apr-vs-gguf-2026-04-27/: - apr-trace.txt (13.5 KB) - gguf-trace.txt (13.7 KB) - binding-criterion-summary.json Note: §27 reproduction requires PR #1081 + #1082 + #1083 cascade to merge first (the apr trace --payload <gguf> wiring is in PR C). Evidence was generated with a local build of PR #1083 branch. Spec v2.71.0 → v2.72.0. Coverage flip pending fix. Spec: SPEC-SHIP-TWO-001 §26.4 P3 verdict References: - §15.4 (PR #1062) — GPU GQA eliminated - §16 (PR #1063) — APR CPU isolated - §17 (PR #1064) — layer-3 FFN sub-block - §23 (PR #1075) — layer-3 ffn_swigl named - §26.8 (PR #1079) — apr-is-canonical methodology rule - PR #1081 (P3 PR A scaffold) - PR #1082 (P3 PR B sub-FFN populate) - PR #1083 (P3 PR C CLI wiring) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Key finding
Gate/up individually normal. Bug surface =
inference.rs:163ffn_hidden.push(silu_g * u).Falsifiable next step
Extend
OwnedQuantizedModel::forward_tracedwith same 4 sub-FFN fields → compare APR vs GGUF layer-3 ffn_swigl directly. Disambiguates "APR-side bug" vs "normal trained behavior".Evidence
evidence/ship-007-layer-3-anomaly/sub-ffn-bisection-2026-04-26.txt(386 lines, full apr trace)evidence/ship-007-layer-3-anomaly/sub-ffn-per-layer-stds.csv(28-layer × 6-field summary)🤖 Generated with Claude Code