Skip to content

docs(ship-007): §23 sub-FFN bisection — layer-3 ffn_swigl first 17× anomaly site (v2.67.0)#1075

Merged
noahgift merged 2 commits into
mainfrom
docs/ship-007-23-sub-ffn-bisection
Apr 26, 2026
Merged

docs(ship-007): §23 sub-FFN bisection — layer-3 ffn_swigl first 17× anomaly site (v2.67.0)#1075
noahgift merged 2 commits into
mainfrom
docs/ship-007-23-sub-ffn-bisection

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Key finding

Sub-FFN slot L1-2 baseline L3 Ratio
ffn_silu 0.043 / 0.052 0.168 3.2× precursor
ffn_swigl 0.061 / 0.071 1.222 17.2× FIRST ANOMALY
ffn_out 0.345 / 0.216 11.459 53× cascade

Gate/up individually normal. Bug surface = inference.rs:163 ffn_hidden.push(silu_g * u).

Falsifiable next step

Extend OwnedQuantizedModel::forward_traced with same 4 sub-FFN fields → compare APR vs GGUF layer-3 ffn_swigl directly. Disambiguates "APR-side bug" vs "normal trained behavior".

Evidence

  • evidence/ship-007-layer-3-anomaly/sub-ffn-bisection-2026-04-26.txt (386 lines, full apr trace)
  • evidence/ship-007-layer-3-anomaly/sub-ffn-per-layer-stds.csv (28-layer × 6-field summary)

🤖 Generated with Claude Code

…t 17× anomaly site — spec v2.66.0 → v2.67.0

§17.4 specified sub-layer bisection of FFN as the falsifier next
step. PR #1066 added the 4 sub-FFN ActivationStats fields. §23
records the first run on the canonical 7B teacher post-#1066-merge.

(Originally authored as §21 in the closed PR #1072. Re-numbered as
§23 because §22 (PR #1074) landed first with v2.66.0 banner; this
PR brings v2.67.0.)

## Key finding

Live `apr trace --payload` on `paiml/qwen2.5-coder-7b-apache-q4k-v1`
teacher (CPU, prompt "What is 2+2?") layer-3 sub-FFN std:

| Sub-FFN slot | L1-2 baseline | L3 | Ratio |
|--------------|--------------:|----:|------:|
| ffn_norm     | 0.85 / 0.86   | 1.00 | 1.16× normal |
| ffn_gate     | 1.50 / 1.99   | 1.92 | 0.97× normal |
| ffn_up       | 1.10 / 0.94   | 1.34 | 1.42× small |
| ffn_silu     | 0.043 / 0.052 | 0.168 | 3.2× precursor |
| **ffn_swigl** | **0.061 / 0.071** | **1.222** | **17.2× anomaly** |
| ffn_out      | 0.345 / 0.216 | 11.459 | 53× cascade |

Gate/up individually normal at layer 3. Element-wise multiply at
inference.rs:163 `ffn_hidden.push(silu_g * u)` is the named bug
site (possibly off-by-one slice indexing).

## Bug surface narrowing chain
- §15.4: GPU GQA kernel ELIMINATED
- §16: GPU stack ELIMINATED (CPU APR vs GGUF)
- §17: layer 3 FFN sub-block named (53× ffn_out)
- **§23: layer 3 ffn_swigl named (17× first anomaly site)**

## Falsifiable next investigation step (§23.6)

Extend `OwnedQuantizedModel::forward_traced` (the GGUF path; needs
to be authored per `project_ship_007_gguf_forward_traced_plan.md`)
with same 4 sub-FFN fields. Compare APR vs GGUF layer-3 ffn_swigl
directly:
- ≈0.07 → APR-side bug pinned to inference.rs:160-164
- ≈1.22 → spike is normal model behavior; bug elsewhere

## Evidence persisted
- evidence/ship-007-layer-3-anomaly/sub-ffn-bisection-2026-04-26.txt (386 lines)
- evidence/ship-007-layer-3-anomaly/sub-ffn-per-layer-stds.csv (28-layer × 6-field summary)

Spec v2.66.0 → v2.67.0. No coverage tally change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) April 26, 2026 18:02
@noahgift noahgift merged commit 211edea into main Apr 26, 2026
10 checks passed
@noahgift noahgift deleted the docs/ship-007-23-sub-ffn-bisection branch April 26, 2026 18:45
noahgift added a commit that referenced this pull request Apr 27, 2026
…irmed APR-side at inference.rs:160-164 — spec v2.71.0 → v2.72.0 (#1084)

Live evidence on noah-Lambda-Vector RTX 4090 2026-04-27.
Built apr from PR #1083 branch (commits 77c016b + c657968
+ f249464 from PR A+B+C cascade). Ran `apr trace --payload`
on canonical 7B teacher in BOTH formats with identical prompt
+ tokenizer.

Result:
| Layer | APR ffn_swigl std | GGUF ffn_swigl std | Ratio |
|------:|------------------:|-------------------:|------:|
| 3     | 1.2216            | 0.0670             | 18.23x |

§26.4 binding criterion threshold: ≥10x → APR-side bug.
**Observed 18.23x — 8x past the threshold, decisive verdict.**

The investigation chain that started in §15.4 (GPU GQA
elimination) has reached its conclusion at §27:

§15.4 → §16 → §17 → §23 → §27 (this)
"Whole forward path" → "GPU eliminated" → "(layer=3, FFN sub-block)"
→ "(layer=3, ffn_swigl)" → "**APR-side at inference.rs:160-164**"

Cascade-damping signature confirmed:
- Layers 0-2: ratio ~1.1x (normal)
- Layer 3: 18.23x (anomaly)
- Layers 4-5: 3.3-4.5x (cascade)
- Layer 6+: ~1x (recovered)

This is consistent with a localized perturbation (off-by-one,
buffer aliasing, or F32-vs-Q4K dequant defect at layer-3-
specifically) rather than persistent residual-stream corruption.

Per §17.5, SHIP-007 fix discharges 5 MODEL-1 PARTIALs at once
(SHIP-002/005/006/007/008). §26.5 expected coverage flip: 33+12
→ 28+17 when fix lands.

§27 does NOT discharge by itself — it locates the bug for fixing.
Next investigation reads `inference.rs:160-164` and tests 4 hypotheses:
1. Off-by-one slice indexing
2. Buffer aliasing (scratch reuse pattern)
3. F32-vs-Q4K dequant defect at layer-3 input range
4. Activation overflow (SiLU saturation amplifies multiply)

Methodology held throughout: zero eprintln!, zero route-arounds,
apr is canonical (§26.8), all instrumentation via `apr trace
--payload`. Lambda-labs lane pre-authorized.

Evidence persisted to evidence/ship-007-apr-vs-gguf-2026-04-27/:
- apr-trace.txt (13.5 KB)
- gguf-trace.txt (13.7 KB)
- binding-criterion-summary.json

Note: §27 reproduction requires PR #1081 + #1082 + #1083
cascade to merge first (the apr trace --payload <gguf> wiring
is in PR C). Evidence was generated with a local build of PR
#1083 branch.

Spec v2.71.0 → v2.72.0. Coverage flip pending fix.

Spec: SPEC-SHIP-TWO-001 §26.4 P3 verdict
References:
- §15.4 (PR #1062) — GPU GQA eliminated
- §16 (PR #1063) — APR CPU isolated
- §17 (PR #1064) — layer-3 FFN sub-block
- §23 (PR #1075) — layer-3 ffn_swigl named
- §26.8 (PR #1079) — apr-is-canonical methodology rule
- PR #1081 (P3 PR A scaffold)
- PR #1082 (P3 PR B sub-FFN populate)
- PR #1083 (P3 PR C CLI wiring)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant