docs(ship-007): §17 layer-3 ffn_out anomaly identified — first divergent layer named by noahgift · Pull Request #1064 · paiml/aprender

noahgift · 2026-04-26T06:36:24Z

Summary

§17 of SHIP-TWO-001 spec records §16.4 falsifier first iteration result. The APR teacher's apr trace --payload already emits per-layer mean/std for all 28 transformer blocks. Examined progression revealed a 31× discontinuity at layer 3: ffn_out std=11.46 (vs layer 2 std=0.22 and layer-4-26 median 0.5–2.0).
Three signals point at layer 3 ffn_out specifically: (a) magnitude 31× isn't architecture-driven (SHIP-003 PR feat(ship-003): FALSIFY-SHIP-003 DISCHARGED via apr diff 339-tensor cosine sweep (5th MODEL-1 of cycle, depends on PR #1058) #1059 proved underlying weights byte-equivalent to SafeTensors); (b) damps in 1 layer (one-off perturbation, not stable feature); (c) mean shift -0.082 is 100× median magnitude → sign-bias defect.
§17.3 narrows §16.3's surviving suspects to: layer-composition glue in forward_single_with_scratch at layer 3 FFN ("most likely") + 3 new suspects (Q4K dequant under load on 18944-dim FFN; SiLU numerical stability under SwiGLU; fused gate+up matvec dispatch defect).
Spec v2.61.0 → v2.62.0. No coverage tally change.

What §17 contains

§17.1 — full 28-layer ffn_out / output std table with the layer-3 spike highlighted
§17.2 — 3-signal argument for why layer 3 is suspect (not surprise)
§17.3 — refined surviving suspect surface (4 §16.3 candidates + 3 new §17.3 candidates)
§17.4 — falsifiable next investigation step: sub-layer bisection of gate_proj_out / silu(up_proj_out) / down_proj_out (requires §15.5 TraceStep enum extension — now load-bearing)
§17.5 — re-confirms §16.2's 7-suspect elimination (none are layer-specific)
§17.6 — methodological continuation: third re-use of apr trace --payload primitive without modification

Why this matters

The bug surface for SHIP-007 is now a single layer index (3) and a single sub-block (FFN), narrowed from §16.3's 28×4 candidate space. The next root-cause fix PR is much more focused than §16's "APR forward CPU path" surface.

Stacks under

docs(ship-007): §16 APR forward CPU path isolated as root cause #1063 (§16 — APR CPU forward path isolation)
Both auto-merge ready

Test plan

§17 added at end of spec, before END OF SPECIFICATION marker
Atomic-next-action banner updated v2.61.0 → v2.62.0
PMAT pre-commit gates pass (complexity, SATD, docs)
Investigation evidence reproducible: apr trace <model.apr> --payload emits the per-layer stats used in §17.1

🤖 Generated with Claude Code

….0 → v2.62.0 Executed §16.4's first iteration ("apr trace --payload --layer 0 on both APR and GGUF teachers, bisect through 28 layers") against the APR teacher's existing per-layer telemetry. The full 28-layer ffn_out std progression on paiml/qwen2.5-coder-7b-apache-q4k-v1 (prompt "What is 2+2?") shows a 31× discontinuity at layer 3: Layer 2: ffn_out std=0.22 Layer 3: ffn_out std=11.46 ← 31× spike Layer 4: ffn_out std=3.84 ← damps in 1 layer (one-off perturbation) Median: ffn_out std=0.5–2.0 The residual stream's output std jumps 0.72 → 11.78 at layer 3 and stays elevated. Three signals point at layer 3 ffn_out specifically: (a) magnitude 31× isn't architecture-driven (SHIP-003 PR #1059's 339-tensor cosine sweep proved underlying weights are byte-equivalent to SafeTensors); (b) damps in one layer (one-off perturbation pattern, not stable feature); (c) mean shift -0.082 is 100× median magnitude, suggesting sign-bias defect not magnitude defect. §17.3 narrows §16.3's four candidates: layer-composition glue in forward_single_with_scratch at layer 3 FFN is "most likely". Three new §17.3 candidates added: Q4K dequant under load on 18944-dim FFN; SiLU numerical stability under SwiGLU `gate * silu(up)`; fused gate+up matvec dispatch defect (per CLAUDE.md FFN section). §17.4 specifies sub-layer bisection: emit gate_proj_out, silu(up_proj_out), gate_proj_out * silu(up_proj_out), down_proj_out separately. Whichever sub-tensor first shows the 31× std discontinuity vs GGUF path is the bug site. This requires the §15.5 TraceStep enum extension — now load-bearing for the fix. Spec v2.61.0 → v2.62.0. No coverage tally change. Methodologically: zero eprintln!, zero bash workarounds, third re-use of `apr trace --payload` primitive without modification (after §15 and §16). Per feedback_apr_trace_not_eprintln.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Captures the §17 falsifier evidence as raw artifacts: evidence/ship-007-layer-3-anomaly/ ├── apr-trace-payload-7b-2026-04-26.txt # 274 lines, all 28 layers ├── gguf-trace-payload-7b-2026-04-26.txt # 34 lines, final decode only └── discharge-evidence-v1.json # JSON summary Precise measurement: layer-3 ffn_out std = 11.459 / layer-2 ffn_out std = 0.216 → 53× spike (§17 stated 31×; actual ratio is even more extreme). The output residual stream's std jumps 0.7159 (layer 2) → 11.7756 (layer 3) → 25+ (layers 9-19) and never recovers below 13. This matches the realizar/aprender-serve CLAUDE.md FFN verification checklist note: "Verify FFN output doesn't cause catastrophic cancellation" — the layer-3 spike IS that catastrophic cancellation pattern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…t 17× anomaly site — spec v2.65.0 → v2.66.0 §17.4 specified the falsifier next step as sub-layer bisection of {ffn_gate_out, silu(g), silu(g)*u, ffn_down_out}. PR #1066 added the 4 new ActivationStats fields. §21 records the **first run of the bisection on the canonical 7B teacher**. ## What §21 contains (8 subsections) - §21.1 Live trace command + 10-line per-layer block - §21.2 Per-layer std table (28 layers × 6 fields) - §21.3 The first divergent sub-FFN slot is **ffn_swigl** (17.2× layer 2; ffn_silu shows 3.2× precursor; ffn_out shows 53× cascade) - §21.4 Why this matters — silu(g) and u individually normal at layer 3, but their elementwise product is 17× — implies an unusual positive correlation or alignment bug - §21.5 Refined surviving suspect surface — element-wise multiply correctness (`inference.rs:163`) + off-by-one slice indexing as newly-named candidate - §21.6 Falsifiable next step: GGUF-path sub-FFN telemetry, compare APR vs GGUF layer-3 ffn_swigl directly - §21.7 What §21 is NOT (doesn't pin to a code line yet, depends on PR #1066 in cascade) - §21.8 Methodological alignment (live-evidence pattern) ## Per-layer ffn_swigl progression (key data) | Layer | ffn_swigl std | |------:|--------------:| | 0 | 0.088 | | 1 | 0.061 | | 2 | 0.071 | | **3** | **1.222** | ← 17.2× layer 2 | 4 | 0.390 | | 5-25 | ~0.15-0.55 | | 26 | 1.452 | | 27 | 2.247 | Layer 3 stands out specifically — both above and below it, ffn_swigl is in the 0.06-0.55 band. The 1.22 value is anomalous. ## Bug surface narrowing (across §15→§16→§17→§21) - §15: candidate space = whole forward path - §15.4: GPU GQA attention kernel ELIMINATED - §16: GPU stack ELIMINATED (CPU APR vs CPU GGUF) - §17: layer 3 FFN sub-block named (53× ffn_out spike) - **§21: layer 3 ffn_swigl named** (17× spike, first anomaly site) The fix surface is now: `inference.rs:160-164`, specifically the `ffn_hidden.push(silu_g * u)` element-wise multiply. Spec v2.65.0 → v2.66.0. No coverage tally change — investigation- recording, not a discharge. Evidence persisted to: - evidence/ship-007-layer-3-anomaly/sub-ffn-bisection-2026-04-26.txt (386 lines) - evidence/ship-007-layer-3-anomaly/sub-ffn-per-layer-stds.csv Stacks under #1070 (§20) which is under #1068 (§19) which is under #1067 (§18) which is under #1064 (§17) which is under #1063 (§16). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…irmed APR-side at inference.rs:160-164 — spec v2.71.0 → v2.72.0 (#1084) Live evidence on noah-Lambda-Vector RTX 4090 2026-04-27. Built apr from PR #1083 branch (commits 77c016b + c657968 + f249464 from PR A+B+C cascade). Ran `apr trace --payload` on canonical 7B teacher in BOTH formats with identical prompt + tokenizer. Result: | Layer | APR ffn_swigl std | GGUF ffn_swigl std | Ratio | |------:|------------------:|-------------------:|------:| | 3 | 1.2216 | 0.0670 | 18.23x | §26.4 binding criterion threshold: ≥10x → APR-side bug. **Observed 18.23x — 8x past the threshold, decisive verdict.** The investigation chain that started in §15.4 (GPU GQA elimination) has reached its conclusion at §27: §15.4 → §16 → §17 → §23 → §27 (this) "Whole forward path" → "GPU eliminated" → "(layer=3, FFN sub-block)" → "(layer=3, ffn_swigl)" → "**APR-side at inference.rs:160-164**" Cascade-damping signature confirmed: - Layers 0-2: ratio ~1.1x (normal) - Layer 3: 18.23x (anomaly) - Layers 4-5: 3.3-4.5x (cascade) - Layer 6+: ~1x (recovered) This is consistent with a localized perturbation (off-by-one, buffer aliasing, or F32-vs-Q4K dequant defect at layer-3- specifically) rather than persistent residual-stream corruption. Per §17.5, SHIP-007 fix discharges 5 MODEL-1 PARTIALs at once (SHIP-002/005/006/007/008). §26.5 expected coverage flip: 33+12 → 28+17 when fix lands. §27 does NOT discharge by itself — it locates the bug for fixing. Next investigation reads `inference.rs:160-164` and tests 4 hypotheses: 1. Off-by-one slice indexing 2. Buffer aliasing (scratch reuse pattern) 3. F32-vs-Q4K dequant defect at layer-3 input range 4. Activation overflow (SiLU saturation amplifies multiply) Methodology held throughout: zero eprintln!, zero route-arounds, apr is canonical (§26.8), all instrumentation via `apr trace --payload`. Lambda-labs lane pre-authorized. Evidence persisted to evidence/ship-007-apr-vs-gguf-2026-04-27/: - apr-trace.txt (13.5 KB) - gguf-trace.txt (13.7 KB) - binding-criterion-summary.json Note: §27 reproduction requires PR #1081 + #1082 + #1083 cascade to merge first (the apr trace --payload <gguf> wiring is in PR C). Evidence was generated with a local build of PR #1083 branch. Spec v2.71.0 → v2.72.0. Coverage flip pending fix. Spec: SPEC-SHIP-TWO-001 §26.4 P3 verdict References: - §15.4 (PR #1062) — GPU GQA eliminated - §16 (PR #1063) — APR CPU isolated - §17 (PR #1064) — layer-3 FFN sub-block - §23 (PR #1075) — layer-3 ffn_swigl named - §26.8 (PR #1079) — apr-is-canonical methodology rule - PR #1081 (P3 PR A scaffold) - PR #1082 (P3 PR B sub-FFN populate) - PR #1083 (P3 PR C CLI wiring) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) April 26, 2026 06:36

noahgift mentioned this pull request Apr 26, 2026

contract(trace-ffn-sub-block-v1): pre-commit schema for sub-FFN telemetry (SHIP-007 load-bearing) #1065

Merged

4 tasks

noahgift force-pushed the docs/ship-007-layer-3-ffn-out-anomaly branch from 71753f2 to 80010cf Compare April 26, 2026 07:41

This was referenced Apr 26, 2026

docs(ship-two-001): §18 training status snapshot as chain-of-thought #1067

Closed

docs(ship-007): §21 sub-FFN bisection — layer-3 ffn_swigl first 17× anomaly site (v2.66.0) #1072

Closed

noahgift and others added 2 commits April 26, 2026 13:13

noahgift force-pushed the docs/ship-007-layer-3-ffn-out-anomaly branch from 2162c0e to ac06497 Compare April 26, 2026 11:13

Merge branch 'main' into docs/ship-007-layer-3-ffn-out-anomaly

d9ead49

noahgift merged commit 1dd6285 into main Apr 26, 2026
10 checks passed

noahgift deleted the docs/ship-007-layer-3-ffn-out-anomaly branch April 26, 2026 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(ship-007): §17 layer-3 ffn_out anomaly identified — first divergent layer named#1064

docs(ship-007): §17 layer-3 ffn_out anomaly identified — first divergent layer named#1064
noahgift merged 3 commits into
mainfrom
docs/ship-007-layer-3-ffn-out-anomaly

noahgift commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 26, 2026

Summary

What §17 contains

Why this matters

Stacks under

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant