docs(ship-007): §16 APR forward CPU path isolated as root cause by noahgift · Pull Request #1063 · paiml/aprender

noahgift · 2026-04-26T06:26:28Z

Summary

§16 of SHIP-TWO-001 spec records a major SHIP-007 root-cause narrowing — apr trace --payload on the canonical 7B Qwen2.5-Coder teacher in BOTH formats (APR vs GGUF), same prompt "What is 2+2?", same encoded tokens, same embedded BPE tokenizer, same CPU. APR forward returns top-1 token=220 (" "); GGUF forward returns " 2+2 is 4." (correct).
Combined with §15.4 (PR test(ship-007): Qwen2.5-Coder-7B GQA-7:1 CPU/GPU attention parity falsifier — kernel ruled out as root cause #1061 — GPU GQA-7:1 attention kernel ruled out), this eliminates the GPU stack, GQA attention kernel, tokenizer, loader-side data layout, Q4K dequantization, RMSNorm, and embedding lookup. Surviving suspects are exclusively in the APR-format CPU forward path.
Spec v2.59.0 → v2.61.0 (jumps v2.60.0; reserved for docs(ship-007): §15.4 falsifier RESULT — attention kernel ruled out as root cause (spec v2.59.0 → v2.60.0) #1062 conflict-merge). No coverage tally change — investigation-recording amendment, not rule promotion.

What §16 contains

§16.1 — full live trace evidence with both run outputs side-by-side
§16.2 — elimination table (7 suspects ruled out, with citations to feat(ship-003): FALSIFY-SHIP-003 DISCHARGED via apr diff 339-tensor cosine sweep (5th MODEL-1 of cycle, depends on PR #1058) #1059/test(ship-007): Qwen2.5-Coder-7B GQA-7:1 CPU/GPU attention parity falsifier — kernel ruled out as root cause #1061 and existing parity tests)
§16.3 — surviving suspects (layer-composition glue in forward_single_with_scratch, multi-layer KV cache layout, RoPE setup, LM head)
§16.4 — falsifiable next investigation step: apr trace --payload --layer 0 bisection across 28 layers (1-2 sessions, not multi-PR)
§16.5 — methodological continuation: zero eprintln! per feedback_apr_trace_not_eprintln.md

Why this matters

All 5 transitively-blocked MODEL-1 PARTIALs (SHIP-002/005/006/007/008) discharge once this single bug is fixed (per §15.7 blast-radius inventory). The next root-cause-fix PR is now much more focused: start with crates/aprender-serve/src/gguf/inference/forward/single_cache.rs and the APR-specific forward_single_with_scratch path.

Test plan

§16 added at end of spec, before END OF SPECIFICATION marker
Atomic-next-action banner updated v2.59.0 → v2.61.0
PMAT pre-commit gates pass (complexity, SATD, docs)
Investigation evidence is reproducible: apr trace <model.apr|.gguf> --payload

🤖 Generated with Claude Code

…cause — spec v2.59.0 → v2.61.0 Live `apr trace --payload` on the canonical paiml/qwen2.5-coder-7b-apache-q4k-v1 teacher (noah-Lambda-Vector RTX 4090, 2026-04-26) ran twice on CPU with the same prompt "What is 2+2?", same encoded tokens [3838, 374, 220, 17, 10, 17, 30], same embedded BPE tokenizer: APR teacher → top-1 token=220 (" "), logit=16.7368 ← WRONG GGUF teacher → " 2+2 is 4." ← CORRECT Combined with §15.4 (PR #1061 — GPU GQA-7:1 attention parity tests all PASS), this eliminates: GPU stack, GQA attention kernel, tokenizer, loader-side data layout, Q4K dequantization, RMSNorm, embedding lookup. Surviving suspects are all in the APR-format CPU forward path: - Layer-composition glue in forward_single_with_scratch - Multi-layer KV cache layout (across-layer indexing) - Position embedding (RoPE) layout / sin/cos cache - LM head projection §16.4 specifies the falsifiable next investigation step: `apr trace --payload --layer 0` bisection across 28 layers. 1-2 sessions task, not multi-PR. Whatever fix lands also discharges all 5 transitively-blocked MODEL-1 PARTIALs (SHIP-002/005/006/007/008) per §15.7's blast-radius inventory. Spec v2.59.0 → v2.61.0 (jumps v2.60.0; reserved for #1062 conflict-merge). No coverage tally change — investigation-recording amendment, not rule promotion. Methodological continuation per feedback_apr_trace_not_eprintln.md: zero eprintln! added, exact same `apr trace --payload` primitive used in §15. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…t 17× anomaly site — spec v2.65.0 → v2.66.0 §17.4 specified the falsifier next step as sub-layer bisection of {ffn_gate_out, silu(g), silu(g)*u, ffn_down_out}. PR #1066 added the 4 new ActivationStats fields. §21 records the **first run of the bisection on the canonical 7B teacher**. ## What §21 contains (8 subsections) - §21.1 Live trace command + 10-line per-layer block - §21.2 Per-layer std table (28 layers × 6 fields) - §21.3 The first divergent sub-FFN slot is **ffn_swigl** (17.2× layer 2; ffn_silu shows 3.2× precursor; ffn_out shows 53× cascade) - §21.4 Why this matters — silu(g) and u individually normal at layer 3, but their elementwise product is 17× — implies an unusual positive correlation or alignment bug - §21.5 Refined surviving suspect surface — element-wise multiply correctness (`inference.rs:163`) + off-by-one slice indexing as newly-named candidate - §21.6 Falsifiable next step: GGUF-path sub-FFN telemetry, compare APR vs GGUF layer-3 ffn_swigl directly - §21.7 What §21 is NOT (doesn't pin to a code line yet, depends on PR #1066 in cascade) - §21.8 Methodological alignment (live-evidence pattern) ## Per-layer ffn_swigl progression (key data) | Layer | ffn_swigl std | |------:|--------------:| | 0 | 0.088 | | 1 | 0.061 | | 2 | 0.071 | | **3** | **1.222** | ← 17.2× layer 2 | 4 | 0.390 | | 5-25 | ~0.15-0.55 | | 26 | 1.452 | | 27 | 2.247 | Layer 3 stands out specifically — both above and below it, ffn_swigl is in the 0.06-0.55 band. The 1.22 value is anomalous. ## Bug surface narrowing (across §15→§16→§17→§21) - §15: candidate space = whole forward path - §15.4: GPU GQA attention kernel ELIMINATED - §16: GPU stack ELIMINATED (CPU APR vs CPU GGUF) - §17: layer 3 FFN sub-block named (53× ffn_out spike) - **§21: layer 3 ffn_swigl named** (17× spike, first anomaly site) The fix surface is now: `inference.rs:160-164`, specifically the `ffn_hidden.push(silu_g * u)` element-wise multiply. Spec v2.65.0 → v2.66.0. No coverage tally change — investigation- recording, not a discharge. Evidence persisted to: - evidence/ship-007-layer-3-anomaly/sub-ffn-bisection-2026-04-26.txt (386 lines) - evidence/ship-007-layer-3-anomaly/sub-ffn-per-layer-stds.csv Stacks under #1070 (§20) which is under #1068 (§19) which is under #1067 (§18) which is under #1064 (§17) which is under #1063 (§16). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…irmed APR-side at inference.rs:160-164 — spec v2.71.0 → v2.72.0 (#1084) Live evidence on noah-Lambda-Vector RTX 4090 2026-04-27. Built apr from PR #1083 branch (commits 77c016b + c657968 + f249464 from PR A+B+C cascade). Ran `apr trace --payload` on canonical 7B teacher in BOTH formats with identical prompt + tokenizer. Result: | Layer | APR ffn_swigl std | GGUF ffn_swigl std | Ratio | |------:|------------------:|-------------------:|------:| | 3 | 1.2216 | 0.0670 | 18.23x | §26.4 binding criterion threshold: ≥10x → APR-side bug. **Observed 18.23x — 8x past the threshold, decisive verdict.** The investigation chain that started in §15.4 (GPU GQA elimination) has reached its conclusion at §27: §15.4 → §16 → §17 → §23 → §27 (this) "Whole forward path" → "GPU eliminated" → "(layer=3, FFN sub-block)" → "(layer=3, ffn_swigl)" → "**APR-side at inference.rs:160-164**" Cascade-damping signature confirmed: - Layers 0-2: ratio ~1.1x (normal) - Layer 3: 18.23x (anomaly) - Layers 4-5: 3.3-4.5x (cascade) - Layer 6+: ~1x (recovered) This is consistent with a localized perturbation (off-by-one, buffer aliasing, or F32-vs-Q4K dequant defect at layer-3- specifically) rather than persistent residual-stream corruption. Per §17.5, SHIP-007 fix discharges 5 MODEL-1 PARTIALs at once (SHIP-002/005/006/007/008). §26.5 expected coverage flip: 33+12 → 28+17 when fix lands. §27 does NOT discharge by itself — it locates the bug for fixing. Next investigation reads `inference.rs:160-164` and tests 4 hypotheses: 1. Off-by-one slice indexing 2. Buffer aliasing (scratch reuse pattern) 3. F32-vs-Q4K dequant defect at layer-3 input range 4. Activation overflow (SiLU saturation amplifies multiply) Methodology held throughout: zero eprintln!, zero route-arounds, apr is canonical (§26.8), all instrumentation via `apr trace --payload`. Lambda-labs lane pre-authorized. Evidence persisted to evidence/ship-007-apr-vs-gguf-2026-04-27/: - apr-trace.txt (13.5 KB) - gguf-trace.txt (13.7 KB) - binding-criterion-summary.json Note: §27 reproduction requires PR #1081 + #1082 + #1083 cascade to merge first (the apr trace --payload <gguf> wiring is in PR C). Evidence was generated with a local build of PR #1083 branch. Spec v2.71.0 → v2.72.0. Coverage flip pending fix. Spec: SPEC-SHIP-TWO-001 §26.4 P3 verdict References: - §15.4 (PR #1062) — GPU GQA eliminated - §16 (PR #1063) — APR CPU isolated - §17 (PR #1064) — layer-3 FFN sub-block - §23 (PR #1075) — layer-3 ffn_swigl named - §26.8 (PR #1079) — apr-is-canonical methodology rule - PR #1081 (P3 PR A scaffold) - PR #1082 (P3 PR B sub-FFN populate) - PR #1083 (P3 PR C CLI wiring) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) April 26, 2026 06:26

This was referenced Apr 26, 2026

docs(ship-007): §17 layer-3 ffn_out anomaly identified — first divergent layer named #1064

Merged

contract(trace-ffn-sub-block-v1): pre-commit schema for sub-FFN telemetry (SHIP-007 load-bearing) #1065

Merged

noahgift force-pushed the docs/ship-007-apr-forward-isolation branch from f1f01aa to dcb6124 Compare April 26, 2026 07:41

noahgift mentioned this pull request Apr 26, 2026

docs(ship-two-001): §18 training status snapshot as chain-of-thought #1067

Closed

4 tasks

noahgift added 2 commits April 26, 2026 11:43

Merge branch 'main' into docs/ship-007-apr-forward-isolation

b868077

Merge branch 'main' into docs/ship-007-apr-forward-isolation

c71f79c

noahgift mentioned this pull request Apr 26, 2026

docs(ship-007): §21 sub-FFN bisection — layer-3 ffn_swigl first 17× anomaly site (v2.66.0) #1072

Closed

4 tasks

noahgift merged commit 71c1152 into main Apr 26, 2026
10 checks passed

noahgift deleted the docs/ship-007-apr-forward-isolation branch April 26, 2026 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(ship-007): §16 APR forward CPU path isolated as root cause#1063

docs(ship-007): §16 APR forward CPU path isolated as root cause#1063
noahgift merged 3 commits into
mainfrom
docs/ship-007-apr-forward-isolation

noahgift commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 26, 2026

Summary

What §16 contains

Why this matters

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant