docs(ship-two-001): §30 — PR E investigation refutes §28 narrow hypothesis — v2.74.0 → v2.75.0 by noahgift · Pull Request #1088 · paiml/aprender

noahgift · 2026-04-27T13:32:40Z

Summary

Live diagnostic on RTX 4090 against canonical 7B teacher refutes §28's narrow root-cause hypothesis for SHIP-007. Two empirical findings prove the proposed PR E fix would NOT close the 9× layer-0 qkv divergence:

q4k_layers IS fully populated for all 28 layers — §28.4's option (a) ("preserve Q4K bytes") is already shipped.
APR's F32 fused-qkv weight ≡ Q4K dispatch within Q4K rounding (max |diff|=0.005, RMS=0.0007 on layer-0 Q-projection). Switching the matmul kernel would change <0.5% of std.

The 9× layer-0 qkv std gap (APR=10.33 vs GGUF=1.14) is REAL but lives upstream of the gate matmul. Three candidate sites for §30.4 bisection: qkv_bias add, RoPE precision, per-head Q/K RMSNorm.

PR E paused. Spec v2.74.0 → v2.75.0. Coverage scoreboard unchanged (15+33).

Files

Spec: §30 added (~80 lines)
- 30.1 Diagnostic evidence
- 30.2 What §28 got right and got wrong
- 30.3 What's still load-bearing
- 30.4 Falsifiable next investigation step
- 30.5 Coverage scoreboard (unchanged)
- 30.6 Methodology note — investigative falsification IS the discharge
Diagnostics (re-runnable on noah-Lambda-Vector):
- crates/aprender-serve/examples/check_q4k_population.rs
- crates/aprender-serve/examples/diag_apr_qkv_layer0.rs
Evidence (live RTX 4090 output):
- evidence/ship-007-pr-e-investigation-2026-04-27/findings.md
- evidence/ship-007-pr-e-investigation-2026-04-27/check_q4k_population.txt
- evidence/ship-007-pr-e-investigation-2026-04-27/diag_apr_qkv_layer0.txt

Methodology

Per feedback_fix_root_cause_never_route_around.md: the §28 mechanical fix would have route-around'd a real bug because the named site (matmul kernel) is not where the divergence originates. This empirical refutation is itself a coverage-protecting artifact — it prevents a no-op fix from shipping. The Toyota Way is to bisect upstream, not flip the kernel call.

Test plan

pv validate continues to pass for unchanged contracts
Diagnostic examples build with cargo build --release -p aprender-serve --example check_q4k_population --example diag_apr_qkv_layer0
Both diagnostics ran live on noah-Lambda-Vector RTX 4090; output captured in evidence/
Spec v2.75.0 self-consistent with §29 scoreboard

Next session

Per §30.4, capture qkv tensor at three points (post-matmul / post-bias / post-RoPE) for layer 0 in BOTH APR and GGUF forward paths. Whichever bisection point shows the 9× std gap is the actual fix surface. Then PR E can be written.

🤖 Generated with Claude Code

…hesis — spec v2.74.0 → v2.75.0 Live diagnostic on noah-Lambda-Vector RTX 4090 against canonical 7B teacher (/mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr) shows that §28's mechanical "replace helpers::f32_matmul with Q4K-fused dispatch" fix would NOT close the 9× layer-0 qkv std gap that propagates to layer 3's 18.23× ffn_swigl ratio. Two empirical findings refute the narrow hypothesis: 1. q4k_layers IS fully populated for all 28 layers (Q=7,225,344 bytes per layer, K=V=1,032,192 each, gate=up=down=38,191,104 each) — §28.4 option (a) is already shipped. 2. APR's F32 fused-qkv weight (load_qkv_weight at mod_dequant_q4k_apr.rs:181) is numerically equivalent to per-Q/K/V Q4K dispatch within Q4K rounding: max |diff|=0.005294, RMS diff=0.000673 on layer-0 Q-projection. The matmul kernel switch in §28 would change <0.5% of std. The 9× layer-0 qkv divergence (APR=10.33 vs GGUF=1.14) is REAL but lives upstream of the gate matmul. Three candidate sites for §30.4 bisection: - qkv_bias add at pmat-260.rs:332-334 (mean shift APR=0.2559 vs GGUF=-0.0163 is suggestive) - RoPE precision at pmat-260.rs:377-378 apply_rope_f32 - Per-head Q/K RMSNorm at pmat-260.rs:359-374 (should be skipped for Qwen2.5-7B but worth verifying) PR E paused. Coverage scoreboard unchanged at 15+33 (§29 §30.5). Files added: - crates/aprender-serve/examples/check_q4k_population.rs (q4k_layers dump) - crates/aprender-serve/examples/diag_apr_qkv_layer0.rs (F32 vs Q4K compare) - evidence/ship-007-pr-e-investigation-2026-04-27/findings.md (full analysis) - evidence/ship-007-pr-e-investigation-2026-04-27/check_q4k_population.txt - evidence/ship-007-pr-e-investigation-2026-04-27/diag_apr_qkv_layer0.txt Spec amendment: - §30 — PR E investigation refutes §28 narrow hypothesis - 30.1 Diagnostic evidence - 30.2 What §28 got right and got wrong - 30.3 What's still load-bearing - 30.4 Falsifiable next investigation step - 30.5 Coverage scoreboard (unchanged) - 30.6 Methodology note — investigative falsification IS the discharge - Header: v2.74.0 → v2.75.0 Methodology: per feedback_fix_root_cause_never_route_around.md, the §28 mechanical fix would have route-around'd a real bug because the named site (matmul kernel) is not where the divergence originates. The empirical refutation is itself a coverage-protecting artifact — it prevents a no-op fix from shipping. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) April 27, 2026 13:32

noahgift merged commit d906d74 into main Apr 27, 2026
11 checks passed

noahgift deleted the docs/spec-30-pr-e-investigation-refutes-s28 branch April 27, 2026 13:56

noahgift mentioned this pull request Apr 27, 2026

docs(ship-two-001): §28 — SHIP-007 root cause REFINED to F32 vs Q4K matmul precision mismatch — spec v2.72.0 → v2.73.0 #1085

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(ship-two-001): §30 — PR E investigation refutes §28 narrow hypothesis — v2.74.0 → v2.75.0#1088

docs(ship-two-001): §30 — PR E investigation refutes §28 narrow hypothesis — v2.74.0 → v2.75.0#1088
noahgift merged 1 commit into
mainfrom
docs/spec-30-pr-e-investigation-refutes-s28

noahgift commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 27, 2026

Summary

Files

Methodology

Test plan

Next session

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant