docs(ship-two-001): §30 — PR E investigation refutes §28 narrow hypothesis — v2.74.0 → v2.75.0#1088
Merged
Conversation
…hesis — spec v2.74.0 → v2.75.0 Live diagnostic on noah-Lambda-Vector RTX 4090 against canonical 7B teacher (/mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr) shows that §28's mechanical "replace helpers::f32_matmul with Q4K-fused dispatch" fix would NOT close the 9× layer-0 qkv std gap that propagates to layer 3's 18.23× ffn_swigl ratio. Two empirical findings refute the narrow hypothesis: 1. q4k_layers IS fully populated for all 28 layers (Q=7,225,344 bytes per layer, K=V=1,032,192 each, gate=up=down=38,191,104 each) — §28.4 option (a) is already shipped. 2. APR's F32 fused-qkv weight (load_qkv_weight at mod_dequant_q4k_apr.rs:181) is numerically equivalent to per-Q/K/V Q4K dispatch within Q4K rounding: max |diff|=0.005294, RMS diff=0.000673 on layer-0 Q-projection. The matmul kernel switch in §28 would change <0.5% of std. The 9× layer-0 qkv divergence (APR=10.33 vs GGUF=1.14) is REAL but lives upstream of the gate matmul. Three candidate sites for §30.4 bisection: - qkv_bias add at pmat-260.rs:332-334 (mean shift APR=0.2559 vs GGUF=-0.0163 is suggestive) - RoPE precision at pmat-260.rs:377-378 apply_rope_f32 - Per-head Q/K RMSNorm at pmat-260.rs:359-374 (should be skipped for Qwen2.5-7B but worth verifying) PR E paused. Coverage scoreboard unchanged at 15+33 (§29 §30.5). Files added: - crates/aprender-serve/examples/check_q4k_population.rs (q4k_layers dump) - crates/aprender-serve/examples/diag_apr_qkv_layer0.rs (F32 vs Q4K compare) - evidence/ship-007-pr-e-investigation-2026-04-27/findings.md (full analysis) - evidence/ship-007-pr-e-investigation-2026-04-27/check_q4k_population.txt - evidence/ship-007-pr-e-investigation-2026-04-27/diag_apr_qkv_layer0.txt Spec amendment: - §30 — PR E investigation refutes §28 narrow hypothesis - 30.1 Diagnostic evidence - 30.2 What §28 got right and got wrong - 30.3 What's still load-bearing - 30.4 Falsifiable next investigation step - 30.5 Coverage scoreboard (unchanged) - 30.6 Methodology note — investigative falsification IS the discharge - Header: v2.74.0 → v2.75.0 Methodology: per feedback_fix_root_cause_never_route_around.md, the §28 mechanical fix would have route-around'd a real bug because the named site (matmul kernel) is not where the divergence originates. The empirical refutation is itself a coverage-protecting artifact — it prevents a no-op fix from shipping. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Live diagnostic on RTX 4090 against canonical 7B teacher refutes §28's narrow root-cause hypothesis for SHIP-007. Two empirical findings prove the proposed PR E fix would NOT close the 9× layer-0 qkv divergence:
q4k_layersIS fully populated for all 28 layers — §28.4's option (a) ("preserve Q4K bytes") is already shipped.The 9× layer-0 qkv std gap (APR=10.33 vs GGUF=1.14) is REAL but lives upstream of the gate matmul. Three candidate sites for §30.4 bisection:
qkv_biasadd, RoPE precision, per-head Q/K RMSNorm.PR E paused. Spec v2.74.0 → v2.75.0. Coverage scoreboard unchanged (15+33).
Files
crates/aprender-serve/examples/check_q4k_population.rscrates/aprender-serve/examples/diag_apr_qkv_layer0.rsevidence/ship-007-pr-e-investigation-2026-04-27/findings.mdevidence/ship-007-pr-e-investigation-2026-04-27/check_q4k_population.txtevidence/ship-007-pr-e-investigation-2026-04-27/diag_apr_qkv_layer0.txtMethodology
Per
feedback_fix_root_cause_never_route_around.md: the §28 mechanical fix would have route-around'd a real bug because the named site (matmul kernel) is not where the divergence originates. This empirical refutation is itself a coverage-protecting artifact — it prevents a no-op fix from shipping. The Toyota Way is to bisect upstream, not flip the kernel call.Test plan
pv validatecontinues to pass for unchanged contractscargo build --release -p aprender-serve --example check_q4k_population --example diag_apr_qkv_layer0Next session
Per §30.4, capture qkv tensor at three points (post-matmul / post-bias / post-RoPE) for layer 0 in BOTH APR and GGUF forward paths. Whichever bisection point shows the 9× std gap is the actual fix surface. Then PR E can be written.
🤖 Generated with Claude Code