Skip to content

docs(ship-two-001): §30 — PR E investigation refutes §28 narrow hypothesis — v2.74.0 → v2.75.0#1088

Merged
noahgift merged 1 commit into
mainfrom
docs/spec-30-pr-e-investigation-refutes-s28
Apr 27, 2026
Merged

docs(ship-two-001): §30 — PR E investigation refutes §28 narrow hypothesis — v2.74.0 → v2.75.0#1088
noahgift merged 1 commit into
mainfrom
docs/spec-30-pr-e-investigation-refutes-s28

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Live diagnostic on RTX 4090 against canonical 7B teacher refutes §28's narrow root-cause hypothesis for SHIP-007. Two empirical findings prove the proposed PR E fix would NOT close the 9× layer-0 qkv divergence:

  1. q4k_layers IS fully populated for all 28 layers — §28.4's option (a) ("preserve Q4K bytes") is already shipped.
  2. APR's F32 fused-qkv weight ≡ Q4K dispatch within Q4K rounding (max |diff|=0.005, RMS=0.0007 on layer-0 Q-projection). Switching the matmul kernel would change <0.5% of std.

The 9× layer-0 qkv std gap (APR=10.33 vs GGUF=1.14) is REAL but lives upstream of the gate matmul. Three candidate sites for §30.4 bisection: qkv_bias add, RoPE precision, per-head Q/K RMSNorm.

PR E paused. Spec v2.74.0 → v2.75.0. Coverage scoreboard unchanged (15+33).

Files

  • Spec: §30 added (~80 lines)
    • 30.1 Diagnostic evidence
    • 30.2 What §28 got right and got wrong
    • 30.3 What's still load-bearing
    • 30.4 Falsifiable next investigation step
    • 30.5 Coverage scoreboard (unchanged)
    • 30.6 Methodology note — investigative falsification IS the discharge
  • Diagnostics (re-runnable on noah-Lambda-Vector):
    • crates/aprender-serve/examples/check_q4k_population.rs
    • crates/aprender-serve/examples/diag_apr_qkv_layer0.rs
  • Evidence (live RTX 4090 output):
    • evidence/ship-007-pr-e-investigation-2026-04-27/findings.md
    • evidence/ship-007-pr-e-investigation-2026-04-27/check_q4k_population.txt
    • evidence/ship-007-pr-e-investigation-2026-04-27/diag_apr_qkv_layer0.txt

Methodology

Per feedback_fix_root_cause_never_route_around.md: the §28 mechanical fix would have route-around'd a real bug because the named site (matmul kernel) is not where the divergence originates. This empirical refutation is itself a coverage-protecting artifact — it prevents a no-op fix from shipping. The Toyota Way is to bisect upstream, not flip the kernel call.

Test plan

  • pv validate continues to pass for unchanged contracts
  • Diagnostic examples build with cargo build --release -p aprender-serve --example check_q4k_population --example diag_apr_qkv_layer0
  • Both diagnostics ran live on noah-Lambda-Vector RTX 4090; output captured in evidence/
  • Spec v2.75.0 self-consistent with §29 scoreboard

Next session

Per §30.4, capture qkv tensor at three points (post-matmul / post-bias / post-RoPE) for layer 0 in BOTH APR and GGUF forward paths. Whichever bisection point shows the 9× std gap is the actual fix surface. Then PR E can be written.

🤖 Generated with Claude Code

…hesis — spec v2.74.0 → v2.75.0

Live diagnostic on noah-Lambda-Vector RTX 4090 against canonical 7B teacher
(/mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr) shows
that §28's mechanical "replace helpers::f32_matmul with Q4K-fused dispatch"
fix would NOT close the 9× layer-0 qkv std gap that propagates to layer 3's
18.23× ffn_swigl ratio.

Two empirical findings refute the narrow hypothesis:

1. q4k_layers IS fully populated for all 28 layers (Q=7,225,344 bytes per
   layer, K=V=1,032,192 each, gate=up=down=38,191,104 each) —
   §28.4 option (a) is already shipped.

2. APR's F32 fused-qkv weight (load_qkv_weight at mod_dequant_q4k_apr.rs:181)
   is numerically equivalent to per-Q/K/V Q4K dispatch within Q4K rounding:
   max |diff|=0.005294, RMS diff=0.000673 on layer-0 Q-projection. The
   matmul kernel switch in §28 would change <0.5% of std.

The 9× layer-0 qkv divergence (APR=10.33 vs GGUF=1.14) is REAL but lives
upstream of the gate matmul. Three candidate sites for §30.4 bisection:

- qkv_bias add at pmat-260.rs:332-334 (mean shift APR=0.2559 vs GGUF=-0.0163
  is suggestive)
- RoPE precision at pmat-260.rs:377-378 apply_rope_f32
- Per-head Q/K RMSNorm at pmat-260.rs:359-374 (should be skipped for
  Qwen2.5-7B but worth verifying)

PR E paused. Coverage scoreboard unchanged at 15+33 (§29 §30.5).

Files added:
- crates/aprender-serve/examples/check_q4k_population.rs (q4k_layers dump)
- crates/aprender-serve/examples/diag_apr_qkv_layer0.rs (F32 vs Q4K compare)
- evidence/ship-007-pr-e-investigation-2026-04-27/findings.md (full analysis)
- evidence/ship-007-pr-e-investigation-2026-04-27/check_q4k_population.txt
- evidence/ship-007-pr-e-investigation-2026-04-27/diag_apr_qkv_layer0.txt

Spec amendment:
- §30 — PR E investigation refutes §28 narrow hypothesis
  - 30.1 Diagnostic evidence
  - 30.2 What §28 got right and got wrong
  - 30.3 What's still load-bearing
  - 30.4 Falsifiable next investigation step
  - 30.5 Coverage scoreboard (unchanged)
  - 30.6 Methodology note — investigative falsification IS the discharge
- Header: v2.74.0 → v2.75.0

Methodology: per feedback_fix_root_cause_never_route_around.md, the §28
mechanical fix would have route-around'd a real bug because the named site
(matmul kernel) is not where the divergence originates. The empirical
refutation is itself a coverage-protecting artifact — it prevents a no-op
fix from shipping.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) April 27, 2026 13:32
@noahgift noahgift merged commit d906d74 into main Apr 27, 2026
11 checks passed
@noahgift noahgift deleted the docs/spec-30-pr-e-investigation-refutes-s28 branch April 27, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant