feat(M-FFN-GGUF-3): heavy APR-vs-GGUF layer-3 ffn_swigl diff harness — SHIP-007 H1/H2 bisection#1533
Merged
Conversation
…— SHIP-007 H1/H2 bisection Authors the heavy comparison harness pinned in contract trace-ffn-sub-block-gguf-v1 v1.0.0 step M-FFN-GGUF-3 (M88 PR #1532 squash ca03361). Mirrors the M80 skip-if-not-present pattern from `qwen3_moe_gpu_per_stage_diff::falsify_moe_sub_002_*`. What it does: - Loads APR 7B teacher via AprTransformer::from_apr_file - Loads same model GGUF via OwnedQuantizedModel::from_mapped - Runs forward_traced on both with the same canonical SHIP-007 prompt ("What is 2+2?") - Extracts per-layer ffn_swiglu_inner_stats.std_dev for each - Computes ratio at layer 3 (the §21 anomaly site) - Reports verdict: - H1 (ratio in [0.5, 2.0]): NORMAL model behavior — SHIP-007 root cause is ELSEWHERE (lm_head / post-FFN residual / token- position correlation) - H2 (ratio outside band): APR-side bug — fix at `inference.rs:160-164` swigl elementwise multiply Discharges FALSIFY-FFN-GGUF-003 at the algorithm level: the harness EXISTS and produces a single H1/H2 verdict on operator dispatch. Full DISCHARGED requires the operator to run with canonical files present and capture the verdict in evidence. `#[ignore]`-gated; skips cleanly if either canonical 7B .apr or .gguf is missing on the host (verified locally — host has the .gguf at /mnt/nvme-raid0/models/ship-two-001/ but no .apr; test ran clean and skipped). Contract amendment: M-FFN-GGUF-3 status PENDING → ALGORITHM_LEVEL_DISCHARGED. Operator workflow documented inline. No production hot path touched — additive test-only file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
b4d6732 to
942729b
Compare
3 tasks
noahgift
added a commit
that referenced
this pull request
May 6, 2026
… integrated, M-FFN-GGUF-3 DISCHARGED (#1534) Same-day post-M88+M89 follow-up: ship-two-models-spec.md v2.72.0 §27 records that the H1/H2 bisection has ALREADY been LIVE-run on noah-Lambda-Vector RTX 4090 on 2026-04-27 (built `apr` from PR #1083 branch + commits 77c016b + c657968 + f249464): APR layer-3 ffn_swigl std = 1.2216 GGUF layer-3 ffn_swigl std = 0.0670 Ratio = 18.23× Verdict = H2 CONFIRMED (APR-side bug) This far exceeds the §26.4 ≥10× threshold by 8× absolute. Status promotions in v1.1.0: - M-FFN-GGUF-3 implementation_stage: ALGORITHM_LEVEL_DISCHARGED → DISCHARGED - FALSIFY-FFN-GGUF-003: PROPOSED → DISCHARGED - contract metadata.status: PROPOSED → ACTIVE_ALGORITHM_LEVEL The M89 PR #1533 harness (falsify_ffn_gguf_003_layer_3_swigl_h1_h2_bisection) adds regression-test coverage for any future re-run; the §27 data remains the canonical operator-dispatched discharge proof. Only M-FFN-GGUF-4 (SHIP-007 fix PR) remains PENDING — gated on engineering investigation of `inference.rs` SwiGLU site (line shifted to 298-302 post sub-FFN telemetry from §22 spec authoring at :160-164). 3 candidate hypotheses for the layer-3-specific behavior within the SwiGLU block authored in v1.1.0 amendment for M-FFN-GGUF-4 investigation: - H2a: Buffer aliasing / scratch-buffer corruption in APR multi-token - H2b: Layer-3-specific upstream divergence (gate or up at L3 only) - H2c: Quantization dequant alignment differs at certain layer configs YAML-only — production hot paths byte-unchanged (this amendment records pre-existing §27 evidence + corrects status drift). Methodology lesson #2 firing in retrospect: had I grep'd the spec for §22 / §27 BEFORE authoring M88's contract scaffold, the M-FFN-GGUF-3 status would have been DISCHARGED at v1.0.0 instead of needing this v1.1.0 follow-up amendment. `pv validate` 0/0. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Authors the heavy comparison harness pinned in contract
trace-ffn-sub-block-gguf-v1v1.0.0 step M-FFN-GGUF-3 (M88 PR #1532). The harness produces the H1 vs H2 verdict for SHIP-007 layer-3 ffn_swigl bisection on operator dispatch.Mirrors the proven M80 skip-if-not-present pattern from
qwen3_moe_gpu_per_stage_diff::falsify_moe_sub_002_*.What it does
H1 vs H2 verdict logic
[0.5, 2.0]inference.rs:160-164swigl elementwise multiplyEither verdict is a valid outcome — only FALSIFY-FFN-GGUF-004 requires the operator to take action with the result (write the SHIP-007 root-cause fix PR citing H1 or H2).
Skip-if-not-present
#[ignore]-gated; skips cleanly if either canonical 7B.apror.ggufis missing on the host:Verified locally — this host has the
.ggufat/mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.ggufbut no canonical.apr; test ran clean and skipped.Contract amendment
M-FFN-GGUF-3: PENDING → ALGORITHM_LEVEL_DISCHARGED
The harness EXISTS and produces a verdict on operator dispatch. Full DISCHARGED requires operator to run with canonical files present and capture the verdict in
evidence/ship-007-layer-3-h1-h2-bisection-{date}/.Cascade state after this PR
Operator workflow (next deliberate-session)
Test plan
pv validate0/0🤖 Generated with Claude Code