Skip to content

feat(M-FFN-GGUF-3): heavy APR-vs-GGUF layer-3 ffn_swigl diff harness — SHIP-007 H1/H2 bisection#1533

Merged
noahgift merged 1 commit into
mainfrom
feat/m-ffn-gguf-3-heavy-comparison-harness
May 6, 2026
Merged

feat(M-FFN-GGUF-3): heavy APR-vs-GGUF layer-3 ffn_swigl diff harness — SHIP-007 H1/H2 bisection#1533
noahgift merged 1 commit into
mainfrom
feat/m-ffn-gguf-3-heavy-comparison-harness

Conversation

@noahgift

@noahgift noahgift commented May 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Authors the heavy comparison harness pinned in contract trace-ffn-sub-block-gguf-v1 v1.0.0 step M-FFN-GGUF-3 (M88 PR #1532). The harness produces the H1 vs H2 verdict for SHIP-007 layer-3 ffn_swigl bisection on operator dispatch.

Mirrors the proven M80 skip-if-not-present pattern from qwen3_moe_gpu_per_stage_diff::falsify_moe_sub_002_*.

What it does

let apr = AprTransformer::from_apr_file(apr_path)?;
let mapped = MappedGGUFModel::from_path(gguf_path)?;
let tokens = mapped.model.encode("What is 2+2?")?;
let apr_trace = apr.forward_traced(&tokens)?;
let gguf_model = OwnedQuantizedModel::from_mapped(&mapped)?;
let gguf_trace = gguf_model.forward_traced(&tokens)?;

// Per-layer table + ratio at layer 3
ratio = apr_layer_3.ffn_swiglu_inner_stats.std_dev / gguf_layer_3.ffn_swiglu_inner_stats.std_dev

H1 vs H2 verdict logic

Ratio Verdict Implication
[0.5, 2.0] H1 confirmed NORMAL model behavior; SHIP-007 root cause is ELSEWHERE (lm_head / post-FFN residual / token-position correlation)
outside band H2 confirmed APR-side bug; fix at inference.rs:160-164 swigl elementwise multiply

Either verdict is a valid outcome — only FALSIFY-FFN-GGUF-004 requires the operator to take action with the result (write the SHIP-007 root-cause fix PR citing H1 or H2).

Skip-if-not-present

#[ignore]-gated; skips cleanly if either canonical 7B .apr or .gguf is missing on the host:

$ cargo test -p aprender-serve --test ffn_gguf_apr_layer_3_swigl_diff -- --include-ignored --nocapture
M-FFN-GGUF-3 layer-3 swigl diff: skipped — no canonical 7B APR teacher in [...]
test result: ok. 1 passed; 0 failed; 0 ignored

Verified locally — this host has the .gguf at /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.gguf but no canonical .apr; test ran clean and skipped.

Contract amendment

M-FFN-GGUF-3: PENDING → ALGORITHM_LEVEL_DISCHARGED

The harness EXISTS and produces a verdict on operator dispatch. Full DISCHARGED requires operator to run with canonical files present and capture the verdict in evidence/ship-007-layer-3-h1-h2-bisection-{date}/.

Cascade state after this PR

Stage Status
M-FFN-GGUF-0 SHIPPED (M88 — contract scaffold)
M-FFN-GGUF-1 SHIPPED (LayerActivation re-export)
M-FFN-GGUF-2 SHIPPED (PRs #1081 + #1082)
M-FFN-GGUF-3 ALGORITHM_LEVEL_DISCHARGED (this PR — harness exists, full DISCHARGED awaits operator dispatch)
M-FFN-GGUF-4 PENDING (SHIP-007 fix PR cites H1 or H2)

Operator workflow (next deliberate-session)

# Ensure canonical 7B .apr is on host (apr pull or copy from teacher source)
ls /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-apache-q4k-v1.apr

# Run the harness
cargo test -p aprender-serve --test ffn_gguf_apr_layer_3_swigl_diff \
    -- --include-ignored --nocapture

# Capture verdict in evidence/ship-007-layer-3-h1-h2-bisection-2026-MM-DD/findings.md
# Author M-FFN-GGUF-4 SHIP-007 fix PR citing H1 or H2 by name

Test plan

  • Compiles + lists 1 test
  • Skips cleanly when files missing (host: 1/2 files present)
  • No production code touched (additive test-only file)
  • Contract amendment validates pv validate 0/0
  • Mirrors M80 skip-if-not-present pattern verbatim

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) May 6, 2026 12:15
…— SHIP-007 H1/H2 bisection

Authors the heavy comparison harness pinned in contract
trace-ffn-sub-block-gguf-v1 v1.0.0 step M-FFN-GGUF-3 (M88 PR #1532
squash ca03361). Mirrors the M80 skip-if-not-present pattern
from `qwen3_moe_gpu_per_stage_diff::falsify_moe_sub_002_*`.

What it does:
- Loads APR 7B teacher via AprTransformer::from_apr_file
- Loads same model GGUF via OwnedQuantizedModel::from_mapped
- Runs forward_traced on both with the same canonical SHIP-007 prompt
  ("What is 2+2?")
- Extracts per-layer ffn_swiglu_inner_stats.std_dev for each
- Computes ratio at layer 3 (the §21 anomaly site)
- Reports verdict:
  - H1 (ratio in [0.5, 2.0]): NORMAL model behavior — SHIP-007
    root cause is ELSEWHERE (lm_head / post-FFN residual / token-
    position correlation)
  - H2 (ratio outside band): APR-side bug — fix at
    `inference.rs:160-164` swigl elementwise multiply

Discharges FALSIFY-FFN-GGUF-003 at the algorithm level: the
harness EXISTS and produces a single H1/H2 verdict on operator
dispatch. Full DISCHARGED requires the operator to run with
canonical files present and capture the verdict in evidence.

`#[ignore]`-gated; skips cleanly if either canonical 7B .apr or
.gguf is missing on the host (verified locally — host has the
.gguf at /mnt/nvme-raid0/models/ship-two-001/ but no .apr; test
ran clean and skipped).

Contract amendment: M-FFN-GGUF-3 status PENDING →
ALGORITHM_LEVEL_DISCHARGED. Operator workflow documented inline.

No production hot path touched — additive test-only file.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/m-ffn-gguf-3-heavy-comparison-harness branch from b4d6732 to 942729b Compare May 6, 2026 12:46
@noahgift noahgift merged commit 009d8b3 into main May 6, 2026
10 checks passed
@noahgift noahgift deleted the feat/m-ffn-gguf-3-heavy-comparison-harness branch May 6, 2026 13:09
noahgift added a commit that referenced this pull request May 6, 2026
… integrated, M-FFN-GGUF-3 DISCHARGED (#1534)

Same-day post-M88+M89 follow-up: ship-two-models-spec.md v2.72.0
§27 records that the H1/H2 bisection has ALREADY been LIVE-run on
noah-Lambda-Vector RTX 4090 on 2026-04-27 (built `apr` from PR
#1083 branch + commits 77c016b + c657968 + f249464):

  APR layer-3 ffn_swigl std  = 1.2216
  GGUF layer-3 ffn_swigl std = 0.0670
  Ratio                       = 18.23×
  Verdict                     = H2 CONFIRMED (APR-side bug)

This far exceeds the §26.4 ≥10× threshold by 8× absolute.

Status promotions in v1.1.0:
- M-FFN-GGUF-3 implementation_stage: ALGORITHM_LEVEL_DISCHARGED → DISCHARGED
- FALSIFY-FFN-GGUF-003: PROPOSED → DISCHARGED
- contract metadata.status: PROPOSED → ACTIVE_ALGORITHM_LEVEL

The M89 PR #1533 harness (falsify_ffn_gguf_003_layer_3_swigl_h1_h2_bisection)
adds regression-test coverage for any future re-run; the §27 data
remains the canonical operator-dispatched discharge proof.

Only M-FFN-GGUF-4 (SHIP-007 fix PR) remains PENDING — gated on
engineering investigation of `inference.rs` SwiGLU site (line
shifted to 298-302 post sub-FFN telemetry from §22 spec authoring
at :160-164).

3 candidate hypotheses for the layer-3-specific behavior within
the SwiGLU block authored in v1.1.0 amendment for M-FFN-GGUF-4
investigation:
- H2a: Buffer aliasing / scratch-buffer corruption in APR multi-token
- H2b: Layer-3-specific upstream divergence (gate or up at L3 only)
- H2c: Quantization dequant alignment differs at certain layer configs

YAML-only — production hot paths byte-unchanged (this amendment
records pre-existing §27 evidence + corrects status drift).

Methodology lesson #2 firing in retrospect: had I grep'd the spec
for §22 / §27 BEFORE authoring M88's contract scaffold, the
M-FFN-GGUF-3 status would have been DISCHARGED at v1.0.0 instead
of needing this v1.1.0 follow-up amendment.

`pv validate` 0/0.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant