contract(trace-ffn-sub-block-gguf-v1): v1.0.0 → v1.1.0 — §27 evidence integrated, M-FFN-GGUF-3 DISCHARGED#1534
Merged
Conversation
… integrated, M-FFN-GGUF-3 DISCHARGED Same-day post-M88+M89 follow-up: ship-two-models-spec.md v2.72.0 §27 records that the H1/H2 bisection has ALREADY been LIVE-run on noah-Lambda-Vector RTX 4090 on 2026-04-27 (built `apr` from PR #1083 branch + commits 77c016b + c657968 + f249464): APR layer-3 ffn_swigl std = 1.2216 GGUF layer-3 ffn_swigl std = 0.0670 Ratio = 18.23× Verdict = H2 CONFIRMED (APR-side bug) This far exceeds the §26.4 ≥10× threshold by 8× absolute. Status promotions in v1.1.0: - M-FFN-GGUF-3 implementation_stage: ALGORITHM_LEVEL_DISCHARGED → DISCHARGED - FALSIFY-FFN-GGUF-003: PROPOSED → DISCHARGED - contract metadata.status: PROPOSED → ACTIVE_ALGORITHM_LEVEL The M89 PR #1533 harness (falsify_ffn_gguf_003_layer_3_swigl_h1_h2_bisection) adds regression-test coverage for any future re-run; the §27 data remains the canonical operator-dispatched discharge proof. Only M-FFN-GGUF-4 (SHIP-007 fix PR) remains PENDING — gated on engineering investigation of `inference.rs` SwiGLU site (line shifted to 298-302 post sub-FFN telemetry from §22 spec authoring at :160-164). 3 candidate hypotheses for the layer-3-specific behavior within the SwiGLU block authored in v1.1.0 amendment for M-FFN-GGUF-4 investigation: - H2a: Buffer aliasing / scratch-buffer corruption in APR multi-token - H2b: Layer-3-specific upstream divergence (gate or up at L3 only) - H2c: Quantization dequant alignment differs at certain layer configs YAML-only — production hot paths byte-unchanged (this amendment records pre-existing §27 evidence + corrects status drift). Methodology lesson #2 firing in retrospect: had I grep'd the spec for §22 / §27 BEFORE authoring M88's contract scaffold, the M-FFN-GGUF-3 status would have been DISCHARGED at v1.0.0 instead of needing this v1.1.0 follow-up amendment. `pv validate` 0/0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 6, 2026
…tion hypothesis FALSIFIED (#1535) Authors 2 lib-only determinism falsifiers (FALSIFY-FFN-GGUF-005) in apr_transformer::helpers::determinism_tests: falsify_ffn_gguf_005_f32_matmul_byte_deterministic_above_parallel_threshold falsify_ffn_gguf_005b_f32_matmul_byte_deterministic_below_parallel_threshold Both tests run `f32_matmul` TWICE with identical synthetic inputs (out_dim above + below F32_PARALLEL_THRESHOLD=256) and assert byte-identical output via f32::to_bits() comparison. BOTH TESTS PASS on first run. APR's f32_matmul (and the underlying f32_matvec_parallel rayon-parallel kernel) is byte-deterministic across repeated calls. This FALSIFIES the §28 parallel-reduction hypothesis at the kernel level. The §27 layer-3 18.23× drift is NOT caused by APR being non-deterministic with itself. REFINED HYPOTHESIS (post-§28 falsification): The cumulative APR↔GGUF drift must be a DIFFERENCE between APR's and GGUF's reduction order, not non-determinism within APR. APR uses simd_dot_f32_avx2 (4-wide FMA, 8-element AVX2 chunks); GGUF uses fused_q4k_q8k_parallel_matvec_into (different unroll + block boundaries). F32 sum-of-products is non-associative; different unroll → different bit-level results. NEXT M-FFN-GGUF-4 INVESTIGATION STEP: Cross-implementation deterministic-difference test — run APR's f32_matmul AND GGUF's fused_q4k_q8k_parallel_matvec_into on byte-identical synthetic inputs and assert whether outputs match. If they differ at the bit level, fix scope = align reduction order. Contract amendment: trace-ffn-sub-block-gguf-v1 v1.1.0 → v1.2.0. Status promotions: - FALSIFY-FFN-GGUF-005: NEW → DISCHARGED (tests pass) - M-FFN-GGUF-4 step (a): PENDING → SHIPPED Stages (b) cross-impl diff + (c) fix remain PENDING. Methodology lesson #2 firing prophylactically: branched off main AFTER #1534 (v1.1.0) merged to avoid cascade-ordering rebase conflict. Verified by `git rebase origin/main` succeeding cleanly with the v1.1.0 amendment intact. `pv validate` 0/0; 2 lib tests pass; production hot paths byte-unchanged. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Same-day post-M88+M89 follow-up. Discovered that ship-two-models-spec.md v2.72.0 §27 records the H1/H2 bisection had already been LIVE-run on noah-Lambda-Vector RTX 4090 on 2026-04-27 — verdict: H2 CONFIRMED (APR-side bug, ratio 18.23×).
§27 evidence
Status promotions
Why this needed a separate amendment
Methodology lesson #2 firing in retrospect: had I grep'd the spec for §22 / §27 BEFORE authoring M88's contract scaffold, the M-FFN-GGUF-3 status would have been DISCHARGED at v1.0.0 instead of needing this v1.1.0 follow-up amendment. The M89 harness (PR #1533) adds regression-test coverage but the §27 data remains the canonical operator-dispatched discharge proof.
Remaining work
Only M-FFN-GGUF-4 (SHIP-007 fix PR) remains PENDING — gated on engineering investigation of
inference.rsSwiGLU site at:298-302(line shifted from spec's:160-164after sub-FFN telemetry was added in PR #1066).3 candidate hypotheses for the layer-3-specific behavior within the SwiGLU block authored in v1.1.0 amendment for M-FFN-GGUF-4 investigation:
Test plan
pv validate0/0🤖 Generated with Claude Code