feat(M-FFN-GGUF-4 step a): determinism falsifier — §28 parallel-reduction hypothesis FALSIFIED#1535
Merged
Merged
Conversation
…tion hypothesis FALSIFIED Authors 2 lib-only determinism falsifiers (FALSIFY-FFN-GGUF-005) in apr_transformer::helpers::determinism_tests: falsify_ffn_gguf_005_f32_matmul_byte_deterministic_above_parallel_threshold falsify_ffn_gguf_005b_f32_matmul_byte_deterministic_below_parallel_threshold Both tests run `f32_matmul` TWICE with identical synthetic inputs (out_dim above + below F32_PARALLEL_THRESHOLD=256) and assert byte-identical output via f32::to_bits() comparison. BOTH TESTS PASS on first run. APR's f32_matmul (and the underlying f32_matvec_parallel rayon-parallel kernel) is byte-deterministic across repeated calls. This FALSIFIES the §28 parallel-reduction hypothesis at the kernel level. The §27 layer-3 18.23× drift is NOT caused by APR being non-deterministic with itself. REFINED HYPOTHESIS (post-§28 falsification): The cumulative APR↔GGUF drift must be a DIFFERENCE between APR's and GGUF's reduction order, not non-determinism within APR. APR uses simd_dot_f32_avx2 (4-wide FMA, 8-element AVX2 chunks); GGUF uses fused_q4k_q8k_parallel_matvec_into (different unroll + block boundaries). F32 sum-of-products is non-associative; different unroll → different bit-level results. NEXT M-FFN-GGUF-4 INVESTIGATION STEP: Cross-implementation deterministic-difference test — run APR's f32_matmul AND GGUF's fused_q4k_q8k_parallel_matvec_into on byte-identical synthetic inputs and assert whether outputs match. If they differ at the bit level, fix scope = align reduction order. Contract amendment: trace-ffn-sub-block-gguf-v1 v1.1.0 → v1.2.0. Status promotions: - FALSIFY-FFN-GGUF-005: NEW → DISCHARGED (tests pass) - M-FFN-GGUF-4 step (a): PENDING → SHIPPED Stages (b) cross-impl diff + (c) fix remain PENDING. Methodology lesson #2 firing prophylactically: branched off main AFTER #1534 (v1.1.0) merged to avoid cascade-ordering rebase conflict. Verified by `git rebase origin/main` succeeding cleanly with the v1.1.0 amendment intact. `pv validate` 0/0; 2 lib tests pass; production hot paths byte-unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5 tasks
noahgift
added a commit
that referenced
this pull request
May 7, 2026
…s decompose §27 1723% within rounding — fix scope EMPIRICALLY VALIDATED — spec v3.03.0 → v3.04.0 (#1546) Two-day autonomous /loop session shipped 11 lib-test + 1 integration-test falsifiers (M91-M101, aprender PRs #1535/#1536/#1537/#1538/#1540/#1541/ #1542/#1543/#1544/#1545) decomposing the §27 layer-3 ffn_swigl 18.23× APR-vs-GGUF std-ratio. Final empirical decomposition (2026-05-07): M94 mechanism × M95 compounding × M99 std-ratio × A5 real-teacher × residual = 0.077% × 5.70× × 50× × 5.56× × 14× ≈ 1715% ≈ §27's 1723% (within rounding) Six synthetic amplifier candidates resolved: - A1 (RoPE phase, M98) — FALSIFIED 1.00× UNITARY - A2 (Softmax saturation, M97) — FALSIFIED 0.01× COMPRESSES - A3 (Block-scale variance, M96) — FALSIFIED 1.00× SCALE-INVARIANT - A4 (Multi-token batch, M99) — FALSIFIED 0.26× per-token + 50× std-ratio - A5 (Real-weight non-uniformity, M100) — PARTIALLY CONFIRMED 5.56× LIVE - A6 (RMSNorm rsqrt, M101) — FALSIFIED 1.00× HOMOGENEOUS 14× residual is now attributed entirely to cumulative-layer interaction. SHIP-007 §22 fix scope EMPIRICALLY VALIDATED as Option-A (PROMOTE GGUF-PATH semantics into APR forward): switching APR's `f32_matmul` to Q8K activation quant + fused matvec semantics will recover the 5.56× per-matvec amplification on every matmul, eliminating cumulative APR-vs-GGUF drift. Estimated fix scope ~250-400 LOC; transitively discharges 5 MODEL-1 PARTIALs (SHIP-002, SHIP-005, SHIP-006, SHIP-007, SHIP-008) per §17.5. Cascade methodology consolidated: - ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_cascade_decomposes_magnitude.md - ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_chain_assert_difference.md Companion-spec entries M91-M101 in claude-code-parity-apr/docs/ specifications/claude-code-parity-apr-poc.md provide the full per-PR narrative. Aprender contract `contracts/trace-ffn-sub-block-gguf-v1.yaml` v1.0.0 → v1.12.0 across 12 amendments. MODEL-1 ship %: unchanged at 91% until M-FFN-GGUF-5 (actual fix PR) lands. MODEL-2 ship %: unchanged at 57% until step 5g.3 produces val_loss < 9.38. Spec v3.03.0 → v3.04.0. Atomic next action banner only — full §59 narrative deferred to deliberate-session work alongside M-FFN-GGUF-5 fix PR. Refs PMAT-CCPA, SHIP-007 §22, M91-M101 cascade. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Authors 2 lib-only determinism falsifiers (FALSIFY-FFN-GGUF-005) in
apr_transformer::helpers::determinism_tests. Both tests runf32_matmulTWICE with identical synthetic inputs and assert byte-identical output viaf32::to_bits()comparison.Both tests PASS on first run. APR's
f32_matmulis byte-deterministic across repeated calls.This FALSIFIES the §28 parallel-reduction hypothesis at the kernel level. The §27 layer-3 18.23× drift is NOT caused by APR being non-deterministic with itself.
Test code
Refined hypothesis (post-§28 falsification)
The cumulative APR↔GGUF drift must be a DIFFERENCE between APR's and GGUF's reduction order, not non-determinism within APR.
simd_dot_f32_avx2(4-wide FMA, 8-element AVX2 chunks)fused_q4k_q8k_parallel_matvec_into(different unroll + block boundaries)Next M-FFN-GGUF-4 investigation step
Cross-implementation deterministic-difference test — author a SECOND lib-only test that runs APR's
f32_matmulAND GGUF'sfused_q4k_q8k_parallel_matvec_into(or its f32 equivalent) on byte-identical synthetic inputs and asserts whether outputs match. If they differ at the bit level, fix scope = align reduction order.Contract amendment (v1.1.0 → v1.2.0)
Methodology lesson applied
Lesson #3 (cascade ordering): branched off main AFTER #1534 (v1.1.0) merged to avoid cascade-ordering rebase conflict. Verified by
git rebase origin/mainsucceeding cleanly with the v1.1.0 amendment intact.Test plan
pv validate0/0#[cfg(test)] mod)🤖 Generated with Claude Code