Skip to content

feat(M-FFN-GGUF-4 step a): determinism falsifier — §28 parallel-reduction hypothesis FALSIFIED#1535

Merged
noahgift merged 1 commit into
mainfrom
feat/m-ffn-gguf-4a-determinism-falsifier
May 6, 2026
Merged

feat(M-FFN-GGUF-4 step a): determinism falsifier — §28 parallel-reduction hypothesis FALSIFIED#1535
noahgift merged 1 commit into
mainfrom
feat/m-ffn-gguf-4a-determinism-falsifier

Conversation

@noahgift

@noahgift noahgift commented May 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Authors 2 lib-only determinism falsifiers (FALSIFY-FFN-GGUF-005) in apr_transformer::helpers::determinism_tests. Both tests run f32_matmul TWICE with identical synthetic inputs and assert byte-identical output via f32::to_bits() comparison.

Both tests PASS on first run. APR's f32_matmul is byte-deterministic across repeated calls.

This FALSIFIES the §28 parallel-reduction hypothesis at the kernel level. The §27 layer-3 18.23× drift is NOT caused by APR being non-deterministic with itself.

Test code

#[test]
fn falsify_ffn_gguf_005_f32_matmul_byte_deterministic_above_parallel_threshold() {
    // out_dim above F32_PARALLEL_THRESHOLD (256) so f32_matvec_parallel fires
    let result_a = f32_matmul(&input, &weight, in_dim, out_dim);
    let result_b = f32_matmul(&input, &weight, in_dim, out_dim);
    for (i, (&a, &b)) in result_a.iter().zip(result_b.iter()).enumerate() {
        assert_eq!(a.to_bits(), b.to_bits(), ...);
    }
}

Refined hypothesis (post-§28 falsification)

The cumulative APR↔GGUF drift must be a DIFFERENCE between APR's and GGUF's reduction order, not non-determinism within APR.

  • APR uses simd_dot_f32_avx2 (4-wide FMA, 8-element AVX2 chunks)
  • GGUF uses fused_q4k_q8k_parallel_matvec_into (different unroll + block boundaries)
  • F32 sum-of-products is non-associative; different unroll → different bit-level results

Next M-FFN-GGUF-4 investigation step

Cross-implementation deterministic-difference test — author a SECOND lib-only test that runs APR's f32_matmul AND GGUF's fused_q4k_q8k_parallel_matvec_into (or its f32 equivalent) on byte-identical synthetic inputs and asserts whether outputs match. If they differ at the bit level, fix scope = align reduction order.

Contract amendment (v1.1.0 → v1.2.0)

Field Before After
version 1.1.0 1.2.0
FALSIFY-FFN-GGUF-005 NEW DISCHARGED
M-FFN-GGUF-4 step (a) PENDING SHIPPED

Methodology lesson applied

Lesson #3 (cascade ordering): branched off main AFTER #1534 (v1.1.0) merged to avoid cascade-ordering rebase conflict. Verified by git rebase origin/main succeeding cleanly with the v1.1.0 amendment intact.

Test plan

  • pv validate 0/0
  • 2 lib tests pass on first run
  • No production hot path touched (additive #[cfg(test)] mod)
  • §28 hypothesis empirically falsified

🤖 Generated with Claude Code

…tion hypothesis FALSIFIED

Authors 2 lib-only determinism falsifiers (FALSIFY-FFN-GGUF-005)
in apr_transformer::helpers::determinism_tests:

  falsify_ffn_gguf_005_f32_matmul_byte_deterministic_above_parallel_threshold
  falsify_ffn_gguf_005b_f32_matmul_byte_deterministic_below_parallel_threshold

Both tests run `f32_matmul` TWICE with identical synthetic inputs
(out_dim above + below F32_PARALLEL_THRESHOLD=256) and assert
byte-identical output via f32::to_bits() comparison.

BOTH TESTS PASS on first run. APR's f32_matmul (and the underlying
f32_matvec_parallel rayon-parallel kernel) is byte-deterministic
across repeated calls.

This FALSIFIES the §28 parallel-reduction hypothesis at the kernel
level. The §27 layer-3 18.23× drift is NOT caused by APR being
non-deterministic with itself.

REFINED HYPOTHESIS (post-§28 falsification):
The cumulative APR↔GGUF drift must be a DIFFERENCE between APR's
and GGUF's reduction order, not non-determinism within APR. APR
uses simd_dot_f32_avx2 (4-wide FMA, 8-element AVX2 chunks); GGUF
uses fused_q4k_q8k_parallel_matvec_into (different unroll + block
boundaries). F32 sum-of-products is non-associative; different
unroll → different bit-level results.

NEXT M-FFN-GGUF-4 INVESTIGATION STEP:
Cross-implementation deterministic-difference test — run APR's
f32_matmul AND GGUF's fused_q4k_q8k_parallel_matvec_into on
byte-identical synthetic inputs and assert whether outputs match.
If they differ at the bit level, fix scope = align reduction order.

Contract amendment: trace-ffn-sub-block-gguf-v1 v1.1.0 → v1.2.0.

Status promotions:
- FALSIFY-FFN-GGUF-005: NEW → DISCHARGED (tests pass)
- M-FFN-GGUF-4 step (a): PENDING → SHIPPED

Stages (b) cross-impl diff + (c) fix remain PENDING.

Methodology lesson #2 firing prophylactically: branched off main
AFTER #1534 (v1.1.0) merged to avoid cascade-ordering rebase
conflict. Verified by `git rebase origin/main` succeeding cleanly
with the v1.1.0 amendment intact.

`pv validate` 0/0; 2 lib tests pass; production hot paths
byte-unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 6, 2026 14:59
@noahgift noahgift merged commit 1eb8124 into main May 6, 2026
11 checks passed
@noahgift noahgift deleted the feat/m-ffn-gguf-4a-determinism-falsifier branch May 6, 2026 15:22
noahgift added a commit that referenced this pull request May 7, 2026
…s decompose §27 1723% within rounding — fix scope EMPIRICALLY VALIDATED — spec v3.03.0 → v3.04.0 (#1546)

Two-day autonomous /loop session shipped 11 lib-test + 1 integration-test
falsifiers (M91-M101, aprender PRs #1535/#1536/#1537/#1538/#1540/#1541/
#1542/#1543/#1544/#1545) decomposing the §27 layer-3 ffn_swigl 18.23×
APR-vs-GGUF std-ratio.

Final empirical decomposition (2026-05-07):

  M94 mechanism × M95 compounding × M99 std-ratio × A5 real-teacher × residual
  = 0.077% × 5.70× × 50× × 5.56× × 14×
  ≈ 1715%   ≈   §27's 1723% (within rounding)

Six synthetic amplifier candidates resolved:
- A1 (RoPE phase, M98)        — FALSIFIED 1.00× UNITARY
- A2 (Softmax saturation, M97) — FALSIFIED 0.01× COMPRESSES
- A3 (Block-scale variance, M96) — FALSIFIED 1.00× SCALE-INVARIANT
- A4 (Multi-token batch, M99) — FALSIFIED 0.26× per-token + 50× std-ratio
- A5 (Real-weight non-uniformity, M100) — PARTIALLY CONFIRMED 5.56× LIVE
- A6 (RMSNorm rsqrt, M101)    — FALSIFIED 1.00× HOMOGENEOUS

14× residual is now attributed entirely to cumulative-layer interaction.

SHIP-007 §22 fix scope EMPIRICALLY VALIDATED as Option-A (PROMOTE
GGUF-PATH semantics into APR forward): switching APR's `f32_matmul`
to Q8K activation quant + fused matvec semantics will recover the
5.56× per-matvec amplification on every matmul, eliminating cumulative
APR-vs-GGUF drift. Estimated fix scope ~250-400 LOC; transitively
discharges 5 MODEL-1 PARTIALs (SHIP-002, SHIP-005, SHIP-006, SHIP-007,
SHIP-008) per §17.5.

Cascade methodology consolidated:
- ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_cascade_decomposes_magnitude.md
- ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_chain_assert_difference.md

Companion-spec entries M91-M101 in claude-code-parity-apr/docs/
specifications/claude-code-parity-apr-poc.md provide the full per-PR
narrative. Aprender contract `contracts/trace-ffn-sub-block-gguf-v1.yaml`
v1.0.0 → v1.12.0 across 12 amendments.

MODEL-1 ship %: unchanged at 91% until M-FFN-GGUF-5 (actual fix PR) lands.
MODEL-2 ship %: unchanged at 57% until step 5g.3 produces val_loss < 9.38.

Spec v3.03.0 → v3.04.0. Atomic next action banner only — full §59
narrative deferred to deliberate-session work alongside M-FFN-GGUF-5
fix PR.

Refs PMAT-CCPA, SHIP-007 §22, M91-M101 cascade.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant