feat(M-FFN-GGUF-4 step a): determinism falsifier — §28 parallel-reduction hypothesis FALSIFIED by noahgift · Pull Request #1535 · paiml/aprender

noahgift · 2026-05-06T14:59:00Z

Summary

Authors 2 lib-only determinism falsifiers (FALSIFY-FFN-GGUF-005) in apr_transformer::helpers::determinism_tests. Both tests run f32_matmul TWICE with identical synthetic inputs and assert byte-identical output via f32::to_bits() comparison.

Both tests PASS on first run. APR's f32_matmul is byte-deterministic across repeated calls.

This FALSIFIES the §28 parallel-reduction hypothesis at the kernel level. The §27 layer-3 18.23× drift is NOT caused by APR being non-deterministic with itself.

Test code

#[test]
fn falsify_ffn_gguf_005_f32_matmul_byte_deterministic_above_parallel_threshold() {
    // out_dim above F32_PARALLEL_THRESHOLD (256) so f32_matvec_parallel fires
    let result_a = f32_matmul(&input, &weight, in_dim, out_dim);
    let result_b = f32_matmul(&input, &weight, in_dim, out_dim);
    for (i, (&a, &b)) in result_a.iter().zip(result_b.iter()).enumerate() {
        assert_eq!(a.to_bits(), b.to_bits(), ...);
    }
}

Refined hypothesis (post-§28 falsification)

The cumulative APR↔GGUF drift must be a DIFFERENCE between APR's and GGUF's reduction order, not non-determinism within APR.

APR uses simd_dot_f32_avx2 (4-wide FMA, 8-element AVX2 chunks)
GGUF uses fused_q4k_q8k_parallel_matvec_into (different unroll + block boundaries)
F32 sum-of-products is non-associative; different unroll → different bit-level results

Next M-FFN-GGUF-4 investigation step

Cross-implementation deterministic-difference test — author a SECOND lib-only test that runs APR's f32_matmul AND GGUF's fused_q4k_q8k_parallel_matvec_into (or its f32 equivalent) on byte-identical synthetic inputs and asserts whether outputs match. If they differ at the bit level, fix scope = align reduction order.

Contract amendment (v1.1.0 → v1.2.0)

Field	Before	After
version	1.1.0	1.2.0
FALSIFY-FFN-GGUF-005	NEW	DISCHARGED
M-FFN-GGUF-4 step (a)	PENDING	SHIPPED

Methodology lesson applied

Lesson #3 (cascade ordering): branched off main AFTER #1534 (v1.1.0) merged to avoid cascade-ordering rebase conflict. Verified by git rebase origin/main succeeding cleanly with the v1.1.0 amendment intact.

Test plan

pv validate 0/0
2 lib tests pass on first run
No production hot path touched (additive #[cfg(test)] mod)
§28 hypothesis empirically falsified

🤖 Generated with Claude Code

…tion hypothesis FALSIFIED Authors 2 lib-only determinism falsifiers (FALSIFY-FFN-GGUF-005) in apr_transformer::helpers::determinism_tests: falsify_ffn_gguf_005_f32_matmul_byte_deterministic_above_parallel_threshold falsify_ffn_gguf_005b_f32_matmul_byte_deterministic_below_parallel_threshold Both tests run `f32_matmul` TWICE with identical synthetic inputs (out_dim above + below F32_PARALLEL_THRESHOLD=256) and assert byte-identical output via f32::to_bits() comparison. BOTH TESTS PASS on first run. APR's f32_matmul (and the underlying f32_matvec_parallel rayon-parallel kernel) is byte-deterministic across repeated calls. This FALSIFIES the §28 parallel-reduction hypothesis at the kernel level. The §27 layer-3 18.23× drift is NOT caused by APR being non-deterministic with itself. REFINED HYPOTHESIS (post-§28 falsification): The cumulative APR↔GGUF drift must be a DIFFERENCE between APR's and GGUF's reduction order, not non-determinism within APR. APR uses simd_dot_f32_avx2 (4-wide FMA, 8-element AVX2 chunks); GGUF uses fused_q4k_q8k_parallel_matvec_into (different unroll + block boundaries). F32 sum-of-products is non-associative; different unroll → different bit-level results. NEXT M-FFN-GGUF-4 INVESTIGATION STEP: Cross-implementation deterministic-difference test — run APR's f32_matmul AND GGUF's fused_q4k_q8k_parallel_matvec_into on byte-identical synthetic inputs and assert whether outputs match. If they differ at the bit level, fix scope = align reduction order. Contract amendment: trace-ffn-sub-block-gguf-v1 v1.1.0 → v1.2.0. Status promotions: - FALSIFY-FFN-GGUF-005: NEW → DISCHARGED (tests pass) - M-FFN-GGUF-4 step (a): PENDING → SHIPPED Stages (b) cross-impl diff + (c) fix remain PENDING. Methodology lesson #2 firing prophylactically: branched off main AFTER #1534 (v1.1.0) merged to avoid cascade-ordering rebase conflict. Verified by `git rebase origin/main` succeeding cleanly with the v1.1.0 amendment intact. `pv validate` 0/0; 2 lib tests pass; production hot paths byte-unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…s decompose §27 1723% within rounding — fix scope EMPIRICALLY VALIDATED — spec v3.03.0 → v3.04.0 (#1546) Two-day autonomous /loop session shipped 11 lib-test + 1 integration-test falsifiers (M91-M101, aprender PRs #1535/#1536/#1537/#1538/#1540/#1541/ #1542/#1543/#1544/#1545) decomposing the §27 layer-3 ffn_swigl 18.23× APR-vs-GGUF std-ratio. Final empirical decomposition (2026-05-07): M94 mechanism × M95 compounding × M99 std-ratio × A5 real-teacher × residual = 0.077% × 5.70× × 50× × 5.56× × 14× ≈ 1715% ≈ §27's 1723% (within rounding) Six synthetic amplifier candidates resolved: - A1 (RoPE phase, M98) — FALSIFIED 1.00× UNITARY - A2 (Softmax saturation, M97) — FALSIFIED 0.01× COMPRESSES - A3 (Block-scale variance, M96) — FALSIFIED 1.00× SCALE-INVARIANT - A4 (Multi-token batch, M99) — FALSIFIED 0.26× per-token + 50× std-ratio - A5 (Real-weight non-uniformity, M100) — PARTIALLY CONFIRMED 5.56× LIVE - A6 (RMSNorm rsqrt, M101) — FALSIFIED 1.00× HOMOGENEOUS 14× residual is now attributed entirely to cumulative-layer interaction. SHIP-007 §22 fix scope EMPIRICALLY VALIDATED as Option-A (PROMOTE GGUF-PATH semantics into APR forward): switching APR's `f32_matmul` to Q8K activation quant + fused matvec semantics will recover the 5.56× per-matvec amplification on every matmul, eliminating cumulative APR-vs-GGUF drift. Estimated fix scope ~250-400 LOC; transitively discharges 5 MODEL-1 PARTIALs (SHIP-002, SHIP-005, SHIP-006, SHIP-007, SHIP-008) per §17.5. Cascade methodology consolidated: - ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_cascade_decomposes_magnitude.md - ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_chain_assert_difference.md Companion-spec entries M91-M101 in claude-code-parity-apr/docs/ specifications/claude-code-parity-apr-poc.md provide the full per-PR narrative. Aprender contract `contracts/trace-ffn-sub-block-gguf-v1.yaml` v1.0.0 → v1.12.0 across 12 amendments. MODEL-1 ship %: unchanged at 91% until M-FFN-GGUF-5 (actual fix PR) lands. MODEL-2 ship %: unchanged at 57% until step 5g.3 produces val_loss < 9.38. Spec v3.03.0 → v3.04.0. Atomic next action banner only — full §59 narrative deferred to deliberate-session work alongside M-FFN-GGUF-5 fix PR. Refs PMAT-CCPA, SHIP-007 §22, M91-M101 cascade. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 6, 2026 14:59

noahgift merged commit 1eb8124 into main May 6, 2026
11 checks passed

noahgift deleted the feat/m-ffn-gguf-4a-determinism-falsifier branch May 6, 2026 15:22

noahgift mentioned this pull request May 7, 2026

docs(SHIP-TWO-001 §59): SHIP-007 §22 falsifier cascade CLOSED — 11 PRs decompose §27 1723% within rounding #1546

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(M-FFN-GGUF-4 step a): determinism falsifier — §28 parallel-reduction hypothesis FALSIFIED#1535

feat(M-FFN-GGUF-4 step a): determinism falsifier — §28 parallel-reduction hypothesis FALSIFIED#1535
noahgift merged 1 commit into
mainfrom
feat/m-ffn-gguf-4a-determinism-falsifier

noahgift commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 6, 2026

Summary

Test code

Refined hypothesis (post-§28 falsification)

Next M-FFN-GGUF-4 investigation step

Contract amendment (v1.1.0 → v1.2.0)

Methodology lesson applied

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant