feat(M-FFN-GGUF-4 step b): SIMD-vs-scalar byte-identity test — H2a' ALSO FALSIFIED by noahgift · Pull Request #1536 · paiml/aprender

noahgift · 2026-05-06T19:49:53Z

Summary

Authors FALSIFY-FFN-GGUF-006: byte-identity test between APR's simd_dot_f32_avx2 (AVX2 8-wide FMA) and APR's scalar fallback (iter().zip().map(*).sum()) on canonical synthetic input.

Empirical result: both paths produce byte-identical output 0x44191e70 = 612.4756. Asserted as regression-test invariant.

This FALSIFIES the refined H2a' hypothesis at the SIMD-vs-scalar level — the cumulative APR↔GGUF drift cannot be explained by APR-internal reduction-order differences.

Two hypothesis falsifications in one session

Falsifier	Hypothesis	Result
FALSIFY-FFN-GGUF-005 (M91)	§28 parallel-reduction non-determinism	FALSIFIED — APR f32_matmul byte-deterministic
FALSIFY-FFN-GGUF-006 (this PR)	H2a' SIMD-vs-scalar reduction-order	FALSIFIED — AVX2 and scalar produce byte-identical output

Refined hypothesis H2d (post-second-falsification)

Pinned in v1.3.0 amendment. The bit-level APR↔GGUF difference must come from one of:

Hypothesis	Description
H2d.1	Per-block dequant boundaries differ (whole-row F32 reduction vs Q4K-super-block-wise)
H2d.2	APR's F32 weights differ at bit level from dequantized GGUF Q4K bytes
H2d.3	GGUF's intermediate Q8K activation quantization rounds differently than APR's F32 path

Next M-FFN-GGUF-4 step (c) deliverable

H2d.2 is most directly testable autonomously — load APR F32 weights + GGUF Q4K bytes for same tensor, dequantize Q4K via APR's dequant routine, compare element-wise. If bit-level differs, H2d.2 confirmed and the SHIP-007 fix scope narrows to "fix dequant invariant".

Contract amendment (v1.2.0 → v1.3.0)

Field	Before	After
version	1.2.0	1.3.0
FALSIFY-FFN-GGUF-006	NEW	DISCHARGED
M-FFN-GGUF-4 step (b)	PENDING	SHIPPED

Test plan

pv validate 0/0
3 lib tests pass (FFN-GGUF-005a, 005b, 006)
No production hot path touched (additive #[cfg(test)] mod)
H2a' empirically falsified at SIMD-vs-scalar level
H2d hypothesis triplet authored for next-step falsification

🤖 Generated with Claude Code

noahgift · 2026-05-06T20:09:36Z

Re-trigger CI

…efined hypothesis ALSO FALSIFIED Authors a third lib-only falsifier (FALSIFY-FFN-GGUF-006) in apr_transformer::helpers::determinism_tests: falsify_ffn_gguf_006_simd_vs_scalar_reduction_order_byte_identity Test runs APR's simd_dot_f32_avx2 (AVX2 8-wide FMA) and APR's scalar fallback (iter().zip().map(*).sum()) on the same canonical synthetic input, compares bit patterns via f32::to_bits(). EMPIRICAL RESULT (2026-05-06): both paths produce BYTE-IDENTICAL output 0x44191e70 = 612.4756. Asserted as regression-test invariant. This FALSIFIES the refined H2a' hypothesis at the SIMD-vs-scalar level. The cumulative APR↔GGUF drift cannot be explained by APR's SIMD vs APR's scalar path differing on this class of f32 inputs. SECOND HYPOTHESIS FALSIFICATION IN ONE SESSION: - §28 (parallel-reduction non-determinism, M91): FALSIFIED - H2a' (SIMD-vs-scalar reduction-order, this PR): FALSIFIED NEW REFINED HYPOTHESIS H2d (post-second-falsification): The bit-level difference between APR and GGUF must come from one of: H2d.1: Per-block dequant boundaries differ between APR's whole-row F32 reduction and GGUF's Q4K-super-block-wise reduction H2d.2: APR's F32 weights differ at bit level from a true dequantization of the GGUF Q4K bytes (despite SHIP-003 PR #1059 cos≥0.9999999 weight invariance) H2d.3: GGUF's intermediate Q8K activation quantization rounds activations to ~7-bit precision differently than APR's full-F32 path Each H2d.x is a separate falsifier candidate. Next M-FFN-GGUF-4 step (c) deliverable: H2d.2 is most directly testable autonomously — load APR F32 weights + GGUF Q4K bytes for same tensor, dequantize Q4K via APR's dequant routine, compare element-wise. If bit-level differ, H2d.2 confirmed. Contract amendment: trace-ffn-sub-block-gguf-v1 v1.2.0 → v1.3.0. Status promotions: - FALSIFY-FFN-GGUF-006: NEW → DISCHARGED (test passes after flip) - M-FFN-GGUF-4 step (b): PENDING → SHIPPED Step (c) remains PENDING — narrowed scope to H2d.{1,2,3}. Production hot paths byte-unchanged. `pv validate` 0/0; 3 lib tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…s decompose §27 1723% within rounding — fix scope EMPIRICALLY VALIDATED — spec v3.03.0 → v3.04.0 (#1546) Two-day autonomous /loop session shipped 11 lib-test + 1 integration-test falsifiers (M91-M101, aprender PRs #1535/#1536/#1537/#1538/#1540/#1541/ #1542/#1543/#1544/#1545) decomposing the §27 layer-3 ffn_swigl 18.23× APR-vs-GGUF std-ratio. Final empirical decomposition (2026-05-07): M94 mechanism × M95 compounding × M99 std-ratio × A5 real-teacher × residual = 0.077% × 5.70× × 50× × 5.56× × 14× ≈ 1715% ≈ §27's 1723% (within rounding) Six synthetic amplifier candidates resolved: - A1 (RoPE phase, M98) — FALSIFIED 1.00× UNITARY - A2 (Softmax saturation, M97) — FALSIFIED 0.01× COMPRESSES - A3 (Block-scale variance, M96) — FALSIFIED 1.00× SCALE-INVARIANT - A4 (Multi-token batch, M99) — FALSIFIED 0.26× per-token + 50× std-ratio - A5 (Real-weight non-uniformity, M100) — PARTIALLY CONFIRMED 5.56× LIVE - A6 (RMSNorm rsqrt, M101) — FALSIFIED 1.00× HOMOGENEOUS 14× residual is now attributed entirely to cumulative-layer interaction. SHIP-007 §22 fix scope EMPIRICALLY VALIDATED as Option-A (PROMOTE GGUF-PATH semantics into APR forward): switching APR's `f32_matmul` to Q8K activation quant + fused matvec semantics will recover the 5.56× per-matvec amplification on every matmul, eliminating cumulative APR-vs-GGUF drift. Estimated fix scope ~250-400 LOC; transitively discharges 5 MODEL-1 PARTIALs (SHIP-002, SHIP-005, SHIP-006, SHIP-007, SHIP-008) per §17.5. Cascade methodology consolidated: - ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_cascade_decomposes_magnitude.md - ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_chain_assert_difference.md Companion-spec entries M91-M101 in claude-code-parity-apr/docs/ specifications/claude-code-parity-apr-poc.md provide the full per-PR narrative. Aprender contract `contracts/trace-ffn-sub-block-gguf-v1.yaml` v1.0.0 → v1.12.0 across 12 amendments. MODEL-1 ship %: unchanged at 91% until M-FFN-GGUF-5 (actual fix PR) lands. MODEL-2 ship %: unchanged at 57% until step 5g.3 produces val_loss < 9.38. Spec v3.03.0 → v3.04.0. Atomic next action banner only — full §59 narrative deferred to deliberate-session work alongside M-FFN-GGUF-5 fix PR. Refs PMAT-CCPA, SHIP-007 §22, M91-M101 cascade. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 6, 2026 19:49

noahgift force-pushed the feat/m-ffn-gguf-4b-simd-vs-scalar-reduction-order branch from a6fa508 to f6fa816 Compare May 6, 2026 20:10

noahgift merged commit 496e955 into main May 6, 2026
10 checks passed

noahgift deleted the feat/m-ffn-gguf-4b-simd-vs-scalar-reduction-order branch May 6, 2026 20:39

noahgift mentioned this pull request May 7, 2026

docs(SHIP-TWO-001 §59): SHIP-007 §22 falsifier cascade CLOSED — 11 PRs decompose §27 1723% within rounding #1546

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(M-FFN-GGUF-4 step b): SIMD-vs-scalar byte-identity test — H2a' ALSO FALSIFIED#1536

feat(M-FFN-GGUF-4 step b): SIMD-vs-scalar byte-identity test — H2a' ALSO FALSIFIED#1536
noahgift merged 1 commit into
mainfrom
feat/m-ffn-gguf-4b-simd-vs-scalar-reduction-order

noahgift commented May 6, 2026

Uh oh!

noahgift commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 6, 2026

Summary

Two hypothesis falsifications in one session

Refined hypothesis H2d (post-second-falsification)

Next M-FFN-GGUF-4 step (c) deliverable

Contract amendment (v1.2.0 → v1.3.0)

Test plan

Uh oh!

noahgift commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant