feat(M-FFN-GGUF-4 step b): SIMD-vs-scalar byte-identity test — H2a' ALSO FALSIFIED#1536
Merged
Merged
Conversation
Contributor
Author
|
Re-trigger CI |
…efined hypothesis ALSO FALSIFIED
Authors a third lib-only falsifier (FALSIFY-FFN-GGUF-006) in
apr_transformer::helpers::determinism_tests:
falsify_ffn_gguf_006_simd_vs_scalar_reduction_order_byte_identity
Test runs APR's simd_dot_f32_avx2 (AVX2 8-wide FMA) and APR's
scalar fallback (iter().zip().map(*).sum()) on the same canonical
synthetic input, compares bit patterns via f32::to_bits().
EMPIRICAL RESULT (2026-05-06): both paths produce BYTE-IDENTICAL
output 0x44191e70 = 612.4756. Asserted as regression-test
invariant.
This FALSIFIES the refined H2a' hypothesis at the SIMD-vs-scalar
level. The cumulative APR↔GGUF drift cannot be explained by APR's
SIMD vs APR's scalar path differing on this class of f32 inputs.
SECOND HYPOTHESIS FALSIFICATION IN ONE SESSION:
- §28 (parallel-reduction non-determinism, M91): FALSIFIED
- H2a' (SIMD-vs-scalar reduction-order, this PR): FALSIFIED
NEW REFINED HYPOTHESIS H2d (post-second-falsification):
The bit-level difference between APR and GGUF must come from one
of:
H2d.1: Per-block dequant boundaries differ between APR's whole-row
F32 reduction and GGUF's Q4K-super-block-wise reduction
H2d.2: APR's F32 weights differ at bit level from a true
dequantization of the GGUF Q4K bytes (despite SHIP-003 PR
#1059 cos≥0.9999999 weight invariance)
H2d.3: GGUF's intermediate Q8K activation quantization rounds
activations to ~7-bit precision differently than APR's
full-F32 path
Each H2d.x is a separate falsifier candidate.
Next M-FFN-GGUF-4 step (c) deliverable: H2d.2 is most directly
testable autonomously — load APR F32 weights + GGUF Q4K bytes for
same tensor, dequantize Q4K via APR's dequant routine, compare
element-wise. If bit-level differ, H2d.2 confirmed.
Contract amendment: trace-ffn-sub-block-gguf-v1 v1.2.0 → v1.3.0.
Status promotions:
- FALSIFY-FFN-GGUF-006: NEW → DISCHARGED (test passes after flip)
- M-FFN-GGUF-4 step (b): PENDING → SHIPPED
Step (c) remains PENDING — narrowed scope to H2d.{1,2,3}.
Production hot paths byte-unchanged.
`pv validate` 0/0; 3 lib tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
a6fa508 to
f6fa816
Compare
5 tasks
noahgift
added a commit
that referenced
this pull request
May 7, 2026
…s decompose §27 1723% within rounding — fix scope EMPIRICALLY VALIDATED — spec v3.03.0 → v3.04.0 (#1546) Two-day autonomous /loop session shipped 11 lib-test + 1 integration-test falsifiers (M91-M101, aprender PRs #1535/#1536/#1537/#1538/#1540/#1541/ #1542/#1543/#1544/#1545) decomposing the §27 layer-3 ffn_swigl 18.23× APR-vs-GGUF std-ratio. Final empirical decomposition (2026-05-07): M94 mechanism × M95 compounding × M99 std-ratio × A5 real-teacher × residual = 0.077% × 5.70× × 50× × 5.56× × 14× ≈ 1715% ≈ §27's 1723% (within rounding) Six synthetic amplifier candidates resolved: - A1 (RoPE phase, M98) — FALSIFIED 1.00× UNITARY - A2 (Softmax saturation, M97) — FALSIFIED 0.01× COMPRESSES - A3 (Block-scale variance, M96) — FALSIFIED 1.00× SCALE-INVARIANT - A4 (Multi-token batch, M99) — FALSIFIED 0.26× per-token + 50× std-ratio - A5 (Real-weight non-uniformity, M100) — PARTIALLY CONFIRMED 5.56× LIVE - A6 (RMSNorm rsqrt, M101) — FALSIFIED 1.00× HOMOGENEOUS 14× residual is now attributed entirely to cumulative-layer interaction. SHIP-007 §22 fix scope EMPIRICALLY VALIDATED as Option-A (PROMOTE GGUF-PATH semantics into APR forward): switching APR's `f32_matmul` to Q8K activation quant + fused matvec semantics will recover the 5.56× per-matvec amplification on every matmul, eliminating cumulative APR-vs-GGUF drift. Estimated fix scope ~250-400 LOC; transitively discharges 5 MODEL-1 PARTIALs (SHIP-002, SHIP-005, SHIP-006, SHIP-007, SHIP-008) per §17.5. Cascade methodology consolidated: - ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_cascade_decomposes_magnitude.md - ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_chain_assert_difference.md Companion-spec entries M91-M101 in claude-code-parity-apr/docs/ specifications/claude-code-parity-apr-poc.md provide the full per-PR narrative. Aprender contract `contracts/trace-ffn-sub-block-gguf-v1.yaml` v1.0.0 → v1.12.0 across 12 amendments. MODEL-1 ship %: unchanged at 91% until M-FFN-GGUF-5 (actual fix PR) lands. MODEL-2 ship %: unchanged at 57% until step 5g.3 produces val_loss < 9.38. Spec v3.03.0 → v3.04.0. Atomic next action banner only — full §59 narrative deferred to deliberate-session work alongside M-FFN-GGUF-5 fix PR. Refs PMAT-CCPA, SHIP-007 §22, M91-M101 cascade. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Authors FALSIFY-FFN-GGUF-006: byte-identity test between APR's
simd_dot_f32_avx2(AVX2 8-wide FMA) and APR's scalar fallback (iter().zip().map(*).sum()) on canonical synthetic input.Empirical result: both paths produce byte-identical output
0x44191e70 = 612.4756. Asserted as regression-test invariant.This FALSIFIES the refined H2a' hypothesis at the SIMD-vs-scalar level — the cumulative APR↔GGUF drift cannot be explained by APR-internal reduction-order differences.
Two hypothesis falsifications in one session
Refined hypothesis H2d (post-second-falsification)
Pinned in v1.3.0 amendment. The bit-level APR↔GGUF difference must come from one of:
Next M-FFN-GGUF-4 step (c) deliverable
H2d.2 is most directly testable autonomously — load APR F32 weights + GGUF Q4K bytes for same tensor, dequantize Q4K via APR's dequant routine, compare element-wise. If bit-level differs, H2d.2 confirmed and the SHIP-007 fix scope narrows to "fix dequant invariant".
Contract amendment (v1.2.0 → v1.3.0)
Test plan
pv validate0/0#[cfg(test)] mod)🤖 Generated with Claude Code