feat(M-FFN-GGUF-4 step g, A2): softmax saturation amplification falsifier — A2 FALSIFIED (compresses 0.01×)#1541
Merged
noahgift merged 1 commit intoMay 7, 2026
Conversation
4 tasks
Base automatically changed from
feat/m-ffn-gguf-4f-q4k-block-scale-variance
to
main
May 7, 2026 00:26
…fier — A2 FALSIFIED (amplification 0.01×, COMPRESSES) M96 (sibling commit, c7091ab on PR #1540) falsified A3 (block-scale variance). A2 (softmax saturation) is the next-most-tractable synthetic candidate amplifier for the §27 magnitude gap. A2 hypothesis: attention softmax in saturation regime (one logit much larger than others) is non-linear and could amplify tiny logit drift to large probability drift — contributing to the §27 1723% magnitude beyond what M95's 5.70× chained matvec compounding explains. This PR authors `falsify_ffn_gguf_011_softmax_saturation_amplification`. Test: 7-element logit vector with one saturated value (+10.0) and others in normal range; perturbs saturated logit by 0.077% × 10.0 = 0.0077 (M94-equivalent absolute drift); compares numerically-stable softmax output before/after. EMPIRICAL RESULT (2026-05-06): input_rel_drift = 0.051333% (perturbation / |logits|_L1) output_rel_drift = 0.000578% (Σ |p_b - p_a| / Σ p_a) amplification = 0.0113× ← COMPRESSES, not amplifies! **A2 EMPIRICALLY FALSIFIED** in the saturation regime. Mechanism explanation: in saturation, the dominant probability is near 1.0 and tail probabilities are near 0.0. Softmax is LOCALLY linear in this regime — small input perturbations produce proportionally smaller output changes (compression rather than amplification). The 0.01× amplification means softmax suppresses M94 perturbations by ~100×. AMPLIFIER LANDSCAPE POST-A2+A3 FALSIFICATION: - A1 (RoPE phase amplification) — UNTESTED, only remaining synthetic candidate - A2 (Softmax saturation) — FALSIFIED ✗ (compresses) - A3 (Block-scale variance) — FALSIFIED ✗ (linear-scaling) Three additional candidates pinned in v1.8.0 amendment (real-teacher or multi-token testable): - A4 (Multi-token batch dimension) — §27 is 7-token batch; M95 was single - A5 (Real-weight non-uniformity) — heavy-tailed weight distributions - A6 (RMSNorm rsqrt approximation) — non-linearity in normalization Most likely path post-2 sequential falsifications: M-FFN-GGUF-6 (real-teacher) is now the highest-leverage next test. Contract trace-ffn-sub-block-gguf-v1 v1.7.0 → v1.8.0: - FALSIFY-FFN-GGUF-011 NEW → DISCHARGED - M-FFN-GGUF-4 step (g) A2 candidate: NEW → DISCHARGED Stacked atop M96 (PR #1540). Test runs locally: cargo test -p aprender-serve --lib falsify_ffn_gguf_011 -- --nocapture test result: ok. 1 passed; finished in 0.03s Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-011. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1df04ee to
0fafa30
Compare
noahgift
added a commit
that referenced
this pull request
May 7, 2026
…1 FALSIFIED (amplification 1.00×, UNITARY) A1 was the LAST remaining synthetic-testable amplifier candidate after M96 (A3 falsified) and M97 (A2 falsified). With this PR all three synthetic amplifiers are FALSIFIED. A1 hypothesis: RoPE rotates F32 vectors by per-position phase; tiny magnitude drift in pre-RoPE Q becomes ROTATIONAL drift in post-RoPE Q. When Q' is dotted with K' (also rotated), rotational drift may compound non-linearly into larger QK^T attention score drift than the magnitude drift alone. This PR authors `falsify_ffn_gguf_012_rope_phase_amplification`. Test: head_dim=64, rope_theta=10000, Q at position 0 perturbed by 0.077% (M94-equivalent), K at position 1, scaled QK^T. EMPIRICAL RESULT (2026-05-06): input_rel_drift = 0.076997% output_rel_drift = 0.076986% amplification = 0.9999× ← UNITARY, essentially 1× **A1 EMPIRICALLY FALSIFIED.** RoPE rotation is approximately unitary; QK^T dot product preserves drift magnitude exactly. Tiny pre-RoPE perturbation produces a proportional post-attention score drift, NOT amplified. AMPLIFIER LANDSCAPE POST-A1+A2+A3 FALSIFICATION: - A1 (RoPE phase) — FALSIFIED ✗ (unitary) - A2 (Softmax saturation) — FALSIFIED ✗ (compresses) - A3 (Block-scale variance) — FALSIFIED ✗ (linear-scaling) - A4 (Multi-token batch) — UNTESTED, synthetically testable - A5 (Real-weight non-uniformity) — UNTESTED, real-teacher gated - A6 (RMSNorm rsqrt approx) — UNTESTED, real-teacher gated ALL THREE SYNTHETIC-TESTABLE amplifiers are now FALSIFIED. The 28× magnitude gap between M95's synthetic 0.4391% and §27's measured 1723% MUST come from A4 (multi-token), A5 (real-weight), or A6 (RMSNorm). Combined synthetic upper bound: ~5.70× total amplification from 0.077% per-matvec mechanism = ~0.4391% total drift. §27 measured 1723% drift = **3920× residual gap unexplained** by synthetic mechanisms. Either M-FFN-GGUF-6 (real-teacher) shows real-weight non-uniformity produces 3920× larger per-tensor rel_diff than synthetic uniform weights, OR there's a non-decomposable interaction between layers that synthetic falsifiers can't isolate. M-FFN-GGUF-6 (real-teacher falsifier) is now THE highest-leverage remaining test. Contract trace-ffn-sub-block-gguf-v1 v1.8.0 → v1.9.0: - FALSIFY-FFN-GGUF-012 NEW → DISCHARGED - M-FFN-GGUF-4 step (h) A1 candidate: NEW → DISCHARGED - All three synthetic amplifiers DISCHARGED - M-FFN-GGUF-4 step (i) A4 multi-token batch: NEW, PENDING - M-FFN-GGUF-6 real-teacher: now highest-leverage remaining Stacked atop M97 (PR #1541). Test runs locally: cargo test -p aprender-serve --lib falsify_ffn_gguf_012 -- --nocapture test result: ok. 1 passed; finished in 0.00s Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-012. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5 tasks
noahgift
added a commit
that referenced
this pull request
May 7, 2026
…1 FALSIFIED (amplification 1.00×, UNITARY) A1 was the LAST remaining synthetic-testable amplifier candidate after M96 (A3 falsified) and M97 (A2 falsified). With this PR all three synthetic amplifiers are FALSIFIED. A1 hypothesis: RoPE rotates F32 vectors by per-position phase; tiny magnitude drift in pre-RoPE Q becomes ROTATIONAL drift in post-RoPE Q. When Q' is dotted with K' (also rotated), rotational drift may compound non-linearly into larger QK^T attention score drift than the magnitude drift alone. This PR authors `falsify_ffn_gguf_012_rope_phase_amplification`. Test: head_dim=64, rope_theta=10000, Q at position 0 perturbed by 0.077% (M94-equivalent), K at position 1, scaled QK^T. EMPIRICAL RESULT (2026-05-06): input_rel_drift = 0.076997% output_rel_drift = 0.076986% amplification = 0.9999× ← UNITARY, essentially 1× **A1 EMPIRICALLY FALSIFIED.** RoPE rotation is approximately unitary; QK^T dot product preserves drift magnitude exactly. Tiny pre-RoPE perturbation produces a proportional post-attention score drift, NOT amplified. AMPLIFIER LANDSCAPE POST-A1+A2+A3 FALSIFICATION: - A1 (RoPE phase) — FALSIFIED ✗ (unitary) - A2 (Softmax saturation) — FALSIFIED ✗ (compresses) - A3 (Block-scale variance) — FALSIFIED ✗ (linear-scaling) - A4 (Multi-token batch) — UNTESTED, synthetically testable - A5 (Real-weight non-uniformity) — UNTESTED, real-teacher gated - A6 (RMSNorm rsqrt approx) — UNTESTED, real-teacher gated ALL THREE SYNTHETIC-TESTABLE amplifiers are now FALSIFIED. The 28× magnitude gap between M95's synthetic 0.4391% and §27's measured 1723% MUST come from A4 (multi-token), A5 (real-weight), or A6 (RMSNorm). Combined synthetic upper bound: ~5.70× total amplification from 0.077% per-matvec mechanism = ~0.4391% total drift. §27 measured 1723% drift = **3920× residual gap unexplained** by synthetic mechanisms. Either M-FFN-GGUF-6 (real-teacher) shows real-weight non-uniformity produces 3920× larger per-tensor rel_diff than synthetic uniform weights, OR there's a non-decomposable interaction between layers that synthetic falsifiers can't isolate. M-FFN-GGUF-6 (real-teacher falsifier) is now THE highest-leverage remaining test. Contract trace-ffn-sub-block-gguf-v1 v1.8.0 → v1.9.0: - FALSIFY-FFN-GGUF-012 NEW → DISCHARGED - M-FFN-GGUF-4 step (h) A1 candidate: NEW → DISCHARGED - All three synthetic amplifiers DISCHARGED - M-FFN-GGUF-4 step (i) A4 multi-token batch: NEW, PENDING - M-FFN-GGUF-6 real-teacher: now highest-leverage remaining Stacked atop M97 (PR #1541). Test runs locally: cargo test -p aprender-serve --lib falsify_ffn_gguf_012 -- --nocapture test result: ok. 1 passed; finished in 0.00s Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-012. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 7, 2026
…1 FALSIFIED (amplification 1.00×, UNITARY) (#1542) A1 was the LAST remaining synthetic-testable amplifier candidate after M96 (A3 falsified) and M97 (A2 falsified). With this PR all three synthetic amplifiers are FALSIFIED. A1 hypothesis: RoPE rotates F32 vectors by per-position phase; tiny magnitude drift in pre-RoPE Q becomes ROTATIONAL drift in post-RoPE Q. When Q' is dotted with K' (also rotated), rotational drift may compound non-linearly into larger QK^T attention score drift than the magnitude drift alone. This PR authors `falsify_ffn_gguf_012_rope_phase_amplification`. Test: head_dim=64, rope_theta=10000, Q at position 0 perturbed by 0.077% (M94-equivalent), K at position 1, scaled QK^T. EMPIRICAL RESULT (2026-05-06): input_rel_drift = 0.076997% output_rel_drift = 0.076986% amplification = 0.9999× ← UNITARY, essentially 1× **A1 EMPIRICALLY FALSIFIED.** RoPE rotation is approximately unitary; QK^T dot product preserves drift magnitude exactly. Tiny pre-RoPE perturbation produces a proportional post-attention score drift, NOT amplified. AMPLIFIER LANDSCAPE POST-A1+A2+A3 FALSIFICATION: - A1 (RoPE phase) — FALSIFIED ✗ (unitary) - A2 (Softmax saturation) — FALSIFIED ✗ (compresses) - A3 (Block-scale variance) — FALSIFIED ✗ (linear-scaling) - A4 (Multi-token batch) — UNTESTED, synthetically testable - A5 (Real-weight non-uniformity) — UNTESTED, real-teacher gated - A6 (RMSNorm rsqrt approx) — UNTESTED, real-teacher gated ALL THREE SYNTHETIC-TESTABLE amplifiers are now FALSIFIED. The 28× magnitude gap between M95's synthetic 0.4391% and §27's measured 1723% MUST come from A4 (multi-token), A5 (real-weight), or A6 (RMSNorm). Combined synthetic upper bound: ~5.70× total amplification from 0.077% per-matvec mechanism = ~0.4391% total drift. §27 measured 1723% drift = **3920× residual gap unexplained** by synthetic mechanisms. Either M-FFN-GGUF-6 (real-teacher) shows real-weight non-uniformity produces 3920× larger per-tensor rel_diff than synthetic uniform weights, OR there's a non-decomposable interaction between layers that synthetic falsifiers can't isolate. M-FFN-GGUF-6 (real-teacher falsifier) is now THE highest-leverage remaining test. Contract trace-ffn-sub-block-gguf-v1 v1.8.0 → v1.9.0: - FALSIFY-FFN-GGUF-012 NEW → DISCHARGED - M-FFN-GGUF-4 step (h) A1 candidate: NEW → DISCHARGED - All three synthetic amplifiers DISCHARGED - M-FFN-GGUF-4 step (i) A4 multi-token batch: NEW, PENDING - M-FFN-GGUF-6 real-teacher: now highest-leverage remaining Stacked atop M97 (PR #1541). Test runs locally: cargo test -p aprender-serve --lib falsify_ffn_gguf_012 -- --nocapture test result: ok. 1 passed; finished in 0.00s Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-012. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
5 tasks
noahgift
added a commit
that referenced
this pull request
May 7, 2026
…s decompose §27 1723% within rounding — fix scope EMPIRICALLY VALIDATED — spec v3.03.0 → v3.04.0 (#1546) Two-day autonomous /loop session shipped 11 lib-test + 1 integration-test falsifiers (M91-M101, aprender PRs #1535/#1536/#1537/#1538/#1540/#1541/ #1542/#1543/#1544/#1545) decomposing the §27 layer-3 ffn_swigl 18.23× APR-vs-GGUF std-ratio. Final empirical decomposition (2026-05-07): M94 mechanism × M95 compounding × M99 std-ratio × A5 real-teacher × residual = 0.077% × 5.70× × 50× × 5.56× × 14× ≈ 1715% ≈ §27's 1723% (within rounding) Six synthetic amplifier candidates resolved: - A1 (RoPE phase, M98) — FALSIFIED 1.00× UNITARY - A2 (Softmax saturation, M97) — FALSIFIED 0.01× COMPRESSES - A3 (Block-scale variance, M96) — FALSIFIED 1.00× SCALE-INVARIANT - A4 (Multi-token batch, M99) — FALSIFIED 0.26× per-token + 50× std-ratio - A5 (Real-weight non-uniformity, M100) — PARTIALLY CONFIRMED 5.56× LIVE - A6 (RMSNorm rsqrt, M101) — FALSIFIED 1.00× HOMOGENEOUS 14× residual is now attributed entirely to cumulative-layer interaction. SHIP-007 §22 fix scope EMPIRICALLY VALIDATED as Option-A (PROMOTE GGUF-PATH semantics into APR forward): switching APR's `f32_matmul` to Q8K activation quant + fused matvec semantics will recover the 5.56× per-matvec amplification on every matmul, eliminating cumulative APR-vs-GGUF drift. Estimated fix scope ~250-400 LOC; transitively discharges 5 MODEL-1 PARTIALs (SHIP-002, SHIP-005, SHIP-006, SHIP-007, SHIP-008) per §17.5. Cascade methodology consolidated: - ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_cascade_decomposes_magnitude.md - ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_chain_assert_difference.md Companion-spec entries M91-M101 in claude-code-parity-apr/docs/ specifications/claude-code-parity-apr-poc.md provide the full per-PR narrative. Aprender contract `contracts/trace-ffn-sub-block-gguf-v1.yaml` v1.0.0 → v1.12.0 across 12 amendments. MODEL-1 ship %: unchanged at 91% until M-FFN-GGUF-5 (actual fix PR) lands. MODEL-2 ship %: unchanged at 57% until step 5g.3 produces val_loss < 9.38. Spec v3.03.0 → v3.04.0. Atomic next action banner only — full §59 narrative deferred to deliberate-session work alongside M-FFN-GGUF-5 fix PR. Refs PMAT-CCPA, SHIP-007 §22, M91-M101 cascade. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked atop PR #1540 (M96 A3 falsification). Will be re-targeted to
mainwhen #1540 merges.M96 (parent PR) falsified A3 (block-scale variance). A2 (softmax saturation) was the next-most-tractable synthetic candidate amplifier.
A2 hypothesis: attention softmax in saturation regime amplifies tiny logit drift non-linearly. This PR tests it directly.
Empirical result (2026-05-06)
A2 EMPIRICALLY FALSIFIED. Softmax in saturation regime suppresses M94 perturbations by ~100×.
Amplifier landscape post-A2+A3 falsification
Three additional candidates pinned in v1.8.0 (real-teacher or multi-token testable):
Most likely path post-2 sequential falsifications: M-FFN-GGUF-6 (real-teacher) is now the highest-leverage next test.
Status changes
contracts/trace-ffn-sub-block-gguf-v1.yamlv1.7.0 → v1.8.0:pv validate→ 0 errors / 0 warnings on v1.8.0.Test plan
pv validate contracts/trace-ffn-sub-block-gguf-v1.yaml→ greencargo test -p aprender-serve --lib falsify_ffn_gguf_011→ greenmainand rebase🤖 Generated with Claude Code