Skip to content

feat(M-FFN-GGUF-4 step g, A2): softmax saturation amplification falsifier — A2 FALSIFIED (compresses 0.01×)#1541

Merged
noahgift merged 1 commit into
mainfrom
feat/m-ffn-gguf-4g-softmax-saturation-amplification
May 7, 2026
Merged

feat(M-FFN-GGUF-4 step g, A2): softmax saturation amplification falsifier — A2 FALSIFIED (compresses 0.01×)#1541
noahgift merged 1 commit into
mainfrom
feat/m-ffn-gguf-4g-softmax-saturation-amplification

Conversation

@noahgift

@noahgift noahgift commented May 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Stacked atop PR #1540 (M96 A3 falsification). Will be re-targeted to main when #1540 merges.

M96 (parent PR) falsified A3 (block-scale variance). A2 (softmax saturation) was the next-most-tractable synthetic candidate amplifier.

A2 hypothesis: attention softmax in saturation regime amplifies tiny logit drift non-linearly. This PR tests it directly.

Empirical result (2026-05-06)

logits (saturated at index 3):    [-1.5, 0.5, -0.8, 10.0, 1.2, -0.3, 0.7]
perturbation on saturated logit:  +0.0077 (= 0.077% × 10.0, M94-equivalent)
input rel_drift  = 0.051333%
output rel_drift = 0.000578%
amplification    = 0.0113×        ← COMPRESSES, not amplifies!

A2 EMPIRICALLY FALSIFIED. Softmax in saturation regime suppresses M94 perturbations by ~100×.

Amplifier landscape post-A2+A3 falsification

Amplifier Status
A1 (RoPE phase amplification) UNTESTED, only remaining synthetic candidate
A2 (Softmax saturation) FALSIFIED ✗ (compresses)
A3 (Block-scale variance) FALSIFIED ✗ (linear-scaling)

Three additional candidates pinned in v1.8.0 (real-teacher or multi-token testable):

  • A4 (Multi-token batch dimension)
  • A5 (Real-weight non-uniformity)
  • A6 (RMSNorm rsqrt approximation)

Most likely path post-2 sequential falsifications: M-FFN-GGUF-6 (real-teacher) is now the highest-leverage next test.

Status changes

contracts/trace-ffn-sub-block-gguf-v1.yaml v1.7.0 → v1.8.0:

  • FALSIFY-FFN-GGUF-011 NEW → DISCHARGED
  • M-FFN-GGUF-4 step (g) A2 candidate: NEW → DISCHARGED

pv validate → 0 errors / 0 warnings on v1.8.0.

Test plan

🤖 Generated with Claude Code

Base automatically changed from feat/m-ffn-gguf-4f-q4k-block-scale-variance to main May 7, 2026 00:26
…fier — A2 FALSIFIED (amplification 0.01×, COMPRESSES)

M96 (sibling commit, c7091ab on PR #1540) falsified A3 (block-scale
variance). A2 (softmax saturation) is the next-most-tractable synthetic
candidate amplifier for the §27 magnitude gap.

A2 hypothesis: attention softmax in saturation regime (one logit much
larger than others) is non-linear and could amplify tiny logit drift
to large probability drift — contributing to the §27 1723% magnitude
beyond what M95's 5.70× chained matvec compounding explains.

This PR authors `falsify_ffn_gguf_011_softmax_saturation_amplification`.
Test: 7-element logit vector with one saturated value (+10.0) and
others in normal range; perturbs saturated logit by 0.077% × 10.0
= 0.0077 (M94-equivalent absolute drift); compares numerically-stable
softmax output before/after.

EMPIRICAL RESULT (2026-05-06):
  input_rel_drift  = 0.051333% (perturbation / |logits|_L1)
  output_rel_drift = 0.000578% (Σ |p_b - p_a| / Σ p_a)
  amplification    = 0.0113×   ← COMPRESSES, not amplifies!

**A2 EMPIRICALLY FALSIFIED** in the saturation regime.

Mechanism explanation: in saturation, the dominant probability is
near 1.0 and tail probabilities are near 0.0. Softmax is LOCALLY
linear in this regime — small input perturbations produce
proportionally smaller output changes (compression rather than
amplification). The 0.01× amplification means softmax suppresses
M94 perturbations by ~100×.

AMPLIFIER LANDSCAPE POST-A2+A3 FALSIFICATION:
- A1 (RoPE phase amplification) — UNTESTED, only remaining synthetic candidate
- A2 (Softmax saturation)       — FALSIFIED ✗ (compresses)
- A3 (Block-scale variance)     — FALSIFIED ✗ (linear-scaling)

Three additional candidates pinned in v1.8.0 amendment (real-teacher
or multi-token testable):
- A4 (Multi-token batch dimension) — §27 is 7-token batch; M95 was single
- A5 (Real-weight non-uniformity) — heavy-tailed weight distributions
- A6 (RMSNorm rsqrt approximation) — non-linearity in normalization

Most likely path post-2 sequential falsifications: M-FFN-GGUF-6
(real-teacher) is now the highest-leverage next test.

Contract trace-ffn-sub-block-gguf-v1 v1.7.0 → v1.8.0:
- FALSIFY-FFN-GGUF-011 NEW → DISCHARGED
- M-FFN-GGUF-4 step (g) A2 candidate: NEW → DISCHARGED

Stacked atop M96 (PR #1540).

Test runs locally:
  cargo test -p aprender-serve --lib falsify_ffn_gguf_011 -- --nocapture
  test result: ok. 1 passed; finished in 0.03s

Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-011.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/m-ffn-gguf-4g-softmax-saturation-amplification branch from 1df04ee to 0fafa30 Compare May 7, 2026 00:43
@noahgift noahgift enabled auto-merge (squash) May 7, 2026 00:43
noahgift added a commit that referenced this pull request May 7, 2026
…1 FALSIFIED (amplification 1.00×, UNITARY)

A1 was the LAST remaining synthetic-testable amplifier candidate
after M96 (A3 falsified) and M97 (A2 falsified). With this PR all
three synthetic amplifiers are FALSIFIED.

A1 hypothesis: RoPE rotates F32 vectors by per-position phase;
tiny magnitude drift in pre-RoPE Q becomes ROTATIONAL drift in
post-RoPE Q. When Q' is dotted with K' (also rotated), rotational
drift may compound non-linearly into larger QK^T attention score
drift than the magnitude drift alone.

This PR authors `falsify_ffn_gguf_012_rope_phase_amplification`.
Test: head_dim=64, rope_theta=10000, Q at position 0 perturbed
by 0.077% (M94-equivalent), K at position 1, scaled QK^T.

EMPIRICAL RESULT (2026-05-06):
  input_rel_drift  = 0.076997%
  output_rel_drift = 0.076986%
  amplification    = 0.9999×  ← UNITARY, essentially 1×

**A1 EMPIRICALLY FALSIFIED.** RoPE rotation is approximately
unitary; QK^T dot product preserves drift magnitude exactly.
Tiny pre-RoPE perturbation produces a proportional post-attention
score drift, NOT amplified.

AMPLIFIER LANDSCAPE POST-A1+A2+A3 FALSIFICATION:
- A1 (RoPE phase)            — FALSIFIED ✗ (unitary)
- A2 (Softmax saturation)    — FALSIFIED ✗ (compresses)
- A3 (Block-scale variance)  — FALSIFIED ✗ (linear-scaling)
- A4 (Multi-token batch)     — UNTESTED, synthetically testable
- A5 (Real-weight non-uniformity) — UNTESTED, real-teacher gated
- A6 (RMSNorm rsqrt approx)  — UNTESTED, real-teacher gated

ALL THREE SYNTHETIC-TESTABLE amplifiers are now FALSIFIED. The
28× magnitude gap between M95's synthetic 0.4391% and §27's
measured 1723% MUST come from A4 (multi-token), A5 (real-weight),
or A6 (RMSNorm).

Combined synthetic upper bound: ~5.70× total amplification from
0.077% per-matvec mechanism = ~0.4391% total drift. §27 measured
1723% drift = **3920× residual gap unexplained** by synthetic
mechanisms.

Either M-FFN-GGUF-6 (real-teacher) shows real-weight non-uniformity
produces 3920× larger per-tensor rel_diff than synthetic uniform
weights, OR there's a non-decomposable interaction between layers
that synthetic falsifiers can't isolate.

M-FFN-GGUF-6 (real-teacher falsifier) is now THE highest-leverage
remaining test.

Contract trace-ffn-sub-block-gguf-v1 v1.8.0 → v1.9.0:
- FALSIFY-FFN-GGUF-012 NEW → DISCHARGED
- M-FFN-GGUF-4 step (h) A1 candidate: NEW → DISCHARGED
- All three synthetic amplifiers DISCHARGED
- M-FFN-GGUF-4 step (i) A4 multi-token batch: NEW, PENDING
- M-FFN-GGUF-6 real-teacher: now highest-leverage remaining

Stacked atop M97 (PR #1541).

Test runs locally:
  cargo test -p aprender-serve --lib falsify_ffn_gguf_012 -- --nocapture
  test result: ok. 1 passed; finished in 0.00s

Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-012.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 916cc45 into main May 7, 2026
10 checks passed
@noahgift noahgift deleted the feat/m-ffn-gguf-4g-softmax-saturation-amplification branch May 7, 2026 01:07
noahgift added a commit that referenced this pull request May 7, 2026
…1 FALSIFIED (amplification 1.00×, UNITARY)

A1 was the LAST remaining synthetic-testable amplifier candidate
after M96 (A3 falsified) and M97 (A2 falsified). With this PR all
three synthetic amplifiers are FALSIFIED.

A1 hypothesis: RoPE rotates F32 vectors by per-position phase;
tiny magnitude drift in pre-RoPE Q becomes ROTATIONAL drift in
post-RoPE Q. When Q' is dotted with K' (also rotated), rotational
drift may compound non-linearly into larger QK^T attention score
drift than the magnitude drift alone.

This PR authors `falsify_ffn_gguf_012_rope_phase_amplification`.
Test: head_dim=64, rope_theta=10000, Q at position 0 perturbed
by 0.077% (M94-equivalent), K at position 1, scaled QK^T.

EMPIRICAL RESULT (2026-05-06):
  input_rel_drift  = 0.076997%
  output_rel_drift = 0.076986%
  amplification    = 0.9999×  ← UNITARY, essentially 1×

**A1 EMPIRICALLY FALSIFIED.** RoPE rotation is approximately
unitary; QK^T dot product preserves drift magnitude exactly.
Tiny pre-RoPE perturbation produces a proportional post-attention
score drift, NOT amplified.

AMPLIFIER LANDSCAPE POST-A1+A2+A3 FALSIFICATION:
- A1 (RoPE phase)            — FALSIFIED ✗ (unitary)
- A2 (Softmax saturation)    — FALSIFIED ✗ (compresses)
- A3 (Block-scale variance)  — FALSIFIED ✗ (linear-scaling)
- A4 (Multi-token batch)     — UNTESTED, synthetically testable
- A5 (Real-weight non-uniformity) — UNTESTED, real-teacher gated
- A6 (RMSNorm rsqrt approx)  — UNTESTED, real-teacher gated

ALL THREE SYNTHETIC-TESTABLE amplifiers are now FALSIFIED. The
28× magnitude gap between M95's synthetic 0.4391% and §27's
measured 1723% MUST come from A4 (multi-token), A5 (real-weight),
or A6 (RMSNorm).

Combined synthetic upper bound: ~5.70× total amplification from
0.077% per-matvec mechanism = ~0.4391% total drift. §27 measured
1723% drift = **3920× residual gap unexplained** by synthetic
mechanisms.

Either M-FFN-GGUF-6 (real-teacher) shows real-weight non-uniformity
produces 3920× larger per-tensor rel_diff than synthetic uniform
weights, OR there's a non-decomposable interaction between layers
that synthetic falsifiers can't isolate.

M-FFN-GGUF-6 (real-teacher falsifier) is now THE highest-leverage
remaining test.

Contract trace-ffn-sub-block-gguf-v1 v1.8.0 → v1.9.0:
- FALSIFY-FFN-GGUF-012 NEW → DISCHARGED
- M-FFN-GGUF-4 step (h) A1 candidate: NEW → DISCHARGED
- All three synthetic amplifiers DISCHARGED
- M-FFN-GGUF-4 step (i) A4 multi-token batch: NEW, PENDING
- M-FFN-GGUF-6 real-teacher: now highest-leverage remaining

Stacked atop M97 (PR #1541).

Test runs locally:
  cargo test -p aprender-serve --lib falsify_ffn_gguf_012 -- --nocapture
  test result: ok. 1 passed; finished in 0.00s

Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-012.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 7, 2026
…1 FALSIFIED (amplification 1.00×, UNITARY) (#1542)

A1 was the LAST remaining synthetic-testable amplifier candidate
after M96 (A3 falsified) and M97 (A2 falsified). With this PR all
three synthetic amplifiers are FALSIFIED.

A1 hypothesis: RoPE rotates F32 vectors by per-position phase;
tiny magnitude drift in pre-RoPE Q becomes ROTATIONAL drift in
post-RoPE Q. When Q' is dotted with K' (also rotated), rotational
drift may compound non-linearly into larger QK^T attention score
drift than the magnitude drift alone.

This PR authors `falsify_ffn_gguf_012_rope_phase_amplification`.
Test: head_dim=64, rope_theta=10000, Q at position 0 perturbed
by 0.077% (M94-equivalent), K at position 1, scaled QK^T.

EMPIRICAL RESULT (2026-05-06):
  input_rel_drift  = 0.076997%
  output_rel_drift = 0.076986%
  amplification    = 0.9999×  ← UNITARY, essentially 1×

**A1 EMPIRICALLY FALSIFIED.** RoPE rotation is approximately
unitary; QK^T dot product preserves drift magnitude exactly.
Tiny pre-RoPE perturbation produces a proportional post-attention
score drift, NOT amplified.

AMPLIFIER LANDSCAPE POST-A1+A2+A3 FALSIFICATION:
- A1 (RoPE phase)            — FALSIFIED ✗ (unitary)
- A2 (Softmax saturation)    — FALSIFIED ✗ (compresses)
- A3 (Block-scale variance)  — FALSIFIED ✗ (linear-scaling)
- A4 (Multi-token batch)     — UNTESTED, synthetically testable
- A5 (Real-weight non-uniformity) — UNTESTED, real-teacher gated
- A6 (RMSNorm rsqrt approx)  — UNTESTED, real-teacher gated

ALL THREE SYNTHETIC-TESTABLE amplifiers are now FALSIFIED. The
28× magnitude gap between M95's synthetic 0.4391% and §27's
measured 1723% MUST come from A4 (multi-token), A5 (real-weight),
or A6 (RMSNorm).

Combined synthetic upper bound: ~5.70× total amplification from
0.077% per-matvec mechanism = ~0.4391% total drift. §27 measured
1723% drift = **3920× residual gap unexplained** by synthetic
mechanisms.

Either M-FFN-GGUF-6 (real-teacher) shows real-weight non-uniformity
produces 3920× larger per-tensor rel_diff than synthetic uniform
weights, OR there's a non-decomposable interaction between layers
that synthetic falsifiers can't isolate.

M-FFN-GGUF-6 (real-teacher falsifier) is now THE highest-leverage
remaining test.

Contract trace-ffn-sub-block-gguf-v1 v1.8.0 → v1.9.0:
- FALSIFY-FFN-GGUF-012 NEW → DISCHARGED
- M-FFN-GGUF-4 step (h) A1 candidate: NEW → DISCHARGED
- All three synthetic amplifiers DISCHARGED
- M-FFN-GGUF-4 step (i) A4 multi-token batch: NEW, PENDING
- M-FFN-GGUF-6 real-teacher: now highest-leverage remaining

Stacked atop M97 (PR #1541).

Test runs locally:
  cargo test -p aprender-serve --lib falsify_ffn_gguf_012 -- --nocapture
  test result: ok. 1 passed; finished in 0.00s

Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-012.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 7, 2026
…s decompose §27 1723% within rounding — fix scope EMPIRICALLY VALIDATED — spec v3.03.0 → v3.04.0 (#1546)

Two-day autonomous /loop session shipped 11 lib-test + 1 integration-test
falsifiers (M91-M101, aprender PRs #1535/#1536/#1537/#1538/#1540/#1541/
#1542/#1543/#1544/#1545) decomposing the §27 layer-3 ffn_swigl 18.23×
APR-vs-GGUF std-ratio.

Final empirical decomposition (2026-05-07):

  M94 mechanism × M95 compounding × M99 std-ratio × A5 real-teacher × residual
  = 0.077% × 5.70× × 50× × 5.56× × 14×
  ≈ 1715%   ≈   §27's 1723% (within rounding)

Six synthetic amplifier candidates resolved:
- A1 (RoPE phase, M98)        — FALSIFIED 1.00× UNITARY
- A2 (Softmax saturation, M97) — FALSIFIED 0.01× COMPRESSES
- A3 (Block-scale variance, M96) — FALSIFIED 1.00× SCALE-INVARIANT
- A4 (Multi-token batch, M99) — FALSIFIED 0.26× per-token + 50× std-ratio
- A5 (Real-weight non-uniformity, M100) — PARTIALLY CONFIRMED 5.56× LIVE
- A6 (RMSNorm rsqrt, M101)    — FALSIFIED 1.00× HOMOGENEOUS

14× residual is now attributed entirely to cumulative-layer interaction.

SHIP-007 §22 fix scope EMPIRICALLY VALIDATED as Option-A (PROMOTE
GGUF-PATH semantics into APR forward): switching APR's `f32_matmul`
to Q8K activation quant + fused matvec semantics will recover the
5.56× per-matvec amplification on every matmul, eliminating cumulative
APR-vs-GGUF drift. Estimated fix scope ~250-400 LOC; transitively
discharges 5 MODEL-1 PARTIALs (SHIP-002, SHIP-005, SHIP-006, SHIP-007,
SHIP-008) per §17.5.

Cascade methodology consolidated:
- ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_cascade_decomposes_magnitude.md
- ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_chain_assert_difference.md

Companion-spec entries M91-M101 in claude-code-parity-apr/docs/
specifications/claude-code-parity-apr-poc.md provide the full per-PR
narrative. Aprender contract `contracts/trace-ffn-sub-block-gguf-v1.yaml`
v1.0.0 → v1.12.0 across 12 amendments.

MODEL-1 ship %: unchanged at 91% until M-FFN-GGUF-5 (actual fix PR) lands.
MODEL-2 ship %: unchanged at 57% until step 5g.3 produces val_loss < 9.38.

Spec v3.03.0 → v3.04.0. Atomic next action banner only — full §59
narrative deferred to deliberate-session work alongside M-FFN-GGUF-5
fix PR.

Refs PMAT-CCPA, SHIP-007 §22, M91-M101 cascade.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant