Skip to content

feat(M-FFN-GGUF-4 step f, A3): Q4K block-scale variance falsifier — A3 FALSIFIED (variance_factor 1.00×)#1540

Merged
noahgift merged 1 commit into
mainfrom
feat/m-ffn-gguf-4f-q4k-block-scale-variance
May 7, 2026
Merged

feat(M-FFN-GGUF-4 step f, A3): Q4K block-scale variance falsifier — A3 FALSIFIED (variance_factor 1.00×)#1540
noahgift merged 1 commit into
mainfrom
feat/m-ffn-gguf-4f-q4k-block-scale-variance

Conversation

@noahgift

@noahgift noahgift commented May 6, 2026

Copy link
Copy Markdown
Contributor

Summary

M95 (now landed on main via #1538 squash) recorded a 28× magnitude gap between synthetic 0.4391% (5-tensor chained) and §27's 1723% (18.23× std-ratio at layer-3 ffn_swigl). Three candidate amplifiers were pinned: A1 (RoPE phase), A2 (softmax saturation), A3 (real-weight magnitude variance). A3 was the strongest candidate because real Qwen Q4K weights have huge per-tensor magnitude variance not present in synthetic tests.

This PR authors falsify_ffn_gguf_010_q4k_block_scale_variance testing whether per-block scale variance amplifies M94 mechanism beyond linear-scaling. Compares Path A vs Path B at 7 block scales spanning 4 orders of magnitude.

Empirical result (2026-05-06)

scale (d) Path A Path B rel_diff
0.001 -15.4086 -15.4228 0.091873%
0.01 -155.39 -155.53 0.091873%
0.05 -631.04 -631.62 0.091924%
0.1 -1553.19 -1554.62 0.092017%
0.5 -7767.85 -7774.99 0.091932%
1.0 -15535.70 -15549.99 0.091932%
10.0 -155356.97 -155499.84 0.091966%

variance_factor = max/min = 1.00× across 4 orders of magnitude.

A3 EMPIRICALLY FALSIFIED at per-block granularity

The M94 mechanism is LINEAR-SCALING: Path A and Path B both scale proportionally with block magnitude, so rel_diff (a RATIO) is scale-INVARIANT.

Amplifier landscape post-A3 falsification

Amplifier Status
A1 (RoPE phase amplification) UNTESTED, candidate
A2 (Softmax saturation) UNTESTED, candidate
A3 (Block-scale variance) FALSIFIED ✗

Per-block magnitude variance in real Qwen weights does NOT amplify M94 mechanism beyond the measured 0.077-0.092% rel_diff baseline.

Next investigation candidate

M-FFN-GGUF-4 step (g): A2 (softmax saturation) — small synthetic test with one near-saturated logit + tiny perturbation, measure softmax(logits) drift.

A1 (RoPE phase) is harder to test in isolation. M-FFN-GGUF-6 (real-teacher) remains the most-direct test but is gated on operator dispatch.

Status changes

contracts/trace-ffn-sub-block-gguf-v1.yaml v1.6.0 → v1.7.0:

  • FALSIFY-FFN-GGUF-010 NEW → DISCHARGED
  • M-FFN-GGUF-4 step (f) A3 candidate: NEW → DISCHARGED

pv validate0 errors / 0 warnings on v1.7.0.

Test plan

  • pv validate contracts/trace-ffn-sub-block-gguf-v1.yaml → green
  • cargo test -p aprender-serve --lib falsify_ffn_gguf_010 → green
  • Production hot paths byte-unchanged (additive test only)
  • Test asserts rel_diff > 1e-7 per scale (sanity bound)
  • CI workspace-test green
  • Auto-merge once required checks pass

🤖 Generated with Claude Code

…3 FALSIFIED (variance_factor 1.00× across 4 orders)

M95 (sibling commit, c641d2d) recorded a 28× magnitude gap between
M95's synthetic 0.4391% (5-tensor chained) and §27's 1723% (18.23×
std-ratio at layer-3 ffn_swigl). Three candidate amplifiers were
pinned for M-FFN-GGUF-6 investigation: A1 (RoPE phase), A2 (Softmax
saturation), A3 (Real-weight magnitude variance).

A3 was the strongest candidate because real Qwen Q4K weights have
huge per-tensor magnitude variance not present in synthetic tests.
Hypothesis: per-block scale variance amplifies M94 mechanism beyond
linear-scaling.

This PR authors `falsify_ffn_gguf_010_q4k_block_scale_variance` in
`crates/aprender-serve/src/apr_transformer/helpers.rs::
determinism_tests`. Test compares Path A vs Path B per-block
divergence at 7 block scales spanning 4 orders of magnitude:
  d ∈ {0.001, 0.01, 0.05, 0.1, 0.5, 1.0, 10.0}

EMPIRICAL RESULT (2026-05-06):
  d=0.001:  0.091873% rel_diff
  d=0.01:   0.091873%
  d=0.05:   0.091924%
  d=0.1:    0.092017%
  d=0.5:    0.091932%
  d=1.0:    0.091932%
  d=10.0:   0.091966%

variance_factor = max/min = **1.00×** across 4 orders of magnitude.

**A3 EMPIRICALLY FALSIFIED** at per-block granularity. The M94
mechanism is LINEAR-SCALING: Path A and Path B both scale
proportionally with block magnitude, so rel_diff (a RATIO) is
scale-INVARIANT.

AMPLIFIER LANDSCAPE POST-A3 FALSIFICATION:
- A1 (RoPE phase amplification) — UNTESTED, candidate
- A2 (Softmax saturation)       — UNTESTED, candidate
- A3 (Block-scale variance)     — FALSIFIED ✗

Per-block magnitude variance in real Qwen weights does NOT amplify
M94 mechanism beyond the measured 0.077-0.092% rel_diff baseline.

NEXT INVESTIGATION CANDIDATE (M-FFN-GGUF-4 step (g)): A2 (softmax
saturation) is the simplest synthetic test. A1 (RoPE phase) is
harder to test in isolation. M-FFN-GGUF-6 (real-teacher) remains
the most-direct test but is gated on operator dispatch.

Contract trace-ffn-sub-block-gguf-v1 v1.6.0 → v1.7.0:
- FALSIFY-FFN-GGUF-010 NEW → DISCHARGED
- M-FFN-GGUF-4 step (f) A3 candidate: NEW → DISCHARGED

Stacked atop the M94+M95 branch. Will rebase on main after #1538
merges (which carries M94 + M95).

Test runs locally:
  cargo test -p aprender-serve --lib falsify_ffn_gguf_010 -- --nocapture
  test result: ok. 1 passed; finished in 0.00s

Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-010.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 6, 2026 23:41
@noahgift noahgift merged commit 4b385a0 into main May 7, 2026
19 of 21 checks passed
@noahgift noahgift deleted the feat/m-ffn-gguf-4f-q4k-block-scale-variance branch May 7, 2026 00:26
noahgift added a commit that referenced this pull request May 7, 2026
…fier — A2 FALSIFIED (amplification 0.01×, COMPRESSES)

M96 (sibling commit, c7091ab on PR #1540) falsified A3 (block-scale
variance). A2 (softmax saturation) is the next-most-tractable synthetic
candidate amplifier for the §27 magnitude gap.

A2 hypothesis: attention softmax in saturation regime (one logit much
larger than others) is non-linear and could amplify tiny logit drift
to large probability drift — contributing to the §27 1723% magnitude
beyond what M95's 5.70× chained matvec compounding explains.

This PR authors `falsify_ffn_gguf_011_softmax_saturation_amplification`.
Test: 7-element logit vector with one saturated value (+10.0) and
others in normal range; perturbs saturated logit by 0.077% × 10.0
= 0.0077 (M94-equivalent absolute drift); compares numerically-stable
softmax output before/after.

EMPIRICAL RESULT (2026-05-06):
  input_rel_drift  = 0.051333% (perturbation / |logits|_L1)
  output_rel_drift = 0.000578% (Σ |p_b - p_a| / Σ p_a)
  amplification    = 0.0113×   ← COMPRESSES, not amplifies!

**A2 EMPIRICALLY FALSIFIED** in the saturation regime.

Mechanism explanation: in saturation, the dominant probability is
near 1.0 and tail probabilities are near 0.0. Softmax is LOCALLY
linear in this regime — small input perturbations produce
proportionally smaller output changes (compression rather than
amplification). The 0.01× amplification means softmax suppresses
M94 perturbations by ~100×.

AMPLIFIER LANDSCAPE POST-A2+A3 FALSIFICATION:
- A1 (RoPE phase amplification) — UNTESTED, only remaining synthetic candidate
- A2 (Softmax saturation)       — FALSIFIED ✗ (compresses)
- A3 (Block-scale variance)     — FALSIFIED ✗ (linear-scaling)

Three additional candidates pinned in v1.8.0 amendment (real-teacher
or multi-token testable):
- A4 (Multi-token batch dimension) — §27 is 7-token batch; M95 was single
- A5 (Real-weight non-uniformity) — heavy-tailed weight distributions
- A6 (RMSNorm rsqrt approximation) — non-linearity in normalization

Most likely path post-2 sequential falsifications: M-FFN-GGUF-6
(real-teacher) is now the highest-leverage next test.

Contract trace-ffn-sub-block-gguf-v1 v1.7.0 → v1.8.0:
- FALSIFY-FFN-GGUF-011 NEW → DISCHARGED
- M-FFN-GGUF-4 step (g) A2 candidate: NEW → DISCHARGED

Stacked atop M96 (PR #1540).

Test runs locally:
  cargo test -p aprender-serve --lib falsify_ffn_gguf_011 -- --nocapture
  test result: ok. 1 passed; finished in 0.03s

Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-011.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 7, 2026
…fier — A2 FALSIFIED (amplification 0.01×, COMPRESSES)

M96 (sibling commit, c7091ab on PR #1540) falsified A3 (block-scale
variance). A2 (softmax saturation) is the next-most-tractable synthetic
candidate amplifier for the §27 magnitude gap.

A2 hypothesis: attention softmax in saturation regime (one logit much
larger than others) is non-linear and could amplify tiny logit drift
to large probability drift — contributing to the §27 1723% magnitude
beyond what M95's 5.70× chained matvec compounding explains.

This PR authors `falsify_ffn_gguf_011_softmax_saturation_amplification`.
Test: 7-element logit vector with one saturated value (+10.0) and
others in normal range; perturbs saturated logit by 0.077% × 10.0
= 0.0077 (M94-equivalent absolute drift); compares numerically-stable
softmax output before/after.

EMPIRICAL RESULT (2026-05-06):
  input_rel_drift  = 0.051333% (perturbation / |logits|_L1)
  output_rel_drift = 0.000578% (Σ |p_b - p_a| / Σ p_a)
  amplification    = 0.0113×   ← COMPRESSES, not amplifies!

**A2 EMPIRICALLY FALSIFIED** in the saturation regime.

Mechanism explanation: in saturation, the dominant probability is
near 1.0 and tail probabilities are near 0.0. Softmax is LOCALLY
linear in this regime — small input perturbations produce
proportionally smaller output changes (compression rather than
amplification). The 0.01× amplification means softmax suppresses
M94 perturbations by ~100×.

AMPLIFIER LANDSCAPE POST-A2+A3 FALSIFICATION:
- A1 (RoPE phase amplification) — UNTESTED, only remaining synthetic candidate
- A2 (Softmax saturation)       — FALSIFIED ✗ (compresses)
- A3 (Block-scale variance)     — FALSIFIED ✗ (linear-scaling)

Three additional candidates pinned in v1.8.0 amendment (real-teacher
or multi-token testable):
- A4 (Multi-token batch dimension) — §27 is 7-token batch; M95 was single
- A5 (Real-weight non-uniformity) — heavy-tailed weight distributions
- A6 (RMSNorm rsqrt approximation) — non-linearity in normalization

Most likely path post-2 sequential falsifications: M-FFN-GGUF-6
(real-teacher) is now the highest-leverage next test.

Contract trace-ffn-sub-block-gguf-v1 v1.7.0 → v1.8.0:
- FALSIFY-FFN-GGUF-011 NEW → DISCHARGED
- M-FFN-GGUF-4 step (g) A2 candidate: NEW → DISCHARGED

Stacked atop M96 (PR #1540).

Test runs locally:
  cargo test -p aprender-serve --lib falsify_ffn_gguf_011 -- --nocapture
  test result: ok. 1 passed; finished in 0.03s

Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-011.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 7, 2026
…fier — A2 FALSIFIED (amplification 0.01×, COMPRESSES) (#1541)

M96 (sibling commit, c7091ab on PR #1540) falsified A3 (block-scale
variance). A2 (softmax saturation) is the next-most-tractable synthetic
candidate amplifier for the §27 magnitude gap.

A2 hypothesis: attention softmax in saturation regime (one logit much
larger than others) is non-linear and could amplify tiny logit drift
to large probability drift — contributing to the §27 1723% magnitude
beyond what M95's 5.70× chained matvec compounding explains.

This PR authors `falsify_ffn_gguf_011_softmax_saturation_amplification`.
Test: 7-element logit vector with one saturated value (+10.0) and
others in normal range; perturbs saturated logit by 0.077% × 10.0
= 0.0077 (M94-equivalent absolute drift); compares numerically-stable
softmax output before/after.

EMPIRICAL RESULT (2026-05-06):
  input_rel_drift  = 0.051333% (perturbation / |logits|_L1)
  output_rel_drift = 0.000578% (Σ |p_b - p_a| / Σ p_a)
  amplification    = 0.0113×   ← COMPRESSES, not amplifies!

**A2 EMPIRICALLY FALSIFIED** in the saturation regime.

Mechanism explanation: in saturation, the dominant probability is
near 1.0 and tail probabilities are near 0.0. Softmax is LOCALLY
linear in this regime — small input perturbations produce
proportionally smaller output changes (compression rather than
amplification). The 0.01× amplification means softmax suppresses
M94 perturbations by ~100×.

AMPLIFIER LANDSCAPE POST-A2+A3 FALSIFICATION:
- A1 (RoPE phase amplification) — UNTESTED, only remaining synthetic candidate
- A2 (Softmax saturation)       — FALSIFIED ✗ (compresses)
- A3 (Block-scale variance)     — FALSIFIED ✗ (linear-scaling)

Three additional candidates pinned in v1.8.0 amendment (real-teacher
or multi-token testable):
- A4 (Multi-token batch dimension) — §27 is 7-token batch; M95 was single
- A5 (Real-weight non-uniformity) — heavy-tailed weight distributions
- A6 (RMSNorm rsqrt approximation) — non-linearity in normalization

Most likely path post-2 sequential falsifications: M-FFN-GGUF-6
(real-teacher) is now the highest-leverage next test.

Contract trace-ffn-sub-block-gguf-v1 v1.7.0 → v1.8.0:
- FALSIFY-FFN-GGUF-011 NEW → DISCHARGED
- M-FFN-GGUF-4 step (g) A2 candidate: NEW → DISCHARGED

Stacked atop M96 (PR #1540).

Test runs locally:
  cargo test -p aprender-serve --lib falsify_ffn_gguf_011 -- --nocapture
  test result: ok. 1 passed; finished in 0.03s

Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-011.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 7, 2026
…s decompose §27 1723% within rounding — fix scope EMPIRICALLY VALIDATED — spec v3.03.0 → v3.04.0 (#1546)

Two-day autonomous /loop session shipped 11 lib-test + 1 integration-test
falsifiers (M91-M101, aprender PRs #1535/#1536/#1537/#1538/#1540/#1541/
#1542/#1543/#1544/#1545) decomposing the §27 layer-3 ffn_swigl 18.23×
APR-vs-GGUF std-ratio.

Final empirical decomposition (2026-05-07):

  M94 mechanism × M95 compounding × M99 std-ratio × A5 real-teacher × residual
  = 0.077% × 5.70× × 50× × 5.56× × 14×
  ≈ 1715%   ≈   §27's 1723% (within rounding)

Six synthetic amplifier candidates resolved:
- A1 (RoPE phase, M98)        — FALSIFIED 1.00× UNITARY
- A2 (Softmax saturation, M97) — FALSIFIED 0.01× COMPRESSES
- A3 (Block-scale variance, M96) — FALSIFIED 1.00× SCALE-INVARIANT
- A4 (Multi-token batch, M99) — FALSIFIED 0.26× per-token + 50× std-ratio
- A5 (Real-weight non-uniformity, M100) — PARTIALLY CONFIRMED 5.56× LIVE
- A6 (RMSNorm rsqrt, M101)    — FALSIFIED 1.00× HOMOGENEOUS

14× residual is now attributed entirely to cumulative-layer interaction.

SHIP-007 §22 fix scope EMPIRICALLY VALIDATED as Option-A (PROMOTE
GGUF-PATH semantics into APR forward): switching APR's `f32_matmul`
to Q8K activation quant + fused matvec semantics will recover the
5.56× per-matvec amplification on every matmul, eliminating cumulative
APR-vs-GGUF drift. Estimated fix scope ~250-400 LOC; transitively
discharges 5 MODEL-1 PARTIALs (SHIP-002, SHIP-005, SHIP-006, SHIP-007,
SHIP-008) per §17.5.

Cascade methodology consolidated:
- ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_cascade_decomposes_magnitude.md
- ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_chain_assert_difference.md

Companion-spec entries M91-M101 in claude-code-parity-apr/docs/
specifications/claude-code-parity-apr-poc.md provide the full per-PR
narrative. Aprender contract `contracts/trace-ffn-sub-block-gguf-v1.yaml`
v1.0.0 → v1.12.0 across 12 amendments.

MODEL-1 ship %: unchanged at 91% until M-FFN-GGUF-5 (actual fix PR) lands.
MODEL-2 ship %: unchanged at 57% until step 5g.3 produces val_loss < 9.38.

Spec v3.03.0 → v3.04.0. Atomic next action banner only — full §59
narrative deferred to deliberate-session work alongside M-FFN-GGUF-5
fix PR.

Refs PMAT-CCPA, SHIP-007 §22, M91-M101 cascade.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant