feat(M-FFN-GGUF-4 step f, A3): Q4K block-scale variance falsifier — A3 FALSIFIED (variance_factor 1.00×) by noahgift · Pull Request #1540 · paiml/aprender

noahgift · 2026-05-06T23:41:04Z

Summary

M95 (now landed on main via #1538 squash) recorded a 28× magnitude gap between synthetic 0.4391% (5-tensor chained) and §27's 1723% (18.23× std-ratio at layer-3 ffn_swigl). Three candidate amplifiers were pinned: A1 (RoPE phase), A2 (softmax saturation), A3 (real-weight magnitude variance). A3 was the strongest candidate because real Qwen Q4K weights have huge per-tensor magnitude variance not present in synthetic tests.

This PR authors falsify_ffn_gguf_010_q4k_block_scale_variance testing whether per-block scale variance amplifies M94 mechanism beyond linear-scaling. Compares Path A vs Path B at 7 block scales spanning 4 orders of magnitude.

Empirical result (2026-05-06)

scale (d)	Path A	Path B	rel_diff
0.001	-15.4086	-15.4228	0.091873%
0.01	-155.39	-155.53	0.091873%
0.05	-631.04	-631.62	0.091924%
0.1	-1553.19	-1554.62	0.092017%
0.5	-7767.85	-7774.99	0.091932%
1.0	-15535.70	-15549.99	0.091932%
10.0	-155356.97	-155499.84	0.091966%

variance_factor = max/min = 1.00× across 4 orders of magnitude.

A3 EMPIRICALLY FALSIFIED at per-block granularity

The M94 mechanism is LINEAR-SCALING: Path A and Path B both scale proportionally with block magnitude, so rel_diff (a RATIO) is scale-INVARIANT.

Amplifier landscape post-A3 falsification

Amplifier	Status
A1 (RoPE phase amplification)	UNTESTED, candidate
A2 (Softmax saturation)	UNTESTED, candidate
A3 (Block-scale variance)	FALSIFIED ✗

Per-block magnitude variance in real Qwen weights does NOT amplify M94 mechanism beyond the measured 0.077-0.092% rel_diff baseline.

Next investigation candidate

M-FFN-GGUF-4 step (g): A2 (softmax saturation) — small synthetic test with one near-saturated logit + tiny perturbation, measure softmax(logits) drift.

A1 (RoPE phase) is harder to test in isolation. M-FFN-GGUF-6 (real-teacher) remains the most-direct test but is gated on operator dispatch.

Status changes

contracts/trace-ffn-sub-block-gguf-v1.yaml v1.6.0 → v1.7.0:

FALSIFY-FFN-GGUF-010 NEW → DISCHARGED
M-FFN-GGUF-4 step (f) A3 candidate: NEW → DISCHARGED

pv validate → 0 errors / 0 warnings on v1.7.0.

Test plan

pv validate contracts/trace-ffn-sub-block-gguf-v1.yaml → green
cargo test -p aprender-serve --lib falsify_ffn_gguf_010 → green
Production hot paths byte-unchanged (additive test only)
Test asserts rel_diff > 1e-7 per scale (sanity bound)
CI workspace-test green
Auto-merge once required checks pass

🤖 Generated with Claude Code

…3 FALSIFIED (variance_factor 1.00× across 4 orders) M95 (sibling commit, c641d2d) recorded a 28× magnitude gap between M95's synthetic 0.4391% (5-tensor chained) and §27's 1723% (18.23× std-ratio at layer-3 ffn_swigl). Three candidate amplifiers were pinned for M-FFN-GGUF-6 investigation: A1 (RoPE phase), A2 (Softmax saturation), A3 (Real-weight magnitude variance). A3 was the strongest candidate because real Qwen Q4K weights have huge per-tensor magnitude variance not present in synthetic tests. Hypothesis: per-block scale variance amplifies M94 mechanism beyond linear-scaling. This PR authors `falsify_ffn_gguf_010_q4k_block_scale_variance` in `crates/aprender-serve/src/apr_transformer/helpers.rs:: determinism_tests`. Test compares Path A vs Path B per-block divergence at 7 block scales spanning 4 orders of magnitude: d ∈ {0.001, 0.01, 0.05, 0.1, 0.5, 1.0, 10.0} EMPIRICAL RESULT (2026-05-06): d=0.001: 0.091873% rel_diff d=0.01: 0.091873% d=0.05: 0.091924% d=0.1: 0.092017% d=0.5: 0.091932% d=1.0: 0.091932% d=10.0: 0.091966% variance_factor = max/min = **1.00×** across 4 orders of magnitude. **A3 EMPIRICALLY FALSIFIED** at per-block granularity. The M94 mechanism is LINEAR-SCALING: Path A and Path B both scale proportionally with block magnitude, so rel_diff (a RATIO) is scale-INVARIANT. AMPLIFIER LANDSCAPE POST-A3 FALSIFICATION: - A1 (RoPE phase amplification) — UNTESTED, candidate - A2 (Softmax saturation) — UNTESTED, candidate - A3 (Block-scale variance) — FALSIFIED ✗ Per-block magnitude variance in real Qwen weights does NOT amplify M94 mechanism beyond the measured 0.077-0.092% rel_diff baseline. NEXT INVESTIGATION CANDIDATE (M-FFN-GGUF-4 step (g)): A2 (softmax saturation) is the simplest synthetic test. A1 (RoPE phase) is harder to test in isolation. M-FFN-GGUF-6 (real-teacher) remains the most-direct test but is gated on operator dispatch. Contract trace-ffn-sub-block-gguf-v1 v1.6.0 → v1.7.0: - FALSIFY-FFN-GGUF-010 NEW → DISCHARGED - M-FFN-GGUF-4 step (f) A3 candidate: NEW → DISCHARGED Stacked atop the M94+M95 branch. Will rebase on main after #1538 merges (which carries M94 + M95). Test runs locally: cargo test -p aprender-serve --lib falsify_ffn_gguf_010 -- --nocapture test result: ok. 1 passed; finished in 0.00s Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-010. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…fier — A2 FALSIFIED (amplification 0.01×, COMPRESSES) M96 (sibling commit, c7091ab on PR #1540) falsified A3 (block-scale variance). A2 (softmax saturation) is the next-most-tractable synthetic candidate amplifier for the §27 magnitude gap. A2 hypothesis: attention softmax in saturation regime (one logit much larger than others) is non-linear and could amplify tiny logit drift to large probability drift — contributing to the §27 1723% magnitude beyond what M95's 5.70× chained matvec compounding explains. This PR authors `falsify_ffn_gguf_011_softmax_saturation_amplification`. Test: 7-element logit vector with one saturated value (+10.0) and others in normal range; perturbs saturated logit by 0.077% × 10.0 = 0.0077 (M94-equivalent absolute drift); compares numerically-stable softmax output before/after. EMPIRICAL RESULT (2026-05-06): input_rel_drift = 0.051333% (perturbation / |logits|_L1) output_rel_drift = 0.000578% (Σ |p_b - p_a| / Σ p_a) amplification = 0.0113× ← COMPRESSES, not amplifies! **A2 EMPIRICALLY FALSIFIED** in the saturation regime. Mechanism explanation: in saturation, the dominant probability is near 1.0 and tail probabilities are near 0.0. Softmax is LOCALLY linear in this regime — small input perturbations produce proportionally smaller output changes (compression rather than amplification). The 0.01× amplification means softmax suppresses M94 perturbations by ~100×. AMPLIFIER LANDSCAPE POST-A2+A3 FALSIFICATION: - A1 (RoPE phase amplification) — UNTESTED, only remaining synthetic candidate - A2 (Softmax saturation) — FALSIFIED ✗ (compresses) - A3 (Block-scale variance) — FALSIFIED ✗ (linear-scaling) Three additional candidates pinned in v1.8.0 amendment (real-teacher or multi-token testable): - A4 (Multi-token batch dimension) — §27 is 7-token batch; M95 was single - A5 (Real-weight non-uniformity) — heavy-tailed weight distributions - A6 (RMSNorm rsqrt approximation) — non-linearity in normalization Most likely path post-2 sequential falsifications: M-FFN-GGUF-6 (real-teacher) is now the highest-leverage next test. Contract trace-ffn-sub-block-gguf-v1 v1.7.0 → v1.8.0: - FALSIFY-FFN-GGUF-011 NEW → DISCHARGED - M-FFN-GGUF-4 step (g) A2 candidate: NEW → DISCHARGED Stacked atop M96 (PR #1540). Test runs locally: cargo test -p aprender-serve --lib falsify_ffn_gguf_011 -- --nocapture test result: ok. 1 passed; finished in 0.03s Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-011. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…fier — A2 FALSIFIED (amplification 0.01×, COMPRESSES) (#1541) M96 (sibling commit, c7091ab on PR #1540) falsified A3 (block-scale variance). A2 (softmax saturation) is the next-most-tractable synthetic candidate amplifier for the §27 magnitude gap. A2 hypothesis: attention softmax in saturation regime (one logit much larger than others) is non-linear and could amplify tiny logit drift to large probability drift — contributing to the §27 1723% magnitude beyond what M95's 5.70× chained matvec compounding explains. This PR authors `falsify_ffn_gguf_011_softmax_saturation_amplification`. Test: 7-element logit vector with one saturated value (+10.0) and others in normal range; perturbs saturated logit by 0.077% × 10.0 = 0.0077 (M94-equivalent absolute drift); compares numerically-stable softmax output before/after. EMPIRICAL RESULT (2026-05-06): input_rel_drift = 0.051333% (perturbation / |logits|_L1) output_rel_drift = 0.000578% (Σ |p_b - p_a| / Σ p_a) amplification = 0.0113× ← COMPRESSES, not amplifies! **A2 EMPIRICALLY FALSIFIED** in the saturation regime. Mechanism explanation: in saturation, the dominant probability is near 1.0 and tail probabilities are near 0.0. Softmax is LOCALLY linear in this regime — small input perturbations produce proportionally smaller output changes (compression rather than amplification). The 0.01× amplification means softmax suppresses M94 perturbations by ~100×. AMPLIFIER LANDSCAPE POST-A2+A3 FALSIFICATION: - A1 (RoPE phase amplification) — UNTESTED, only remaining synthetic candidate - A2 (Softmax saturation) — FALSIFIED ✗ (compresses) - A3 (Block-scale variance) — FALSIFIED ✗ (linear-scaling) Three additional candidates pinned in v1.8.0 amendment (real-teacher or multi-token testable): - A4 (Multi-token batch dimension) — §27 is 7-token batch; M95 was single - A5 (Real-weight non-uniformity) — heavy-tailed weight distributions - A6 (RMSNorm rsqrt approximation) — non-linearity in normalization Most likely path post-2 sequential falsifications: M-FFN-GGUF-6 (real-teacher) is now the highest-leverage next test. Contract trace-ffn-sub-block-gguf-v1 v1.7.0 → v1.8.0: - FALSIFY-FFN-GGUF-011 NEW → DISCHARGED - M-FFN-GGUF-4 step (g) A2 candidate: NEW → DISCHARGED Stacked atop M96 (PR #1540). Test runs locally: cargo test -p aprender-serve --lib falsify_ffn_gguf_011 -- --nocapture test result: ok. 1 passed; finished in 0.03s Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-011. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…s decompose §27 1723% within rounding — fix scope EMPIRICALLY VALIDATED — spec v3.03.0 → v3.04.0 (#1546) Two-day autonomous /loop session shipped 11 lib-test + 1 integration-test falsifiers (M91-M101, aprender PRs #1535/#1536/#1537/#1538/#1540/#1541/ #1542/#1543/#1544/#1545) decomposing the §27 layer-3 ffn_swigl 18.23× APR-vs-GGUF std-ratio. Final empirical decomposition (2026-05-07): M94 mechanism × M95 compounding × M99 std-ratio × A5 real-teacher × residual = 0.077% × 5.70× × 50× × 5.56× × 14× ≈ 1715% ≈ §27's 1723% (within rounding) Six synthetic amplifier candidates resolved: - A1 (RoPE phase, M98) — FALSIFIED 1.00× UNITARY - A2 (Softmax saturation, M97) — FALSIFIED 0.01× COMPRESSES - A3 (Block-scale variance, M96) — FALSIFIED 1.00× SCALE-INVARIANT - A4 (Multi-token batch, M99) — FALSIFIED 0.26× per-token + 50× std-ratio - A5 (Real-weight non-uniformity, M100) — PARTIALLY CONFIRMED 5.56× LIVE - A6 (RMSNorm rsqrt, M101) — FALSIFIED 1.00× HOMOGENEOUS 14× residual is now attributed entirely to cumulative-layer interaction. SHIP-007 §22 fix scope EMPIRICALLY VALIDATED as Option-A (PROMOTE GGUF-PATH semantics into APR forward): switching APR's `f32_matmul` to Q8K activation quant + fused matvec semantics will recover the 5.56× per-matvec amplification on every matmul, eliminating cumulative APR-vs-GGUF drift. Estimated fix scope ~250-400 LOC; transitively discharges 5 MODEL-1 PARTIALs (SHIP-002, SHIP-005, SHIP-006, SHIP-007, SHIP-008) per §17.5. Cascade methodology consolidated: - ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_cascade_decomposes_magnitude.md - ~/.claude/projects/-home-noah-src-aprender/memory/feedback_falsifier_chain_assert_difference.md Companion-spec entries M91-M101 in claude-code-parity-apr/docs/ specifications/claude-code-parity-apr-poc.md provide the full per-PR narrative. Aprender contract `contracts/trace-ffn-sub-block-gguf-v1.yaml` v1.0.0 → v1.12.0 across 12 amendments. MODEL-1 ship %: unchanged at 91% until M-FFN-GGUF-5 (actual fix PR) lands. MODEL-2 ship %: unchanged at 57% until step 5g.3 produces val_loss < 9.38. Spec v3.03.0 → v3.04.0. Atomic next action banner only — full §59 narrative deferred to deliberate-session work alongside M-FFN-GGUF-5 fix PR. Refs PMAT-CCPA, SHIP-007 §22, M91-M101 cascade. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 6, 2026 23:41

noahgift mentioned this pull request May 6, 2026

feat(M-FFN-GGUF-4 step g, A2): softmax saturation amplification falsifier — A2 FALSIFIED (compresses 0.01×) #1541

Merged

4 tasks

noahgift merged commit 4b385a0 into main May 7, 2026
19 of 21 checks passed

noahgift deleted the feat/m-ffn-gguf-4f-q4k-block-scale-variance branch May 7, 2026 00:26

noahgift mentioned this pull request May 7, 2026

docs(M96+M97+M98): all 3 synthetic amplifiers (A1/A2/A3) FALSIFIED — bundled cascade record paiml/claude-code-parity-apr#83

Merged

5 tasks

noahgift mentioned this pull request May 7, 2026

docs(SHIP-TWO-001 §59): SHIP-007 §22 falsifier cascade CLOSED — 11 PRs decompose §27 1723% within rounding #1546

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(M-FFN-GGUF-4 step f, A3): Q4K block-scale variance falsifier — A3 FALSIFIED (variance_factor 1.00×)#1540

feat(M-FFN-GGUF-4 step f, A3): Q4K block-scale variance falsifier — A3 FALSIFIED (variance_factor 1.00×)#1540
noahgift merged 1 commit into
mainfrom
feat/m-ffn-gguf-4f-q4k-block-scale-variance

noahgift commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 6, 2026

Summary

Empirical result (2026-05-06)

A3 EMPIRICALLY FALSIFIED at per-block granularity

Amplifier landscape post-A3 falsification

Next investigation candidate

Status changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant