Skip to content

feat(M-FFN-GGUF-4 step e): multi-tensor compound falsifier — SUPER-LINEAR growth (5.70×)#1539

Merged
noahgift merged 1 commit into
feat/m-ffn-gguf-4d-fused-vs-standalone-matvecfrom
feat/m-ffn-gguf-4e-multi-tensor-divergence-compound
May 6, 2026
Merged

feat(M-FFN-GGUF-4 step e): multi-tensor compound falsifier — SUPER-LINEAR growth (5.70×)#1539
noahgift merged 1 commit into
feat/m-ffn-gguf-4d-fused-vs-standalone-matvecfrom
feat/m-ffn-gguf-4e-multi-tensor-divergence-compound

Conversation

@noahgift

@noahgift noahgift commented May 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Stacked atop PR #1538 (M94 H2d.3+H2d.4 confirmation). Will be re-targeted to main when #1538 merges.

M94 confirmed Path A vs Path B differ by 0.077% on a SINGLE 144-byte Q4K super-block matvec. The v1.5.0 amendment hypothesized (without measurement) that this compounds across "28 layers × 4 matmuls × 7 tokens" to match the §27 layer-3 ffn_swigl 18.23× std-ratio.

This PR authors falsify_ffn_gguf_009_multi_tensor_divergence_compound to MEASURE the compounding empirically. Test runs N=5 sequential matvecs (chained — each output is the next input, with RMSNorm between layers to keep magnitude bounded), comparing Path A vs Path B at the final layer.

Empirical result (2026-05-06)

Single-tensor rel_diff (M94): 0.077%
5-tensor chained rel_diff:    0.4391%
Growth factor:                5.70×  ← SUPER-LINEAR
Hypothesis Predicted growth Observed
H-COMPOUND-LINEAR 5.00×
H-COMPOUND-SUBLINEAR (√N) 2.24×
H-COMPOUND-SUPER (k > 1) >5.00× 5.70× ✓

Quantitative extrapolation to §27

Layer-3 chain depth ≈ 21 chained ops. Naive super-linear extrapolation: ~1.85% rel_diff. Far below §27's 1723% (18.23× std-ratio). The M94 mechanism explains COMPOUNDING but not the §27 MAGNITUDE.

Three candidate amplifiers (M-FFN-GGUF-6 scope)

  • A1: RoPE phase amplification (rotational drift across heads)
  • A2: Softmax saturation (logit drift → near-max amplification)
  • A3: Real-weight magnitude variance (synthetic uniform vs real Qwen Q4K with high per-tensor variance)

Next investigation: M-FFN-GGUF-6 real-teacher falsifier at /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr to discriminate A3 (real-weight) vs A1+A2 (non-linearity).

Status changes

contracts/trace-ffn-sub-block-gguf-v1.yaml v1.5.0 → v1.6.0:

  • FALSIFY-FFN-GGUF-009 NEW → DISCHARGED
  • M-FFN-GGUF-4 step (e): NEW → DISCHARGED
  • M-FFN-GGUF-6 (NEW, NEXT): real-teacher falsifier; PENDING

pv validate contracts/trace-ffn-sub-block-gguf-v1.yaml0 errors / 0 warnings on v1.6.0.

Test plan

🤖 Generated with Claude Code

…NEAR growth confirmed (5.70× over 5 chained matvecs)

M94 (FALSIFY-FFN-GGUF-008, sibling PR #1538) confirmed Path A vs
Path B differ by 0.077% on a SINGLE 144-byte Q4K super-block matvec.
The v1.5.0 amendment hypothesized (without measurement) that this
compounds across "28 layers × 4 matmuls/layer × 7 tokens" to match
the §27 layer-3 ffn_swigl 18.23× std-ratio.

This PR authors `falsify_ffn_gguf_009_multi_tensor_divergence_compound`
in `crates/aprender-serve/src/apr_transformer/helpers.rs::
determinism_tests` to MEASURE that compounding empirically. Test
runs N=5 sequential matvecs (chained — each output is the next
input, with RMSNorm between layers to keep magnitude bounded),
comparing Path A vs Path B at the final layer.

EMPIRICAL RESULT (2026-05-06):
  Single-tensor rel_diff (M94): 0.077%
  5-tensor chained rel_diff:    0.4391%
  Growth factor:                5.70×

Linear projection would be 5.00× (5 × 0.077%); sub-linear (√N)
projection would be 2.24×. The empirical 5.70× growth is
**SUPER-LINEAR** — confirms H-COMPOUND-SUPER hypothesis.

QUANTITATIVE EXTRAPOLATION TO §27:
  Layer-3 chain depth = 3 layers × ~7 tensor-ops = 21 chained ops.
  Naive super-linear extrapolation:
    21 × 0.077% × (5.70/5)^log2(21/5) ≈ 1.85% (rel_diff)

This is FAR BELOW §27's 1723% (18.23× std-ratio). The M94 mechanism
explains COMPOUNDING but not the §27 MAGNITUDE.

Three candidate amplifiers (M-FFN-GGUF-6 investigation scope):
- A1: RoPE phase amplification (rotational drift across heads)
- A2: Softmax saturation (logit drift → output drift via near-max)
- A3: Real-weight magnitude variance (synthetic uniform magnitude
       vs real Qwen Q4K weights with high per-tensor variance)

Most likely path forward: M-FFN-GGUF-6 = real-teacher falsifier.
Load actual layer-3 down_proj Q4K bytes from canonical 7B Qwen2.5-
Coder .apr file at `/mnt/nvme-raid0/models/ship-two-001/qwen2.5-
coder-7b-instruct-q4k.apr`, run both paths against a real activation
vector, measure rel_diff. If real-teacher rel_diff is 5-50× larger
than synthetic, A3 alone explains §27 magnitude. If matches
synthetic, A1+A2 are load-bearing.

Contract trace-ffn-sub-block-gguf-v1 v1.5.0 → v1.6.0:
- FALSIFY-FFN-GGUF-009 NEW → DISCHARGED
- M-FFN-GGUF-4 step (e) compounding-hypothesis: DISCHARGED
- M-FFN-GGUF-6 (NEW, NEXT): real-teacher falsifier; PENDING

Production hot paths byte-unchanged. Test additive in
helpers.rs::determinism_tests. `pv validate`: 0 errors / 0 warnings
on v1.6.0.

Stacked atop PR #1538 (M94/M-FFN-GGUF-4d). Will rebase on main
after #1538 merges.

Test runs locally:
  cargo test -p aprender-serve --lib falsify_ffn_gguf_009 -- --nocapture
  test result: ok. 1 passed; finished in 0.03s

Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-009.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit d6e9161 into feat/m-ffn-gguf-4d-fused-vs-standalone-matvec May 6, 2026
1 check passed
@noahgift noahgift deleted the feat/m-ffn-gguf-4e-multi-tensor-divergence-compound branch May 6, 2026 23:21
noahgift added a commit that referenced this pull request May 6, 2026
…mpounding (5.70×) — bundled M94+M95 (#1538)

* feat(M-FFN-GGUF-4 step c, H2d.3+H2d.4): fused-vs-standalone Q4K matvec — FIRST CONFIRMED hypothesis in chain

After three sequential falsifications (M91 §28 parallel-reduction,
M92 H2a' SIMD-vs-scalar dot, M93 H2d.2 APR-internal Q4K dequant
byte-identity), the H2d.4 falsifier (FALSIFY-FFN-GGUF-008) is the
**first test in the SHIP-007 §22 hypothesis chain that produces
the EXPECTED bit-level divergence between paths**.

Adds `falsify_ffn_gguf_008_fused_vs_standalone_q4k_matvec` to
`crates/aprender-serve/src/apr_transformer/helpers.rs::
determinism_tests`. Compares:

  Path A (APR-style):  dequantize_q4_k_simd + manual F32 dot
  Path B (GGUF-style): quantize_activations_q8k_into +
                       fused_q4k_q8k_parallel_matvec_into

On a synthetic 144-byte Q4K super-block + 256-element F32
activation. Both paths compute the same mathematical operation
(W @ a) but Path B has an additional Q8K activation-quantization
step Path A doesn't have.

EMPIRICAL RESULT (2026-05-06):
  Path A = -18882.443 (0xc69384e3)
  Path B = -18897.059 (0xc693a21e)
  diff   = 14.615 (rel_diff = 0.077%)
  bits_a != bits_b ✓

Paths DIFFER at bit level as expected. Math agreement within
0.10% (Q8K precision loss is mathematically reasonable but NOT
bit-exact). This **CONFIRMS H2d.3 + H2d.4 simultaneously** at
the kernel level.

SHIP-007 §22 ROOT CAUSE NOW HAS A CONCRETE MECHANISM:

APR's loader path uses Path A semantics — full F32 dequant of
weights, then F32 matmul with F32 activations. GGUF's matvec
uses Path B semantics — Q8K quantization of activations + fused
inline Q4K dequant during the parallel matvec. Per-tensor the
divergence is small (0.077%) but cumulative across 28 layers ×
4 matmuls/layer × 7 tokens, the divergence compounds in a way
that matches the §27 layer-3 ffn_swigl 18.23× APR↔GGUF drift.

Hypothesis chain (CLOSED for kernel-level reduction-order):
- §28  parallel-reduction non-determinism      (M91): FALSIFIED
- H2a' SIMD-vs-scalar dot reduction            (M92): FALSIFIED
- H2d.2 APR-internal Q4K dequant byte-identity (M93): FALSIFIED
- H2d.3 + H2d.4 fused-vs-standalone matvec     (M94): CONFIRMED ✓

Contract trace-ffn-sub-block-gguf-v1 v1.4.0 → v1.5.0:
- Documents the first hypothesis CONFIRMATION in the chain
- Records empirical evidence (-18882.443 vs -18897.059)
- Records the two architecturally-clean fix options:
  - Option-A (PROMOTE GGUF-PATH semantics into APR forward)
  - Option-B (PROMOTE APR-PATH semantics into GGUF forward)
- M-FFN-GGUF-4 step (c) hypothesis-narrowing: ALGORITHM_LEVEL
  → DISCHARGED — chain produced first CONFIRMED mechanism
- M-FFN-GGUF-5 (NEW, NEXT): SHIP-007 §22 actual fix PR; gate
  Option-A vs Option-B; PENDING

Production hot paths byte-unchanged. New test additive in
`crates/aprender-serve/src/apr_transformer/helpers.rs::
determinism_tests`. `pv validate`: 0 errors / 0 warnings on
v1.5.0.

Test runs locally on RTX 4090:
  cargo test -p aprender-serve --lib falsify_ffn_gguf_008
  test result: ok. 1 passed; 0 failed; finished in 0.00s

Refs PMAT-CCPA, SHIP-007 §22.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(M-FFN-GGUF-4 step e): multi-tensor compound falsifier — SUPER-LINEAR growth confirmed (5.70× over 5 chained matvecs) (#1539)

M94 (FALSIFY-FFN-GGUF-008, sibling PR #1538) confirmed Path A vs
Path B differ by 0.077% on a SINGLE 144-byte Q4K super-block matvec.
The v1.5.0 amendment hypothesized (without measurement) that this
compounds across "28 layers × 4 matmuls/layer × 7 tokens" to match
the §27 layer-3 ffn_swigl 18.23× std-ratio.

This PR authors `falsify_ffn_gguf_009_multi_tensor_divergence_compound`
in `crates/aprender-serve/src/apr_transformer/helpers.rs::
determinism_tests` to MEASURE that compounding empirically. Test
runs N=5 sequential matvecs (chained — each output is the next
input, with RMSNorm between layers to keep magnitude bounded),
comparing Path A vs Path B at the final layer.

EMPIRICAL RESULT (2026-05-06):
  Single-tensor rel_diff (M94): 0.077%
  5-tensor chained rel_diff:    0.4391%
  Growth factor:                5.70×

Linear projection would be 5.00× (5 × 0.077%); sub-linear (√N)
projection would be 2.24×. The empirical 5.70× growth is
**SUPER-LINEAR** — confirms H-COMPOUND-SUPER hypothesis.

QUANTITATIVE EXTRAPOLATION TO §27:
  Layer-3 chain depth = 3 layers × ~7 tensor-ops = 21 chained ops.
  Naive super-linear extrapolation:
    21 × 0.077% × (5.70/5)^log2(21/5) ≈ 1.85% (rel_diff)

This is FAR BELOW §27's 1723% (18.23× std-ratio). The M94 mechanism
explains COMPOUNDING but not the §27 MAGNITUDE.

Three candidate amplifiers (M-FFN-GGUF-6 investigation scope):
- A1: RoPE phase amplification (rotational drift across heads)
- A2: Softmax saturation (logit drift → output drift via near-max)
- A3: Real-weight magnitude variance (synthetic uniform magnitude
       vs real Qwen Q4K weights with high per-tensor variance)

Most likely path forward: M-FFN-GGUF-6 = real-teacher falsifier.
Load actual layer-3 down_proj Q4K bytes from canonical 7B Qwen2.5-
Coder .apr file at `/mnt/nvme-raid0/models/ship-two-001/qwen2.5-
coder-7b-instruct-q4k.apr`, run both paths against a real activation
vector, measure rel_diff. If real-teacher rel_diff is 5-50× larger
than synthetic, A3 alone explains §27 magnitude. If matches
synthetic, A1+A2 are load-bearing.

Contract trace-ffn-sub-block-gguf-v1 v1.5.0 → v1.6.0:
- FALSIFY-FFN-GGUF-009 NEW → DISCHARGED
- M-FFN-GGUF-4 step (e) compounding-hypothesis: DISCHARGED
- M-FFN-GGUF-6 (NEW, NEXT): real-teacher falsifier; PENDING

Production hot paths byte-unchanged. Test additive in
helpers.rs::determinism_tests. `pv validate`: 0 errors / 0 warnings
on v1.6.0.

Stacked atop PR #1538 (M94/M-FFN-GGUF-4d). Will rebase on main
after #1538 merges.

Test runs locally:
  cargo test -p aprender-serve --lib falsify_ffn_gguf_009 -- --nocapture
  test result: ok. 1 passed; finished in 0.03s

Refs PMAT-CCPA, SHIP-007 §22, FALSIFY-FFN-GGUF-009.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant