fix(M-FFN-GGUF-5b): SHIP-007 §22 closure — QKV split-Q4K dispatch in forward_traced + production forward() by noahgift · Pull Request #1556 · paiml/aprender

noahgift · 2026-05-07T07:07:04Z

Summary

Closes the 8th (final) F32-fallback matmul site that M-FFN-GGUF-5 (PR #1550) left as a fused F32 matmul because Q4K storage splits Q/K/V into separate attn_q_weight / attn_k_weight / attn_v_weight{,_q6k} arrays while APR uses a fused F32 qkv_weight array.

After this PR, BOTH forward_traced (inference.rs) AND production forward() (pmat-260.rs) use the Q4K-split QKV path when q4k_layer is available, mirroring the production decode forward_with_cache ↔ project_qkv_fused semantics at sequence (multi-token) granularity.

What changes

New helper: `qkv_split_q4k_traced` (mod_apr_transformer.rs)

Computes Q, K, V independently across all sequence positions via seq_matmul_q4k / seq_matmul_q6k, then re-interleaves per-token to produce the fused [Q_pos | K_pos | V_pos] layout that the downstream RoPE + attention code expects.

V supports the Q4K → Q6K cascade (mirrors select_q4k_q6k).

Falls back to fused F32 matmul when any required Q or K bytes are missing.

Two call-site swaps

forward_traced in inference.rs:99-100 — swap fused F32 matmul → qkv_split_q4k_traced
Production forward() in pmat-260.rs:330-331 — same swap on the production hot path used by apr run for prompt processing

Empirical verification

Build + lib tests

cargo build -p aprender-serve → clean compile
cargo test -p aprender-serve --lib → 15233 passed; 0 failed
cargo test -p aprender-serve --lib determinism_tests → 10 passed (M91-M101)

LIVE on canonical 7B (lambda-vector RTX 4090, 180s)

cargo test -p aprender-serve --test ffn_gguf_apr_layer_3_swigl_diff \
    -- --include-ignored --nocapture

Layer-3 ratio = 1.2059 (in [0.5, 2.0] H1 band; tighter than M-FFN-GGUF-5's prior 1.245× reading).

layer | apr.ffn_swigl.std | gguf.ffn_swigl.std | ratio (apr/gguf)
------|-------------------|--------------------|-----------------
L00   | 0.077376          | 0.079255           | 0.9763
L01   | 0.050151          | 0.044786           | 1.1198
L02   | 0.044975          | 0.063019           | 0.7137
L03   | 0.080802          | 0.067006           | 1.2059  ← H1 BAND
...
L27   | 1.187084          | 1.532710           | 0.7745

verdict: H1 CONFIRMED — APR layer-3 ffn_swigl matches GGUF apples-to-apples

All 28 layers' last-token-only ffn_swigl std now lands within the H1 band [0.5, 2.0].

Test plan

cargo build -p aprender-serve → clean
cargo test -p aprender-serve --lib → 15233 passed
cargo test -p aprender-serve --lib determinism_tests → 10 passed (M91-M101)
LIVE 7B teacher layer-3 ffn_swigl diff → H1 CONFIRMED (ratio 1.2059)
Production hot path coverage: pmat-260.rs forward() uses qkv_split_q4k_traced when q4k_layer is present (apr run prompt processing)
F32-only fallback unchanged when q4k_layer is None or Q/K bytes are absent

Refs SHIP-007 §22, M-FFN-GGUF-5 (#1550), M91-M101 + M-FFN-GGUF-7 cascade, FALSIFY-FFN-GGUF-003 H1 verdict.

🤖 Generated with Claude Code

…forward_traced + production forward() Closes the 8th (final) F32-fallback matmul site that M-FFN-GGUF-5 (PR #1550) left as a fused F32 matmul because Q4K storage splits Q/K/V into separate `attn_q_weight` / `attn_k_weight` / `attn_v_weight{,_q6k}` arrays while APR uses a fused F32 `qkv_weight` array. After this PR, BOTH `forward_traced` (inference.rs) and production `forward()` (pmat-260.rs) use the Q4K-split QKV path when q4k_layer is available, mirroring the production decode `forward_with_cache` ↔ `project_qkv_fused` semantics at sequence (multi-token) granularity. The fused F32 matmul remains as fallback when Q4K bytes are absent. ## What changes ### New helper: `qkv_split_q4k_traced` (mod_apr_transformer.rs) Computes Q, K, V independently across all sequence positions via `seq_matmul_q4k` / `seq_matmul_q6k` (mirrors `project_qkv_fused`'s single-token semantics at sequence granularity), then re-interleaves per-token to produce the fused `[Q_pos | K_pos | V_pos]` layout that the downstream RoPE + attention code expects (matches the F32 fused QKV matmul output of `f32_matmul(normed, qkv_weight, hidden_dim, qkv_dim)`). V supports the Q4K → Q6K cascade used by some 7B Qwen2.5 quantizations (mirrors `select_q4k_q6k`). Falls back to fused F32 matmul when any required Q or K bytes are missing (V-only Q4K or Q6K is acceptable; missing Q or K triggers fallback). ### Two call-site swaps 1. `forward_traced` in `inference.rs:99-100` — `let mut qkv = self.matmul(&normed, &layer.qkv_weight, hidden_dim, qkv_dim);` → `let mut qkv = self.qkv_split_q4k_traced(&normed, q4k_layer, &layer.qkv_weight, ...);` 2. Production `forward()` in `pmat-260.rs:330-331` — same swap on the production hot path used by `apr run` for prompt processing. ## Empirical verification ### Build + lib tests ``` cargo build -p aprender-serve → clean compile cargo test -p aprender-serve --lib → 15233 passed (single-thread mode); 0 failed cargo test -p aprender-serve --lib determinism_tests → 10 passed (M91-M101 falsifiers) ``` ### LIVE on canonical 7B (lambda-vector RTX 4090, 180s) ``` cargo test -p aprender-serve --test ffn_gguf_apr_layer_3_swigl_diff \ -- --include-ignored --nocapture ``` Layer-3 ratio = **1.2059** (in [0.5, 2.0] H1 band; tighter than M-FFN-GGUF-5's prior 1.245× reading). ``` layer | apr.ffn_swigl.std | gguf.ffn_swigl.std | ratio (apr/gguf) ------|-------------------|--------------------|----------------- L00 | 0.077376 | 0.079255 | 0.9763 L01 | 0.050151 | 0.044786 | 1.1198 L02 | 0.044975 | 0.063019 | 0.7137 L03 | 0.080802 | 0.067006 | 1.2059 ← H1 BAND ... L27 | 1.187084 | 1.532710 | 0.7745 verdict: **H1 CONFIRMED** — APR layer-3 ffn_swigl matches GGUF within 1.21× (apples-to-apples agreement). ``` All 28 layers' last-token-only ffn_swigl std now lands within the H1 band [0.5, 2.0]. The §27 1723% std-ratio decomposition is fully closed at sub-FFN ffn_swigl granularity. ## Why this matters for SHIP-007 §22 M-FFN-GGUF-5 (PR #1550) closed 7 of 8 matmul call sites in `forward_traced` to use Q4K+Q8K dispatch matching GGUF. The 8th (QKV) was deferred because the storage layout difference (split attn_q/k/v vs fused qkv) required a non-trivial re-interleave helper. This PR delivers that helper and closes the gap in BOTH trace (inference.rs) and production (pmat-260.rs) paths. This means any future `apr run` / `apr trace` invocation on a canonical 7B Q4K teacher uses Q4K-split QKV semantics, eliminating the F32-vs-Q4K matmul precision delta at the QKV stage. The 5 MODEL-1 PARTIALs (SHIP-002/005/006/007/008) tied to forward/decode parity can now reference both `forward_traced` AND production `forward()` as discharged. ## Test plan - [x] `cargo build -p aprender-serve` → clean - [x] `cargo test -p aprender-serve --lib` → 15233 passed - [x] `cargo test -p aprender-serve --lib determinism_tests` → 10 passed (M91-M101) - [x] LIVE 7B teacher layer-3 ffn_swigl diff → H1 CONFIRMED (ratio 1.2059, tighter than prior 1.245×) - [x] Production hot path coverage: pmat-260.rs `forward()` uses qkv_split_q4k_traced when q4k_layer is present (apr run prompt processing) - [x] F32-only path unchanged: when q4k_layer is None or Q/K bytes are absent, falls through to byte-identical f32_matmul Refs SHIP-007 §22, M-FFN-GGUF-5 (PR #1550), M91-M101 + M-FFN-GGUF-7 cascade, FALSIFY-FFN-GGUF-003 H1 verdict. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…IONAL — falsifier passes refine §61.8 picture (PMAT-CODE-GGUF-PROMPT-SENS) (#1612) Authored a falsifier-first contract for the SPEC-SHIP-TWO-001 §61.8 "GGUF prompt-insensitive output" finding, then ran the falsifiers LIVE on canonical 7B teacher. All 3 falsifiers PASSED — empirical data refines the §61.8 picture significantly. Five-Whys: 1. Why this contract? §61.8 named Branch B (GGUF prompt-insensitive bug) as a major bisection target. Falsifier-first cascade pattern requires a contract+test before any fix attempt. 2. Why DRAFT_RED → ACTIVE_FUNCTIONAL same-day? The falsifier-test surprised me with GREEN at run_inference() library level. The original §61.8 RED claim was based on `apr run` CLI output truncation (max-tokens 16-32 sharing prefix "ampiezza = 0.5\n diametro = 10"), not byte-identical full-length output. 3. Why is this a real finding? At run_inference library: - GGUF P1 → "ampiezza = 0.5\ndiametro = 10\naltezza = 20\n# Calcolo del volume\nvolume = (" - GGUF P2 → "ampiezza = 10\nampiezza\n# Stampa il doppio del valore di ampiezza\ndoppio_ampiezz" Outputs DIFFER — distinctness invariant HOLDS. GGUF still emits Italian-coding-style gibberish (mode-collapse to a cluster), but it's prompt-correlated. 4. Why does APR work cleanly? - APR P1 → "2+2 is 4." (correct numerical answer) - APR P2 → "Hello! It's nice to meet you. What can I help you with today?" (correct conversational) The M-FFN-GGUF-5/5b cascade (PRs #1550 + #1556 on 2026-05-07) fully fixed APR. APR + ChatML auto-wrap is FUNCTIONAL through run_inference today. 5. Why does this matter for ship-%? SHIP-008 (chat template render) may LIVE-discharge today via APR path — the underlying engine produces clean conversational output. SHIP-005 (HumanEval) and SHIP-007 (decode tps) may also discharge on APR path. The residual GGUF mode-collapse bug warrants a SEPARATE contract (gguf-mode-collapse-v1) authored as a follow-up. Methodology lesson #9 (NEW): a falsifier's GREEN outcome may INVALIDATE an earlier RED observation when the falsifier is more rigorous than the original. The §61.8 "byte-identical" claim came from CLI output truncation at low max-tokens; the run_inference library test ran 32 tokens and revealed clustered-but-distinct outputs. Status flips PROPOSED → ACTIVE_FUNCTIONAL same-day. Changes: - contracts/gguf-prompt-sensitivity-v1.yaml (NEW, v1.1.0 ACTIVE_FUNCTIONAL): - 3 falsifiers (FALSIFY-GGUF-PROMPT-SENS-001/002/003) - All 3 carry status_v1_1_0: PASS + evidence_v1_1_0 with LIVE output snippets - description: §61.8 background + v1.1.0 empirical refinement - Methodology lesson #9 codified in description - qa_gate.follow_up_contract: notes need for gguf-mode-collapse-v1 - crates/aprender-serve/tests/gguf_prompt_sensitivity.rs (NEW, 3 tests): - falsify_gguf_prompt_sensitivity_distinct_prompts_distinct_outputs - falsify_gguf_prompt_sensitivity_three_prompt_sweep - falsify_gguf_prompt_sensitivity_apr_control_passes Each #[ignore] gated on canonical 7B fixtures; auto-skips on CI runners that lack the 8 GB models. Validation: - pv validate contracts/gguf-prompt-sensitivity-v1.yaml ✓ (0 errors) - pv lint --strict-test-binding ✓ (PASS, 9 gates) - cargo test -p aprender-serve --test gguf_prompt_sensitivity --release -- --ignored --test-threads=1 ✓ (3 passed, 0 failed, 321.91s wall) Spec movement: - MODEL-1 ship %: stays at 92% (this contract documents what IS; no fix shipped) - MODEL-2 ship %: unchanged at 57% (gated on step 5g.3) Refs: - SPEC-SHIP-TWO-001 §61.8 (parent — defines Branch B) - contracts/apr-vs-gguf-forward-parity-v1.yaml v1.2.0 (sibling, PR #1608) - evidence/section-61-8-pred-fired-2026-05-10/findings.json (CLI evidence) Closes the Branch B bisection investigation. Follow-up: gguf-mode-collapse-v1 contract for the residual Italian-gibberish output (separate semantic-correctness invariant). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…nonical 7B teacher (PMAT-CODE-SHIP-008-DISCHARGE) §17.5 cascade follow-up #2 to PR #1608 (apr-vs-gguf-forward-parity-v1 v1.2.0) and PR #1612 (gguf-prompt-sensitivity-v1 v1.1.0). With the SHIP-007 §22 upstream blocker resolved on 2026-05-07 (M-FFN-GGUF-5 PR #1550) AND Branch B (§61.8 GGUF prompt-insensitive bug) resolved 2026-05-10 (PR #1612 — bug was CLI truncation artifact, not library bug), SHIP-008 is now LIVE-dispatch-ready. Five-Whys: 1. Why SHIP-008 still PARTIAL? Held on SHIP-007 §22 + Branch B bisection until both resolved. 2. Why upstream resolved? §60 closure (PR #1550 + #1556) fixed APR forward path to within H1 band; PR #1612 confirmed APR + ChatML produces clean conversational output through run_inference. 3. Why this AC after SHIP-002? SHIP-008 is the chat template render gate — exercises the ChatML auto-wrap path through inference. Independent of SHIP-005 (eval) and SHIP-007 (perf). 4. Why now? Per `feedback_compute_pre_authorized.md`, lambda-labs LIVE evidence dispatch is pre-authorized. Empirical evidence from PR #1612 already shows clean output for similar prompts. 5. Why use SHIP-008 canonical USER message ("Write a Python function to compute the nth Fibonacci number.")? It's the literal AC_SHIP1_008_CANONICAL_USER constant pinned in `crates/aprender-core/src/text/chat_template/ship_008.rs:36`. Using anything else would be off-spec. Evidence (LIVE 2026-05-10, noah-Lambda-Vector RTX 4090): - Binary: /mnt/nvme-raid0/targets/aprender/release/apr v0.32.0 (post-e856eb91f) - Artifact: /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr - Sha256: a394dd286732a5f32dfb983fd2ea0eeba4d6239ac4c47e44bcfe62f590ddeb28 - Size: 8,035,635,652 bytes (8.0 GB Q4K) - Command: `apr run <artifact> --prompt "Write a Python function to compute the nth Fibonacci number." --max-tokens 256` - Wall time: 82.97s (CPU fallback, CUDA path hit transient ILLEGAL_ADDRESS, wgpu rejected) - Output: 256-token ChatML response with: * Conversational opening: "Certainly! The Fibonacci sequence..." * Markdown ### headings (Iterative Approach / Recursive Approach / Example Usage / Explanation) * 3 ```python``` fenced code blocks (all parseable, 0 syntax errors) * 2 function definitions: fibonacci_iterative, fibonacci_recursive - Algorithm-level (existing): cargo test -p aprender-core --lib falsify_ship_008_chat_template_render_bind ✓ (1 passed) Changes: - contracts/chat-template-v1.yaml v1.2.0 → v1.3.0 - GATE-CHAT-SHIP-008.discharge_status: PARTIAL_ALGORITHM_LEVEL → DISCHARGED - + 4 evidence file paths in evidence_discharged_by - + new live_discharge: block (date, host, binary, artifact sha256, command, teacher_response_summary, wall_time, backend_path, upstream_blocker_resolved, branch_b_finding_resolved) - full_discharge_blocks_on: rewritten to record post-2026-05-10 LIVE state - description: prepended v1.3.0 changelog with full evidence summary - + reference to §60, §61.8, evidence directory - evidence/ship-008-discharge-2026-05-10/ (NEW directory): - discharge-evidence-v1.json (6-step verification chain + provenance) - apr-run-output.txt (raw apr run log) - completion.md (extracted ChatML teacher response) - parse-result.json (Python ast.parse + structural verdict per code block) Validation: - pv validate contracts/chat-template-v1.yaml ✓ (0 errors) - pv lint --strict-test-binding ✓ (PASS) - ast.parse on each ```python``` block ✓ (3/3 parseable, 0 syntax errors) - LIVE on canonical 7B teacher: reproducible via single apr run command Spec movement: - SHIP-TWO-001 MODEL-1 ship %: 92% → 93% (2 of 5 §17.5 PARTIALs LIVE-discharged; SHIP-005, SHIP-006, SHIP-007 remain). - MODEL-2 ship %: unchanged at 57% (gated on step 5g.3 val_loss < 9.38). Refs: - contracts/chat-template-v1.yaml v1.3.0 (this PR) - contracts/apr-vs-gguf-forward-parity-v1.yaml v1.2.0 (PR #1608, parent §17.5) - contracts/gguf-prompt-sensitivity-v1.yaml v1.1.0 (PR #1612, sibling §61.8) - evidence/ship-008-discharge-2026-05-10/ (this PR) - crates/aprender-core/src/text/chat_template/ship_008.rs (canonical golden + verdict fn) - SPEC-SHIP-TWO-001 §18.3 (MODEL-1 5/10 ACs blocked on SHIP-007) - SPEC-SHIP-TWO-001 §60 (SHIP-007 §22 closure) - SPEC-SHIP-TWO-001 §61.8 (Branch A vs Branch B taxonomy) Closes task #31 PMAT-CODE-SHIP-008-DISCHARGE. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…nonical 7B teacher (PMAT-CODE-SHIP-008-DISCHARGE) (#1614) §17.5 cascade follow-up #2 to PR #1608 (apr-vs-gguf-forward-parity-v1 v1.2.0) and PR #1612 (gguf-prompt-sensitivity-v1 v1.1.0). With the SHIP-007 §22 upstream blocker resolved on 2026-05-07 (M-FFN-GGUF-5 PR #1550) AND Branch B (§61.8 GGUF prompt-insensitive bug) resolved 2026-05-10 (PR #1612 — bug was CLI truncation artifact, not library bug), SHIP-008 is now LIVE-dispatch-ready. Five-Whys: 1. Why SHIP-008 still PARTIAL? Held on SHIP-007 §22 + Branch B bisection until both resolved. 2. Why upstream resolved? §60 closure (PR #1550 + #1556) fixed APR forward path to within H1 band; PR #1612 confirmed APR + ChatML produces clean conversational output through run_inference. 3. Why this AC after SHIP-002? SHIP-008 is the chat template render gate — exercises the ChatML auto-wrap path through inference. Independent of SHIP-005 (eval) and SHIP-007 (perf). 4. Why now? Per `feedback_compute_pre_authorized.md`, lambda-labs LIVE evidence dispatch is pre-authorized. Empirical evidence from PR #1612 already shows clean output for similar prompts. 5. Why use SHIP-008 canonical USER message ("Write a Python function to compute the nth Fibonacci number.")? It's the literal AC_SHIP1_008_CANONICAL_USER constant pinned in `crates/aprender-core/src/text/chat_template/ship_008.rs:36`. Using anything else would be off-spec. Evidence (LIVE 2026-05-10, noah-Lambda-Vector RTX 4090): - Binary: /mnt/nvme-raid0/targets/aprender/release/apr v0.32.0 (post-e856eb91f) - Artifact: /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr - Sha256: a394dd286732a5f32dfb983fd2ea0eeba4d6239ac4c47e44bcfe62f590ddeb28 - Size: 8,035,635,652 bytes (8.0 GB Q4K) - Command: `apr run <artifact> --prompt "Write a Python function to compute the nth Fibonacci number." --max-tokens 256` - Wall time: 82.97s (CPU fallback, CUDA path hit transient ILLEGAL_ADDRESS, wgpu rejected) - Output: 256-token ChatML response with: * Conversational opening: "Certainly! The Fibonacci sequence..." * Markdown ### headings (Iterative Approach / Recursive Approach / Example Usage / Explanation) * 3 ```python``` fenced code blocks (all parseable, 0 syntax errors) * 2 function definitions: fibonacci_iterative, fibonacci_recursive - Algorithm-level (existing): cargo test -p aprender-core --lib falsify_ship_008_chat_template_render_bind ✓ (1 passed) Changes: - contracts/chat-template-v1.yaml v1.2.0 → v1.3.0 - GATE-CHAT-SHIP-008.discharge_status: PARTIAL_ALGORITHM_LEVEL → DISCHARGED - + 4 evidence file paths in evidence_discharged_by - + new live_discharge: block (date, host, binary, artifact sha256, command, teacher_response_summary, wall_time, backend_path, upstream_blocker_resolved, branch_b_finding_resolved) - full_discharge_blocks_on: rewritten to record post-2026-05-10 LIVE state - description: prepended v1.3.0 changelog with full evidence summary - + reference to §60, §61.8, evidence directory - evidence/ship-008-discharge-2026-05-10/ (NEW directory): - discharge-evidence-v1.json (6-step verification chain + provenance) - apr-run-output.txt (raw apr run log) - completion.md (extracted ChatML teacher response) - parse-result.json (Python ast.parse + structural verdict per code block) Validation: - pv validate contracts/chat-template-v1.yaml ✓ (0 errors) - pv lint --strict-test-binding ✓ (PASS) - ast.parse on each ```python``` block ✓ (3/3 parseable, 0 syntax errors) - LIVE on canonical 7B teacher: reproducible via single apr run command Spec movement: - SHIP-TWO-001 MODEL-1 ship %: 92% → 93% (2 of 5 §17.5 PARTIALs LIVE-discharged; SHIP-005, SHIP-006, SHIP-007 remain). - MODEL-2 ship %: unchanged at 57% (gated on step 5g.3 val_loss < 9.38). Refs: - contracts/chat-template-v1.yaml v1.3.0 (this PR) - contracts/apr-vs-gguf-forward-parity-v1.yaml v1.2.0 (PR #1608, parent §17.5) - contracts/gguf-prompt-sensitivity-v1.yaml v1.1.0 (PR #1612, sibling §61.8) - evidence/ship-008-discharge-2026-05-10/ (this PR) - crates/aprender-core/src/text/chat_template/ship_008.rs (canonical golden + verdict fn) - SPEC-SHIP-TWO-001 §18.3 (MODEL-1 5/10 ACs blocked on SHIP-007) - SPEC-SHIP-TWO-001 §60 (SHIP-007 §22 closure) - SPEC-SHIP-TWO-001 §61.8 (Branch A vs Branch B taxonomy) Closes task #31 PMAT-CODE-SHIP-008-DISCHARGE. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 7, 2026 07:07

noahgift force-pushed the feat/m-ffn-gguf-5b-qkv-q4k-closure branch from 5da2486 to 4447f7d Compare May 7, 2026 07:23

noahgift force-pushed the feat/m-ffn-gguf-5b-qkv-q4k-closure branch from 4447f7d to d47e92c Compare May 7, 2026 07:52

noahgift merged commit a68252e into main May 7, 2026
10 checks passed

noahgift deleted the feat/m-ffn-gguf-5b-qkv-q4k-closure branch May 7, 2026 08:12

This was referenced May 7, 2026

docs(M104): M-FFN-GGUF-5b QKV F32 gap closure — layer-3 ratio 1.245× → 1.2059× paiml/claude-code-parity-apr#90

Merged

feat(contracts): GGUF prompt-sensitivity v1.1.0 — falsifier RED→GREEN refines §61.8 picture #1612

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(M-FFN-GGUF-5b): SHIP-007 §22 closure — QKV split-Q4K dispatch in forward_traced + production forward()#1556

fix(M-FFN-GGUF-5b): SHIP-007 §22 closure — QKV split-Q4K dispatch in forward_traced + production forward()#1556
noahgift merged 1 commit into
mainfrom
feat/m-ffn-gguf-5b-qkv-q4k-closure

noahgift commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 7, 2026

Summary

What changes

New helper: qkv_split_q4k_traced (mod_apr_transformer.rs)

Two call-site swaps

Empirical verification

Build + lib tests

LIVE on canonical 7B (lambda-vector RTX 4090, 180s)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New helper: `qkv_split_q4k_traced` (mod_apr_transformer.rs)