spec(M32d): KV cache for qwen3_moe inference path — scope + operator decision doc by noahgift · Pull Request #1826 · paiml/aprender

noahgift · 2026-05-19T20:51:06Z

Summary

Scope doc + operator decision matrix for M32d (KV cache on the qwen3_moe inference path). This is the upstream blocker for FALSIFY-QWEN3_MOE_SERVE_DISPATCH_V1_004 in `contracts/qwen3-moe-serve-dispatch-v1.yaml` v1.1.1.

Not an implementation PR. Documents the work + estimates effort/risk so operator can choose go/no-go without re-deriving the analysis.

Why now

The empirical evidence chain (5 Phase 6 dispatches across the post-#1789 fix series — #1806, #1812, #1814, #1819) shows that the 30B-MoE student fails uniformly at the per-turn timeout, regardless of how the timeouts are tuned. Root cause: full-prefill-per-token at ~0.5 tok/s. Symptom-class progression documented in paiml/claude-code-parity-apr evidence/phase-6/30b-moe-empirical-2026-05-19.md.

The dense path already has `OwnedQuantizedKVCache` + `forward_single_with_cache`. The MoE path has neither.

What this scope covers

Dense-path reference: file refs + line numbers for the existing dense KV cache integration
API inventory: `OwnedQuantizedKVCache` is sufficient as-is; no struct changes needed
5 implementation steps: function skeleton → attention helper lift → MoE FFN helper lift → generate-loop wire → tests
6 risk surfaces: numerical equivalence, dense path regression, RoPE position offset, GQA shapes, expert routing under cache, free streaming SSE
Effort estimate: 8 focused engineering hours total
3 operator decisions: greenlight in-session vs engineer-driven follow-up vs skip

What this does NOT cover

Perf tuning beyond 5-15 tok/s baseline
Streaming SSE delivery (natural follow-up; one-line addition once KV cache lands)
GPU MoE acceleration (separate `qwen3-moe-forward-gpu-v1` contract + M-GPU-MOE-2.x track)

Test plan

doc-only change; no code touched
CI: doc/spec markdown lint (if configured)

🤖 Generated with Claude Code

…decision doc Scopes the M32d work that's currently blocking contracts/qwen3-moe-serve-dispatch-v1.yaml V1_004 (CCPA Phase 6 bench non-zero student pass rate against Qwen3-Coder-30B-A3B). Empirical finding (paiml/claude-code-parity-apr Phase 6, 5 dispatches across the post-#1789 fix chain): 30B-MoE full-prefill-per-token at ~0.5 tok/s cannot fit any reasonable per-turn budget. The dense path already has `OwnedQuantizedKVCache` + `forward_single_with_cache`; the MoE path has neither and re-runs the whole prompt on every token. This scope doc: - Surveys the dense KV cache code path (file refs + line numbers) - Inventories the OwnedQuantizedKVCache API (sufficient as-is) - Lays out 5 implementation steps (function skeleton → attention helper lift → MoE FFN helper lift → generate-loop wire → tests) - Identifies 6 risk surfaces (numerical equivalence, dense regression, RoPE offset, GQA shapes, expert routing under cache, streaming SSE) - Estimates 8 focused engineering hours total - Presents three operator decisions (greenlight in-session vs engineer-driven follow-up vs skip) NOT an implementation PR. Documents the work so operator can choose go/no-go without re-deriving the analysis. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift and others added 4 commits May 19, 2026 22:50

Merge branch 'main' into spec/m32d-moe-kv-cache-scope

aac57da

Merge branch 'main' into spec/m32d-moe-kv-cache-scope

542b426

Merge branch 'main' into spec/m32d-moe-kv-cache-scope

616e534

noahgift merged commit 33aee24 into main May 19, 2026
10 checks passed

noahgift deleted the spec/m32d-moe-kv-cache-scope branch May 19, 2026 23:18

noahgift mentioned this pull request May 20, 2026

M32d: KV cache for qwen3_moe inference path (engineer-driven, 1-2 week) #1830

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec(M32d): KV cache for qwen3_moe inference path — scope + operator decision doc#1826

spec(M32d): KV cache for qwen3_moe inference path — scope + operator decision doc#1826
noahgift merged 4 commits into
mainfrom
spec/m32d-moe-kv-cache-scope

noahgift commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 19, 2026

Summary

Why now

What this scope covers

What this does NOT cover

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant