[Bug] Piecewise CUDA graph replay crashes with FlashInfer ≥0.6.6: q.shape[0] does not match qo_indptr[-1] in paged prefill

### Checklist

- [ ] I searched related issues but found no solution.
- [ ] The bug persists in the latest version.
- [ ] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [ ] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [ ] Please use English. Otherwise, it will be closed.

### Describe the bug

When using --attention-backend flashinfer with piecewise CUDA graph enabled (default), the server crashes during replay with:

ValueError: q.shape[0] (8) does not match qo_indptr[-1] (6).
For paged prefill, q must have shape [total_tokens, num_heads, head_dim]
where total_tokens = qo_indptr[-1].

FlashInfer [PR #2801](https://github.com/flashinfer-ai/flashinfer/pull/2801) (merged 2026-03-23) added explicit shape validation in prefill.run() to catch what was previously a silent out-of-bounds read. The validation now raises ValueError when q.shape[0] != qo_indptr[-1].

Add --disable-piecewise-cuda-graph (already documented in SGLang's own error message) as workaround now.

### Reproduction

python -m sglang.launch_server \
  --model-path Qwen/Qwen3-14B \
  --attention-backend flashinfer \
  --disable-cuda-graph


### Environment

SGLang: latest main
FlashInfer: latest main
GPU: 4× NVIDIA B200 (SM100, Compute 10.0)
PyTorch: 2.9.1+cu128
Model: Qwen/Qwen3-14B, tp=1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Piecewise CUDA graph replay crashes with FlashInfer ≥0.6.6: q.shape[0] does not match qo_indptr[-1] in paged prefill #21218

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Piecewise CUDA graph replay crashes with FlashInfer ≥0.6.6: q.shape[0] does not match qo_indptr[-1] in paged prefill #21218

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions