[Bug] HiCache CUDA illegal memory

### Checklist

- [x] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [x] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [x] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Describe the bug



error:
[2026-02-04 19:36:48 DP7 PP0 TP7] Decode batch, #running-req: 52, #token: 382208, token usage: 0.94, cuda graph: False, gen throughput (token/s): 112.22, #queue-req: 215, 
[2026-02-04 19:36:48 DP5 PP0 TP5] Decode batch, #running-req: 56, #token: 378880, token usage: 0.93, cuda graph: False, gen throughput (token/s): 120.51, #queue-req: 226, 
[2026-02-04 19:36:48 DP3 PP0 TP3] Prefill batch, #new-seq: 1, #new-token: 3264, #cached-token: 640, token usage: 0.92, #running-req: 50, #queue-req: 203, 
[2026-02-04 19:36:48 DP6 PP0 TP6] Decode batch, #running-req: 56, #token: 379200, token usage: 0.93, cuda graph: False, gen throughput (token/s): 122.81, #queue-req: 254, 
[2026-02-04 19:36:48 DP4 PP0 TP4] Decode batch, #running-req: 56, #token: 383232, token usage: 0.94, cuda graph: False, gen throughput (token/s): 118.84, #queue-req: 200, 
[2026-02-04 19:36:48 DP1 PP0 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/local-ssd/pv0/python/sglang/srt/managers/scheduler.py", line 2974, in run_scheduler_process
    scheduler.event_loop_pp()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/local-ssd/pv0/python/sglang/srt/managers/scheduler_pp_mixin.py", line 90, in event_loop_pp
    self.mbs[mb_id] = self.get_next_batch_to_run()
  File "/local-ssd/pv0/python/sglang/srt/managers/scheduler.py", line 1880, in get_next_batch_to_run
    ret = self.maybe_prepare_mlp_sync_batch_and_log_stats(
  File "/local-ssd/pv0/python/sglang/srt/managers/scheduler_dp_attn_mixin.py", line 253, in maybe_prepare_mlp_sync_batch_and_log_stats
    batch = self.prepare_mlp_sync_batch(batch)
  File "/local-ssd/pv0/python/sglang/srt/managers/scheduler_dp_attn_mixin.py", line 225, in prepare_mlp_sync_batch
    return prepare_mlp_sync_batch_raw(
  File "/local-ssd/pv0/python/sglang/srt/managers/scheduler_dp_attn_mixin.py", line 197, in prepare_mlp_sync_batch_raw
    mlp_sync_info.all_gather(device=device, group=group)
  File "/local-ssd/pv0/python/sglang/srt/managers/scheduler_dp_attn_mixin.py", line 73, in all_gather
    local_info_tensor = self._get_local_tensor(device=device)
  File "/local-ssd/pv0/python/sglang/srt/managers/scheduler_dp_attn_mixin.py", line 45, in _get_local_tensor
    return torch.tensor(
torch.AcceleratorError: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

### Reproduction

python3 -m sglang.launch_server \
        --model-path /local_ssd/DeepSeek-V3.2 \
        --nccl-init-addr "$MASTER_IP_ADDRESS:20000" \
        --nnodes 4 \
        --node-rank "$RANK" \
        --trust-remote-code \
        --host 0.0.0.0 \
        --schedule-policy fcfs \
        --port "$PORT" \
        --decode-log-interval 1 \
        --context-length 128000 \
        --tokenizer-worker-num 4 \
        $ARGS

ARGS = --enable-hierarchical-cache --hicache-ratio 3 --cuda-graph-bs 1 2 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 72 80 88 96 104 112 120 128 --tp-size 8 --pp-size 4 --pp-async-batch-depth 1 --dp-size 8 --enable-dp-attention --max-running-requests 5120 --pp-max-micro-batch-size 1024  --chunked-prefill-size 32768  --schedule-conservativeness 3.333 --tokenizer-worker-num 1 --tool-call-parser deepseekv32 --mem-fraction-static 0.82 --reasoning-parser deepseek-v3 --disable-custom-all-reduce

### Environment

sglang-0.5.8 / H800 / 32cards
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] HiCache CUDA illegal memory #18166

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] HiCache CUDA illegal memory #18166

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions