...
batch. #new-seq: 1, #new-token: 32, #cached-token: 0, token usage: 0.00, #running-req: 8, #queue-req: 119
[2025-07-02 03:22:31] Prefill batch. #new-seq: 1, #new-token: 32, #cached-token: 0, token usage: 0.00, #running-req: 8, #queue-req: 119
...
[2025-07-02 03:22:34] Prefill batch. #new-seq: 2, #new-token: 32, #cached-token: 0, token usage: 0.01, #running-req: 45, #queue-req: 81
Memory access fault by GPU node-2 (Agent handle: 0xdedb180) on address 0x7f57d9a00000. Reason: Unknown.
Checklist
Describe the bug
The CI unit-test-backend-1-gpu-amd failed when run
test/srt/test_no_overlap_scheduler.py. It exits with a GPU memory access fault on node-2.Error snippet:
@hubertlu-tw suggust temporarily disable the test in AMD CI to avoid blocking other PRs.
Reproduction
Sample failure run: https://github.com/sgl-project/sglang/actions/runs/15965626491/job/45029188833
Environment
lmsysorg/sglang:v0.4.8.post1-rocm630CC: @saienduri @HaiShaw @hubertlu-tw