Skip to content

[Bug] KIMI-K2.5 can't use context parallel #22692

@zhaotyer

Description

@zhaotyer

Checklist

  • I searched related issues but found no solution.
  • The bug persists in the latest version.
  • Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
  • If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • Please use English. Otherwise, it will be closed.

Describe the bug

Deploying kimi-k2.5 with context parallelism will result in an error.

[2026-04-13 10:35:43 PP1 ATTN_CP2 TP2] Decode batch, #running-req: 1, #token: 25088, token usage: 0.02, cuda graph: True, gen throughput (token/s): 21.39, #queue-req: 0
[2026-04-13 10:35:43 PP1 ATTN_CP3 TP3] Decode batch, #running-req: 1, #token: 25088, token usage: 0.02, cuda graph: True, gen throughput (token/s): 21.39, #queue-req: 0
[2026-04-13 10:35:43 PP1 ATTN_CP6 TP6] Decode batch, #running-req: 1, #token: 25088, token usage: 0.02, cuda graph: True, gen throughput (token/s): 21.38, #queue-req: 0
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:112: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:112: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:112: _assert_async_cuda_kernel:112: block: [0: _assert_async_cuda_kernel,0,0: block: [0], thread: [0,0,0,0,0], thread: [0] Assertion `probability tensor contains either `inf`, `nan` or element < 0,0` failed.
,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:112: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
[2026-04-13 10:35:45 PP1 ATTN_CP4 TP4] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
    scheduler.run_event_loop()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
    dispatch_event_loop(self)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3495, in dispatch_event_loop
    scheduler.event_loop_pp()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_pp_mixin.py", line 122, in event_loop_pp
    d2h_event.synchronize()
  File "/usr/local/lib/python3.12/dist-packages/torch/cuda/streams.py", line 231, in synchronize
    super().synchronize()
torch.AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


[2026-04-13 10:35:45 PP1 ATTN_CP0 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
    scheduler.run_event_loop()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
    dispatch_event_loop(self)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3495, in dispatch_event_loop
    scheduler.event_loop_pp()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_pp_mixin.py", line 122, in event_loop_pp
    d2h_event.synchronize()
  File "/usr/local/lib/python3.12/dist-packages/torch/cuda/streams.py", line 231, in synchronize
    super().synchronize()
torch.AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


[2026-04-13 10:35:45 PP1 ATTN_CP6 TP6] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
    scheduler.run_event_loop()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
    dispatch_event_loop(self)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3495, in dispatch_event_loop
    scheduler.event_loop_pp()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_pp_mixin.py", line 122, in event_loop_pp
    d2h_event.synchronize()
  File "/usr/local/lib/python3.12/dist-packages/torch/cuda/streams.py", line 231, in synchronize
    super().synchronize()
torch.AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


[2026-04-13 10:35:45 PP1 ATTN_CP7 TP7] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
    scheduler.run_event_loop()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
    dispatch_event_loop(self)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3495, in dispatch_event_loop
    scheduler.event_loop_pp()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_pp_mixin.py", line 122, in event_loop_pp
    d2h_event.synchronize()
  File "/usr/local/lib/python3.12/dist-packages/torch/cuda/streams.py", line 231, in synchronize
    super().synchronize()
torch.AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


terminate called after throwing an instance of 'c10::AcceleratorError'
terminate called after throwing an instance of 'c10::AcceleratorError'
terminate called after throwing an instance of 'c10::AcceleratorError'
terminate called after throwing an instance of 'c10::AcceleratorError'
[2026-04-13 10:35:45 PP1 ATTN_CP1 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
    scheduler.run_event_loop()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
    dispatch_event_loop(self)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3495, in dispatch_event_loop
    scheduler.event_loop_pp()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_pp_mixin.py", line 122, in event_loop_pp
    d2h_event.synchronize()
  File "/usr/local/lib/python3.12/dist-packages/torch/cuda/streams.py", line 231, in synchronize
    super().synchronize()
torch.AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


  what():  CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x7f2a8f57cb80 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x11fb7 (0x7f2a8f90efb7 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1db4e (0x7f2a8f91ab4e in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1fac2 (0x7f2a8f91cac2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #4: <unknown function> + 0x4827af (0x7f2a8130b7af in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #5: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f2a8f559d69 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #6: <unknown function> + 0x7cb658 (0x7f2a81654658 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x7cb9c5 (0x7f2a816549c5 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #8: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x575eae]
frame #9: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x575bfc]
frame #10: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x579982]
frame #11: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x59edf9]
frame #12: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x558da1]
frame #13: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x610665]
frame #14: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x610675]
frame #15: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x610675]
frame #16: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x610675]
frame #17: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x610675]
frame #18: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x55331b]
frame #19: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x59ef53]
frame #20: _PyEval_EvalFrameDefault + 0x502d (0x5db4ad in sglang::scheduler_PP1_ATTN_CP4_TP4)
frame #21: PyEval_EvalCode + 0x15b (0x5d543b in sglang::scheduler_PP1_ATTN_CP4_TP4)
frame #22: PyRun_StringFlags + 0xd3 (0x6084b3 in sglang::scheduler_PP1_ATTN_CP4_TP4)
frame #23: PyRun_SimpleStringFlags + 0x3e (0x6b3d0e in sglang::scheduler_PP1_ATTN_CP4_TP4)
frame #24: Py_RunMain + 0x481 (0x6bc9d1 in sglang::scheduler_PP1_ATTN_CP4_TP4)
frame #25: Py_BytesMain + 0x2d (0x6bc3ed in sglang::scheduler_PP1_ATTN_CP4_TP4)
frame #26: <unknown function> + 0x2a1ca (0x7f2c11d9a1ca in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #27: __libc_start_main + 0x8b (0x7f2c11d9a28b in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #28: _start + 0x25 (0x6576c5 in sglang::scheduler_PP1_ATTN_CP4_TP4)

Fatal Python error: Aborted

Thread 0x00007efce3fff6c0 (most recent call first):
  File "/sgl-workspace/sglang/python/sglang/srt/utils/watchdog.py", line 147 in _watchdog_once
  File "/sgl-workspace/sglang/python/sglang/srt/utils/watchdog.py", line 127 in _watchdog_thread
  File "/usr/lib/python3.12/threading.py", line 1010 in run
  File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/usr/lib/python3.12/threading.py", line 1030 in _bootstrap

Thread 0x00007efceffff6c0 (most recent call first):
  File "/usr/lib/python3.12/threading.py", line 359 in wait
  File "/usr/lib/python3.12/queue.py", line 180 in get
  File "/sgl-workspace/sglang/python/sglang/srt/managers/cache_controller.py", line 1047 in backup_thread_func
  File "/usr/lib/python3.12/threading.py", line 1010 in run
  File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
terminate called after throwing an instance of '  File "/usr/lib/python3.12/threading.pyc10::AcceleratorError", line '
1030 in _bootstrap

Thread 0x00007efcfbffd6c0 (most recent call first):
  File "/usr/lib/python3.12/threading.py", line 359 in wait
  File "/usr/lib/python3.12/queue.py", line 180 in get
  File "/sgl-workspace/sglang/python/sglang/srt/managers/cache_controller.py", line 886 in prefetch_io_aux_func
  File "/usr/lib/python3.12/threading.py", line 1010 in run
  File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/usr/lib/python3.12/threading.py", line 1030 in _bootstrap

Thread 0x00007efcffffe6c0 (most recent call first):
  File "/usr/lib/python3.12/threading.py", line 359 in wait
  File "/usr/lib/python3.12/queue.py", line 180 in get
  File "/sgl-workspace/sglang/python/sglang/srt/managers/cache_controller.py", line 950 in prefetch_thread_func
  File "[2026-04-13 10:35:45] Received sigquit from a child process. It usually means the child failed.
/usr/lib/python3.12/threading.py", line 1010 in run
  File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/usr/lib/python3.12/threading.py", line 1030 in _bootstrap

Thread 0x00007f15e7fff6c0 (most recent call first):
  File "/usr/lib/python3.12/threading.py", line 359 in wait
  File "/usr/lib/python3.12/threading.py", line 655 in wait
  File "/usr/local/lib/python3.12/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/usr/lib/python3.12/threading.py", line 1030 in _bootstrap

Current thread 0x00007f2c11d6f300 (most recent call first):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3621 in run_scheduler_process
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.12/multiprocessing/spawn.py", line 135 in _main
  File "/usr/lib/python3.12/multiprocessing/spawn.py", line 122 in spawn_main
  File "<string>", line 1 in <module>
  what():  CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x7fd1a4b91b80 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x11fb7 (0x7fd21ef66fb7 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1db4e (0x7fd21ef72b4e in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1fac2 (0x7fd21ef74ac2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #4: <unknown function> + 0x4827af (0x7fd210d0b7af in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #5: c10::TensorImpl::~TensorImpl() + 0x9 (0x7fd1a4b6ed69 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #6: <unknown function> + 0x7cb658 (0x7fd211054658 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x7cb9c5 (0x7fd2110549c5 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #8: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x575eae]
frame #9: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x575bfc]
frame #10: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x579982]
frame #11: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x59edf9]
frame #12: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x558da1]
frame #13: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x610665]
frame #14: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x610675]
frame #15: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x610675]
frame #16: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x610675]
frame #17: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x610675]
frame #18: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x55331b]
frame #19: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x59ef53]
frame #20: _PyEval_EvalFrameDefault + 0x502d (0x5db4ad in sglang::scheduler_PP1_ATTN_CP7_TP7)
frame #21: PyEval_EvalCode + 0x15b (0x5d543b in sglang::scheduler_PP1_ATTN_CP7_TP7)
frame #22: PyRun_StringFlags + 0xd3 (0x6084b3 in sglang::scheduler_PP1_ATTN_CP7_TP7)
frame #23: PyRun_SimpleStringFlags + 0x3e (0x6b3d0e in sglang::scheduler_PP1_ATTN_CP7_TP7)
frame #24: Py_RunMain + 0x481 (0x6bc9d1 in sglang::scheduler_PP1_ATTN_CP7_TP7)
frame #25: Py_BytesMain + 0x2d (0x6bc3ed in sglang::scheduler_PP1_ATTN_CP7_TP7)
frame #26: <unknown function> + 0x2a1ca (0x7fd3a16b61ca in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #27: __libc_start_main + 0x8b (0x7fd3a16b628b in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #28: _start + 0x25 (0x6576c5 in sglang::scheduler_PP1_ATTN_CP7_TP7)

Fatal Python error: Aborted

Reproduction

start command

              python3 -m sglang.launch_server \
              --model /models/Kimi-K2.5  \
              --dist-init-addr $LWS_LEADER_ADDRESS:20000 \
              --tensor-parallel-size 8 \
              --pp-size 2 \
              --nnodes $LWS_GROUP_SIZE  \
              --node-rank $LWS_WORKER_INDEX \
              --trust-remote-code \
              --host 0.0.0.0 \
              --port 8000 \
              --dist-timeout 7200 \
              --enable-metrics \
              --reasoning-parser kimi_k2 \
              --tool-call-parser kimi_k2 \
              --mem-fraction-static 0.85 \
              --log-requests --log-requests-leve 1 \
              --kv-cache-dtype fp8_e4m3 \
              --enable-hierarchical-cache \
              --hicache-ratio 1 \
              --hicache-write-policy write_through \
              --hicache-storage-backend mooncake \
              --page-size 64 \
              --served-model-name kimi-2.5 \
              --enable-cache-report \
              --allow-auto-truncate \
              --preferred-sampling-params '{"max_new_tokens": 8192}' \
              --dp-size 1 --moe-dense-tp-size 1 \
              --attn-cp-size 8 \
              --enable-prefill-context-parallel

Environment

Python: 3.12.3 (main, Mar  3 2026, 12:15:18) [GCC 13.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H100 80GB HBM3
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 570.124.06
PyTorch: 2.9.1+cu129
sglang: 0.5.10
sglang-kernel: 0.4.1
flashinfer_python: 0.6.7.post2
flashinfer_cubin: 0.6.7.post2
flashinfer_jit_cache: 0.6.7.post2+cu129
triton: 3.5.1
transformers: 5.3.0
torchao: 0.9.0
numpy: 2.3.5
aiohttp: 3.13.5
fastapi: 0.135.3
huggingface_hub: 1.9.0
interegular: 0.3.3
modelscope: 1.35.3
orjson: 3.11.8
outlines: 0.1.11
packaging: 26.0
psutil: 7.2.2
pydantic: 2.12.5
python-multipart: 0.0.22
pyzmq: 27.1.0
uvicorn: 0.43.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.32
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.89.0
litellm: Module Not Found
torchcodec: 0.9.1
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    NIC6    NIC7    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV18    NV18    NV18    NV18    NV18    NV18    NV18    PIX     PIX     NODE    NODE    SYS     SYS     SYS     SYS     0-47,96-143     0               N/A
GPU1    NV18     X      NV18    NV18    NV18    NV18    NV18    NV18    NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     0-47,96-143     0               N/A
GPU2    NV18    NV18     X      NV18    NV18    NV18    NV18    NV18    NODE    NODE    PIX     PIX     SYS     SYS     SYS     SYS     0-47,96-143     0               N/A
GPU3    NV18    NV18    NV18     X      NV18    NV18    NV18    NV18    NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     0-47,96-143     0               N/A
GPU4    NV18    NV18    NV18    NV18     X      NV18    NV18    NV18    SYS     SYS     SYS     SYS     PIX     PIX     NODE    NODE    48-95,144-191   1               N/A
GPU5    NV18    NV18    NV18    NV18    NV18     X      NV18    NV18    SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    48-95,144-191   1               N/A
GPU6    NV18    NV18    NV18    NV18    NV18    NV18     X      NV18    SYS     SYS     SYS     SYS     NODE    NODE    PIX     PIX     48-95,144-191   1               N/A
GPU7    NV18    NV18    NV18    NV18    NV18    NV18    NV18     X      SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    48-95,144-191   1               N/A
NIC0    PIX     NODE    NODE    NODE    SYS     SYS     SYS     SYS      X      PIX     NODE    NODE    SYS     SYS     SYS     SYS
NIC1    PIX     NODE    NODE    NODE    SYS     SYS     SYS     SYS     PIX      X      NODE    NODE    SYS     SYS     SYS     SYS
NIC2    NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS     NODE    NODE     X      PIX     SYS     SYS     SYS     SYS
NIC3    NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS     NODE    NODE    PIX      X      SYS     SYS     SYS     SYS
NIC4    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE    SYS     SYS     SYS     SYS      X      PIX     NODE    NODE
NIC5    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE    SYS     SYS     SYS     SYS     PIX      X      NODE    NODE
NIC6    SYS     SYS     SYS     SYS     NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS     NODE    NODE     X      PIX
NIC7    SYS     SYS     SYS     SYS     NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS     NODE    NODE    PIX      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_3
  NIC4: mlx5_4
  NIC5: mlx5_5
  NIC6: mlx5_6
  NIC7: mlx5_7


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions