Deploying kimi-k2.5 with context parallelism will result in an error.
[2026-04-13 10:35:43 PP1 ATTN_CP2 TP2] Decode batch, #running-req: 1, #token: 25088, token usage: 0.02, cuda graph: True, gen throughput (token/s): 21.39, #queue-req: 0
[2026-04-13 10:35:43 PP1 ATTN_CP3 TP3] Decode batch, #running-req: 1, #token: 25088, token usage: 0.02, cuda graph: True, gen throughput (token/s): 21.39, #queue-req: 0
[2026-04-13 10:35:43 PP1 ATTN_CP6 TP6] Decode batch, #running-req: 1, #token: 25088, token usage: 0.02, cuda graph: True, gen throughput (token/s): 21.38, #queue-req: 0
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:112: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:112: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:112: _assert_async_cuda_kernel:112: block: [0: _assert_async_cuda_kernel,0,0: block: [0], thread: [0,0,0,0,0], thread: [0] Assertion `probability tensor contains either `inf`, `nan` or element < 0,0` failed.
,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:112: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
[2026-04-13 10:35:45 PP1 ATTN_CP4 TP4] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
scheduler.run_event_loop()
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
dispatch_event_loop(self)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3495, in dispatch_event_loop
scheduler.event_loop_pp()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_pp_mixin.py", line 122, in event_loop_pp
d2h_event.synchronize()
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/streams.py", line 231, in synchronize
super().synchronize()
torch.AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2026-04-13 10:35:45 PP1 ATTN_CP0 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
scheduler.run_event_loop()
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
dispatch_event_loop(self)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3495, in dispatch_event_loop
scheduler.event_loop_pp()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_pp_mixin.py", line 122, in event_loop_pp
d2h_event.synchronize()
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/streams.py", line 231, in synchronize
super().synchronize()
torch.AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2026-04-13 10:35:45 PP1 ATTN_CP6 TP6] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
scheduler.run_event_loop()
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
dispatch_event_loop(self)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3495, in dispatch_event_loop
scheduler.event_loop_pp()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_pp_mixin.py", line 122, in event_loop_pp
d2h_event.synchronize()
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/streams.py", line 231, in synchronize
super().synchronize()
torch.AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2026-04-13 10:35:45 PP1 ATTN_CP7 TP7] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
scheduler.run_event_loop()
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
dispatch_event_loop(self)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3495, in dispatch_event_loop
scheduler.event_loop_pp()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_pp_mixin.py", line 122, in event_loop_pp
d2h_event.synchronize()
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/streams.py", line 231, in synchronize
super().synchronize()
torch.AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
terminate called after throwing an instance of 'c10::AcceleratorError'
terminate called after throwing an instance of 'c10::AcceleratorError'
terminate called after throwing an instance of 'c10::AcceleratorError'
terminate called after throwing an instance of 'c10::AcceleratorError'
[2026-04-13 10:35:45 PP1 ATTN_CP1 TP1] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
scheduler.run_event_loop()
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
dispatch_event_loop(self)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3495, in dispatch_event_loop
scheduler.event_loop_pp()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_pp_mixin.py", line 122, in event_loop_pp
d2h_event.synchronize()
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/streams.py", line 231, in synchronize
super().synchronize()
torch.AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
what(): CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x7f2a8f57cb80 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x11fb7 (0x7f2a8f90efb7 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1db4e (0x7f2a8f91ab4e in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1fac2 (0x7f2a8f91cac2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #4: <unknown function> + 0x4827af (0x7f2a8130b7af in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #5: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f2a8f559d69 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #6: <unknown function> + 0x7cb658 (0x7f2a81654658 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x7cb9c5 (0x7f2a816549c5 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #8: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x575eae]
frame #9: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x575bfc]
frame #10: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x579982]
frame #11: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x59edf9]
frame #12: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x558da1]
frame #13: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x610665]
frame #14: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x610675]
frame #15: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x610675]
frame #16: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x610675]
frame #17: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x610675]
frame #18: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x55331b]
frame #19: sglang::scheduler_PP1_ATTN_CP4_TP4() [0x59ef53]
frame #20: _PyEval_EvalFrameDefault + 0x502d (0x5db4ad in sglang::scheduler_PP1_ATTN_CP4_TP4)
frame #21: PyEval_EvalCode + 0x15b (0x5d543b in sglang::scheduler_PP1_ATTN_CP4_TP4)
frame #22: PyRun_StringFlags + 0xd3 (0x6084b3 in sglang::scheduler_PP1_ATTN_CP4_TP4)
frame #23: PyRun_SimpleStringFlags + 0x3e (0x6b3d0e in sglang::scheduler_PP1_ATTN_CP4_TP4)
frame #24: Py_RunMain + 0x481 (0x6bc9d1 in sglang::scheduler_PP1_ATTN_CP4_TP4)
frame #25: Py_BytesMain + 0x2d (0x6bc3ed in sglang::scheduler_PP1_ATTN_CP4_TP4)
frame #26: <unknown function> + 0x2a1ca (0x7f2c11d9a1ca in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #27: __libc_start_main + 0x8b (0x7f2c11d9a28b in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #28: _start + 0x25 (0x6576c5 in sglang::scheduler_PP1_ATTN_CP4_TP4)
Fatal Python error: Aborted
Thread 0x00007efce3fff6c0 (most recent call first):
File "/sgl-workspace/sglang/python/sglang/srt/utils/watchdog.py", line 147 in _watchdog_once
File "/sgl-workspace/sglang/python/sglang/srt/utils/watchdog.py", line 127 in _watchdog_thread
File "/usr/lib/python3.12/threading.py", line 1010 in run
File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
File "/usr/lib/python3.12/threading.py", line 1030 in _bootstrap
Thread 0x00007efceffff6c0 (most recent call first):
File "/usr/lib/python3.12/threading.py", line 359 in wait
File "/usr/lib/python3.12/queue.py", line 180 in get
File "/sgl-workspace/sglang/python/sglang/srt/managers/cache_controller.py", line 1047 in backup_thread_func
File "/usr/lib/python3.12/threading.py", line 1010 in run
File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
terminate called after throwing an instance of ' File "/usr/lib/python3.12/threading.pyc10::AcceleratorError", line '
1030 in _bootstrap
Thread 0x00007efcfbffd6c0 (most recent call first):
File "/usr/lib/python3.12/threading.py", line 359 in wait
File "/usr/lib/python3.12/queue.py", line 180 in get
File "/sgl-workspace/sglang/python/sglang/srt/managers/cache_controller.py", line 886 in prefetch_io_aux_func
File "/usr/lib/python3.12/threading.py", line 1010 in run
File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
File "/usr/lib/python3.12/threading.py", line 1030 in _bootstrap
Thread 0x00007efcffffe6c0 (most recent call first):
File "/usr/lib/python3.12/threading.py", line 359 in wait
File "/usr/lib/python3.12/queue.py", line 180 in get
File "/sgl-workspace/sglang/python/sglang/srt/managers/cache_controller.py", line 950 in prefetch_thread_func
File "[2026-04-13 10:35:45] Received sigquit from a child process. It usually means the child failed.
/usr/lib/python3.12/threading.py", line 1010 in run
File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
File "/usr/lib/python3.12/threading.py", line 1030 in _bootstrap
Thread 0x00007f15e7fff6c0 (most recent call first):
File "/usr/lib/python3.12/threading.py", line 359 in wait
File "/usr/lib/python3.12/threading.py", line 655 in wait
File "/usr/local/lib/python3.12/dist-packages/tqdm/_monitor.py", line 60 in run
File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
File "/usr/lib/python3.12/threading.py", line 1030 in _bootstrap
Current thread 0x00007f2c11d6f300 (most recent call first):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3621 in run_scheduler_process
File "/usr/lib/python3.12/multiprocessing/process.py", line 108 in run
File "/usr/lib/python3.12/multiprocessing/process.py", line 314 in _bootstrap
File "/usr/lib/python3.12/multiprocessing/spawn.py", line 135 in _main
File "/usr/lib/python3.12/multiprocessing/spawn.py", line 122 in spawn_main
File "<string>", line 1 in <module>
what(): CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x7fd1a4b91b80 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x11fb7 (0x7fd21ef66fb7 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1db4e (0x7fd21ef72b4e in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1fac2 (0x7fd21ef74ac2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #4: <unknown function> + 0x4827af (0x7fd210d0b7af in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #5: c10::TensorImpl::~TensorImpl() + 0x9 (0x7fd1a4b6ed69 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #6: <unknown function> + 0x7cb658 (0x7fd211054658 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x7cb9c5 (0x7fd2110549c5 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #8: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x575eae]
frame #9: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x575bfc]
frame #10: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x579982]
frame #11: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x59edf9]
frame #12: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x558da1]
frame #13: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x610665]
frame #14: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x610675]
frame #15: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x610675]
frame #16: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x610675]
frame #17: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x610675]
frame #18: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x55331b]
frame #19: sglang::scheduler_PP1_ATTN_CP7_TP7() [0x59ef53]
frame #20: _PyEval_EvalFrameDefault + 0x502d (0x5db4ad in sglang::scheduler_PP1_ATTN_CP7_TP7)
frame #21: PyEval_EvalCode + 0x15b (0x5d543b in sglang::scheduler_PP1_ATTN_CP7_TP7)
frame #22: PyRun_StringFlags + 0xd3 (0x6084b3 in sglang::scheduler_PP1_ATTN_CP7_TP7)
frame #23: PyRun_SimpleStringFlags + 0x3e (0x6b3d0e in sglang::scheduler_PP1_ATTN_CP7_TP7)
frame #24: Py_RunMain + 0x481 (0x6bc9d1 in sglang::scheduler_PP1_ATTN_CP7_TP7)
frame #25: Py_BytesMain + 0x2d (0x6bc3ed in sglang::scheduler_PP1_ATTN_CP7_TP7)
frame #26: <unknown function> + 0x2a1ca (0x7fd3a16b61ca in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #27: __libc_start_main + 0x8b (0x7fd3a16b628b in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #28: _start + 0x25 (0x6576c5 in sglang::scheduler_PP1_ATTN_CP7_TP7)
Fatal Python error: Aborted
Python: 3.12.3 (main, Mar 3 2026, 12:15:18) [GCC 13.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H100 80GB HBM3
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 570.124.06
PyTorch: 2.9.1+cu129
sglang: 0.5.10
sglang-kernel: 0.4.1
flashinfer_python: 0.6.7.post2
flashinfer_cubin: 0.6.7.post2
flashinfer_jit_cache: 0.6.7.post2+cu129
triton: 3.5.1
transformers: 5.3.0
torchao: 0.9.0
numpy: 2.3.5
aiohttp: 3.13.5
fastapi: 0.135.3
huggingface_hub: 1.9.0
interegular: 0.3.3
modelscope: 1.35.3
orjson: 3.11.8
outlines: 0.1.11
packaging: 26.0
psutil: 7.2.2
pydantic: 2.12.5
python-multipart: 0.0.22
pyzmq: 27.1.0
uvicorn: 0.43.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.32
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.89.0
litellm: Module Not Found
torchcodec: 0.9.1
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 PIX PIX NODE NODE SYS SYS SYS SYS 0-47,96-143 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 NODE NODE NODE NODE SYS SYS SYS SYS 0-47,96-143 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 NODE NODE PIX PIX SYS SYS SYS SYS 0-47,96-143 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 NODE NODE NODE NODE SYS SYS SYS SYS 0-47,96-143 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS SYS SYS PIX PIX NODE NODE 48-95,144-191 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS SYS SYS NODE NODE NODE NODE 48-95,144-191 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS SYS SYS NODE NODE PIX PIX 48-95,144-191 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS SYS SYS NODE NODE NODE NODE 48-95,144-191 1 N/A
NIC0 PIX NODE NODE NODE SYS SYS SYS SYS X PIX NODE NODE SYS SYS SYS SYS
NIC1 PIX NODE NODE NODE SYS SYS SYS SYS PIX X NODE NODE SYS SYS SYS SYS
NIC2 NODE NODE PIX NODE SYS SYS SYS SYS NODE NODE X PIX SYS SYS SYS SYS
NIC3 NODE NODE PIX NODE SYS SYS SYS SYS NODE NODE PIX X SYS SYS SYS SYS
NIC4 SYS SYS SYS SYS PIX NODE NODE NODE SYS SYS SYS SYS X PIX NODE NODE
NIC5 SYS SYS SYS SYS PIX NODE NODE NODE SYS SYS SYS SYS PIX X NODE NODE
NIC6 SYS SYS SYS SYS NODE NODE PIX NODE SYS SYS SYS SYS NODE NODE X PIX
NIC7 SYS SYS SYS SYS NODE NODE PIX NODE SYS SYS SYS SYS NODE NODE PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
NIC7: mlx5_7
Checklist
Describe the bug
Deploying kimi-k2.5 with context parallelism will result in an error.
Reproduction
start command
Environment