[TransferEngine] Tcp Transport supporting vram data transfer (#602) by alogfans · Pull Request #609 · kvcache-ai/Mooncake

alogfans · 2025-07-10T01:57:37Z

This addresses issue #602.

…-ai#602)

ShangmingCai

LGTM

ZhenshengWu · 2025-07-15T02:33:08Z

The header file cuda_runtime.h is missing.

ShangmingCai · 2025-07-15T02:45:12Z

@ZhenshengWu Do you set USE_CUDA=1? Also, you need to include your local cuda header file in your library path.

ZhenshengWu · 2025-07-15T02:54:01Z

@ZhenshengWu Do you set USE_CUDA=1? Also, you need to include your local cuda header file in your library path.

Yes, I set USE_CUDA, but during compilation it reported that the header file was missing.

So I added #include <cuda_runtime.h>, and after that the compilation succeeded. The test case also ran correctly, and the data being transferred was GPU data.

ShangmingCai · 2025-07-15T02:56:51Z

@alogfans please check the above feedback.

alogfans · 2025-07-15T07:30:48Z

@ZhenshengWu I have fixed this problem.

ShangmingCai · 2025-07-15T07:45:00Z

@ZhenshengWu Can you check whether this is feasible for your sglang e2e tests?

ZhenshengWu · 2025-07-15T07:55:48Z

@ZhenshengWu Can you check whether this is feasible for your sglang e2e tests?

Yes, I have already done a complete test, but I found a nearly reproducible bug that causes the prefill node to core dump. My input is 5120, output is 128, with max-concurrency set to 2. Below is the error log. From [lts-4090:12961:0:14725] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xb0), I suspect this may be related to the release of the cache buffer

If needed, I can provide more testing details. The version of sglang I’m using is 0.4.7, and I will try the latest branch code later.

[2025-07-15 07:47:56 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 2945.74
[2025-07-15 07:47:58] INFO:     172.16.16.63:48076 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 07:47:58 TP0] Prefill batch. #new-seq: 1, #new-token: 5108, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 3040.07
I0715 07:47:59.415562 14106 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 120.18 MB/s (over last 5s)
I0715 07:47:59.422766 14116 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 121.18 MB/s (over last 5s)
I0715 07:48:04.416076 14106 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 26.49 MB/s (over last 5s)
I0715 07:48:04.423153 14116 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 24.98 MB/s (over last 5s)
[2025-07-15 07:48:05] INFO:     172.16.16.63:48088 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 07:48:05 TP0] Prefill batch. #new-seq: 1, #new-token: 5105, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 730.52
[lts-4090:12961:0:14725] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xb0)
Fatal Python error: Segmentation fault

Thread 0x00007fb209fff640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/transfer_engine.py", line 75 in transfer_sync
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 259 in process_layer
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007fb24bfff640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/transfer_engine.py", line 75 in transfer_sync
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 259 in process_layer
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb26ffff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb283fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 245 in as_completed
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 276 in send_kvcache
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 359 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb28ffff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/common/utils.py", line 24 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 319 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb29bfff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/common/utils.py", line 24 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 319 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb2a7fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/common/utils.py", line 24 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 319 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb2b3fff640 (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/zmq/sugar/socket.py", line 799 in recv_multipart
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 437 in bootstrap_thread
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb2ebfff640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/managers/scheduler.py", line 1967 in watchdog_thread
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb2f5fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/usr/lib/python3.10/queue.py", line 171 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 139 in ==== backtrace (tid:  14725) ====
 0  /usr/lib/x86_64-linux-gnu/libucs.so.0(ucs_handle_error+0x2e4) [0x7fbca40c2fc4]
forward_thread_func_ 1  /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x24fec) [0x7fbca40c6fec]

 2  /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x251aa) [0x7fbca40c71aa]
  File  3  /usr/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_GetDataRelBase+0x4) [0x7fbf68986984]
 4  /usr/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x189) [0x7fbf68a3fa49]
 5  /usr/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7fbab4029fe9]
 6  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x89e38) [0x7fbaa19f7e38]
 7  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x12cf) [0x7fbaa19f13cf]
"/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py 8  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7fbaa19f1cc3]
" 9  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7fbaa1a0663e]
, line 10  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b5b5) [0x7fbaa19b95b5]
11611  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7979e) [0x7fbaa19e779e]
 in 12  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6a570) [0x7fbaa19d8570]
decorate_context13  sglang::scheduler_TP1(+0x18ae12) [0x5566851e2e12]

14  sglang::scheduler_TP1(_PyObject_MakeTpCall+0x25b) [0x5566851d975b]
  File 15  sglang::scheduler_TP1(+0x198a6b) [0x5566851f0a6b]
"16  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x5642) [0x5566851d28b2]
/home/sglang_0.4.7_dea8aa7/python/sglang/srt/managers/tp_worker_overlap_thread.py17  sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x5566851e366c]
"18  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x5566851cda74]
, line 19  sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x5566851e366c]
12720  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x2a83) [0x5566851cfcf3]
 in 21  sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x5566851e366c]
forward_thread_func22  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x5566851cda74]

23  sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x5566851e366c]
  File 24  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x2a83) [0x5566851cfcf3]
"25  sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x5566851e366c]
/usr/lib/python3.10/threading.py26  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x5566851cda74]
"27  sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x5566851e366c]
, line 28  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x5566851cda74]
95329  sglang::scheduler_TP1(+0x1989f1) [0x5566851f09f1]
 in 30  sglang::scheduler_TP1(+0x2acfca) [0x556685304fca]
run31  sglang::scheduler_TP1(+0x2a28e8) [0x5566852fa8e8]

32  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fbf6b0d2ac3]
  File 33  /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fbf6b163a04]
"=================================
/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb495fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
I0715 07:48:09.423519 14116 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 25.45 MB/s (over last 5s)
[lts-4090:12960:0:14726] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xb0)
Fatal Python error: Segmentation fault

Thread 0x00007fb701fff640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/transfer_engine.py", line 75 in transfer_sync
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 259 in process_layer
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007fb73fffe640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/transfer_engine.py", line 75 in transfer_sync
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 259 in process_layer
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb767fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb77bfff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 245 in as_completed
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 276 in send_kvcache
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 359 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb787fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/common/utils.py", line 24 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 319 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb793fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/common/utils.py", line 24 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 319 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb79ffff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/common/utils.py", line 24 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 319 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb7abfff640 (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/zmq/sugar/socket.py", line 799 in recv_multipart
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 437 in bootstrap_thread
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb863fff640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/managers/scheduler.py", line 1967 in watchdog_thread
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb86dfff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/usr/lib/python3.10/queue.py", line 171 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 139 in forward_thread_func_
==== backtrace (tid:  14726) ====
  File  0  /usr/lib/x86_64-linux-gnu/libucs.so.0(ucs_handle_error+0x2e4) [0x7fc2700a4fc4]
" 1  /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x24fec) [0x7fc2700a8fec]
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py 2  /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x251aa) [0x7fc2700a91aa]
" 3  /usr/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_GetDataRelBase+0x4) [0x7fc4fa25a984]
, line  4  /usr/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x189) [0x7fc4fa313a49]
116 5  /usr/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7fc02801dfe9]
 6  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x89e38) [0x7fc0152f7e38]
 in  7  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x12cf) [0x7fc0152f13cf]
decorate_context 8  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7fc0152f1cc3]

 9  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7fc01530663e]
  File 10  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b5b5) [0x7fc0152b95b5]
"11  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7979e) [0x7fc0152e779e]
/home/sglang_0.4.7_dea8aa7/python/sglang/srt/managers/tp_worker_overlap_thread.py12  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6a570) [0x7fc0152d8570]
"13  sglang::scheduler_TP0(+0x18ae12) [0x56488f9c4e12]
, line 14  sglang::scheduler_TP0(_PyObject_MakeTpCall+0x25b) [0x56488f9bb75b]
12715  sglang::scheduler_TP0(+0x198a6b) [0x56488f9d2a6b]
 in 16  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x5642) [0x56488f9b48b2]
forward_thread_func17  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56488f9c566c]

18  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56488f9afa74]
19  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56488f9c566c]
20  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x2a83) [0x56488f9b1cf3]
  File 21  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56488f9c566c]
"22  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56488f9afa74]
/usr/lib/python3.10/threading.py23  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56488f9c566c]
", line 24  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x2a83) [0x56488f9b1cf3]
95325  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56488f9c566c]
 in 26  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56488f9afa74]
run27  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56488f9c566c]

28  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56488f9afa74]
  File 29  sglang::scheduler_TP0(+0x1989f1) [0x56488f9d29f1]
"30  sglang::scheduler_TP0(+0x2acfca) [0x56488fae6fca]
/usr/lib/python3.10/threading.py31  sglang::scheduler_TP0(+0x2a28e8) [0x56488fadc8e8]
"32  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fc4fc9a6ac3]
, line 33  /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fc4fca37a04]
1016=================================
 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fba0ffff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
[2025-07-15 07:48:39] Child process unexpectedly failed with exitcode=139. pid=12961
[2025-07-15 07:48:39] Child process unexpectedly failed with exitcode=139. pid=12960

alogfans · 2025-07-15T09:01:50Z

I don't know what's the actual reason, because cudaMemcpy doesn't cause sigfault, and other modifications are consistent with previous versions. BTW, I have added the support of dumping backtrace logs in C++ part. You can repeat it using the latest whl package.

ZhenshengWu · 2025-07-15T09:29:35Z

[2025-07-15 09:19:55] INFO:     172.16.16.63:57890 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 09:19:55 TP0] Prefill batch. #new-seq: 1, #new-token: 127, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 1228.97
I0715 09:19:55.802920 17460 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 5.26 MB/s (over last 5s)
I0715 09:19:55.803395 17466 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 5.32 MB/s (over last 5s)
I0715 09:20:00.803344 17460 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 1.53 MB/s (over last 5s)
I0715 09:20:00.803800 17466 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 1.36 MB/s (over last 5s)
[2025-07-15 09:20:01] INFO:     172.16.16.63:47096 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 09:20:01 TP0] Prefill batch. #new-seq: 1, #new-token: 128, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 21.73
[2025-07-15 09:20:02] INFO:     172.16.16.63:47102 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 09:20:02 TP0] Prefill batch. #new-seq: 1, #new-token: 122, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 80.89
[2025-07-15 09:20:03] INFO:     172.16.16.63:47104 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 09:20:03 TP0] Prefill batch. #new-seq: 1, #new-token: 128, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 242.79
I0715 09:20:05.803707 17460 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 2.21 MB/s (over last 5s)
I0715 09:20:05.804239 17466 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 2.40 MB/s (over last 5s)
[2025-07-15 09:20:09] INFO:     172.16.16.63:51718 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 09:20:09 TP0] Prefill batch. #new-seq: 1, #new-token: 128, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 21.91
[2025-07-15 09:20:10] INFO:     172.16.16.63:51720 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 09:20:10 TP0] Prefill batch. #new-seq: 1, #new-token: 125, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 143.14
[lts-4090:16314:0:18012] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid:  18012) ====
 0  /usr/lib/x86_64-linux-gnu/libucs.so.0(ucs_handle_error+0x2e4) [0x7f9854ea5fc4]
 1  /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x24fec) [0x7f9854ea9fec]
 2  /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x251aa) [0x7f9854eaa1aa]
 3  /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad550) [0x7f9ac29f6550]
 4  /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad8b1) [0x7f9ac29f68b1]
 5  /usr/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x107) [0x7f9ac29f69c7]
 6  /usr/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7f9600019fe9]
 7  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x89e38) [0x7f95dd9f7e38]
 8  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x12cf) [0x7f95dd9f13cf]
 9  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7f95dd9f1cc3]
10  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7f95dda0663e]
11  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b5b5) [0x7f95dd9b95b5]
12  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7979e) [0x7f95dd9e779e]
13  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6a570) [0x7f95dd9d8570]
14  sglang::scheduler_TP0(+0x18ae12) [0x56412e960e12]
15  sglang::scheduler_TP0(_PyObject_MakeTpCall+0x25b) [0x56412e95775b]
16  sglang::scheduler_TP0(+0x198a6b) [0x56412e96ea6b]
17  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x5642) [0x56412e9508b2]
18  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56412e96166c]
19  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56412e94ba74]
20  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56412e96166c]
21  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x2a83) [0x56412e94dcf3]
22  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56412e96166c]
23  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56412e94ba74]
24  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56412e96166c]
25  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x2a83) [0x56412e94dcf3]
26  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56412e96166c]
27  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56412e94ba74]
28  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56412e96166c]
29  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56412e94ba74]
30  sglang::scheduler_TP0(+0x1989f1) [0x56412e96e9f1]
31  sglang::scheduler_TP0(+0x2acfca) [0x56412ea82fca]
32  sglang::scheduler_TP0(+0x2a28e8) [0x56412ea788e8]
33  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f9ac5089ac3]
34  /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7f9ac511aa04]
=================================
Fatal Python error: Segmentation fault

Current thread 0x00007f8cdbfff640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/transfer_engine.py", line 75 in transfer_sync
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 259 in process_layer
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

alogfans · 2025-07-16T02:21:11Z

@ZhenshengWu You can try the latest patch.

ZhenshengWu · 2025-07-16T11:50:05Z

@ZhenshengWu You can try the latest patch.

This fix doesn’t seem to work; the same error still occurs.

ZhenshengWu · 2025-07-17T02:12:45Z

If needed, I can provide you with my test machine and environment to help reproduce the issue. As of now, based on this fix（[Fix coredump problem due to slice allocation failed]）, the stack error from slice no longer appears.

ShangmingCai · 2025-07-17T03:09:12Z

+#ifdef USE_CUDA
+#include <cuda.h>
+#include <cuda_runtime.h>
+#endif


It requires USE_CUDA, maybe the release pkg is not sufficient to get the job done since it hasn't been compiled with cuda. cc: @xiaguan

Let me try to fix the use_cuda issue in CI.

ShangmingCai

Do we need a env var like MC_FORCE_MNNVL, or the E2E will use RDMA first?

alogfans · 2025-07-18T01:54:00Z

I'm puzzled why the UCX libucs.so is present in the backtrace. Mooncake doesn't rely on this currently.

alogfans · 2025-07-24T02:51:15Z

Do we need a env var like MC_FORCE_MNNVL, or the E2E will use RDMA first?

On current implementation, we have a env var MC_FORCE_MNNVL.

ShangmingCai · 2025-07-24T02:57:50Z

Do we need a env var like MC_FORCE_MNNVL, or the E2E will use RDMA first?

On current implementation, we have a env var MC_FORCE_MNNVL.

@alogfans What I really mean is that should we have an env var MC_FORCE_TCP, in case users want to use TCP for transport even if they have RDMA, so that they can use RDMA for EP, and transfer KVCache through TCP.

ShangmingCai · 2025-07-24T03:33:47Z

@ZhenshengWu Do you have time to verify this PR? We are about to release v0.3.5, just wondering whether we should involve this PR.

ZhenshengWu · 2025-07-24T06:13:13Z

@ZhenshengWu Do you have time to verify this PR? We are about to release v0.3.5, just wondering whether we should involve this PR.

I will try to verify this PR end to end

ZhenshengWu · 2025-07-24T07:33:26Z

@ZhenshengWu Do you have time to verify this PR? We are about to release v0.3.5, just wondering whether we should involve this PR.

I tried the latest code, but errors still occur during stress testing, and almost always at a fixed stage of the test.

I0724 06:38:23.078465 50958 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 190.17 MB/s (over last 5s)
I0724 06:38:23.098349 50962 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 189.18 MB/s (over last 5s)
[2025-07-24 06:38:23] INFO:     172.16.16.63:55310 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:23 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 2857.05
[2025-07-24 06:38:23] INFO:     172.16.16.63:55326 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:23 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 18350.93
[2025-07-24 06:38:25] INFO:     172.16.16.63:55340 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:25 TP0] Prefill batch. #new-seq: 1, #new-token: 5116, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 2831.76
[2025-07-24 06:38:25] INFO:     172.16.16.63:55354 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:25 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 18825.14
[2025-07-24 06:38:27] INFO:     172.16.16.63:55356 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:27 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 2814.35
[2025-07-24 06:38:27] INFO:     172.16.16.63:47712 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:27 TP0] Prefill batch. #new-seq: 1, #new-token: 5114, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 18786.64
I0724 06:38:28.078872 50958 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 272.80 MB/s (over last 5s)
I0724 06:38:28.098749 50962 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 280.87 MB/s (over last 5s)
[2025-07-24 06:38:29] INFO:     172.16.16.63:47722 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:29 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 2785.64
[2025-07-24 06:38:29] INFO:     172.16.16.63:47728 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:29 TP0] Prefill batch. #new-seq: 1, #new-token: 5100, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 18812.22
[2025-07-24 06:38:31] INFO:     172.16.16.63:47738 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:31 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 3072.73
I0724 06:38:33.079277 50958 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 130.36 MB/s (over last 5s)
I0724 06:38:33.099143 50962 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 122.99 MB/s (over last 5s)
[2025-07-24 06:38:33] INFO:     172.16.16.63:47754 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:33 TP0] Prefill batch. #new-seq: 1, #new-token: 5108, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 3050.47
I0724 06:38:38.079701 50958 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 35.97 MB/s (over last 5s)
I0724 06:38:38.099504 50962 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 35.67 MB/s (over last 5s)
[2025-07-24 06:38:38] INFO:     172.16.16.63:60296 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:38 TP0] Prefill batch. #new-seq: 1, #new-token: 5105, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 968.96
Received signal 11
Backtrace (C++):
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake15print_backtraceEv+0x36) [0x7fce89bfeeb6]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14signal_handlerEi+0x2c) [0x7fce89bfefac]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fd353435520]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad550) [0x7fd350df4550]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad8b1) [0x7fd350df48b1]
/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x107) [0x7fd350df49c7]
/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7fce89b23fe9]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x8afa8) [0x7fce89bf3fa8]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x13d3) [0x7fce89bed553]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7fce89bede33]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7fce89c0295e]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b702) [0x7fce89bb4702]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7a7be) [0x7fce89be37be]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6aa90) [0x7fce89bd3a90]
sglang::scheduler_TP1(+0x18ae12) [0x562c371b1e12]
sglang::scheduler_TP1(_PyObject_MakeTpCall+0x25b) [0x562c371a875b]
sglang::scheduler_TP1(+0x198a6b) [0x562c371bfa6b]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x5642) [0x562c371a18b2]
sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x562c371b266c]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x562c3719ca74]
sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x562c371b266c]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x2a83) [0x562c3719ecf3]
sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x562c371b266c]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x562c3719ca74]
sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x562c371b266c]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x2a83) [0x562c3719ecf3]
sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x562c371b266c]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x562c3719ca74]
sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x562c371b266c]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x562c3719ca74]
sglang::scheduler_TP1(+0x1989f1) [0x562c371bf9f1]
sglang::scheduler_TP1(+0x2acfca) [0x562c372d3fca]
sglang::scheduler_TP1(+0x2a28e8) [0x562c372c98e8]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fd353487ac3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fd353518a04]
Received signal 6
terminate called after throwing an instance of 'std::system_error'
Backtrace (C++):
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake15print_backtraceEv+0x36) [0x7f56ec481eb6]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14signal_handlerEi+0x2c) [0x7f56ec481fac]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f5bc1ece520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7f5bc1f229fc]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7f5bc1ece476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3) [0x7f5bc1eb47f3]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2692) [0x7f5bbf882692]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad89f) [0x7f5bbf88d89f]
/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x107) [0x7f5bbf88d9c7]
/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7f5714019fe9]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x8afa8) [0x7f56ec476fa8]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x13d3) [0x7f56ec470553]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7f56ec470e33]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7f56ec48595e]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b702) [0x7f56ec437702]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7a7be) [0x7f56ec4667be]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6aa90) [0x7f56ec456a90]
sglang::scheduler_TP0(+0x18ae12) [0x5599b7068e12]
sglang::scheduler_TP0(_PyObject_MakeTpCall+0x25b) [0x5599b705f75b]
sglang::scheduler_TP0(+0x198a6b) [0x5599b7076a6b]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x5642) [0x5599b70588b2]
sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x5599b706966c]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x5599b7053a74]
sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x5599b706966c]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x2a83) [0x5599b7055cf3]
sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x5599b706966c]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x5599b7053a74]
sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x5599b706966c]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x2a83) [0x5599b7055cf3]
sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x5599b706966c]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x5599b7053a74]
sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x5599b706966c]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x5599b7053a74]
sglang::scheduler_TP0(+0x1989f1) [0x5599b70769f1]
sglang::scheduler_TP0(+0x2acfca) [0x5599b718afca]
sglang::scheduler_TP0(+0x2a28e8) [0x5599b71808e8]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f5bc1f20ac3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7f5bc1fb1a04]
[2025-07-24 06:38:41] Child process unexpectedly failed with exitcode=256. pid=48836
[2025-07-24 06:38:41] Child process unexpectedly failed with exitcode=256. pid=48835

CMD

#P
CUDA_VISIBLE_DEVICES=0,1 MC_TE_METRIC=true SGLANG_TBO_DEBUG=1 python3 -m sglang.launch_server --model-path /models/Qwen/Qwen3-30B-A3B-FP8/ --disaggregation-mode prefill --dist-init-addr  172.16.16.63:9000 --nnodes 1 --node-rank 0 --tp-size 2 --decode-log-interval 1  --page-size 1 --host  172.16.16.63 --trust-remote-code  --disable-radix-cache --watchdog-timeout 1000000  --mem-fraction-static 0.85 --chunked-prefill-size 8192  --enable-metrics --enable-p2p-check --attention-backend fa3 --reasoning-parser qwen3  --port 9001

#D
CUDA_VISIBLE_DEVICES=2,3 SGLANG_TBO_DEBUG=1  python3 -m sglang.launch_server --model-path  /models/Qwen/Qwen3-30B-A3B-FP8/ --disaggregation-mode decode  --dist-init-addr  172.16.16.63:9991 --nnodes 1 --node-rank 0 --tp-size 2 --decode-log-interval 1 --page-size 1 --host   172.16.16.63 --trust-remote-code  --disable-radix-cache --watchdog-timeout 1000000  --mem-fraction-static 0.85 --chunked-prefill-size 8192  --enable-metrics --enable-p2p-check --attention-backend fa3 --port 9002


#test 
python3 -m sglang.bench_serving --backend sglang   --num-prompts 512 --random-input-len 5120 --random-output-len  128 --dataset-name random --dataset-path  /datasets/ShareGPT_V3_unfiltered_cleaned_split.json --seed 42 --host 0.0.0.0 --port 8001 --random-range-ratio 1.0 --max-concurrency 2

I noticed something unusual — the length of the KVCache data being transmitted seems off right before and after the coredump. Below is a comparison between the failing stress test and a normal one.

Additionally, this week we attempted an adaptation on the sglang side. Without modifying MoonCake’s code, we perform D to H transfers on the P side of sglang, host-to-device (H to D) transfers on the D side and transfer kv by mooncake-tcp. We've already implemented this, and single curl requests work fine and return correct results. However, we still encounter coredumps during stress testing.

At this point, I’m still unsure whether the issue is introduced by sglang or on the MoonCake side.

ZhenshengWu · 2025-07-24T09:14:47Z

I tested our DtoH and HtoD implementation with the latest sglang code, and it seems that the error no longer occurs. I will soon run tests combining the latest versions of MoonCake-tcp-vram and sglang. The latest test results will be available by tomorrow morning at the latest.

ZhenshengWu · 2025-07-24T11:44:23Z

I tested our DtoH and HtoD implementation with the latest sglang code, and it seems that the error no longer occurs. I will soon run tests combining the latest versions of MoonCake-tcp-vram and sglang. The latest test results will be available by tomorrow morning at the latest.

Switching to the latest version of sglang still results in the same error. However, the difference is that the number of completed prompts before the error occurs has increased from 23 to 322.

And the adaptation we made on the sglang side—specifically the way DtoH and HtoD are handled—exhibits the same issue. I suspect that under high pressure, when each transfer only contains 512 bytes, the error is likely triggered in mooncake, possibly in:

TcpTransport::startTransfer(Slice *slice){
    .....
    .....
    asio::connect(socket, endpoint_iterator);

}

Of course, this is just my current hypothesis. And I believe this might be an issue that has existed for a long time, not something introduced by this PR. This PR LGTM

yangelaboy · 2025-07-24T13:24:00Z

I have same problem even if I install mooncake with branch alogfans:tcp-vram

#P
CUDA_VISIBLE_DEVICES=3 MC_FORCE_TCP=1 MC_TE_METRIC=1 SGLANG_TBO_DEBUG=1 python3 -m sglang.launch_server --model-path /models/Qwen2-VL-2B-Instruct --disaggregation-mode prefill --nnodes 1 --node-rank 0 --tp-size 1 --decode-log-interval 1  --page-size 1 --host  0.0.0.0 --trust-remote-code  --disable-radix-cache --watchdog-timeout 1000000  --mem-fraction-static 0.85 --chunked-prefill-size 8192  --enable-metrics --enable-p2p-check --attention-backend torch_native  --port 9001 &

#D
CUDA_VISIBLE_DEVICES=4 MC_FORCE_TCP=1 MC_TE_METRIC=1 SGLANG_TBO_DEBUG=1 python3 -m sglang.launch_server --model-path /models/Qwen2-VL-2B-Instruct --disaggregation-mode decode --nnodes 1 --node-rank 0 --tp-size 1 --decode-log-interval 1 --page-size 1 --host 0.0.0.0 --trust-remote-code  --disable-radix-cache --watchdog-timeout 1000000  --mem-fraction-static 0.85 --chunked-prefill-size 8192  --enable-metrics --enable-p2p-check --attention-backend torch_native --port 9002 &

#router

python3 -m sglang.srt.disaggregation.mini_lb --prefill http://0.0.0.0:9001 --decode http://0.0.0.0:9002 --host 0.0.0.0 --port 8000 &

error is as follow:

ZhenshengWu · 2025-07-25T00:56:37Z

error is as follow:

@yangelaboy It seems that mlx5_bond0 may not actually be an InfiniBand (IB) card, but Mooncake mistakenly recognizes it as one. As a result, it defaults to using the RDMA protocol instead of the TCP protocol. You should set the MC_FORCE_TCP environment variable to force the use of the TCP protocol and then continue testing.
beb3230

yangelaboy · 2025-07-25T02:24:15Z

@ZhenshengWu I have set MC_FORCE_TCP=1， but it seems not work。I try to disable rdma and try again。

ZhenshengWu · 2025-07-25T02:59:37Z

@yangelaboy

"To test this, I think you can modify the code here （transfer_engine.cpp）and recompile mooncake to force it to use the TCP protocol directly."

ZhenshengWu · 2025-07-25T03:21:57Z

I’m fairly certain the issue lies in the TCP transmission on the MoonCake side, and I’ve found a minimal reproducible case. I mocked a P service and a D service, then used a curl request to send the CPU address registered in D to P. On the P side, I directly simulated the KVCache transmission and modified the byte length of the KV being sent. We found that when the length is set to 512 bytes, the error consistently occurs.

When i set 630784. It's ok!

mock_p.py

from fastapi import FastAPI, APIRouter, Body
from pydantic import BaseModel
from typing import List
import struct
from transfer_engine import MooncakeTransferEngine  # 假设这是你的模块
import numpy as np
import ctypes
from concurrent.futures import ThreadPoolExecutor
from loguru import logger as logger_my

import torch
class InputIPPort(BaseModel):
    ip: str
    port: int


class KVManager:
    def __init__(self):
        self.kv_data_ptrs: List[int] = []


class MyAPI:
    def __init__(self, hostname: str, gpu_id: int, num_buffers: int, mem_type: str = "cpu"):
        self.app = FastAPI()
        self.num_buffers = num_buffers
        self.mem_type = mem_type
        self.router = APIRouter()
        self.kv_mgr = KVManager()
        self.setup_routes()
        self.app.include_router(self.router)
        self.engine = MooncakeTransferEngine(hostname, gpu_id, "", )  
        self.register_kv()
        self.d_seesion_id = None

    def setup_routes(self):
        @self.router.put("/set_ip_port")
        async def set_ip_port(data: InputIPPort):
            # 这里可以调用 self.engine 来设置通信对端
            self.d_seesion_id = data.ip + ":" + str(data.port)
            logger_my.info("self.d_seesion_id:{}".format(self.d_seesion_id))
            return {"message": f"Received IP: {data.ip}, Port: {data.port}"}

        @self.router.put("/register_kv_ptrs")
        async def upload_kv_ptrs(raw_data: bytes = Body(...)):
            num_ptrs = len(raw_data) // 8
            fmt = f"{num_ptrs}Q"  # Q = unsigned long long (8 bytes)
            raw_ptrs = struct.unpack(fmt, raw_data)

            # 显式转换为 Python int 类型
            logger_my.info("raw_ptrs:{}".format(raw_ptrs))
            self.kv_mgr.kv_data_ptrs = [int(ptr) for ptr in raw_ptrs]

            return {
                "message": f"Received {num_ptrs} pointers",
                "ptrs": self.kv_mgr.kv_data_ptrs  
            }

        @self.router.post("/send_kv")
        async def trigger_send_kv():
            self.send_kv()
            return {"message": "KV buffers sent from local to remote successfully."}


    def register_kv(self):
        buf_size = 44436787  

        self.kv_buffers = []
        self.kv_data_ptrs = []

        for _ in range(self.num_buffers):
            if self.mem_type == "cpu":
                data = np.random.randint(0, 256, size=buf_size, dtype=np.uint8)
                buf = (ctypes.c_ubyte * buf_size)()
                ctypes.memmove(buf, data.ctypes.data, buf_size)
                ptr = ctypes.cast(buf, ctypes.c_void_p).value
                logger_my.info("ptr:{}  length:{}".format(ptr, buf_size))
                self.kv_buffers.append(buf)
                self.kv_data_ptrs.append(ptr)
            elif self.mem_type == "gpu":
                buf = torch.randint(0, 256, (buf_size,), dtype=torch.uint8, device='cuda')
                ptr = buf.data_ptr()  # 获取 GPU 设备指针
                logger_my.info("GPU ptr:{}  length:{}".format(ptr, buf_size))
                self.kv_buffers.append(buf)
                self.kv_data_ptrs.append(ptr)

    def send_kv(self):
        if len(self.kv_data_ptrs) != len(self.kv_mgr.kv_data_ptrs):
            raise ValueError("Local and remote KV pointer lists have different lengths.")

        buf_size = 444367872
        chunk_size = 512
        total_chunks = buf_size // chunk_size

        tasks = []

        with ThreadPoolExecutor(max_workers=32) as executor:
            for local_ptr, remote_ptr in zip(self.kv_data_ptrs, self.kv_mgr.kv_data_ptrs):
                for i in range(total_chunks):
                    offset = i * chunk_size
                    src = local_ptr + offset
                    dest = remote_ptr + offset

                    # 提交任务
                    tasks.append(executor.submit(self.send, src, dest, chunk_size))

        # 可选：等待所有任务完成
        for task in tasks:
            task.result()

        print("All KV transfers completed.")

    def send(self, src, dest, length=512):
        logger_my.info("src:{}, dest:{}, length:{}".format(src, dest, length))
        self.engine.transfer_sync(self.d_seesion_id, src, dest, length)

# 例如在 main.py 或启动脚本中
if __name__ == "__main__":
    import uvicorn
    import sys

    # 从命令行读取 hostname 和 gpu_id，默认值可修改
    hostname = sys.argv[1] if len(sys.argv) > 1 else "127.0.0.1"
    gpu_id = int(sys.argv[2]) if len(sys.argv) > 2 else 0
    port = int(sys.argv[3]) if len(sys.argv) > 3 else 8001
    num_buffers  = int(sys.argv[4]) if len(sys.argv) > 4 else 16
    mem_type = str(sys.argv[5]) if len(sys.argv) > 5 else "cpu"

    client = MyAPI(hostname, gpu_id, num_buffers, mem_type)
    uvicorn.run(client.app, host="0.0.0.0", port=port)

mock_d.py

import numpy as np
import ctypes
import struct
import requests
from fastapi import FastAPI, APIRouter, Body
from pydantic import BaseModel
from typing import List
from transfer_engine import MooncakeTransferEngine

import torch
from loguru import  logger as logger_my

class InputIPPort(BaseModel):
    ip: str
    port: int


class ClientAPI:
    def __init__(self, hostname: str, gpu_id: int, num_buffers, mem_type: str = "cpu"):
        self.app = FastAPI()
        self.num_buffer = num_buffers
        self.mem_type = mem_type
        self.router = APIRouter()
        self.kv_buffers: List[ctypes.Array] = []
        self.kv_data_ptrs: List[int] = []
        self.remote_url: str = ""
        self.engine = MooncakeTransferEngine(hostname, gpu_id, "", )  
        self.register_kv()
        self.setup_routes()
        self.app.include_router(self.router)



    def register_kv(self):
        buf_size = 44436787  

        self.kv_buffers = []
        self.kv_data_ptrs = []

        for _ in range(self.num_buffer):

            if self.mem_type == "cpu":

                data = np.random.randint(0, 256, size=buf_size, dtype=np.uint8)
                buf = (ctypes.c_ubyte * buf_size)()
                ctypes.memmove(buf, data.ctypes.data, buf_size)
                ptr = ctypes.cast(buf, ctypes.c_void_p).value
                logger_my.info("ptr:{}  length:{}".format(ptr, buf_size))
                self.kv_buffers.append(buf)
                self.kv_data_ptrs.append(ptr)
            elif self.mem_type == "gpu":
                buf = torch.randint(0, 256, (buf_size,), dtype=torch.uint8, device='cuda')
                ptr = buf.data_ptr()  # 获取 GPU 设备指针
                logger_my.info("GPU ptr:{}  length:{}".format(ptr, buf_size))
                self.kv_buffers.append(buf)
                self.kv_data_ptrs.append(ptr)

    def send_session_id_to_remote(self):
        if not self.remote_url:
            raise ValueError("Remote URL not set")

        session_id = self.engine.session_id  
        ip, port = session_id.split(":")
        payload = {"ip": ip, "port": int(port)}
        print(f"Sending session ID to remote: {payload}")

        resp = requests.put(f"{self.remote_url}/set_ip_port", json=payload)
        print(f"Response from remote: {resp.status_code} - {resp.text}")

    def send_kv_ptrs_to_remote(self):
        if not self.remote_url:
            raise ValueError("Remote URL not set")

        packed = b''.join(struct.pack("Q", ptr) for ptr in self.kv_data_ptrs)
        logger_my.info(packed)
        print(f"Sending {len(self.kv_data_ptrs)} KV pointers ({len(packed)} bytes)")

        resp = requests.put(f"{self.remote_url}/register_kv_ptrs", data=packed,
                            headers={"Content-Type": "application/octet-stream"})
        print(f"Response from remote: {resp.status_code} - {resp.text}")

    def setup_routes(self):
        @self.router.put("/set_remote")
        async def set_remote(data: InputIPPort):
            self.remote_url = f"http://{data.ip}:{data.port}"
            logger_my.info("remote_url:{}".format(self.remote_url))
            self.send_session_id_to_remote()

            return {"message": f"Remote URL set to {self.remote_url}"}

        @self.router.post("/send_kv_ptrs")
        async def send_kv_ptrs():
            self.send_kv_ptrs_to_remote()
            return {"message": "KV pointers sent to remote."}


# 启动服务
if __name__ == "__main__":
    import uvicorn
    import sys

    # 从命令行读取 hostname 和 gpu_id，默认值可修改
    hostname = sys.argv[1] if len(sys.argv) > 1 else "127.0.0.1"
    gpu_id = int(sys.argv[2]) if len(sys.argv) > 2 else 0
    port = int(sys.argv[3]) if len(sys.argv) > 3 else 8001
    num_buffers  = int(sys.argv[4]) if len(sys.argv) > 4 else 16
    mem_type = str(sys.argv[5]) if len(sys.argv) > 5 else "cpu"

    client = ClientAPI(hostname, gpu_id, num_buffers, mem_type)
    uvicorn.run(client.app, host="0.0.0.0", port=port)

transfer_engine.py is copied from sglang

import json
import logging
from dataclasses import dataclass
from typing import Optional

logger = logging.getLogger(__name__)


class MooncakeTransferEngine:

    def __init__(self, hostname: str, gpu_id: int, ib_device: Optional[str] = None):
        try:
            from mooncake.engine import TransferEngine
        except ImportError as e:
            raise ImportError(
                "Please install mooncake by following the instructions at "
                "https://github.com/kvcache-ai/Mooncake/blob/main/doc/en/build.md "  # noqa: E501
                "to run SGLang with MooncakeTransferEngine."
            ) from e

        self.engine = TransferEngine()
        self.hostname = hostname
        self.gpu_id = gpu_id
        self.ib_device = ib_device

        self.initialize(
            hostname=self.hostname,
            device_name=self.ib_device,
        )
        self.session_id = f"{self.hostname}:{self.engine.get_rpc_port()}"

    def register(self, ptr, length):
        try:
            ret_value = self.engine.register_memory(ptr, length)
        except Exception:
            # Mark register as failed
            ret_value = -1

        if ret_value != 0:
            logger.debug("Mooncake memory registration %s failed.", ptr)

    def deregister(self, ptr):
        try:
            ret_value = self.engine.unregister_memory(ptr)
        except Exception:
            # Mark deregister as failed
            ret_value = -1

        if ret_value != 0:
            logger.debug("Mooncake memory deregistration %s failed.", ptr)

    def initialize(
        self,
        hostname: str,
        device_name: Optional[str],
    ) -> None:
        """Initialize the mooncake instance."""
        ret_value = self.engine.initialize(
            hostname,
            "P2PHANDSHAKE",
            "rdma",
            device_name if device_name is not None else "",
        )
        if ret_value != 0:
            logger.error("Mooncake Transfer Engine initialization failed.")
            raise RuntimeError("Mooncake Transfer Engine initialization failed.")

    def transfer_sync(
        self, session_id: str, buffer: int, peer_buffer_address: int, length: int
    ) -> int:
        """Synchronously transfer data to the specified address."""
        try:

            ret = self.engine.transfer_sync_write(
                session_id, buffer, peer_buffer_address, length
            )
        except Exception:
            ret = -1

        if ret < 0:
            logger.debug(
                "Failed to transfer data from %s to %s - %s.",
                buffer,
                session_id,
                peer_buffer_address,
            )

        return ret

    def get_session_id(self):
        return self.session_id

CMD

# p
python3 mock_p.py 172.16.16.63 0 8003 32 gpu
 
# d
python3 mock_d.py 172.16.16.63 2 8002 32 gpu


curl  -X PUT http://172.16.16.63:8002/set_remote -H "Content-Type: application/json" -d '{"ip": "172.16.16.63", "port": 8003}'
curl -X POST http://172.16.16.63:8002/send_kv_ptrs
curl -X POST http://172.16.16.63:8003/send_kv

alogfans · 2025-07-25T05:18:08Z

@ZhenshengWu Try to pull code again. I have fixed the code yesterday.

ZhenshengWu · 2025-07-25T06:09:56Z

@ZhenshengWu Try to pull code again. I have fixed the code yesterday.

@alogfans Sorry, you mean the last code is "beb3230ddd271b227bc3770b600498057aa83e51"? I test on beb3230 now

ZhenshengWu · 2025-07-28T01:44:55Z

@yangelaboy Did you test end to end again?

yangelaboy · 2025-07-28T02:20:55Z

@ZhenshengWu It's ok to start P/D instance，but theris an erorr as follow when trigger a http request:

command to start P

CUDA_VISIBLE_DEVICES=6 MC_FORCE_TCP=1 MC_TE_METRIC=true SGLANG_TBO_DEBUG=1 python3 -m sglang.launch_server --model-path /Qwen2-VL-2B-Instruct --disaggregation-mode prefill --nnodes 1 --node-rank 0 --tp-size 1 --decode-log-interval 1  --page-size 1 --host  0.0.0.0 --trust-remote-code  --disable-radix-cache --watchdog-timeout 1000000  --mem-fraction-static 0.85 --chunked-prefill-size 8192  --enable-metrics --enable-p2p-check --attention-backend torch_native --port 9001 &

command to start D

CUDA_VISIBLE_DEVICES=5 MC_FORCE_TCP=1 SGLANG_TBO_DEBUG=1  python3 -m sglang.launch_server --model-path /models/Qwen2-VL-2B-Instruct/ --disaggregation-mode decode  --nnodes 1 --node-rank 0 --tp-size 1 --decode-log-interval 1 --page-size 1 --host   0.0.0.0 --trust-remote-code  --disable-radix-cache --watchdog-timeout 1000000  --mem-fraction-static 0.85 --chunked-prefill-size 8192  --enable-metrics --enable-p2p-check --attention-backend torch_native --port 9002 &

command to start router

python3 -m sglang.srt.disaggregation.mini_lb --prefill http://0.0.0.0:9001 --decode http://0.0.0.0:9002 --host 0.0.0.0 --port 8000 &

command to trigger request

curl http://0.0.0.0:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{    "model": "uiagent",    "messages": [    {"role": "system", "content": "You are a helpful assistant."},    {"role": "user", "content": [        {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}},        {"type": "text", "text": "What is the text in the illustrate?"}    ]}    ]    }'

yangelaboy · 2025-07-28T02:33:29Z

More Info as follow：

tcp        0      0 10.38.244.193:23715     0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 0.0.0.0:8998            0.0.0.0:*               LISTEN      19906/python3       
tcp        0      0 0.0.0.0:9001            0.0.0.0:*               LISTEN      19906/python3       
tcp        0      0 0.0.0.0:9002            0.0.0.0:*               LISTEN      20701/python3       
tcp        0      0 0.0.0.0:16330           0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 10.38.244.193:31533     0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 0.0.0.0:15791           0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 0.0.0.0:16049           0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 10.38.244.193:16851     0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 10.38.244.193:24725     0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 10.38.244.193:26869     0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 0.0.0.0:16055           0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 10.38.244.193:36411     0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 10.38.244.193:64925     0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 10.38.244.193:25885     0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 10.38.244.193:20989     0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 10.38.244.193:31071     0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 0.0.0.0:8000            0.0.0.0:*               LISTEN      21489/python3       
tcp6       0      0 :::9571                 :::*                    LISTEN      20038/sglang::sched 
tcp6       0      0 :::8998                 :::*                    LISTEN      19906/python3       
tcp6       0      0 :::9533                 :::*                    LISTEN      20833/sglang::sched

Session 10.38.244.193:16330 failed.
Failed to send kv chunk of 1667001290960120928 to 10.38.244.193:64925

ZhenshengWu · 2025-07-28T02:39:36Z

@yangelaboy This still seems to be an issue with the registration information when MoonCake starts up. Would it be convenient to connect via WeChat for further discussion? If possible, could you please send your WeChat ID to my email: 1910433006@email.szu.edu.cu? Thank you!

yangelaboy · 2025-07-28T03:37:28Z

@ZhenshengWu done, please check the email

yangelaboy · 2025-07-28T04:39:34Z

@ZhenshengWu Sending email failed，please check email address: 1910433006@email.szu.edu.cu

stmatengss · 2025-07-28T05:38:56Z

@ZhenshengWu @yangelaboy Could you help us verify this PR and provide feedback? Thanks!

yangelaboy · 2025-07-28T05:59:31Z

@stmatengss I am working on it.
It seems that prefill instance transfers data to port of p2p handshake of decode instance

# prefill log 

Transfer Engine parseHostNameWithPort. server_name: 10.38.244.193 port: 15442
Transfer Engine RPC using P2P handshake, listening on 10.38.244.193:16115
TcpTransport: listen on port 16285

# decode log 
Transfer Engine parseHostNameWithPort. server_name: 10.38.244.193 port: 12001
Transfer Engine RPC using P2P handshake, listening on 10.38.244.193:15194
TcpTransport: listen on port **15793**

# prefill sending log
Register KVArgs from 10.38.244.193:15194 successfully
Failed to transfer data from 140367617853952 to 10.38.244.193:**15194**
Failed to transfer data from 140367382972928 to 10.38.244.193:**15194**
Session 10.38.244.193:**15194** failed
Failed to send kv chunk of xxx to 10.38.244.193:**53249**

ZhenshengWu · 2025-07-28T07:06:28Z

@ZhenshengWu Sending email failed，please check email address: 1910433006@email.szu.edu.cu

@yangelaboy sorry: 1910433006@email.szu.edu.cn

ZhenshengWu · 2025-07-28T07:42:21Z

@ZhenshengWu @yangelaboy Could you help us verify this PR and provide feedback? Thanks!

@stmatengss We’ve been conducting end-to-end testing for the past two weeks, and errors are still occurring.

Sglang end to end test

I0728 06:46:11.288584 145851 tcp_transport.cpp:496] Resolving 172.16.16.63:15740
I0728 06:46:11.288594 145851 tcp_transport.cpp:501] Attempting to connect to 172.16.16.63:15740
terminate called after throwing an instance of 'std::system_error'
  what():  connect: Cannot assign requested address
Received signal 6
I0728 06:46:11.289353 145853 tcp_transport.cpp:506] Successfully connected to 172.16.16.63:15740
Received signal 11
I0728 06:46:11.289366 145853 tcp_transport.cpp:521] Initiating session with source_addr: 0x7f3c6c880800, dest_addr: 140089473385472, length: 1536, opcode: 1
I0728 06:46:11.289405 144947 tcp_transport.cpp:512] Transfer completed for slice targeting: 1
I0728 06:46:11.289423 145853 tcp_transport.cpp:473] TcpTransport::startTransfer started for target_id: 1
I0728 06:46:11.289427 145853 tcp_transport.cpp:496] Resolving 172.16.16.63:15740
I0728 06:46:11.289431 145853 tcp_transport.cpp:501] Attempting to connect to 172.16.16.63:15740
Backtrace (C++):
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake15print_backtraceEv+0x36) [0x7f5e966b1336]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14signal_handlerEi+0x2c) [0x7f5e966b142c]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f648bb3b520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7f648bb8f9fc]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7f648bb3b476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3) [0x7f648bb217f3]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e) [0x7f64894efb9e]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f64894fb20c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad1e9) [0x7f64894fa1e9]
/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x99) [0x7f64894fa959]
/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7f5fc0017fe9]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x8b3e8) [0x7f5e966a63e8]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x15c5) [0x7f5e9669f905]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7f5e966a01b3]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7f5e966b4dde]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b7c2) [0x7f5e966667c2]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7a87e) [0x7f5e9669587e]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6ab50) [0x7f5e96685b50]

single test

Using the standalone simulated PD process I mentioned in my previous comment(mock_d.py， mock_p.py) to transfer kvCache. The error message is as follows:

I0728 07:30:03.210448 146717 tcp_transport.cpp:496] Resolving 172.16.16.63:15620
I0728 07:30:03.210454 146717 tcp_transport.cpp:501] Attempting to connect to 172.16.16.63:15620
Received signal 11
Received signal 11
terminate called after throwing an instance of 'std::system_error'
  what():  connect: Cannot assign requested address
Received signal 6
Received signal 11
Backtrace (C++):
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake15print_backtraceEv+0x36) [0x7fb3bcf43336]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14signal_handlerEi+0x2c) [0x7fb3bcf4342c]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fb52c210520]
/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_GetDataRelBase+0x4) [0x7fb52bcd3984]
/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x189) [0x7fb52867ca49]
/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7fb52b405fe9]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x8b3e8) [0x7fb3bcf383e8]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x15c5) [0x7fb3bcf31905]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7fb3bcf321b3]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7fb3bcf46dde]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b7c2) [0x7fb3bcef87c2]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7a87e) [0x7fb3bcf2787e]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6ab50) [0x7fb3bcf17b50]
python3(+0x18ae12) [0x55f1ed7cae12]
python3(_PyObject_MakeTpCall+0x25b) [0x55f1ed7c175b]
python3(+0x198a6b) [0x55f1ed7d8a6b]
python3(_PyEval_EvalFrameDefault+0x5642) [0x55f1ed7ba8b2]
python3(_PyFunction_Vectorcall+0x7c) [0x55f1ed7cb66c]
python3(_PyEval_EvalFrameDefault+0x804) [0x55f1ed7b5a74]
python3(+0x1988de) [0x55f1ed7d88de]
python3(_PyEval_EvalFrameDefault+0x2a83) [0x55f1ed7b7cf3]
python3(_PyFunction_Vectorcall+0x7c) [0x55f1ed7cb66c]
python3(_PyEval_EvalFrameDefault+0x804) [0x55f1ed7b5a74]
python3(_PyFunction_Vectorcall+0x7c) [0x55f1ed7cb66c]
python3(_PyEval_EvalFrameDefault+0x2a83) [0x55f1ed7b7cf3]
python3(_PyFunction_Vectorcall+0x7c) [0x55f1ed7cb66c]
python3(_PyEval_EvalFrameDefault+0x804) [0x55f1ed7b5a74]
python3(_PyFunction_Vectorcall+0x7c) [0x55f1ed7cb66c]
python3(_PyEval_EvalFrameDefault+0x804) [0x55f1ed7b5a74]
python3(+0x1989f1) [0x55f1ed7d89f1]
python3(+0x2acfca) [0x55f1ed8ecfca]
python3(+0x2a28e8) [0x55f1ed8e28e8]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fb52c262ac3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fb52c2f3a04]

If I set the length to 264000, It will work normally

stmatengss · 2025-07-30T09:46:20Z

@alogfans, could you take a look? Thanks!

stmatengss · 2025-07-30T09:48:24Z

@ZhenshengWu USE MC_LOG_LEVEL This option can be set as TRACE/INFO/WARNING/ERROR (see glog doc), and more detailed logs will be output during runtime

alogfans · 2025-08-01T06:07:46Z

This PR is deprecated. I've re-implement it and you can try to use #702 @ZhenshengWu

ZhenshengWu · 2025-08-01T07:46:26Z

This PR is deprecated. I've re-implement it and you can try to use #702 @ZhenshengWu

Get it!

ZhenshengWu · 2025-08-01T07:47:10Z

@ZhenshengWu USE MC_LOG_LEVEL This option can be set as TRACE/INFO/WARNING/ERROR (see glog doc), and more detailed logs will be output during runtime

Get it, thanks

[TransferEngine] Tcp Transport supporting vram data transfer (kvcache…

509f7a8

…-ai#602)

ZhenshengWu mentioned this pull request Jul 10, 2025

[Bug] failure on simplest PD Disaggregation with error : Failed to send kv chunk sgl-project/sglang#7118

Closed

5 tasks

alogfans requested review from ShangmingCai and doujiang24 July 14, 2025 01:55

ShangmingCai approved these changes Jul 14, 2025

View reviewed changes

Add include of cuda runtime

0aa1fd2

Add C++ backtrace logging

131cbb9

Fix coredump problem due to slice allocation failed

52e87a7

ShangmingCai mentioned this pull request Jul 17, 2025

[PD] Support KV transfer with mooncake sgl-project/sglang#4880

Merged

6 tasks

ShangmingCai reviewed Jul 17, 2025

View reviewed changes

Merge branch 'main' into tcp-vram

d71adb7

Add FORCE TCP option

beb3230

stmatengss mentioned this pull request Jul 28, 2025

[Usage]: Questions about transfer engine #680

Open

1 task

alogfans closed this Aug 1, 2025

skyliulu mentioned this pull request Oct 16, 2025

[Bug] PD disaggregated with mooncake ERROR sgl-project/sglang#7338

Closed

5 tasks

Conversation

alogfans commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

ZhenshengWu commented Jul 15, 2025

Uh oh!

ShangmingCai commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZhenshengWu commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShangmingCai commented Jul 15, 2025

Uh oh!

alogfans commented Jul 15, 2025

Uh oh!

ShangmingCai commented Jul 15, 2025

Uh oh!

ZhenshengWu commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alogfans commented Jul 15, 2025

Uh oh!

ZhenshengWu commented Jul 15, 2025

Uh oh!

alogfans commented Jul 16, 2025

Uh oh!

ZhenshengWu commented Jul 16, 2025

Uh oh!

ZhenshengWu commented Jul 17, 2025

Uh oh!

ShangmingCai Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

xiaguan Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

alogfans commented Jul 18, 2025

Uh oh!

alogfans commented Jul 24, 2025

Uh oh!

ShangmingCai commented Jul 24, 2025

Uh oh!

ShangmingCai commented Jul 24, 2025

Uh oh!

ZhenshengWu commented Jul 24, 2025

Uh oh!

ZhenshengWu commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CMD

Uh oh!

ZhenshengWu commented Jul 24, 2025

Uh oh!

ZhenshengWu commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yangelaboy commented Jul 24, 2025

Uh oh!

ZhenshengWu commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yangelaboy commented Jul 25, 2025

Uh oh!

ZhenshengWu commented Jul 25, 2025

Uh oh!

ZhenshengWu commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mock_p.py

mock_d.py

CMD

Uh oh!

alogfans commented Jul 25, 2025

Uh oh!

ZhenshengWu commented Jul 25, 2025

alogfans commented Jul 10, 2025 •

edited

Loading

ShangmingCai commented Jul 15, 2025 •

edited

Loading

ZhenshengWu commented Jul 15, 2025 •

edited

Loading

ZhenshengWu commented Jul 15, 2025 •

edited

Loading

ZhenshengWu commented Jul 24, 2025 •

edited

Loading

ZhenshengWu commented Jul 24, 2025 •

edited

Loading

ZhenshengWu commented Jul 25, 2025 •

edited

Loading

ZhenshengWu commented Jul 25, 2025 •

edited

Loading

ZhenshengWu commented Jul 28, 2025 •

edited

Loading

ZhenshengWu commented Jul 28, 2025 •

edited

Loading