Skip to content

[TransferEngine] Tcp Transport supporting vram data transfer (#602)#609

Closed
alogfans wants to merge 6 commits intokvcache-ai:mainfrom
alogfans:tcp-vram
Closed

[TransferEngine] Tcp Transport supporting vram data transfer (#602)#609
alogfans wants to merge 6 commits intokvcache-ai:mainfrom
alogfans:tcp-vram

Conversation

@alogfans
Copy link
Copy Markdown
Collaborator

@alogfans alogfans commented Jul 10, 2025

This addresses issue #602.

Copy link
Copy Markdown
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZhenshengWu
Copy link
Copy Markdown

The header file cuda_runtime.h is missing.

@ShangmingCai
Copy link
Copy Markdown
Collaborator

ShangmingCai commented Jul 15, 2025

@ZhenshengWu Do you set USE_CUDA=1? Also, you need to include your local cuda header file in your library path.

@ZhenshengWu
Copy link
Copy Markdown

ZhenshengWu commented Jul 15, 2025

@ZhenshengWu Do you set USE_CUDA=1? Also, you need to include your local cuda header file in your library path.

Yes, I set USE_CUDA, but during compilation it reported that the header file was missing.
image
So I added #include <cuda_runtime.h>, and after that the compilation succeeded. The test case also ran correctly, and the data being transferred was GPU data.
image

@ShangmingCai
Copy link
Copy Markdown
Collaborator

@alogfans please check the above feedback.

@alogfans
Copy link
Copy Markdown
Collaborator Author

@ZhenshengWu I have fixed this problem.

@ShangmingCai
Copy link
Copy Markdown
Collaborator

@ZhenshengWu Can you check whether this is feasible for your sglang e2e tests?

@ZhenshengWu
Copy link
Copy Markdown

ZhenshengWu commented Jul 15, 2025

@ZhenshengWu Can you check whether this is feasible for your sglang e2e tests?

Yes, I have already done a complete test, but I found a nearly reproducible bug that causes the prefill node to core dump. My input is 5120, output is 128, with max-concurrency set to 2. Below is the error log. From [lts-4090:12961:0:14725] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xb0), I suspect this may be related to the release of the cache buffer

If needed, I can provide more testing details. The version of sglang I’m using is 0.4.7, and I will try the latest branch code later.

[2025-07-15 07:47:56 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 2945.74
[2025-07-15 07:47:58] INFO:     172.16.16.63:48076 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 07:47:58 TP0] Prefill batch. #new-seq: 1, #new-token: 5108, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 3040.07
I0715 07:47:59.415562 14106 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 120.18 MB/s (over last 5s)
I0715 07:47:59.422766 14116 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 121.18 MB/s (over last 5s)
I0715 07:48:04.416076 14106 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 26.49 MB/s (over last 5s)
I0715 07:48:04.423153 14116 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 24.98 MB/s (over last 5s)
[2025-07-15 07:48:05] INFO:     172.16.16.63:48088 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 07:48:05 TP0] Prefill batch. #new-seq: 1, #new-token: 5105, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 730.52
[lts-4090:12961:0:14725] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xb0)
Fatal Python error: Segmentation fault

Thread 0x00007fb209fff640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/transfer_engine.py", line 75 in transfer_sync
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 259 in process_layer
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007fb24bfff640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/transfer_engine.py", line 75 in transfer_sync
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 259 in process_layer
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb26ffff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb283fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 245 in as_completed
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 276 in send_kvcache
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 359 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb28ffff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/common/utils.py", line 24 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 319 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb29bfff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/common/utils.py", line 24 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 319 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb2a7fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/common/utils.py", line 24 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 319 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb2b3fff640 (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/zmq/sugar/socket.py", line 799 in recv_multipart
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 437 in bootstrap_thread
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb2ebfff640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/managers/scheduler.py", line 1967 in watchdog_thread
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb2f5fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/usr/lib/python3.10/queue.py", line 171 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 139 in ==== backtrace (tid:  14725) ====
 0  /usr/lib/x86_64-linux-gnu/libucs.so.0(ucs_handle_error+0x2e4) [0x7fbca40c2fc4]
forward_thread_func_ 1  /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x24fec) [0x7fbca40c6fec]

 2  /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x251aa) [0x7fbca40c71aa]
  File  3  /usr/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_GetDataRelBase+0x4) [0x7fbf68986984]
 4  /usr/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x189) [0x7fbf68a3fa49]
 5  /usr/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7fbab4029fe9]
 6  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x89e38) [0x7fbaa19f7e38]
 7  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x12cf) [0x7fbaa19f13cf]
"/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py 8  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7fbaa19f1cc3]
" 9  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7fbaa1a0663e]
, line 10  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b5b5) [0x7fbaa19b95b5]
11611  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7979e) [0x7fbaa19e779e]
 in 12  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6a570) [0x7fbaa19d8570]
decorate_context13  sglang::scheduler_TP1(+0x18ae12) [0x5566851e2e12]

14  sglang::scheduler_TP1(_PyObject_MakeTpCall+0x25b) [0x5566851d975b]
  File 15  sglang::scheduler_TP1(+0x198a6b) [0x5566851f0a6b]
"16  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x5642) [0x5566851d28b2]
/home/sglang_0.4.7_dea8aa7/python/sglang/srt/managers/tp_worker_overlap_thread.py17  sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x5566851e366c]
"18  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x5566851cda74]
, line 19  sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x5566851e366c]
12720  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x2a83) [0x5566851cfcf3]
 in 21  sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x5566851e366c]
forward_thread_func22  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x5566851cda74]

23  sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x5566851e366c]
  File 24  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x2a83) [0x5566851cfcf3]
"25  sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x5566851e366c]
/usr/lib/python3.10/threading.py26  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x5566851cda74]
"27  sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x5566851e366c]
, line 28  sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x5566851cda74]
95329  sglang::scheduler_TP1(+0x1989f1) [0x5566851f09f1]
 in 30  sglang::scheduler_TP1(+0x2acfca) [0x556685304fca]
run31  sglang::scheduler_TP1(+0x2a28e8) [0x5566852fa8e8]

32  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fbf6b0d2ac3]
  File 33  /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fbf6b163a04]
"=================================
/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb495fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
I0715 07:48:09.423519 14116 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 25.45 MB/s (over last 5s)
[lts-4090:12960:0:14726] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xb0)
Fatal Python error: Segmentation fault

Thread 0x00007fb701fff640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/transfer_engine.py", line 75 in transfer_sync
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 259 in process_layer
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007fb73fffe640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/transfer_engine.py", line 75 in transfer_sync
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 259 in process_layer
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb767fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb77bfff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 245 in as_completed
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 276 in send_kvcache
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 359 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb787fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/common/utils.py", line 24 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 319 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb793fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/common/utils.py", line 24 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 319 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb79ffff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/common/utils.py", line 24 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 319 in transfer_worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb7abfff640 (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/zmq/sugar/socket.py", line 799 in recv_multipart
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 437 in bootstrap_thread
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb863fff640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/managers/scheduler.py", line 1967 in watchdog_thread
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fb86dfff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 320 in wait
  File "/usr/lib/python3.10/queue.py", line 171 in get
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 139 in forward_thread_func_
==== backtrace (tid:  14726) ====
  File  0  /usr/lib/x86_64-linux-gnu/libucs.so.0(ucs_handle_error+0x2e4) [0x7fc2700a4fc4]
" 1  /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x24fec) [0x7fc2700a8fec]
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py 2  /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x251aa) [0x7fc2700a91aa]
" 3  /usr/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_GetDataRelBase+0x4) [0x7fc4fa25a984]
, line  4  /usr/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x189) [0x7fc4fa313a49]
116 5  /usr/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7fc02801dfe9]
 6  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x89e38) [0x7fc0152f7e38]
 in  7  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x12cf) [0x7fc0152f13cf]
decorate_context 8  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7fc0152f1cc3]

 9  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7fc01530663e]
  File 10  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b5b5) [0x7fc0152b95b5]
"11  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7979e) [0x7fc0152e779e]
/home/sglang_0.4.7_dea8aa7/python/sglang/srt/managers/tp_worker_overlap_thread.py12  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6a570) [0x7fc0152d8570]
"13  sglang::scheduler_TP0(+0x18ae12) [0x56488f9c4e12]
, line 14  sglang::scheduler_TP0(_PyObject_MakeTpCall+0x25b) [0x56488f9bb75b]
12715  sglang::scheduler_TP0(+0x198a6b) [0x56488f9d2a6b]
 in 16  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x5642) [0x56488f9b48b2]
forward_thread_func17  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56488f9c566c]

18  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56488f9afa74]
19  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56488f9c566c]
20  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x2a83) [0x56488f9b1cf3]
  File 21  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56488f9c566c]
"22  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56488f9afa74]
/usr/lib/python3.10/threading.py23  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56488f9c566c]
", line 24  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x2a83) [0x56488f9b1cf3]
95325  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56488f9c566c]
 in 26  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56488f9afa74]
run27  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56488f9c566c]

28  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56488f9afa74]
  File 29  sglang::scheduler_TP0(+0x1989f1) [0x56488f9d29f1]
"30  sglang::scheduler_TP0(+0x2acfca) [0x56488fae6fca]
/usr/lib/python3.10/threading.py31  sglang::scheduler_TP0(+0x2a28e8) [0x56488fadc8e8]
"32  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fc4fc9a6ac3]
, line 33  /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fc4fca37a04]
1016=================================
 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fba0ffff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
[2025-07-15 07:48:39] Child process unexpectedly failed with exitcode=139. pid=12961
[2025-07-15 07:48:39] Child process unexpectedly failed with exitcode=139. pid=12960

@alogfans
Copy link
Copy Markdown
Collaborator Author

I don't know what's the actual reason, because cudaMemcpy doesn't cause sigfault, and other modifications are consistent with previous versions. BTW, I have added the support of dumping backtrace logs in C++ part. You can repeat it using the latest whl package.

@ZhenshengWu
Copy link
Copy Markdown

[2025-07-15 09:19:55] INFO:     172.16.16.63:57890 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 09:19:55 TP0] Prefill batch. #new-seq: 1, #new-token: 127, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 1228.97
I0715 09:19:55.802920 17460 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 5.26 MB/s (over last 5s)
I0715 09:19:55.803395 17466 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 5.32 MB/s (over last 5s)
I0715 09:20:00.803344 17460 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 1.53 MB/s (over last 5s)
I0715 09:20:00.803800 17466 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 1.36 MB/s (over last 5s)
[2025-07-15 09:20:01] INFO:     172.16.16.63:47096 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 09:20:01 TP0] Prefill batch. #new-seq: 1, #new-token: 128, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 21.73
[2025-07-15 09:20:02] INFO:     172.16.16.63:47102 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 09:20:02 TP0] Prefill batch. #new-seq: 1, #new-token: 122, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 80.89
[2025-07-15 09:20:03] INFO:     172.16.16.63:47104 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 09:20:03 TP0] Prefill batch. #new-seq: 1, #new-token: 128, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 242.79
I0715 09:20:05.803707 17460 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 2.21 MB/s (over last 5s)
I0715 09:20:05.804239 17466 transfer_engine.cpp:424] [Metrics] Transfer Engine Throughput: 2.40 MB/s (over last 5s)
[2025-07-15 09:20:09] INFO:     172.16.16.63:51718 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 09:20:09 TP0] Prefill batch. #new-seq: 1, #new-token: 128, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 21.91
[2025-07-15 09:20:10] INFO:     172.16.16.63:51720 - "POST /generate HTTP/1.1" 200 OK
[2025-07-15 09:20:10 TP0] Prefill batch. #new-seq: 1, #new-token: 125, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 143.14
[lts-4090:16314:0:18012] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid:  18012) ====
 0  /usr/lib/x86_64-linux-gnu/libucs.so.0(ucs_handle_error+0x2e4) [0x7f9854ea5fc4]
 1  /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x24fec) [0x7f9854ea9fec]
 2  /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x251aa) [0x7f9854eaa1aa]
 3  /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad550) [0x7f9ac29f6550]
 4  /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad8b1) [0x7f9ac29f68b1]
 5  /usr/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x107) [0x7f9ac29f69c7]
 6  /usr/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7f9600019fe9]
 7  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x89e38) [0x7f95dd9f7e38]
 8  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x12cf) [0x7f95dd9f13cf]
 9  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7f95dd9f1cc3]
10  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7f95dda0663e]
11  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b5b5) [0x7f95dd9b95b5]
12  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7979e) [0x7f95dd9e779e]
13  /usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6a570) [0x7f95dd9d8570]
14  sglang::scheduler_TP0(+0x18ae12) [0x56412e960e12]
15  sglang::scheduler_TP0(_PyObject_MakeTpCall+0x25b) [0x56412e95775b]
16  sglang::scheduler_TP0(+0x198a6b) [0x56412e96ea6b]
17  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x5642) [0x56412e9508b2]
18  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56412e96166c]
19  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56412e94ba74]
20  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56412e96166c]
21  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x2a83) [0x56412e94dcf3]
22  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56412e96166c]
23  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56412e94ba74]
24  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56412e96166c]
25  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x2a83) [0x56412e94dcf3]
26  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56412e96166c]
27  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56412e94ba74]
28  sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x56412e96166c]
29  sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x56412e94ba74]
30  sglang::scheduler_TP0(+0x1989f1) [0x56412e96e9f1]
31  sglang::scheduler_TP0(+0x2acfca) [0x56412ea82fca]
32  sglang::scheduler_TP0(+0x2a28e8) [0x56412ea788e8]
33  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f9ac5089ac3]
34  /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7f9ac511aa04]
=================================
Fatal Python error: Segmentation fault

Current thread 0x00007f8cdbfff640 (most recent call first):
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/transfer_engine.py", line 75 in transfer_sync
  File "/home/sglang_0.4.7_dea8aa7/python/sglang/srt/disaggregation/mooncake/conn.py", line 259 in process_layer
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

@alogfans
Copy link
Copy Markdown
Collaborator Author

@ZhenshengWu You can try the latest patch.

@ZhenshengWu
Copy link
Copy Markdown

@ZhenshengWu You can try the latest patch.

This fix doesn’t seem to work; the same error still occurs.

@ZhenshengWu
Copy link
Copy Markdown

If needed, I can provide you with my test machine and environment to help reproduce the issue. As of now, based on this fix([Fix coredump problem due to slice allocation failed]), the stack error from slice no longer appears.

Comment on lines +31 to +34
#ifdef USE_CUDA
#include <cuda.h>
#include <cuda_runtime.h>
#endif
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires USE_CUDA, maybe the release pkg is not sufficient to get the job done since it hasn't been compiled with cuda. cc: @xiaguan

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to fix the use_cuda issue in CI.

Copy link
Copy Markdown
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a env var like MC_FORCE_MNNVL, or the E2E will use RDMA first?

@alogfans
Copy link
Copy Markdown
Collaborator Author

I'm puzzled why the UCX libucs.so is present in the backtrace. Mooncake doesn't rely on this currently.

@alogfans
Copy link
Copy Markdown
Collaborator Author

Do we need a env var like MC_FORCE_MNNVL, or the E2E will use RDMA first?

On current implementation, we have a env var MC_FORCE_MNNVL.

@ShangmingCai
Copy link
Copy Markdown
Collaborator

Do we need a env var like MC_FORCE_MNNVL, or the E2E will use RDMA first?

On current implementation, we have a env var MC_FORCE_MNNVL.

@alogfans What I really mean is that should we have an env var MC_FORCE_TCP, in case users want to use TCP for transport even if they have RDMA, so that they can use RDMA for EP, and transfer KVCache through TCP.

@ShangmingCai
Copy link
Copy Markdown
Collaborator

@ZhenshengWu Do you have time to verify this PR? We are about to release v0.3.5, just wondering whether we should involve this PR.

@ZhenshengWu
Copy link
Copy Markdown

@ZhenshengWu Do you have time to verify this PR? We are about to release v0.3.5, just wondering whether we should involve this PR.

I will try to verify this PR end to end

@ZhenshengWu
Copy link
Copy Markdown

ZhenshengWu commented Jul 24, 2025

@ZhenshengWu Do you have time to verify this PR? We are about to release v0.3.5, just wondering whether we should involve this PR.

I tried the latest code, but errors still occur during stress testing, and almost always at a fixed stage of the test.

I0724 06:38:23.078465 50958 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 190.17 MB/s (over last 5s)
I0724 06:38:23.098349 50962 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 189.18 MB/s (over last 5s)
[2025-07-24 06:38:23] INFO:     172.16.16.63:55310 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:23 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 2857.05
[2025-07-24 06:38:23] INFO:     172.16.16.63:55326 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:23 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 18350.93
[2025-07-24 06:38:25] INFO:     172.16.16.63:55340 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:25 TP0] Prefill batch. #new-seq: 1, #new-token: 5116, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 2831.76
[2025-07-24 06:38:25] INFO:     172.16.16.63:55354 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:25 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 18825.14
[2025-07-24 06:38:27] INFO:     172.16.16.63:55356 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:27 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 2814.35
[2025-07-24 06:38:27] INFO:     172.16.16.63:47712 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:27 TP0] Prefill batch. #new-seq: 1, #new-token: 5114, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 18786.64
I0724 06:38:28.078872 50958 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 272.80 MB/s (over last 5s)
I0724 06:38:28.098749 50962 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 280.87 MB/s (over last 5s)
[2025-07-24 06:38:29] INFO:     172.16.16.63:47722 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:29 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.00, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 0, input throughput (token/s): 2785.64
[2025-07-24 06:38:29] INFO:     172.16.16.63:47728 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:29 TP0] Prefill batch. #new-seq: 1, #new-token: 5100, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 18812.22
[2025-07-24 06:38:31] INFO:     172.16.16.63:47738 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:31 TP0] Prefill batch. #new-seq: 1, #new-token: 5120, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 3072.73
I0724 06:38:33.079277 50958 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 130.36 MB/s (over last 5s)
I0724 06:38:33.099143 50962 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 122.99 MB/s (over last 5s)
[2025-07-24 06:38:33] INFO:     172.16.16.63:47754 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:33 TP0] Prefill batch. #new-seq: 1, #new-token: 5108, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 3050.47
I0724 06:38:38.079701 50958 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 35.97 MB/s (over last 5s)
I0724 06:38:38.099504 50962 transfer_engine.cpp:460] [Metrics] Transfer Engine Throughput: 35.67 MB/s (over last 5s)
[2025-07-24 06:38:38] INFO:     172.16.16.63:60296 - "POST /generate HTTP/1.1" 200 OK
[2025-07-24 06:38:38 TP0] Prefill batch. #new-seq: 1, #new-token: 5105, #cached-token: 0, token usage: 0.05, #unbootstrapped-req: 0, #queue-req: 0, #transferring-req: 1, input throughput (token/s): 968.96
Received signal 11
Backtrace (C++):
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake15print_backtraceEv+0x36) [0x7fce89bfeeb6]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14signal_handlerEi+0x2c) [0x7fce89bfefac]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fd353435520]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad550) [0x7fd350df4550]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad8b1) [0x7fd350df48b1]
/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x107) [0x7fd350df49c7]
/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7fce89b23fe9]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x8afa8) [0x7fce89bf3fa8]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x13d3) [0x7fce89bed553]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7fce89bede33]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7fce89c0295e]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b702) [0x7fce89bb4702]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7a7be) [0x7fce89be37be]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6aa90) [0x7fce89bd3a90]
sglang::scheduler_TP1(+0x18ae12) [0x562c371b1e12]
sglang::scheduler_TP1(_PyObject_MakeTpCall+0x25b) [0x562c371a875b]
sglang::scheduler_TP1(+0x198a6b) [0x562c371bfa6b]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x5642) [0x562c371a18b2]
sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x562c371b266c]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x562c3719ca74]
sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x562c371b266c]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x2a83) [0x562c3719ecf3]
sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x562c371b266c]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x562c3719ca74]
sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x562c371b266c]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x2a83) [0x562c3719ecf3]
sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x562c371b266c]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x562c3719ca74]
sglang::scheduler_TP1(_PyFunction_Vectorcall+0x7c) [0x562c371b266c]
sglang::scheduler_TP1(_PyEval_EvalFrameDefault+0x804) [0x562c3719ca74]
sglang::scheduler_TP1(+0x1989f1) [0x562c371bf9f1]
sglang::scheduler_TP1(+0x2acfca) [0x562c372d3fca]
sglang::scheduler_TP1(+0x2a28e8) [0x562c372c98e8]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fd353487ac3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fd353518a04]
Received signal 6
terminate called after throwing an instance of 'std::system_error'
Backtrace (C++):
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake15print_backtraceEv+0x36) [0x7f56ec481eb6]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14signal_handlerEi+0x2c) [0x7f56ec481fac]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f5bc1ece520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7f5bc1f229fc]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7f5bc1ece476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3) [0x7f5bc1eb47f3]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2692) [0x7f5bbf882692]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad89f) [0x7f5bbf88d89f]
/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x107) [0x7f5bbf88d9c7]
/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7f5714019fe9]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x8afa8) [0x7f56ec476fa8]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x13d3) [0x7f56ec470553]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7f56ec470e33]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7f56ec48595e]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b702) [0x7f56ec437702]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7a7be) [0x7f56ec4667be]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6aa90) [0x7f56ec456a90]
sglang::scheduler_TP0(+0x18ae12) [0x5599b7068e12]
sglang::scheduler_TP0(_PyObject_MakeTpCall+0x25b) [0x5599b705f75b]
sglang::scheduler_TP0(+0x198a6b) [0x5599b7076a6b]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x5642) [0x5599b70588b2]
sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x5599b706966c]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x5599b7053a74]
sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x5599b706966c]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x2a83) [0x5599b7055cf3]
sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x5599b706966c]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x5599b7053a74]
sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x5599b706966c]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x2a83) [0x5599b7055cf3]
sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x5599b706966c]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x5599b7053a74]
sglang::scheduler_TP0(_PyFunction_Vectorcall+0x7c) [0x5599b706966c]
sglang::scheduler_TP0(_PyEval_EvalFrameDefault+0x804) [0x5599b7053a74]
sglang::scheduler_TP0(+0x1989f1) [0x5599b70769f1]
sglang::scheduler_TP0(+0x2acfca) [0x5599b718afca]
sglang::scheduler_TP0(+0x2a28e8) [0x5599b71808e8]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f5bc1f20ac3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7f5bc1fb1a04]
[2025-07-24 06:38:41] Child process unexpectedly failed with exitcode=256. pid=48836
[2025-07-24 06:38:41] Child process unexpectedly failed with exitcode=256. pid=48835
image

CMD

#P
CUDA_VISIBLE_DEVICES=0,1 MC_TE_METRIC=true SGLANG_TBO_DEBUG=1 python3 -m sglang.launch_server --model-path /models/Qwen/Qwen3-30B-A3B-FP8/ --disaggregation-mode prefill --dist-init-addr  172.16.16.63:9000 --nnodes 1 --node-rank 0 --tp-size 2 --decode-log-interval 1  --page-size 1 --host  172.16.16.63 --trust-remote-code  --disable-radix-cache --watchdog-timeout 1000000  --mem-fraction-static 0.85 --chunked-prefill-size 8192  --enable-metrics --enable-p2p-check --attention-backend fa3 --reasoning-parser qwen3  --port 9001

#D
CUDA_VISIBLE_DEVICES=2,3 SGLANG_TBO_DEBUG=1  python3 -m sglang.launch_server --model-path  /models/Qwen/Qwen3-30B-A3B-FP8/ --disaggregation-mode decode  --dist-init-addr  172.16.16.63:9991 --nnodes 1 --node-rank 0 --tp-size 2 --decode-log-interval 1 --page-size 1 --host   172.16.16.63 --trust-remote-code  --disable-radix-cache --watchdog-timeout 1000000  --mem-fraction-static 0.85 --chunked-prefill-size 8192  --enable-metrics --enable-p2p-check --attention-backend fa3 --port 9002


#test 
python3 -m sglang.bench_serving --backend sglang   --num-prompts 512 --random-input-len 5120 --random-output-len  128 --dataset-name random --dataset-path  /datasets/ShareGPT_V3_unfiltered_cleaned_split.json --seed 42 --host 0.0.0.0 --port 8001 --random-range-ratio 1.0 --max-concurrency 2

I noticed something unusual — the length of the KVCache data being transmitted seems off right before and after the coredump. Below is a comparison between the failing stress test and a normal one.

4a4f6b1d0cf9f78b49c20bfa2919a69c 984b8b9aa9fe557a48d9854503d16689

Additionally, this week we attempted an adaptation on the sglang side. Without modifying MoonCake’s code, we perform D to H transfers on the P side of sglang, host-to-device (H to D) transfers on the D side and transfer kv by mooncake-tcp. We've already implemented this, and single curl requests work fine and return correct results. However, we still encounter coredumps during stress testing.

At this point, I’m still unsure whether the issue is introduced by sglang or on the MoonCake side.

@ZhenshengWu
Copy link
Copy Markdown

I tested our DtoH and HtoD implementation with the latest sglang code, and it seems that the error no longer occurs. I will soon run tests combining the latest versions of MoonCake-tcp-vram and sglang. The latest test results will be available by tomorrow morning at the latest.

@ZhenshengWu
Copy link
Copy Markdown

ZhenshengWu commented Jul 24, 2025

I tested our DtoH and HtoD implementation with the latest sglang code, and it seems that the error no longer occurs. I will soon run tests combining the latest versions of MoonCake-tcp-vram and sglang. The latest test results will be available by tomorrow morning at the latest.

Switching to the latest version of sglang still results in the same error. However, the difference is that the number of completed prompts before the error occurs has increased from 23 to 322.
836e691426c3aadaaacef45ad207ff5d

And the adaptation we made on the sglang side—specifically the way DtoH and HtoD are handled—exhibits the same issue. I suspect that under high pressure, when each transfer only contains 512 bytes, the error is likely triggered in mooncake, possibly in:

TcpTransport::startTransfer(Slice *slice){
    .....
    .....
    asio::connect(socket, endpoint_iterator);

}

Of course, this is just my current hypothesis. And I believe this might be an issue that has existed for a long time, not something introduced by this PR. This PR LGTM

@yangelaboy
Copy link
Copy Markdown

I have same problem even if I install mooncake with branch alogfans:tcp-vram

#P
CUDA_VISIBLE_DEVICES=3 MC_FORCE_TCP=1 MC_TE_METRIC=1 SGLANG_TBO_DEBUG=1 python3 -m sglang.launch_server --model-path /models/Qwen2-VL-2B-Instruct --disaggregation-mode prefill --nnodes 1 --node-rank 0 --tp-size 1 --decode-log-interval 1  --page-size 1 --host  0.0.0.0 --trust-remote-code  --disable-radix-cache --watchdog-timeout 1000000  --mem-fraction-static 0.85 --chunked-prefill-size 8192  --enable-metrics --enable-p2p-check --attention-backend torch_native  --port 9001 &

#D
CUDA_VISIBLE_DEVICES=4 MC_FORCE_TCP=1 MC_TE_METRIC=1 SGLANG_TBO_DEBUG=1 python3 -m sglang.launch_server --model-path /models/Qwen2-VL-2B-Instruct --disaggregation-mode decode --nnodes 1 --node-rank 0 --tp-size 1 --decode-log-interval 1 --page-size 1 --host 0.0.0.0 --trust-remote-code  --disable-radix-cache --watchdog-timeout 1000000  --mem-fraction-static 0.85 --chunked-prefill-size 8192  --enable-metrics --enable-p2p-check --attention-backend torch_native --port 9002 &

#router

python3 -m sglang.srt.disaggregation.mini_lb --prefill http://0.0.0.0:9001 --decode http://0.0.0.0:9002 --host 0.0.0.0 --port 8000 &

error is as follow:

image image

@ZhenshengWu
Copy link
Copy Markdown

ZhenshengWu commented Jul 25, 2025

error is as follow:

image image

@yangelaboy It seems that mlx5_bond0 may not actually be an InfiniBand (IB) card, but Mooncake mistakenly recognizes it as one. As a result, it defaults to using the RDMA protocol instead of the TCP protocol. You should set the MC_FORCE_TCP environment variable to force the use of the TCP protocol and then continue testing.
beb3230

@yangelaboy
Copy link
Copy Markdown

@ZhenshengWu I have set MC_FORCE_TCP=1, but it seems not work。I try to disable rdma and try again。

image

@ZhenshengWu
Copy link
Copy Markdown

@yangelaboy
image
"To test this, I think you can modify the code here (transfer_engine.cpp)and recompile mooncake to force it to use the TCP protocol directly."

@ZhenshengWu
Copy link
Copy Markdown

ZhenshengWu commented Jul 25, 2025

I’m fairly certain the issue lies in the TCP transmission on the MoonCake side, and I’ve found a minimal reproducible case. I mocked a P service and a D service, then used a curl request to send the CPU address registered in D to P. On the P side, I directly simulated the KVCache transmission and modified the byte length of the KV being sent. We found that when the length is set to 512 bytes, the error consistently occurs.
6be74257a0f26427f6a1bc39d3b4e396

When i set 630784. It's ok!
5fec63f0e91895a46b778b55021ac9a5

mock_p.py

from fastapi import FastAPI, APIRouter, Body
from pydantic import BaseModel
from typing import List
import struct
from transfer_engine import MooncakeTransferEngine  # 假设这是你的模块
import numpy as np
import ctypes
from concurrent.futures import ThreadPoolExecutor
from loguru import logger as logger_my

import torch
class InputIPPort(BaseModel):
    ip: str
    port: int


class KVManager:
    def __init__(self):
        self.kv_data_ptrs: List[int] = []


class MyAPI:
    def __init__(self, hostname: str, gpu_id: int, num_buffers: int, mem_type: str = "cpu"):
        self.app = FastAPI()
        self.num_buffers = num_buffers
        self.mem_type = mem_type
        self.router = APIRouter()
        self.kv_mgr = KVManager()
        self.setup_routes()
        self.app.include_router(self.router)
        self.engine = MooncakeTransferEngine(hostname, gpu_id, "", )  
        self.register_kv()
        self.d_seesion_id = None

    def setup_routes(self):
        @self.router.put("/set_ip_port")
        async def set_ip_port(data: InputIPPort):
            # 这里可以调用 self.engine 来设置通信对端
            self.d_seesion_id = data.ip + ":" + str(data.port)
            logger_my.info("self.d_seesion_id:{}".format(self.d_seesion_id))
            return {"message": f"Received IP: {data.ip}, Port: {data.port}"}

        @self.router.put("/register_kv_ptrs")
        async def upload_kv_ptrs(raw_data: bytes = Body(...)):
            num_ptrs = len(raw_data) // 8
            fmt = f"{num_ptrs}Q"  # Q = unsigned long long (8 bytes)
            raw_ptrs = struct.unpack(fmt, raw_data)

            # 显式转换为 Python int 类型
            logger_my.info("raw_ptrs:{}".format(raw_ptrs))
            self.kv_mgr.kv_data_ptrs = [int(ptr) for ptr in raw_ptrs]

            return {
                "message": f"Received {num_ptrs} pointers",
                "ptrs": self.kv_mgr.kv_data_ptrs  
            }

        @self.router.post("/send_kv")
        async def trigger_send_kv():
            self.send_kv()
            return {"message": "KV buffers sent from local to remote successfully."}


    def register_kv(self):
        buf_size = 44436787  

        self.kv_buffers = []
        self.kv_data_ptrs = []

        for _ in range(self.num_buffers):
            if self.mem_type == "cpu":
                data = np.random.randint(0, 256, size=buf_size, dtype=np.uint8)
                buf = (ctypes.c_ubyte * buf_size)()
                ctypes.memmove(buf, data.ctypes.data, buf_size)
                ptr = ctypes.cast(buf, ctypes.c_void_p).value
                logger_my.info("ptr:{}  length:{}".format(ptr, buf_size))
                self.kv_buffers.append(buf)
                self.kv_data_ptrs.append(ptr)
            elif self.mem_type == "gpu":
                buf = torch.randint(0, 256, (buf_size,), dtype=torch.uint8, device='cuda')
                ptr = buf.data_ptr()  # 获取 GPU 设备指针
                logger_my.info("GPU ptr:{}  length:{}".format(ptr, buf_size))
                self.kv_buffers.append(buf)
                self.kv_data_ptrs.append(ptr)

    def send_kv(self):
        if len(self.kv_data_ptrs) != len(self.kv_mgr.kv_data_ptrs):
            raise ValueError("Local and remote KV pointer lists have different lengths.")

        buf_size = 444367872
        chunk_size = 512
        total_chunks = buf_size // chunk_size

        tasks = []

        with ThreadPoolExecutor(max_workers=32) as executor:
            for local_ptr, remote_ptr in zip(self.kv_data_ptrs, self.kv_mgr.kv_data_ptrs):
                for i in range(total_chunks):
                    offset = i * chunk_size
                    src = local_ptr + offset
                    dest = remote_ptr + offset

                    # 提交任务
                    tasks.append(executor.submit(self.send, src, dest, chunk_size))

        # 可选:等待所有任务完成
        for task in tasks:
            task.result()

        print("All KV transfers completed.")

    def send(self, src, dest, length=512):
        logger_my.info("src:{}, dest:{}, length:{}".format(src, dest, length))
        self.engine.transfer_sync(self.d_seesion_id, src, dest, length)

# 例如在 main.py 或启动脚本中
if __name__ == "__main__":
    import uvicorn
    import sys

    # 从命令行读取 hostname 和 gpu_id,默认值可修改
    hostname = sys.argv[1] if len(sys.argv) > 1 else "127.0.0.1"
    gpu_id = int(sys.argv[2]) if len(sys.argv) > 2 else 0
    port = int(sys.argv[3]) if len(sys.argv) > 3 else 8001
    num_buffers  = int(sys.argv[4]) if len(sys.argv) > 4 else 16
    mem_type = str(sys.argv[5]) if len(sys.argv) > 5 else "cpu"

    client = MyAPI(hostname, gpu_id, num_buffers, mem_type)
    uvicorn.run(client.app, host="0.0.0.0", port=port)

mock_d.py

import numpy as np
import ctypes
import struct
import requests
from fastapi import FastAPI, APIRouter, Body
from pydantic import BaseModel
from typing import List
from transfer_engine import MooncakeTransferEngine

import torch
from loguru import  logger as logger_my

class InputIPPort(BaseModel):
    ip: str
    port: int


class ClientAPI:
    def __init__(self, hostname: str, gpu_id: int, num_buffers, mem_type: str = "cpu"):
        self.app = FastAPI()
        self.num_buffer = num_buffers
        self.mem_type = mem_type
        self.router = APIRouter()
        self.kv_buffers: List[ctypes.Array] = []
        self.kv_data_ptrs: List[int] = []
        self.remote_url: str = ""
        self.engine = MooncakeTransferEngine(hostname, gpu_id, "", )  
        self.register_kv()
        self.setup_routes()
        self.app.include_router(self.router)



    def register_kv(self):
        buf_size = 44436787  

        self.kv_buffers = []
        self.kv_data_ptrs = []

        for _ in range(self.num_buffer):

            if self.mem_type == "cpu":

                data = np.random.randint(0, 256, size=buf_size, dtype=np.uint8)
                buf = (ctypes.c_ubyte * buf_size)()
                ctypes.memmove(buf, data.ctypes.data, buf_size)
                ptr = ctypes.cast(buf, ctypes.c_void_p).value
                logger_my.info("ptr:{}  length:{}".format(ptr, buf_size))
                self.kv_buffers.append(buf)
                self.kv_data_ptrs.append(ptr)
            elif self.mem_type == "gpu":
                buf = torch.randint(0, 256, (buf_size,), dtype=torch.uint8, device='cuda')
                ptr = buf.data_ptr()  # 获取 GPU 设备指针
                logger_my.info("GPU ptr:{}  length:{}".format(ptr, buf_size))
                self.kv_buffers.append(buf)
                self.kv_data_ptrs.append(ptr)

    def send_session_id_to_remote(self):
        if not self.remote_url:
            raise ValueError("Remote URL not set")

        session_id = self.engine.session_id  
        ip, port = session_id.split(":")
        payload = {"ip": ip, "port": int(port)}
        print(f"Sending session ID to remote: {payload}")

        resp = requests.put(f"{self.remote_url}/set_ip_port", json=payload)
        print(f"Response from remote: {resp.status_code} - {resp.text}")

    def send_kv_ptrs_to_remote(self):
        if not self.remote_url:
            raise ValueError("Remote URL not set")

        packed = b''.join(struct.pack("Q", ptr) for ptr in self.kv_data_ptrs)
        logger_my.info(packed)
        print(f"Sending {len(self.kv_data_ptrs)} KV pointers ({len(packed)} bytes)")

        resp = requests.put(f"{self.remote_url}/register_kv_ptrs", data=packed,
                            headers={"Content-Type": "application/octet-stream"})
        print(f"Response from remote: {resp.status_code} - {resp.text}")

    def setup_routes(self):
        @self.router.put("/set_remote")
        async def set_remote(data: InputIPPort):
            self.remote_url = f"http://{data.ip}:{data.port}"
            logger_my.info("remote_url:{}".format(self.remote_url))
            self.send_session_id_to_remote()

            return {"message": f"Remote URL set to {self.remote_url}"}

        @self.router.post("/send_kv_ptrs")
        async def send_kv_ptrs():
            self.send_kv_ptrs_to_remote()
            return {"message": "KV pointers sent to remote."}


# 启动服务
if __name__ == "__main__":
    import uvicorn
    import sys

    # 从命令行读取 hostname 和 gpu_id,默认值可修改
    hostname = sys.argv[1] if len(sys.argv) > 1 else "127.0.0.1"
    gpu_id = int(sys.argv[2]) if len(sys.argv) > 2 else 0
    port = int(sys.argv[3]) if len(sys.argv) > 3 else 8001
    num_buffers  = int(sys.argv[4]) if len(sys.argv) > 4 else 16
    mem_type = str(sys.argv[5]) if len(sys.argv) > 5 else "cpu"

    client = ClientAPI(hostname, gpu_id, num_buffers, mem_type)
    uvicorn.run(client.app, host="0.0.0.0", port=port)

transfer_engine.py is copied from sglang

import json
import logging
from dataclasses import dataclass
from typing import Optional

logger = logging.getLogger(__name__)


class MooncakeTransferEngine:

    def __init__(self, hostname: str, gpu_id: int, ib_device: Optional[str] = None):
        try:
            from mooncake.engine import TransferEngine
        except ImportError as e:
            raise ImportError(
                "Please install mooncake by following the instructions at "
                "https://github.com/kvcache-ai/Mooncake/blob/main/doc/en/build.md "  # noqa: E501
                "to run SGLang with MooncakeTransferEngine."
            ) from e

        self.engine = TransferEngine()
        self.hostname = hostname
        self.gpu_id = gpu_id
        self.ib_device = ib_device

        self.initialize(
            hostname=self.hostname,
            device_name=self.ib_device,
        )
        self.session_id = f"{self.hostname}:{self.engine.get_rpc_port()}"

    def register(self, ptr, length):
        try:
            ret_value = self.engine.register_memory(ptr, length)
        except Exception:
            # Mark register as failed
            ret_value = -1

        if ret_value != 0:
            logger.debug("Mooncake memory registration %s failed.", ptr)

    def deregister(self, ptr):
        try:
            ret_value = self.engine.unregister_memory(ptr)
        except Exception:
            # Mark deregister as failed
            ret_value = -1

        if ret_value != 0:
            logger.debug("Mooncake memory deregistration %s failed.", ptr)

    def initialize(
        self,
        hostname: str,
        device_name: Optional[str],
    ) -> None:
        """Initialize the mooncake instance."""
        ret_value = self.engine.initialize(
            hostname,
            "P2PHANDSHAKE",
            "rdma",
            device_name if device_name is not None else "",
        )
        if ret_value != 0:
            logger.error("Mooncake Transfer Engine initialization failed.")
            raise RuntimeError("Mooncake Transfer Engine initialization failed.")

    def transfer_sync(
        self, session_id: str, buffer: int, peer_buffer_address: int, length: int
    ) -> int:
        """Synchronously transfer data to the specified address."""
        try:

            ret = self.engine.transfer_sync_write(
                session_id, buffer, peer_buffer_address, length
            )
        except Exception:
            ret = -1

        if ret < 0:
            logger.debug(
                "Failed to transfer data from %s to %s - %s.",
                buffer,
                session_id,
                peer_buffer_address,
            )

        return ret

    def get_session_id(self):
        return self.session_id

CMD

# p
python3 mock_p.py 172.16.16.63 0 8003 32 gpu
 
# d
python3 mock_d.py 172.16.16.63 2 8002 32 gpu


curl  -X PUT http://172.16.16.63:8002/set_remote -H "Content-Type: application/json" -d '{"ip": "172.16.16.63", "port": 8003}'
curl -X POST http://172.16.16.63:8002/send_kv_ptrs
curl -X POST http://172.16.16.63:8003/send_kv

@alogfans
Copy link
Copy Markdown
Collaborator Author

@ZhenshengWu Try to pull code again. I have fixed the code yesterday.

@ZhenshengWu
Copy link
Copy Markdown

@ZhenshengWu Try to pull code again. I have fixed the code yesterday.

@alogfans Sorry, you mean the last code is "beb3230ddd271b227bc3770b600498057aa83e51"? I test on beb3230 now
image

@ZhenshengWu
Copy link
Copy Markdown

@yangelaboy Did you test end to end again?

@yangelaboy
Copy link
Copy Markdown

@ZhenshengWu It's ok to start P/D instance,but theris an erorr as follow when trigger a http request:
image

  • command to start P
CUDA_VISIBLE_DEVICES=6 MC_FORCE_TCP=1 MC_TE_METRIC=true SGLANG_TBO_DEBUG=1 python3 -m sglang.launch_server --model-path /Qwen2-VL-2B-Instruct --disaggregation-mode prefill --nnodes 1 --node-rank 0 --tp-size 1 --decode-log-interval 1  --page-size 1 --host  0.0.0.0 --trust-remote-code  --disable-radix-cache --watchdog-timeout 1000000  --mem-fraction-static 0.85 --chunked-prefill-size 8192  --enable-metrics --enable-p2p-check --attention-backend torch_native --port 9001 &
  • command to start D
CUDA_VISIBLE_DEVICES=5 MC_FORCE_TCP=1 SGLANG_TBO_DEBUG=1  python3 -m sglang.launch_server --model-path /models/Qwen2-VL-2B-Instruct/ --disaggregation-mode decode  --nnodes 1 --node-rank 0 --tp-size 1 --decode-log-interval 1 --page-size 1 --host   0.0.0.0 --trust-remote-code  --disable-radix-cache --watchdog-timeout 1000000  --mem-fraction-static 0.85 --chunked-prefill-size 8192  --enable-metrics --enable-p2p-check --attention-backend torch_native --port 9002 &
  • command to start router
python3 -m sglang.srt.disaggregation.mini_lb --prefill http://0.0.0.0:9001 --decode http://0.0.0.0:9002 --host 0.0.0.0 --port 8000 &
  • command to trigger request
curl http://0.0.0.0:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{    "model": "uiagent",    "messages": [    {"role": "system", "content": "You are a helpful assistant."},    {"role": "user", "content": [        {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}},        {"type": "text", "text": "What is the text in the illustrate?"}    ]}    ]    }'

@yangelaboy
Copy link
Copy Markdown

More Info as follow:

tcp        0      0 10.38.244.193:23715     0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 0.0.0.0:8998            0.0.0.0:*               LISTEN      19906/python3       
tcp        0      0 0.0.0.0:9001            0.0.0.0:*               LISTEN      19906/python3       
tcp        0      0 0.0.0.0:9002            0.0.0.0:*               LISTEN      20701/python3       
tcp        0      0 0.0.0.0:16330           0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 10.38.244.193:31533     0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 0.0.0.0:15791           0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 0.0.0.0:16049           0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 10.38.244.193:16851     0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 10.38.244.193:24725     0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 10.38.244.193:26869     0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 0.0.0.0:16055           0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 10.38.244.193:36411     0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 10.38.244.193:64925     0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 10.38.244.193:25885     0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 10.38.244.193:20989     0.0.0.0:*               LISTEN      20038/sglang::sched 
tcp        0      0 10.38.244.193:31071     0.0.0.0:*               LISTEN      20833/sglang::sched 
tcp        0      0 0.0.0.0:8000            0.0.0.0:*               LISTEN      21489/python3       
tcp6       0      0 :::9571                 :::*                    LISTEN      20038/sglang::sched 
tcp6       0      0 :::8998                 :::*                    LISTEN      19906/python3       
tcp6       0      0 :::9533                 :::*                    LISTEN      20833/sglang::sched

Session 10.38.244.193:16330 failed.
Failed to send kv chunk of 1667001290960120928 to 10.38.244.193:64925

image

@ZhenshengWu
Copy link
Copy Markdown

ZhenshengWu commented Jul 28, 2025

@yangelaboy This still seems to be an issue with the registration information when MoonCake starts up. Would it be convenient to connect via WeChat for further discussion? If possible, could you please send your WeChat ID to my email: 1910433006@email.szu.edu.cu? Thank you!

@yangelaboy
Copy link
Copy Markdown

@ZhenshengWu done, please check the email

@yangelaboy
Copy link
Copy Markdown

@ZhenshengWu Sending email failed,please check email address: 1910433006@email.szu.edu.cu

@stmatengss
Copy link
Copy Markdown
Collaborator

@ZhenshengWu @yangelaboy Could you help us verify this PR and provide feedback? Thanks!

@yangelaboy
Copy link
Copy Markdown

@stmatengss I am working on it.
It seems that prefill instance transfers data to port of p2p handshake of decode instance

# prefill log 

Transfer Engine parseHostNameWithPort. server_name: 10.38.244.193 port: 15442
Transfer Engine RPC using P2P handshake, listening on 10.38.244.193:16115
TcpTransport: listen on port 16285

# decode log 
Transfer Engine parseHostNameWithPort. server_name: 10.38.244.193 port: 12001
Transfer Engine RPC using P2P handshake, listening on 10.38.244.193:15194
TcpTransport: listen on port **15793**

# prefill sending log
Register KVArgs from 10.38.244.193:15194 successfully
Failed to transfer data from 140367617853952 to 10.38.244.193:**15194**
Failed to transfer data from 140367382972928 to 10.38.244.193:**15194**
Session 10.38.244.193:**15194** failed
Failed to send kv chunk of xxx to 10.38.244.193:**53249**

@ZhenshengWu
Copy link
Copy Markdown

ZhenshengWu commented Jul 28, 2025

@ZhenshengWu Sending email failed,please check email address: 1910433006@email.szu.edu.cu

@yangelaboy sorry: 1910433006@email.szu.edu.cn

@ZhenshengWu
Copy link
Copy Markdown

@ZhenshengWu @yangelaboy Could you help us verify this PR and provide feedback? Thanks!

@stmatengss We’ve been conducting end-to-end testing for the past two weeks, and errors are still occurring.

Sglang end to end test

I0728 06:46:11.288584 145851 tcp_transport.cpp:496] Resolving 172.16.16.63:15740
I0728 06:46:11.288594 145851 tcp_transport.cpp:501] Attempting to connect to 172.16.16.63:15740
terminate called after throwing an instance of 'std::system_error'
  what():  connect: Cannot assign requested address
Received signal 6
I0728 06:46:11.289353 145853 tcp_transport.cpp:506] Successfully connected to 172.16.16.63:15740
Received signal 11
I0728 06:46:11.289366 145853 tcp_transport.cpp:521] Initiating session with source_addr: 0x7f3c6c880800, dest_addr: 140089473385472, length: 1536, opcode: 1
I0728 06:46:11.289405 144947 tcp_transport.cpp:512] Transfer completed for slice targeting: 1
I0728 06:46:11.289423 145853 tcp_transport.cpp:473] TcpTransport::startTransfer started for target_id: 1
I0728 06:46:11.289427 145853 tcp_transport.cpp:496] Resolving 172.16.16.63:15740
I0728 06:46:11.289431 145853 tcp_transport.cpp:501] Attempting to connect to 172.16.16.63:15740
Backtrace (C++):
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake15print_backtraceEv+0x36) [0x7f5e966b1336]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14signal_handlerEi+0x2c) [0x7f5e966b142c]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f648bb3b520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7f648bb8f9fc]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7f648bb3b476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3) [0x7f648bb217f3]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e) [0x7f64894efb9e]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f64894fb20c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad1e9) [0x7f64894fa1e9]
/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x99) [0x7f64894fa959]
/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7f5fc0017fe9]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x8b3e8) [0x7f5e966a63e8]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x15c5) [0x7f5e9669f905]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7f5e966a01b3]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7f5e966b4dde]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b7c2) [0x7f5e966667c2]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7a87e) [0x7f5e9669587e]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6ab50) [0x7f5e96685b50]

single test

Using the standalone simulated PD process I mentioned in my previous comment(mock_d.py, mock_p.py) to transfer kvCache. The error message is as follows:

I0728 07:30:03.210448 146717 tcp_transport.cpp:496] Resolving 172.16.16.63:15620
I0728 07:30:03.210454 146717 tcp_transport.cpp:501] Attempting to connect to 172.16.16.63:15620
Received signal 11
Received signal 11
terminate called after throwing an instance of 'std::system_error'
  what():  connect: Cannot assign requested address
Received signal 6
Received signal 11
Backtrace (C++):
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake15print_backtraceEv+0x36) [0x7fb3bcf43336]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14signal_handlerEi+0x2c) [0x7fb3bcf4342c]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fb52c210520]
/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_GetDataRelBase+0x4) [0x7fb52bcd3984]
/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x189) [0x7fb52867ca49]
/lib/x86_64-linux-gnu/libunwind.so.8(__libunwind_Unwind_Resume+0x129) [0x7fb52b405fe9]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x8b3e8) [0x7fb3bcf383e8]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport13startTransferEPNS_9Transport5SliceE+0x15c5) [0x7fb3bcf31905]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake12TcpTransport18submitTransferTaskERKSt6vectorIPNS_9Transport15TransferRequestESaIS4_EERKS1_IPNS2_12TransferTaskESaISA_EE+0xd3) [0x7fb3bcf321b3]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(_ZN8mooncake14MultiTransport14submitTransferEmRKSt6vectorINS_9Transport15TransferRequestESaIS3_EE+0x36e) [0x7fb3bcf46dde]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x4b7c2) [0x7fb3bcef87c2]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x7a87e) [0x7fb3bcf2787e]
/usr/local/lib/python3.10/dist-packages/mooncake/engine.cpython-310-x86_64-linux-gnu.so(+0x6ab50) [0x7fb3bcf17b50]
python3(+0x18ae12) [0x55f1ed7cae12]
python3(_PyObject_MakeTpCall+0x25b) [0x55f1ed7c175b]
python3(+0x198a6b) [0x55f1ed7d8a6b]
python3(_PyEval_EvalFrameDefault+0x5642) [0x55f1ed7ba8b2]
python3(_PyFunction_Vectorcall+0x7c) [0x55f1ed7cb66c]
python3(_PyEval_EvalFrameDefault+0x804) [0x55f1ed7b5a74]
python3(+0x1988de) [0x55f1ed7d88de]
python3(_PyEval_EvalFrameDefault+0x2a83) [0x55f1ed7b7cf3]
python3(_PyFunction_Vectorcall+0x7c) [0x55f1ed7cb66c]
python3(_PyEval_EvalFrameDefault+0x804) [0x55f1ed7b5a74]
python3(_PyFunction_Vectorcall+0x7c) [0x55f1ed7cb66c]
python3(_PyEval_EvalFrameDefault+0x2a83) [0x55f1ed7b7cf3]
python3(_PyFunction_Vectorcall+0x7c) [0x55f1ed7cb66c]
python3(_PyEval_EvalFrameDefault+0x804) [0x55f1ed7b5a74]
python3(_PyFunction_Vectorcall+0x7c) [0x55f1ed7cb66c]
python3(_PyEval_EvalFrameDefault+0x804) [0x55f1ed7b5a74]
python3(+0x1989f1) [0x55f1ed7d89f1]
python3(+0x2acfca) [0x55f1ed8ecfca]
python3(+0x2a28e8) [0x55f1ed8e28e8]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fb52c262ac3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fb52c2f3a04]

If I set the length to 264000, It will work normally

@stmatengss
Copy link
Copy Markdown
Collaborator

@alogfans, could you take a look? Thanks!

@stmatengss
Copy link
Copy Markdown
Collaborator

@ZhenshengWu USE MC_LOG_LEVEL This option can be set as TRACE/INFO/WARNING/ERROR (see glog doc), and more detailed logs will be output during runtime

@alogfans
Copy link
Copy Markdown
Collaborator Author

alogfans commented Aug 1, 2025

This PR is deprecated. I've re-implement it and you can try to use #702 @ZhenshengWu

@alogfans alogfans closed this Aug 1, 2025
@ZhenshengWu
Copy link
Copy Markdown

This PR is deprecated. I've re-implement it and you can try to use #702 @ZhenshengWu

Get it!

@ZhenshengWu
Copy link
Copy Markdown

@ZhenshengWu USE MC_LOG_LEVEL This option can be set as TRACE/INFO/WARNING/ERROR (see glog doc), and more detailed logs will be output during runtime

Get it, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants