Skip to content

Reimplement VRAM buffering in TCP transport#702

Merged
ShangmingCai merged 5 commits intokvcache-ai:mainfrom
alogfans:reimplement-tcp-vram
Aug 6, 2025
Merged

Reimplement VRAM buffering in TCP transport#702
ShangmingCai merged 5 commits intokvcache-ai:mainfrom
alogfans:reimplement-tcp-vram

Conversation

@alogfans
Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread mooncake-transfer-engine/src/transport/tcp_transport/tcp_transport.cpp Outdated
Comment thread mooncake-transfer-engine/src/transfer_engine.cpp
Copy link
Copy Markdown
Collaborator

@staryxchen staryxchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

char *dram_buffer = addr + total_transferred_bytes_;

#ifdef USE_CUDA
if (isCudaMemory(addr)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each transfer, we need to invoke it once. Why not move this check to the register memory phase?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory type check is simpler than lookup the memory registration table.

Copy link
Copy Markdown
Collaborator

@stmatengss stmatengss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one concern. Remains look good to me.

@ZhenshengWu
Copy link
Copy Markdown

@alogfans @ShangmingCai I have completed thorough testing on the RTX 4090, and the output is correct with no issues under stress testing. The previous coredump problem I encountered was resolved after changing the sglang pagesize parameter from 1 to 32. If it is still set to 1, the error persists.

@ShangmingCai
Copy link
Copy Markdown
Collaborator

@alogfans @ShangmingCai I have completed thorough testing on the RTX 4090, and the output is correct with no issues under stress testing. The previous coredump problem I encountered was resolved after changing the sglang pagesize parameter from 1 to 32. If it is still set to 1, the error persists.

@ZhenshengWu Thx for the info! Guess we can merge this now, and check where (maybe sglang) we should fix to address the coredump later.

@ShangmingCai ShangmingCai merged commit 2eadcef into kvcache-ai:main Aug 6, 2025
10 checks passed
@purp1e-ace
Copy link
Copy Markdown

I try to use tcp as mooncake backend to do pd disaggregation with sglang, but I still got some errors. Here is my environment, start script and logs.

Here is my env

Python: 3.12.11 | packaged by Anaconda, Inc. | (main, Jun  5 2025, 13:09:17) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H20
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 550.144.03
PyTorch: 2.7.1+cu128
sglang: 0.4.8.post1+dev0.8
sgl_kernel: 0.1.9
flashinfer_python: 0.2.6.post1
triton: 3.3.1
transformers: 4.52.3
torchao: 0.9.0
numpy: 2.2.6
aiohttp: 3.12.14
fastapi: 0.116.1
hf_transfer: 0.1.9
huggingface_hub: 0.34.1
interegular: 0.3.3
modelscope: 1.28.1
orjson: 3.11.1
outlines: 0.1.11
packaging: 24.2
psutil: 7.0.0
pydantic: 2.11.7
python-multipart: 0.0.20
pyzmq: 27.0.0
uvicorn: 0.35.0
uvloop: 0.21.0
vllm: 0.10.0
xgrammar: 0.1.19
openai: 1.90.0
tiktoken: 0.9.0
anthropic: 0.60.0
litellm: 1.74.15.post1
decord: 0.6.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    CPU Affinity      NUMA Affinity   GPU NUMA ID
GPU0     X      NV18    NV18    NV18    NV18    NV18    NV18    NV18    PIX     NODE    SYS     SYS     SYS     SYS     0-105   0N/A
GPU1    NV18     X      NV18    NV18    NV18    NV18    NV18    NV18    PIX     NODE    SYS     SYS     SYS     SYS     0-105   0N/A
GPU2    NV18    NV18     X      NV18    NV18    NV18    NV18    NV18    NODE    PIX     SYS     SYS     SYS     SYS     0-105   0N/A
GPU3    NV18    NV18    NV18     X      NV18    NV18    NV18    NV18    NODE    PIX     SYS     SYS     SYS     SYS     0-105   0N/A
GPU4    NV18    NV18    NV18    NV18     X      NV18    NV18    NV18    SYS     SYS     PIX     NODE    NODE    NODE    106-211 1N/A
GPU5    NV18    NV18    NV18    NV18    NV18     X      NV18    NV18    SYS     SYS     PIX     NODE    NODE    NODE    106-211 1N/A
GPU6    NV18    NV18    NV18    NV18    NV18    NV18     X      NV18    SYS     SYS     NODE    PIX     NODE    NODE    106-211 1N/A
GPU7    NV18    NV18    NV18    NV18    NV18    NV18    NV18     X      SYS     SYS     NODE    PIX     NODE    NODE    106-211 1N/A
NIC0    PIX     PIX     NODE    NODE    SYS     SYS     SYS     SYS      X      NODE    SYS     SYS     SYS     SYS
NIC1    NODE    NODE    PIX     PIX     SYS     SYS     SYS     SYS     NODE     X      SYS     SYS     SYS     SYS
NIC2    SYS     SYS     SYS     SYS     PIX     PIX     NODE    NODE    SYS     SYS      X      NODE    NODE    NODE
NIC3    SYS     SYS     SYS     SYS     NODE    NODE    PIX     PIX     SYS     SYS     NODE     X      NODE    NODE
NIC4    SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    SYS     SYS     NODE    NODE     X      PIX
NIC5    SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    SYS     SYS     NODE    NODE    PIX      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_1
  NIC1: mlx5_2
  NIC2: mlx5_3
  NIC3: mlx5_4
  NIC4: mlx5_5
  NIC5: mlx5_6


Hypervisor vendor: KVM
ulimit soft: 65536

Start script for P:

export SGL_ENABLE_JIT_DEEPGEMM=1
export NCCL_DEBUG_ENABLE=false
export NCCL_IB_DISABLE=0
export NCCL_P2P_DISABLE=0
export NCCL_IB_HCA=mlx5
export GLOO_SOCKET_IFNAME=eth0
export NCCL_SOCKET_IFNAME=eth0
export SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=600
export MC_FORCE_TCP=1

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_HOME/lib
export NCCL_NET_MERGE_LEVEL="LOC"

export MC_TE_METRIC=true
export NCCL_DEBUG_ENABLE=true
export NCCL_DEBUG=DEBUG 

nohup python -m sglang.launch_server --model-path /data/data90/hf-models/DeepSeek-R1 --served-model-name DeepSeek-R1 \
    --disaggregation-mode prefill --trust-remote-code --enable-nccl-nvls \
    --tp 8 --mem-fraction-static 0.88 --host 10.148.4.114 --port 30000 \
    --max-prefill-tokens 16384 --chunked-prefill-size 16384 \
    --disaggregation-decode-tp 16 --disaggregation-decode-dp 16 \
    --watchdog-timeout 1000000 \
    --page-size 1024 --disable-radix-cache \
    --log-level debug --pd-sgl-router-url 10.148.4.114:8080 \
    --tool-call-parser deepseekv3 > output.log 2>&1 &

D node0:

export SGL_ENABLE_JIT_DEEPGEMM=1
export NCCL_DEBUG_ENABLE=false
export NCCL_IB_DISABLE=0
export NCCL_P2P_DISABLE=0
export NCCL_IB_HCA=mlx5
export GLOO_SOCKET_IFNAME=eth0
export NCCL_SOCKET_IFNAME=eth0
export SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=600
export MC_FORCE_TCP=1

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_HOME/lib
export NCCL_NET_MERGE_LEVEL="LOC"

export MC_TE_METRIC=true
export NCCL_DEBUG_ENABLE=true
export NCCL_DEBUG=DEBUG 

nohup python -m sglang.launch_server --model-path /data/data90/hf-models/DeepSeek-R1 --served-model-name DeepSeek-R1 \
    --host 10.148.12.136 --port 30001 --trust-remote-code --enable-nccl-nvls \
    --dist-init-addr 10.148.12.136:29500 --nnodes 2 --node-rank 0 \
    --tp-size 16 --mem-fraction-static 0.75 \
    --disaggregation-mode decode \
    --enable-dp-lm-head \
    --enable-dp-attention --dp 16 \
    --enable-deepep-moe --moe-dense-tp-size 1 --enable-two-batch-overlap --deepep-mode low_latency \
    --enable-eplb --ep-dispatch-algorithm dynamic --eplb-algorithm auto \
    --watchdog-timeout 1000000 \
    --chunked-prefill-size 16384 \
    --page-size 1024 --disable-radix-cache \
    --log-level debug --pd-sgl-router-url 10.148.4.114:8080 \
    --tool-call-parser deepseekv3 > output.log 2>&1 &

D node1:

export SGL_ENABLE_JIT_DEEPGEMM=1
export NCCL_DEBUG_ENABLE=false
export NCCL_IB_DISABLE=0
export NCCL_P2P_DISABLE=0
export NCCL_IB_HCA=mlx5
export GLOO_SOCKET_IFNAME=eth0
export NCCL_SOCKET_IFNAME=eth0
export SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=600
export MC_FORCE_TCP=1

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_HOME/lib
export NCCL_NET_MERGE_LEVEL="LOC"

export MC_TE_METRIC=true
export NCCL_DEBUG_ENABLE=true
export NCCL_DEBUG=DEBUG 

nohup python -m sglang.launch_server --model-path /data/data90/hf-models/DeepSeek-R1 --served-model-name DeepSeek-R1 \
    --host 10.148.4.216 --port 30001 --trust-remote-code --enable-nccl-nvls \
    --dist-init-addr 10.148.12.136:29500 --nnodes 2 --node-rank 1 \
    --tp-size 16 --mem-fraction-static 0.75 \
    --disaggregation-mode decode \
    --enable-dp-lm-head \
    --enable-dp-attention --dp 16 \
    --enable-deepep-moe --moe-dense-tp-size 1 --enable-two-batch-overlap --deepep-mode low_latency \
    --enable-eplb --ep-dispatch-algorithm dynamic --eplb-algorithm auto \
    --watchdog-timeout 1000000 \
    --chunked-prefill-size 16384 \
    --page-size 1024 --disable-radix-cache \
    --log-level debug \
    --tool-call-parser deepseekv3 > output.log 2>&1 &

Here are the logs.
tcp_decode0.log
tcp_decode1.log
tcp_prefill.log

Should I just upgrade sglang version?

@stmatengss
Copy link
Copy Markdown
Collaborator

I try to use tcp as mooncake backend to do pd disaggregation with sglang, but I still got some errors. Here is my environment, start script and logs.

Here is my env

Python: 3.12.11 | packaged by Anaconda, Inc. | (main, Jun  5 2025, 13:09:17) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H20
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 550.144.03
PyTorch: 2.7.1+cu128
sglang: 0.4.8.post1+dev0.8
sgl_kernel: 0.1.9
flashinfer_python: 0.2.6.post1
triton: 3.3.1
transformers: 4.52.3
torchao: 0.9.0
numpy: 2.2.6
aiohttp: 3.12.14
fastapi: 0.116.1
hf_transfer: 0.1.9
huggingface_hub: 0.34.1
interegular: 0.3.3
modelscope: 1.28.1
orjson: 3.11.1
outlines: 0.1.11
packaging: 24.2
psutil: 7.0.0
pydantic: 2.11.7
python-multipart: 0.0.20
pyzmq: 27.0.0
uvicorn: 0.35.0
uvloop: 0.21.0
vllm: 0.10.0
xgrammar: 0.1.19
openai: 1.90.0
tiktoken: 0.9.0
anthropic: 0.60.0
litellm: 1.74.15.post1
decord: 0.6.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    CPU Affinity      NUMA Affinity   GPU NUMA ID
GPU0     X      NV18    NV18    NV18    NV18    NV18    NV18    NV18    PIX     NODE    SYS     SYS     SYS     SYS     0-105   0N/A
GPU1    NV18     X      NV18    NV18    NV18    NV18    NV18    NV18    PIX     NODE    SYS     SYS     SYS     SYS     0-105   0N/A
GPU2    NV18    NV18     X      NV18    NV18    NV18    NV18    NV18    NODE    PIX     SYS     SYS     SYS     SYS     0-105   0N/A
GPU3    NV18    NV18    NV18     X      NV18    NV18    NV18    NV18    NODE    PIX     SYS     SYS     SYS     SYS     0-105   0N/A
GPU4    NV18    NV18    NV18    NV18     X      NV18    NV18    NV18    SYS     SYS     PIX     NODE    NODE    NODE    106-211 1N/A
GPU5    NV18    NV18    NV18    NV18    NV18     X      NV18    NV18    SYS     SYS     PIX     NODE    NODE    NODE    106-211 1N/A
GPU6    NV18    NV18    NV18    NV18    NV18    NV18     X      NV18    SYS     SYS     NODE    PIX     NODE    NODE    106-211 1N/A
GPU7    NV18    NV18    NV18    NV18    NV18    NV18    NV18     X      SYS     SYS     NODE    PIX     NODE    NODE    106-211 1N/A
NIC0    PIX     PIX     NODE    NODE    SYS     SYS     SYS     SYS      X      NODE    SYS     SYS     SYS     SYS
NIC1    NODE    NODE    PIX     PIX     SYS     SYS     SYS     SYS     NODE     X      SYS     SYS     SYS     SYS
NIC2    SYS     SYS     SYS     SYS     PIX     PIX     NODE    NODE    SYS     SYS      X      NODE    NODE    NODE
NIC3    SYS     SYS     SYS     SYS     NODE    NODE    PIX     PIX     SYS     SYS     NODE     X      NODE    NODE
NIC4    SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    SYS     SYS     NODE    NODE     X      PIX
NIC5    SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    SYS     SYS     NODE    NODE    PIX      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_1
  NIC1: mlx5_2
  NIC2: mlx5_3
  NIC3: mlx5_4
  NIC4: mlx5_5
  NIC5: mlx5_6


Hypervisor vendor: KVM
ulimit soft: 65536

Start script for P:

export SGL_ENABLE_JIT_DEEPGEMM=1
export NCCL_DEBUG_ENABLE=false
export NCCL_IB_DISABLE=0
export NCCL_P2P_DISABLE=0
export NCCL_IB_HCA=mlx5
export GLOO_SOCKET_IFNAME=eth0
export NCCL_SOCKET_IFNAME=eth0
export SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=600
export MC_FORCE_TCP=1

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_HOME/lib
export NCCL_NET_MERGE_LEVEL="LOC"

export MC_TE_METRIC=true
export NCCL_DEBUG_ENABLE=true
export NCCL_DEBUG=DEBUG 

nohup python -m sglang.launch_server --model-path /data/data90/hf-models/DeepSeek-R1 --served-model-name DeepSeek-R1 \
    --disaggregation-mode prefill --trust-remote-code --enable-nccl-nvls \
    --tp 8 --mem-fraction-static 0.88 --host 10.148.4.114 --port 30000 \
    --max-prefill-tokens 16384 --chunked-prefill-size 16384 \
    --disaggregation-decode-tp 16 --disaggregation-decode-dp 16 \
    --watchdog-timeout 1000000 \
    --page-size 1024 --disable-radix-cache \
    --log-level debug --pd-sgl-router-url 10.148.4.114:8080 \
    --tool-call-parser deepseekv3 > output.log 2>&1 &

D node0:

export SGL_ENABLE_JIT_DEEPGEMM=1
export NCCL_DEBUG_ENABLE=false
export NCCL_IB_DISABLE=0
export NCCL_P2P_DISABLE=0
export NCCL_IB_HCA=mlx5
export GLOO_SOCKET_IFNAME=eth0
export NCCL_SOCKET_IFNAME=eth0
export SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=600
export MC_FORCE_TCP=1

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_HOME/lib
export NCCL_NET_MERGE_LEVEL="LOC"

export MC_TE_METRIC=true
export NCCL_DEBUG_ENABLE=true
export NCCL_DEBUG=DEBUG 

nohup python -m sglang.launch_server --model-path /data/data90/hf-models/DeepSeek-R1 --served-model-name DeepSeek-R1 \
    --host 10.148.12.136 --port 30001 --trust-remote-code --enable-nccl-nvls \
    --dist-init-addr 10.148.12.136:29500 --nnodes 2 --node-rank 0 \
    --tp-size 16 --mem-fraction-static 0.75 \
    --disaggregation-mode decode \
    --enable-dp-lm-head \
    --enable-dp-attention --dp 16 \
    --enable-deepep-moe --moe-dense-tp-size 1 --enable-two-batch-overlap --deepep-mode low_latency \
    --enable-eplb --ep-dispatch-algorithm dynamic --eplb-algorithm auto \
    --watchdog-timeout 1000000 \
    --chunked-prefill-size 16384 \
    --page-size 1024 --disable-radix-cache \
    --log-level debug --pd-sgl-router-url 10.148.4.114:8080 \
    --tool-call-parser deepseekv3 > output.log 2>&1 &

D node1:

export SGL_ENABLE_JIT_DEEPGEMM=1
export NCCL_DEBUG_ENABLE=false
export NCCL_IB_DISABLE=0
export NCCL_P2P_DISABLE=0
export NCCL_IB_HCA=mlx5
export GLOO_SOCKET_IFNAME=eth0
export NCCL_SOCKET_IFNAME=eth0
export SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=600
export MC_FORCE_TCP=1

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_HOME/lib
export NCCL_NET_MERGE_LEVEL="LOC"

export MC_TE_METRIC=true
export NCCL_DEBUG_ENABLE=true
export NCCL_DEBUG=DEBUG 

nohup python -m sglang.launch_server --model-path /data/data90/hf-models/DeepSeek-R1 --served-model-name DeepSeek-R1 \
    --host 10.148.4.216 --port 30001 --trust-remote-code --enable-nccl-nvls \
    --dist-init-addr 10.148.12.136:29500 --nnodes 2 --node-rank 1 \
    --tp-size 16 --mem-fraction-static 0.75 \
    --disaggregation-mode decode \
    --enable-dp-lm-head \
    --enable-dp-attention --dp 16 \
    --enable-deepep-moe --moe-dense-tp-size 1 --enable-two-batch-overlap --deepep-mode low_latency \
    --enable-eplb --ep-dispatch-algorithm dynamic --eplb-algorithm auto \
    --watchdog-timeout 1000000 \
    --chunked-prefill-size 16384 \
    --page-size 1024 --disable-radix-cache \
    --log-level debug \
    --tool-call-parser deepseekv3 > output.log 2>&1 &

Here are the logs. tcp_decode0.log tcp_decode1.log tcp_prefill.log

Should I just upgrade sglang version?

There are no code changes for this feature. The error messages are here. (batch_transfer_sync)

[2025-08-08 15:45:33] 10.148.4.216 [08/Aug/2025:15:45:33 +0800] "GET /health HTTP/1.1" 200 155 "-" "python-requests/2.32.4"
Fatal Python error: Segmentation fault

Thread 0x00007f4d75ffd640 (most recent call first):
  File "/opt/app/python3.12/lib/python3.12/site-packages/sglang/srt/disaggregation/mooncake/transfer_engine.py", line 102 in batch_transfer_sync
  File "/opt/app/python3.12/lib/python3.12/site-packages/sglang/srt/disaggregation/mooncake/conn.py", line 272 in process_layer
  File "/opt/app/python3.12/lib/python3.12/concurrent/futures/thread.py", line 59 in run
  File "/opt/app/python3.12/lib/python3.12/concurrent/futures/thread.py", line 93 in _worker
  File "/opt/app/python3.12/lib/python3.12/threading.py", line 1012 in run
  File "/opt/app/python3.12/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
  File "/opt/app/python3.12/lib/python3.12/threading.py", line 1032 in _bootstrap

Could you take a look? @alogfans

@stmatengss
Copy link
Copy Markdown
Collaborator

Did you export MC_FORCE_TCP before running sglang? @purp1e-ace

@purp1e-ace
Copy link
Copy Markdown

Did you export MC_FORCE_TCP before running sglang? @purp1e-ace

Yes, this error only occurs when after I export MC_FORCE_TCP=1

@stmatengss
Copy link
Copy Markdown
Collaborator

Did you export MC_FORCE_TCP before running sglang? @purp1e-ace

Yes, this error only occurs when after I export MC_FORCE_TCP=1

You generate more mooncake logs by setting export MC_LOG_LEVEL=TRACE. On the other hand, I see a segmentation fault; could you check the system log using dmesg?

@purp1e-ace
Copy link
Copy Markdown

Did you export MC_FORCE_TCP before running sglang? @purp1e-ace

Yes, this error only occurs when after I export MC_FORCE_TCP=1

You generate more mooncake logs by setting export MC_LOG_LEVEL=TRACE. On the other hand, I see a segmentation fault; could you check the system log using dmesg?

The logs remain the same after I export MC_LOG_LEVEL=TRACE. Also, because I'm working on a k8s pod, I could not read kernel buffer. Do you have other advice? Is there anything else I can do?

@wl-lei
Copy link
Copy Markdown

wl-lei commented Oct 16, 2025

Did you export MC_FORCE_TCP before running sglang? @purp1e-ace

Yes, this error only occurs when after I export MC_FORCE_TCP=1

You generate more mooncake logs by setting export MC_LOG_LEVEL=TRACE. On the other hand, I see a segmentation fault; could you check the system log using dmesg?

The logs remain the same after I export MC_LOG_LEVEL=TRACE. Also, because I'm working on a k8s pod, I could not read kernel buffer. Do you have other advice? Is there anything else I can do?

Did you solve the problem? I have the same.

wanyue-wy pushed a commit to wanyue-wy/Mooncake that referenced this pull request Dec 14, 2025
* Re-implement vram support

* Test logging

* Remove CUDA logging line

* Add comments

* Avoid memcpy if addr is dram
JasonZhang517 pushed a commit to JasonZhang517/Mooncake that referenced this pull request Feb 9, 2026
* Re-implement vram support

* Test logging

* Remove CUDA logging line

* Add comments

* Avoid memcpy if addr is dram
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants