Bug Report
I was testing sglang with HiCache + mooncake L3(DFS enabled)
The test deployment plan is sgl router + two SGLang servers with HiCache + mooncake L3(+ persistent storage). I was conducting this test within a Docker container.
server prepare
# mooncacke master
mooncake_master --enable_http_metadata_server=true --http_metadata_server_port=8080 --eviction_high_watermark_ratio=0.95 --root_fs_dir /mnt/data-cbs
export MOONCAKE_TE_META_DATA_SERVER="http://127.0.0.1:8080/metadata"
export MOONCAKE_MASTER="127.0.0.1:50051"
export MOONCAKE_PROTOCOL="tcp"
export MOONCAKE_DEVICE=""
export MOONCAKE_GLOBAL_SEGMENT_SIZE="16gb" # 每个sglang server贡献16G
# sglang server
CUDA_VISIBLE_DEVICES=0 python -m sglang.launch_server --model-path Qwen3-8B --tp 1 --mem-fraction-static 0.6 --watchdog-timeout 1000 --host 0.0.0.0 --port 30001 --enable-hierarchical-cache --hicache-storage-backend mooncake
CUDA_VISIBLE_DEVICES=1 python -m sglang.launch_server --model-path Qwen3-8B --tp 1 --mem-fraction-static 0.6 --watchdog-timeout 1000 --host 0.0.0.0 --port 30002 --enable-hierarchical-cache --hicache-storage-backend mooncake
# router
python -m sglang_router.launch_router \
--worker-urls http://127.0.0.1:30001 http://127.0.0.1:30002 \
--policy round_robin \
--host 0.0.0.0 --port 30000
benchmark clients:
# cd /sgl-workspace/sglang/benchmark/hicache,
python bench_multiturn.py --model-path /root/workspace/data_dir/Qwen3-8B --host 127.0.0.1 --disable-auto-run --request-rate 16 --request-length 1024 --output-length 64
the master service error log( seems reported by discarding replica)
E0306 11:12:10.776895 116419 replica.h:281] Invalid replica type: DISK
E0306 11:12:10.776903 116419 replica.h:281] Invalid replica type: DISK
E0306 11:12:10.776911 116419 replica.h:281] Invalid replica type: DISK
E0306 11:12:10.776917 116419 replica.h:281] Invalid replica type: DISK
E0306 11:12:10.776937 116419 replica.h:281] Invalid replica type: DISK
E0306 11:12:10.776943 116419 replica.h:281] Invalid replica type: DISK
client-side log:
#Output tokens: 65536
78%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 995/1280 [03:36<00:52, 5.48it/s]Request failed: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
79%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 1010/1280 [03:41<01:09, 3.86it/s]Request failed: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
Request failed: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
Request failed: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
Request failed: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
Request failed: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
Request failed: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
Error processing response for client 222: Request failed with error: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
Request failed: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
Error processing response for client 103: Request failed with error: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
Request failed: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
Error processing response for client 17: Request failed with error: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
Is this error expected (client-side also reports errors, which harms the user experience)?
And I did not limit the SSD max segment size usage. Why did discarding disk storage happen?
Before submitting...
Bug Report
I was testing sglang with HiCache + mooncake L3(DFS enabled)
The test deployment plan is
sgl router + two SGLang servers with HiCache + mooncake L3(+ persistent storage). I was conducting this test within a Docker container.server prepare
benchmark clients:
# cd /sgl-workspace/sglang/benchmark/hicache, python bench_multiturn.py --model-path /root/workspace/data_dir/Qwen3-8B --host 127.0.0.1 --disable-auto-run --request-rate 16 --request-length 1024 --output-length 64the master service error log( seems reported by discarding replica)
client-side log:
Is this error expected (client-side also reports errors, which harms the user experience)?
And I did not limit the SSD max segment size usage. Why did discarding disk storage happen?
Before submitting...