Checklist
Motivation
Title
Performance regression between sglang 0.5.5 and 0.5.6+ when using Mooncake
Description
I would like to report a performance difference observed across different versions of sglang when used together with Mooncake.
We are running an online service with sglang + Mooncake and applying sustained benchmark traffic.
Observed Behavior
sglang 0.5.5 + Mooncake 0.3.7
Under continuous benchmark workload:
- TTFT is relatively low
- Memory usage of the scheduler process keeps increasing over time
sglang 0.5.6 / 0.5.7 + Mooncake 0.3.7
With the Mooncake version unchanged:
- Scheduler memory no longer shows continuous growth
- TTFT increases significantly
- Cache hit rate remains consistently low
The behavior observed in sglang 0.5.7 is consistent with 0.5.6.
Reproduction Conditions
Environment Variables
export MOONCAKE_TE_META_DATA_SERVER="etcd://etcdn1.mooncake-c1.dns.org:2379;etcd://etcdn2.mooncake-c1.dns.org:2379;etcd://etcdn3.mooncake-c1.dns.org:2379;etcd://etcdn4.mooncake-c1.dns.org:2379;etcd://etcdn5.mooncake-c1.dns.org:2379"
export MOONCAKE_MASTER="etcd://etcdn1.mooncake-c1.dns.org:2379;etcd://etcdn2.mooncake-c1.dns.org:2379;etcd://etcdn3.mooncake-c1.dns.org:2379;etcd://etcdn4.mooncake-c1.dns.org:2379;etcd://etcdn5.mooncake-c1.dns.org:2379"
export MOONCAKE_PROTOCOL="tcp"
export MOONCAKE_DEVICE=""
export MOONCAKE_GLOBAL_SEGMENT_SIZE=0
nohup python3 -m sglang.launch_server \
--model-path Qwen/Qwen1.5-1.8B-Chat \
--trust-remote-code \
--host 0.0.0.0 \
--port 30000 \
--mem-fraction-static 0.85 \
--enable-hierarchical-cache \
--hicache-size 100 \
--page-size 64 \
--hicache-io-backend kernel \
--hicache-mem-layout page_first \
--hicache-storage-backend mooncake &
Benchmark Results Comparison
Below are benchmark results collected under identical workload, environment, and configuration settings.
sglang 0.5.5 + Mooncake 0.3.7
Round 0: Average TTFT = 0.29s, Cache Hit Rate = 0.000000 (48 requests)
Round 1: Average TTFT = 0.29s, Cache Hit Rate = 0.495324 (48 requests)
Round 2: Average TTFT = 0.40s, Cache Hit Rate = 0.664739 (48 requests)
Round 3: Average TTFT = 0.41s, Cache Hit Rate = 0.744979 (48 requests)
Round 4: Average TTFT = 0.51s, Cache Hit Rate = 0.715385 (48 requests)
Round 5: Average TTFT = 0.93s, Cache Hit Rate = 0.557634 (48 requests)
Round 6: Average TTFT = 1.20s, Cache Hit Rate = 0.599504 (48 requests)
Round 7: Average TTFT = 0.98s, Cache Hit Rate = 0.634969 (48 requests)
Round 8: Average TTFT = 0.61s, Cache Hit Rate = 0.888961 (48 requests)
Round 9: Average TTFT = 0.54s, Cache Hit Rate = 0.899401 (48 requests)
Round 10: Average TTFT = 0.94s, Cache Hit Rate = 0.909295 (48 requests)
Round 11: Average TTFT = 1.05s, Cache Hit Rate = 0.916223 (48 requests)
Overall, cache hit rate steadily improves over rounds, while TTFT remains relatively stable and low.
sglang 0.5.6 + Mooncake 0.3.7
Round 0: Average TTFT = 0.29s, Cache Hit Rate = 0.000000 (48 requests)
Round 1: Average TTFT = 0.29s, Cache Hit Rate = 0.494886 (48 requests)
Round 2: Average TTFT = 0.41s, Cache Hit Rate = 0.663523 (48 requests)
Round 3: Average TTFT = 0.90s, Cache Hit Rate = 0.409318 (48 requests)
Round 4: Average TTFT = 1.88s, Cache Hit Rate = 0.051335 (48 requests)
Round 5: Average TTFT = 2.38s, Cache Hit Rate = 0.000000 (48 requests)
Round 6: Average TTFT = 1.81s, Cache Hit Rate = 0.317494 (48 requests)
Round 7: Average TTFT = 2.37s, Cache Hit Rate = 0.287596 (48 requests)
Round 8: Average TTFT = 2.66s, Cache Hit Rate = 0.304387 (48 requests)
Round 9: Average TTFT = 2.88s, Cache Hit Rate = 0.321815 (48 requests)
Round 10: Average TTFT = 3.33s, Cache Hit Rate = 0.310904 (48 requests)
Round 11: Average TTFT = 5.54s, Cache Hit Rate = 0.197224 (48 requests)
Compared to 0.5.5, TTFT increases significantly starting from Round 3, while cache hit rate drops sharply and remains at a relatively low level across subsequent rounds.
Expected / Questions
I am opening this issue mainly to ask about the following two questions:
1. Cause of behavioral differences
What causes the behavioral differences between sglang 0.5.5 and sglang 0.5.6+ in this setup?
In particular, are there any known changes related to the scheduler, hierarchical cache, or Mooncake integration that could explain:
- the disappearance of scheduler memory growth, and
- the significant increase in TTFT and drop in cache hit rate?
2. Configuration or tuning for newer versions
Is it possible to use sglang 0.5.6+ while achieving TTFT comparable to older versions (e.g., 0.5.5)?
If so, are there recommended configuration changes or tuning options that should be applied when upgrading?
Additional Context
- The issue is observed under long-running, sustained benchmark traffic, not short or bursty tests.
- The benchmark workload, runtime environment, and startup configuration are identical across all tested versions.
Related resources
No response
Checklist
Motivation
Title
Performance regression between sglang 0.5.5 and 0.5.6+ when using Mooncake
Description
I would like to report a performance difference observed across different versions of sglang when used together with Mooncake.
We are running an online service with sglang + Mooncake and applying sustained benchmark traffic.
Observed Behavior
sglang 0.5.5 + Mooncake 0.3.7
Under continuous benchmark workload:
sglang 0.5.6 / 0.5.7 + Mooncake 0.3.7
With the Mooncake version unchanged:
The behavior observed in sglang 0.5.7 is consistent with 0.5.6.
Reproduction Conditions
Environment Variables
Benchmark Results Comparison
Below are benchmark results collected under identical workload, environment, and configuration settings.
sglang 0.5.5 + Mooncake 0.3.7
Overall, cache hit rate steadily improves over rounds, while TTFT remains relatively stable and low.
sglang 0.5.6 + Mooncake 0.3.7
Compared to 0.5.5, TTFT increases significantly starting from Round 3, while cache hit rate drops sharply and remains at a relatively low level across subsequent rounds.
Expected / Questions
I am opening this issue mainly to ask about the following two questions:
1. Cause of behavioral differences
What causes the behavioral differences between sglang 0.5.5 and sglang 0.5.6+ in this setup?
In particular, are there any known changes related to the scheduler, hierarchical cache, or Mooncake integration that could explain:
2. Configuration or tuning for newer versions
Is it possible to use sglang 0.5.6+ while achieving TTFT comparable to older versions (e.g., 0.5.5)?
If so, are there recommended configuration changes or tuning options that should be applied when upgrading?
Additional Context
Related resources
No response