Skip to content

add decode round robin policy#15164

Merged
iforgetmyname merged 3 commits intosgl-project:mainfrom
Hexq0210:dpc_decode_round_robin
Dec 22, 2025
Merged

add decode round robin policy#15164
iforgetmyname merged 3 commits intosgl-project:mainfrom
Hexq0210:dpc_decode_round_robin

Conversation

@Hexq0210
Copy link
Copy Markdown
Contributor

@Hexq0210 Hexq0210 commented Dec 15, 2025

Motivation

In the PD separation scenario, based on the DeepSeek-R1 model, we find that the load requests of each DP channel of the Decode instance are unbalanced during the pressure test. (The following figure shows the pressure test process.)
2C3B87F5-CDB0-43B7-D25C-A3A78AAFD1B7

To solve this problem, the roundrobin scheduling policy is added for the Decode instance in the PD separation scenario. The test result is as follows:
3150DA89-8700-4E95-DDFA-96BFC8FF0CF1

Modifications

Add the decode_round_robin policy to enable the roundrobin policy of the decode instance.

Accuracy Tests

NA

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@iforgetmyname
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@iforgetmyname iforgetmyname merged commit cb30d05 into sgl-project:main Dec 22, 2025
147 of 152 checks passed
Liwansi added a commit to iforgetmyname/sglang that referenced this pull request Dec 23, 2025
…n_eagle3_dp

* 'main' of https://github.com/sgl-project/sglang: (208 commits)
  MoE: Skip SiLU/GELU activation for masked experts (sgl-project#15539)
  [GLM-ASR] GLM-ASR Support  (sgl-project#15570)
  Improve engine customization interface (sgl-project#15635)
  chore: bump sgl-kernel version to 0.3.20 (sgl-project#15590)
  bugfix[schedule]: Refactor sort method and add related UT (sgl-project#13576)
  Adjust wrong `mtp` meaning introduce by mimo (sgl-project#15632)
  Tiny add back missing router per attempt response metric (sgl-project#15621)
  Fix router gRPC mode launch error caused by async loading (sgl-project#15368)
  [model-gateway] return 503 when all workers are circuit-broken (sgl-project#15611)
  [Diffusion] Support peak memory record in offline generate and serving (sgl-project#15610)
  [VLM] Tiny: Unify VLM environment variables (sgl-project#15572)
  [diffusion] chore: remove default post-denoising dit offload in local mode (sgl-project#15573)
  Tiny enable soft watchdog in CI for stuck without logs (sgl-project#15616)
  Tiny add stuck simulation (sgl-project#15613)
  Support soft watchdog for tokenizer/detokenizer/dp-controller processes (sgl-project#15607)
  Tiny avoid EnvField misuse (sgl-project#15612)
  add decode round robin policy (sgl-project#15164)
  Add glm-4.6-fp8 with/without mtp in nightly ci (sgl-project#15566)
  Adapt fixture-kit to gsm8k mixin (sgl-project#15599)
  [model-gateway] add retry support to OpenAI router chat endpoint (sgl-project#15589)
  ...
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
@jovany-wang
Copy link
Copy Markdown

Nice work!

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants