[sglang] chore: Upgrade SGLang 0.4.9 with multi-stage awake feature by hebiao064 · Pull Request #2187 · verl-project/verl

hebiao064 · 2025-06-24T21:08:54Z

What does this PR do?

Upgrade SGLang 0.4.8 with multi-stage awake function

I build the sglang docker file based on the https://github.com/volcengine/verl/blob/main/docker/verl0.5-cu126-torch2.7.1-fa2.8.0/Dockerfile.app.sglang.mcore0.12

(Only 2 line change about bumping sglang from 0.4.8 to 0.4.9)

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

GRPO Test with SGLang works as expected

(TaskRunner pid=3043098) step:5 - global_seqlen/min:53826.000 - global_seqlen/max:59368.000 - global_seqlen/minmax_diff:5542.000 - global_seqlen/balanced_min:55522.000 - global_seqlen/balanced_max:55551.000 - global_seqlen/mean:55525.625 - actor/entropy:0.152 - actor/kl_loss:0.003 - actor/kl_coef:0.001 - actor/pg_loss:0.039 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.098 - perf/mfu/actor:0.332 - perf/max_memory_allocated_gb:25.143 - perf/max_memory_reserved_gb:44.846 - perf/cpu_memory_used_gb:79.942 - actor/lr:0.000 - val-core/openai/gsm8k/reward/mean@1:0.895 - training/global_step:5.000 - training/epoch:0.000 - critic/score/mean:0.925 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.925 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.008 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.008 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:247.129 - response_length/max:878.000 - response_length/min:93.000 - response_length/clip_ratio:0.000 - prompt_length/mean:99.906 - prompt_length/max:256.000 - prompt_length/min:66.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:5.777 - timing_s/reshard:1.795 - timing_s/gen:8.290 - timing_s/reward:0.225 - timing_s/old_log_prob:1.973 - timing_s/ref:1.997 - timing_s/adv:0.021 - timing_s/update_actor:7.820 - timing_s/testing:10.669 - timing_s/step:31.018 - timing_per_token_ms/gen:0.026 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.004 - timing_per_token_ms/update_actor:0.018 - perf/total_num_tokens:444205.000 - perf/time_per_step:31.018 - perf/throughput:1790.105

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

List the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace.

eric-haibin-lin

does installing sglang now require nvcc?

hebiao064 · 2025-06-26T21:12:48Z

does installing sglang now require nvcc?

The newer version of SGLang relies on the flashinfer 0.2.6, which didn't provide aot binary as previous version did flashinfer-ai/flashinfer#1064

Hence we may need nvcc to make sure flashinfer works, I think vllm also need flashinfer, is it possible to bake nvcc into the CI?

zhaochenyang20

Great

eric-haibin-lin

good to go as long as CI tests pass

This reverts commit 503e19f.

…e_sglang

zhaochenyang20 · 2025-07-01T16:12:36Z

Wait for us to fix Megatron of VLM

zhaochenyang20 · 2025-07-06T23:55:26Z

Great job so far.

…to upgrade_sglang

hebiao064 · 2025-07-07T20:25:09Z

Given flash-attn 2.8.0.post2 (required by torch 2.7) has some issues for apply rotary kernel, huggingface/transformers#39167 (comment)

I'll disable the CI for qwen2.5vl sglang test: e2e_ppo_trainer_sglang_vlm

hebiao064 · 2025-07-29T18:49:51Z

duplicated with #2794

Upgrade SGLang 0.4.8

44bb4c0

hebiao064 requested review from PeterSH6, chenhaiq, eric-haibin-lin, tongyx361, vermouth1992 and zhaochenyang20 as code owners June 24, 2025 21:08

hebiao064 added 3 commits June 25, 2025 20:31

fix docker

5e8842f

fix

9df6216

Revert "fix docker"

2eb279d

hebiao064 mentioned this pull request Jun 26, 2025

[sglang] feat: Support SGLang v0.4.8 #2206

Closed

eric-haibin-lin reviewed Jun 26, 2025

View reviewed changes

hebiao064 added 3 commits June 26, 2025 15:39

Merge branch 'volcengine:main' into upgrade_sglang

54603ab

fix

d829083

fix

d170de8

zhaochenyang20 approved these changes Jun 27, 2025

View reviewed changes

eric-haibin-lin approved these changes Jun 27, 2025

View reviewed changes

fix

4adb876

hebiao064 mentioned this pull request Jun 28, 2025

[rollout] feat: Support Multi-stage Awake for SGLang+Megatron #2248

Closed

7 tasks

hebiao064 added 8 commits July 1, 2025 01:27

fix

acc12ab

Merge branch 'main' into upgrade_sglang

8048618

fix

f283346

fix

503e19f

Revert "fix"

3ea9467

This reverts commit 503e19f.

Merge branch 'main' of https://github.com/volcengine/verl into upgrad…

70f4a2e

…e_sglang

fix

28c406d

fix

76fe5dd

zhaochenyang20 mentioned this pull request Jul 1, 2025

[sglang, doc] feat: Update Acknoledgement of SGLang Team Members [WIP] #2307

Closed

7 tasks

zhaochenyang20 mentioned this pull request Jul 2, 2025

[rollout] feat: Support Multi-stage Awake for SGLang+Megatron #2297

Open

7 tasks

enable multistage wakeup by default

4da1074

hebiao064 requested a review from SwordFaith as a code owner July 3, 2025 18:04

remove flash-attn upgrade

96cfcb9

hebiao064 mentioned this pull request Jul 4, 2025

[RL] Fix illegal memory for _import_static_state sgl-project/sglang#7733

Merged

6 tasks

hebiao064 and others added 3 commits July 6, 2025 12:16

Merge branch 'volcengine:main' into upgrade_sglang

f7f8093

chore: update sglang to 0.4.9

aeffae2

chore: update sglang to 0.4.9

ee4fbe3

hebiao064 changed the title ~~[sglang] chore: Upgrade SGLang 0.4.8 with multi-stage awake feature~~ [sglang] chore: Upgrade SGLang 0.4.9 with multi-stage awake feature Jul 7, 2025

hebiao064 commented Jul 7, 2025

View reviewed changes

Comment thread setup.py

hebiao064 and others added 7 commits July 7, 2025 06:14

fix image for sglang

32e9d67

Merge branch 'main' into upgrade_sglang

a7c34b6

fix the convert weight issue brought by earlier commit

6066714

fix

4bf1fff

Merge branch 'main' into upgrade_sglang

5119e3e

fix

98352bb

Merge branch 'upgrade_sglang' of https://github.com/hebiao064/verl in…

287fd98

…to upgrade_sglang

hebiao064 added 3 commits July 7, 2025 13:25

Merge branch 'main' into upgrade_sglang

ab71890

remove broken ci test for fused kernel

ae6cc37

fix

1de510f

hebiao064 mentioned this pull request Jul 7, 2025

Qwen2.5 VL in compatible with torch 2.7 and flash attn 2.8: AttributeError("'constexpr' object has no attribute 'bit_length'") #2405

Open

nanjiangwill and others added 3 commits July 8, 2025 09:24

Merge branch 'main' into upgrade_sglang

95edd72

fix: remove duplicate tool import

59b348e

Merge branch 'main' into upgrade_sglang

8da4dfd

hebiao064 mentioned this pull request Jul 10, 2025

[sglang] chore: Bump SGLang to 0.4.9.post1 #2451

Closed

7 tasks

hebiao064 closed this Jul 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sglang] chore: Upgrade SGLang 0.4.9 with multi-stage awake feature#2187

[sglang] chore: Upgrade SGLang 0.4.9 with multi-stage awake feature#2187
hebiao064 wants to merge 34 commits intoverl-project:mainfrom
hebiao064:upgrade_sglang

hebiao064 commented Jun 24, 2025 •

edited

Loading

Uh oh!

eric-haibin-lin left a comment

Uh oh!

hebiao064 commented Jun 26, 2025 •

edited

Loading

Uh oh!

zhaochenyang20 left a comment

Uh oh!

eric-haibin-lin left a comment

Uh oh!

zhaochenyang20 commented Jul 1, 2025

Uh oh!

zhaochenyang20 commented Jul 6, 2025

Uh oh!

Uh oh!

hebiao064 commented Jul 7, 2025 •

edited

Loading

Uh oh!

hebiao064 commented Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hebiao064 commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

High-Level Design

Specific Changes

Checklist Before Submitting

Uh oh!

eric-haibin-lin left a comment

Choose a reason for hiding this comment

Uh oh!

hebiao064 commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Uh oh!

eric-haibin-lin left a comment

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 commented Jul 1, 2025

Uh oh!

zhaochenyang20 commented Jul 6, 2025

Uh oh!

Uh oh!

hebiao064 commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hebiao064 commented Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hebiao064 commented Jun 24, 2025 •

edited

Loading

hebiao064 commented Jun 26, 2025 •

edited

Loading

hebiao064 commented Jul 7, 2025 •

edited

Loading