Skip to content

[RL] Support Multi-Stage Awake#6962

Closed
hebiao064 wants to merge 15 commits intomainfrom
bhe/support_multiple_tms
Closed

[RL] Support Multi-Stage Awake#6962
hebiao064 wants to merge 15 commits intomainfrom
bhe/support_multiple_tms

Conversation

@hebiao064
Copy link
Copy Markdown
Collaborator

@hebiao064 hebiao064 commented Jun 8, 2025

To Reviewer: Need this PR to check in first: fzyzcjy/torch_memory_saver#17

Currently, many unit tests are failed because tms only support singleton without above change.

Related PR:

  1. Torch Memory Saver: Support Mulitple Instance fzyzcjy/torch_memory_saver#17
  2. SGLang: https://github.com/sgl-project/sglang/pull/6962/files
  3. [rollout] feat: Support Multi-stage Awake for SGLang verl-project/verl#1911

Thanks a lot for @fzyzcjy 's guidance and help.

Motivation

To close: fzyzcjy/torch_memory_saver#3

vLLM has supported Multi-Stage Awake for vllm engine: vllm-project/vllm#15254

But in SGLang, we are using torch_memory_saver for holding the model weight and KV Cache virtual address (to make sure CUDA Graph works across different rollouts)

And torch_memory_saver is a singleton, which make it hard for SGLang to support Multi-Stage Awake which is critical in RL use case

Without this feature, we can only set KV Cache mem frac rate to <= 0.7 or even lower (e.g 0.3)

Need this PR to check in first: fzyzcjy/torch_memory_saver#17

Modifications

Checklist

gemini-code-assist[bot]

This comment was marked as spam.

gemini-code-assist[bot]

This comment was marked as spam.

@hebiao064 hebiao064 requested a review from fzyzcjy June 8, 2025 02:00
@hebiao064 hebiao064 changed the title Support Multiple Torch Memory Saver for Multi-Stage-Awake [RL] Support Multi-Stage-Awake Jun 8, 2025
Comment thread python/sglang/srt/managers/scheduler.py Outdated
Comment thread test/srt/test_release_memory_occupation.py Outdated
@hebiao064 hebiao064 changed the title [RL] Support Multi-Stage-Awake [RL] Support Multi-Stage Awake Jun 8, 2025
Copy link
Copy Markdown
Collaborator

@fzyzcjy fzyzcjy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet fully reviewed, just spent a few minute to glance

Comment thread python/sglang/srt/torch_memory_saver_adapter.py Outdated
Comment thread python/sglang/srt/torch_memory_saver_adapter.py Outdated
Copy link
Copy Markdown
Collaborator

@fzyzcjy fzyzcjy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits

Comment thread python/sglang/srt/torch_memory_saver_adapter.py Outdated
Comment thread python/sglang/srt/torch_memory_saver_adapter.py Outdated
Comment thread python/sglang/srt/managers/scheduler.py Outdated
self.tp_worker.worker.model_runner.model
)

self.weights_memory_saver_adapter.check_validity(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to be honest I feel the PR to be a little bit over complicated, if time permits, could you please update them a bit

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified, let me know if its better now

Copy link
Copy Markdown
Collaborator

@zhaochenyang20 zhaochenyang20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread python/sglang/srt/managers/scheduler.py Outdated
@hebiao064
Copy link
Copy Markdown
Collaborator Author

Will be using this PR: #7099

And here is the issue for tracking: #7009

@hebiao064 hebiao064 closed this Jun 17, 2025
@zhyncs zhyncs deleted the bhe/support_multiple_tms branch June 20, 2025 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for multi-instance memory saver

3 participants