[BREAKING][rollout] feat: adapt to verl LLMServerClient refactor#52
Merged
SamitHuang merged 3 commits intoverl-project:mainfrom May 1, 2026
Merged
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the verl dependency to a newer commit and refactors the agent loop and trainer components to integrate with the LLMServerManager architecture. The DiffusionAgentLoopWorker now receives an LLMServerClient instead of managing server handles internally, and the RayDiffusionTrainer has been updated to handle the lifecycle of the LLM server. Feedback was provided to improve type safety by explicitly marking the teacher_client parameter as optional in the DiffusionAgentLoopWorker constructor.
zhtmike
reviewed
May 1, 2026
zhtmike
approved these changes
May 1, 2026
Adapt verl-omni's diffusion agent loop and ray trainer to verl-project/verl#6129, which removed AsyncLLMServerManager and made AgentLoopManager / AgentLoopWorker consume an LLMServerClient produced by a separately-owned LLMServerManager. verl-omni changes: - DiffusionAgentLoopWorker.__init__ now takes (config, llm_client, teacher_client, reward_loop_worker_handles), matching the positional contract that AgentLoopManager.create() uses when spawning workers. _get_rollout_and_model_config was also dropped upstream, so the config slicing is inlined to keep the diff minimal. - ray_diffusion_trainer now creates an LLMServerManager first, hands its client to AgentLoopManager.create(), and uses llm_server_manager.get_replicas() (instead of async_rollout_manager.rollout_replicas) to wire the CheckpointEngineManager. This mirrors the new pattern in upstream verl/trainer/ppo/ray_trainer.py. - tests/agent_loop/test_diffusion_agent_loop.py is updated for the new API; in standalone test mode LLMServerManager spins up its own replicas via rollout.nnodes / n_gpus_per_node. Pin / docs / CI: - Bump the pinned verl commit to a4351480 (the merge commit of #5209), which is the first commit that ships verl/experimental/reward_loop/router/ in the wheel AND contains the #6129 refactor that this change adapts to. With this commit, the workaround in PR verl-project#51 (clone + editable install) is no longer required. - Restore the simple `uv pip install git+...@<commit>` install line in docs/start/install.md. - Bump the same pin in .github/workflows/{cpu_unit_tests,sanity,type-coverage-check}.yml. This is a BREAKING change because DiffusionAgentLoopWorker.__init__ signature changed. Any downstream code that subclasses or directly instantiates DiffusionAgentLoopWorker must switch from (servers, load_balancer_handle, teacher_servers, teacher_load_balancer_handle) to (llm_client, teacher_client). No public CLI/config surface is affected. Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: samithuang <285365963@qq.com>
c26e9e4 to
7c1e490
Compare
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adapt verl-omni's diffusion agent loop and ray trainer to verl-project/verl#6129, which removed AsyncLLMServerManager and made AgentLoopManager / AgentLoopWorker consume an LLMServerClient produced by a separately-owned LLMServerManager.
verl-omni changes:
Pin / docs / CI:
uv pip install git+...@<commit>install line in docs/start/install.md.This is a BREAKING change because DiffusionAgentLoopWorker.init signature changed. Any downstream code that subclasses or directly instantiates DiffusionAgentLoopWorker must switch from (servers, load_balancer_handle, teacher_servers, teacher_load_balancer_handle) to (llm_client, teacher_client). No public CLI/config surface is affected.
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,vllm_omni,rollout,trainer,ci,training_utils,recipe,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,diffusion,omni,tests,docker,like[diffusion, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][diffusion, fsdp] feat: new rollout schedulerTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always