[BREAKING][rollout] refactor: move LLMServerManager out of AgentLoopManager#6129
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the LLM server management architecture by introducing LLMServerManager and LLMServerClient to replace the previous AsyncLLMServerManager implementation. The core logic for server lifecycle management and load balancing has been moved to verl/workers/rollout/llm_server.py, while AgentLoopManager and AgentLoopWorker have been updated to use the new client-based interface. Additionally, the FullyAsyncAgentLoopManager was refactored and moved to the fully async policy module, and corresponding updates were made across documentation, tests, and various trainer implementations to align with these changes. I have no feedback to provide.
| max_cache_size=DEFAULT_ROUTING_CACHE_SIZE, | ||
| ) | ||
|
|
||
| def get_client(self, fully_async: bool = False) -> LLMServerClient: |
There was a problem hiding this comment.
I think this implementation should be fine, but it doesn't feel very elegant. Later, I might change it to pass in a client class and initialize it here.
There was a problem hiding this comment.
Sure, I noticed that 5900 add an additional model_engine_server_handle to FullyAsyncLLMServerManager. We may need to pass in subclass with additional kwargs in get_client.
PeterSH6
left a comment
There was a problem hiding this comment.
LGTM. What's the plan of the old asyncllmservermanager?
9c2eea4 to
1e5d587
Compare
The old |
Resolve conflicts in verl/experimental/agent_loop/agent_loop.py introduced by PR verl-project#6129 (refactor: move LLMServerManager out of AgentLoopManager): * Imports - keep the function_tool import while accepting main's removal of prometheus_utils, teacher_loop, single_controller.ray.base imports. * AgentLoopWorker.__init__ - keep both the new "Online policy distillation" block (from main) and the "Load function-based tools once per worker" block (from this PR); ordering is irrelevant since they touch disjoint state. The function_tools=FunctionToolListWrap(self.function_tools) kwarg in _run_agent_loop auto-merged cleanly next to main's renamed server_manager=self.llm_client. Co-authored-by: Claude Made-with: Cursor
…ckaging bug The pinned verl commit (a512e90) ships a wheel that is missing verl/experimental/reward_loop/router/ because the upstream directory had no __init__.py at that commit and setuptools' default package discovery silently drops it. This breaks the FlowGRPO trainer at runtime with "ModuleNotFoundError: No module named 'verl.experimental.reward_loop.router'". Switch the verl install in docs/start/install.md from a wheel install (uv pip install git+…@<commit>) to a clone-and-editable install pinned at the same commit. An editable install exposes the source tree on sys.path, so router/ is picked up as a PEP 420 implicit namespace package and the import works without any per-venv patching. CI workflows are intentionally not touched because they don't exercise the broken codepath. The pin will be bumped past verl-project/verl#5209 once verl-omni is also adapted to the breaking LLMServerClient refactor in verl-project/verl#6129 (tracked separately).
Adapt verl-omni's diffusion agent loop and ray trainer to verl-project/verl#6129, which removed AsyncLLMServerManager and made AgentLoopManager / AgentLoopWorker consume an LLMServerClient produced by a separately-owned LLMServerManager. verl-omni changes: - DiffusionAgentLoopWorker.__init__ now takes (config, llm_client, teacher_client, reward_loop_worker_handles), matching the positional contract that AgentLoopManager.create() uses when spawning workers. _get_rollout_and_model_config was also dropped upstream, so the config slicing is inlined to keep the diff minimal. - ray_diffusion_trainer now creates an LLMServerManager first, hands its client to AgentLoopManager.create(), and uses llm_server_manager.get_replicas() (instead of async_rollout_manager.rollout_replicas) to wire the CheckpointEngineManager. This mirrors the new pattern in upstream verl/trainer/ppo/ray_trainer.py. - tests/agent_loop/test_diffusion_agent_loop.py is updated for the new API; in standalone test mode LLMServerManager spins up its own replicas via rollout.nnodes / n_gpus_per_node. Pin / docs / CI: - Bump the pinned verl commit to a4351480 (the merge commit of #5209), which is the first commit that ships verl/experimental/reward_loop/router/ in the wheel AND contains the #6129 refactor that this change adapts to. With this commit, the workaround in PR verl-project#51 (clone + editable install) is no longer required. - Restore the simple `uv pip install git+...@<commit>` install line in docs/start/install.md. - Bump the same pin in .github/workflows/{cpu_unit_tests,sanity,type-coverage-check}.yml. This is a BREAKING change because DiffusionAgentLoopWorker.__init__ signature changed. Any downstream code that subclasses or directly instantiates DiffusionAgentLoopWorker must switch from (servers, load_balancer_handle, teacher_servers, teacher_load_balancer_handle) to (llm_client, teacher_client). No public CLI/config surface is affected. Signed-off-by: samithuang <285365963@qq.com>
* [BREAKING][rollout] feat: adapt to verl LLMServerClient refactor Adapt verl-omni's diffusion agent loop and ray trainer to verl-project/verl#6129, which removed AsyncLLMServerManager and made AgentLoopManager / AgentLoopWorker consume an LLMServerClient produced by a separately-owned LLMServerManager. verl-omni changes: - DiffusionAgentLoopWorker.__init__ now takes (config, llm_client, teacher_client, reward_loop_worker_handles), matching the positional contract that AgentLoopManager.create() uses when spawning workers. _get_rollout_and_model_config was also dropped upstream, so the config slicing is inlined to keep the diff minimal. - ray_diffusion_trainer now creates an LLMServerManager first, hands its client to AgentLoopManager.create(), and uses llm_server_manager.get_replicas() (instead of async_rollout_manager.rollout_replicas) to wire the CheckpointEngineManager. This mirrors the new pattern in upstream verl/trainer/ppo/ray_trainer.py. - tests/agent_loop/test_diffusion_agent_loop.py is updated for the new API; in standalone test mode LLMServerManager spins up its own replicas via rollout.nnodes / n_gpus_per_node. Pin / docs / CI: - Bump the pinned verl commit to a4351480 (the merge commit of #5209), which is the first commit that ships verl/experimental/reward_loop/router/ in the wheel AND contains the #6129 refactor that this change adapts to. With this commit, the workaround in PR #51 (clone + editable install) is no longer required. - Restore the simple `uv pip install git+...@<commit>` install line in docs/start/install.md. - Bump the same pin in .github/workflows/{cpu_unit_tests,sanity,type-coverage-check}.yml. This is a BREAKING change because DiffusionAgentLoopWorker.__init__ signature changed. Any downstream code that subclasses or directly instantiates DiffusionAgentLoopWorker must switch from (servers, load_balancer_handle, teacher_servers, teacher_load_balancer_handle) to (llm_client, teacher_client). No public CLI/config surface is affected. Signed-off-by: samithuang <285365963@qq.com>
What does this PR do?
AgentLoopManageris one specific agent-framework implementation in verl, and is designed to be fully replaceable by other agent frameworks such as:Previously the LLM server replicas (launch / tear-down / load balancer / profiling / KV-cache clearing) were owned by
AgentLoopManager, which forced every alternative agent framework to either inherit fromAgentLoopManageror re-implement the rollout server plumbing. This made integration of third-party agent frameworks inconvenient and entangled server life-cycle with agent scheduling.This PR extracts LLM-server management into a standalone module
verl/workers/rollout/llm_server.py, so that any agent framework can reuse the same rollout servers by consuming anLLMServerClient.Compatibility
Breaking change for out-of-tree agent frameworks that imported
AsyncLLMServerManager/FullyAsyncLLMServerManagerfromverl.experimental.agent_loop— import fromverl.workers.rollout.llm_serverand use the new namesLLMServerClient/FullyLLMServerClientinstead.AgentLoopManager.create(...)signature alsochanged (see change #3).
Test
tests/checkpoint_engine/test_special_server_adapter.pyandtests/experimental/agent_loop/*to the new APIs.docs/advance/agent_loop.rst,docs/start/agentic_rl.rst) updated.