[BREAKING][rollout] refactor: move LLMServerManager out of AgentLoopManager by wuxibin89 · Pull Request #6129 · verl-project/verl

wuxibin89 · 2026-04-23T14:09:44Z

What does this PR do?

AgentLoopManager is one specific agent-framework implementation in verl, and is designed to be fully replaceable by other agent frameworks such as:

NVIDIA NeMo-Gym nemo gym integration #5787 feat: nemo gym recipe verl-recipe#80
AWS Bedrock AgentCore [RFC] A Fully decoupled and auto-scaled rollout engine using AWS Bedrock AgentCore Runtime #4216
RemoteAgentLoop: Feature Request: RemoteAgentLoop - Support for External Distributed Agent Integration #5737
SWE-agent:
Any blackbox agent framework: [RFC] Agent Abstractions and Trajectory Gateway for VERL #5790

Previously the LLM server replicas (launch / tear-down / load balancer / profiling / KV-cache clearing) were owned by AgentLoopManager, which forced every alternative agent framework to either inherit from AgentLoopManager or re-implement the rollout server plumbing. This made integration of third-party agent frameworks inconvenient and entangled server life-cycle with agent scheduling.

This PR extracts LLM-server management into a standalone module verl/workers/rollout/llm_server.py, so that any agent framework can reuse the same rollout servers by consuming an LLMServerClient.

Compatibility

Breaking change for out-of-tree agent frameworks that imported
AsyncLLMServerManager / FullyAsyncLLMServerManager from
verl.experimental.agent_loop — import from
verl.workers.rollout.llm_server and use the new names LLMServerClient /
FullyLLMServerClient instead. AgentLoopManager.create(...) signature also
changed (see change #3).

Test

Updated tests/checkpoint_engine/test_special_server_adapter.py and
tests/experimental/agent_loop/* to the new APIs.
Docs (docs/advance/agent_loop.rst, docs/start/agentic_rl.rst) updated.

gemini-code-assist

Code Review

This pull request refactors the LLM server management architecture by introducing LLMServerManager and LLMServerClient to replace the previous AsyncLLMServerManager implementation. The core logic for server lifecycle management and load balancing has been moved to verl/workers/rollout/llm_server.py, while AgentLoopManager and AgentLoopWorker have been updated to use the new client-based interface. Additionally, the FullyAsyncAgentLoopManager was refactored and moved to the fully async policy module, and corresponding updates were made across documentation, tests, and various trainer implementations to align with these changes. I have no feedback to provide.

ArronHZG

LLMServerManager

ArronHZG · 2026-04-27T12:23:37Z

+            max_cache_size=DEFAULT_ROUTING_CACHE_SIZE,
+        )
+
+    def get_client(self, fully_async: bool = False) -> LLMServerClient:


I think this implementation should be fine, but it doesn't feel very elegant. Later, I might change it to pass in a client class and initialize it here.

Sure, I noticed that 5900 add an additional model_engine_server_handle to FullyAsyncLLMServerManager. We may need to pass in subclass with additional kwargs in get_client.

PeterSH6

LGTM. What's the plan of the old asyncllmservermanager?

wuxibin89 · 2026-04-29T13:39:42Z

LGTM. What's the plan of the old asyncllmservermanager?

The old AsyncLLMServerManager has been replaced by LLMServerClient

Resolve conflicts in verl/experimental/agent_loop/agent_loop.py introduced by PR verl-project#6129 (refactor: move LLMServerManager out of AgentLoopManager): * Imports - keep the function_tool import while accepting main's removal of prometheus_utils, teacher_loop, single_controller.ray.base imports. * AgentLoopWorker.__init__ - keep both the new "Online policy distillation" block (from main) and the "Load function-based tools once per worker" block (from this PR); ordering is irrelevant since they touch disjoint state. The function_tools=FunctionToolListWrap(self.function_tools) kwarg in _run_agent_loop auto-merged cleanly next to main's renamed server_manager=self.llm_client. Co-authored-by: Claude Made-with: Cursor

…factoring

…ckaging bug The pinned verl commit (a512e90) ships a wheel that is missing verl/experimental/reward_loop/router/ because the upstream directory had no __init__.py at that commit and setuptools' default package discovery silently drops it. This breaks the FlowGRPO trainer at runtime with "ModuleNotFoundError: No module named 'verl.experimental.reward_loop.router'". Switch the verl install in docs/start/install.md from a wheel install (uv pip install git+…@<commit>) to a clone-and-editable install pinned at the same commit. An editable install exposes the source tree on sys.path, so router/ is picked up as a PEP 420 implicit namespace package and the import works without any per-venv patching. CI workflows are intentionally not touched because they don't exercise the broken codepath. The pin will be bumped past verl-project/verl#5209 once verl-omni is also adapted to the breaking LLMServerClient refactor in verl-project/verl#6129 (tracked separately).

Adapt verl-omni's diffusion agent loop and ray trainer to verl-project/verl#6129, which removed AsyncLLMServerManager and made AgentLoopManager / AgentLoopWorker consume an LLMServerClient produced by a separately-owned LLMServerManager. verl-omni changes: - DiffusionAgentLoopWorker.__init__ now takes (config, llm_client, teacher_client, reward_loop_worker_handles), matching the positional contract that AgentLoopManager.create() uses when spawning workers. _get_rollout_and_model_config was also dropped upstream, so the config slicing is inlined to keep the diff minimal. - ray_diffusion_trainer now creates an LLMServerManager first, hands its client to AgentLoopManager.create(), and uses llm_server_manager.get_replicas() (instead of async_rollout_manager.rollout_replicas) to wire the CheckpointEngineManager. This mirrors the new pattern in upstream verl/trainer/ppo/ray_trainer.py. - tests/agent_loop/test_diffusion_agent_loop.py is updated for the new API; in standalone test mode LLMServerManager spins up its own replicas via rollout.nnodes / n_gpus_per_node. Pin / docs / CI: - Bump the pinned verl commit to a4351480 (the merge commit of #5209), which is the first commit that ships verl/experimental/reward_loop/router/ in the wheel AND contains the #6129 refactor that this change adapts to. With this commit, the workaround in PR verl-project#51 (clone + editable install) is no longer required. - Restore the simple `uv pip install git+...@<commit>` install line in docs/start/install.md. - Bump the same pin in .github/workflows/{cpu_unit_tests,sanity,type-coverage-check}.yml. This is a BREAKING change because DiffusionAgentLoopWorker.__init__ signature changed. Any downstream code that subclasses or directly instantiates DiffusionAgentLoopWorker must switch from (servers, load_balancer_handle, teacher_servers, teacher_load_balancer_handle) to (llm_client, teacher_client). No public CLI/config surface is affected. Signed-off-by: samithuang <285365963@qq.com>

* [BREAKING][rollout] feat: adapt to verl LLMServerClient refactor Adapt verl-omni's diffusion agent loop and ray trainer to verl-project/verl#6129, which removed AsyncLLMServerManager and made AgentLoopManager / AgentLoopWorker consume an LLMServerClient produced by a separately-owned LLMServerManager. verl-omni changes: - DiffusionAgentLoopWorker.__init__ now takes (config, llm_client, teacher_client, reward_loop_worker_handles), matching the positional contract that AgentLoopManager.create() uses when spawning workers. _get_rollout_and_model_config was also dropped upstream, so the config slicing is inlined to keep the diff minimal. - ray_diffusion_trainer now creates an LLMServerManager first, hands its client to AgentLoopManager.create(), and uses llm_server_manager.get_replicas() (instead of async_rollout_manager.rollout_replicas) to wire the CheckpointEngineManager. This mirrors the new pattern in upstream verl/trainer/ppo/ray_trainer.py. - tests/agent_loop/test_diffusion_agent_loop.py is updated for the new API; in standalone test mode LLMServerManager spins up its own replicas via rollout.nnodes / n_gpus_per_node. Pin / docs / CI: - Bump the pinned verl commit to a4351480 (the merge commit of #5209), which is the first commit that ships verl/experimental/reward_loop/router/ in the wheel AND contains the #6129 refactor that this change adapts to. With this commit, the workaround in PR #51 (clone + editable install) is no longer required. - Restore the simple `uv pip install git+...@<commit>` install line in docs/start/install.md. - Bump the same pin in .github/workflows/{cpu_unit_tests,sanity,type-coverage-check}.yml. This is a BREAKING change because DiffusionAgentLoopWorker.__init__ signature changed. Any downstream code that subclasses or directly instantiates DiffusionAgentLoopWorker must switch from (servers, load_balancer_handle, teacher_servers, teacher_load_balancer_handle) to (llm_client, teacher_client). No public CLI/config surface is affected. Signed-off-by: samithuang <285365963@qq.com>

wuxibin89 requested review from ArronHZG, PeterSH6, chenhaiq, eric-haibin-lin, tongyx361 and vermouth1992 as code owners April 23, 2026 14:09

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

wuxibin89 changed the title ~~[rollout] refactor: move LLMServerManager out of AgentLoopManager~~ [1/2][rollout] refactor: move LLMServerManager out of AgentLoopManager Apr 23, 2026

This was referenced Apr 24, 2026

[fully_async] feat: standalone log prob server (Model Engine Server) support #5990

Open

[fully_async] feat: reuse trainer worker group for hybrid rollout to do validation #6076

Open

wuxibin89 changed the title ~~[1/2][rollout] refactor: move LLMServerManager out of AgentLoopManager~~ [BREAKING][rollout] refactor: move LLMServerManager out of AgentLoopManager Apr 24, 2026

PeterSH6 self-assigned this Apr 24, 2026

wuxibin89 added the rollout label Apr 27, 2026

ArronHZG mentioned this pull request Apr 27, 2026

[fully_async, rollout] feat: enable online policy distillation in fully async training #6056

Merged

8 tasks

ArronHZG reviewed Apr 27, 2026

View reviewed changes

PeterSH6 previously approved these changes Apr 28, 2026

View reviewed changes

[rollout] refactor: move server management out of AgentLoopManager

1e5d587

wuxibin89 dismissed PeterSH6’s stale review via 1e5d587 April 29, 2026 13:38

wuxibin89 force-pushed the wuxibin/refactor_llm_server branch from 9c2eea4 to 1e5d587 Compare April 29, 2026 13:38

fix

ac527e6

wuxibin89 merged commit 3c5f6e0 into verl-project:main Apr 29, 2026
86 of 95 checks passed

xiefan46 added a commit to xiefan46/verl that referenced this pull request Apr 30, 2026

fix: pass teacher_client to AgentLoopManager per verl-project#6129 re…

81b2ad4

…factoring

SamitHuang mentioned this pull request Apr 30, 2026

[BREAKING][rollout] feat: adapt to verl LLMServerClient refactor verl-project/verl-omni#52

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BREAKING][rollout] refactor: move LLMServerManager out of AgentLoopManager#6129

[BREAKING][rollout] refactor: move LLMServerManager out of AgentLoopManager#6129
wuxibin89 merged 2 commits intoverl-project:mainfrom
wuxibin89:wuxibin/refactor_llm_server

wuxibin89 commented Apr 23, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

ArronHZG left a comment

Uh oh!

ArronHZG Apr 27, 2026

Uh oh!

wuxibin89 Apr 27, 2026 •

edited

Loading

Uh oh!

PeterSH6 left a comment

Uh oh!

wuxibin89 commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wuxibin89 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Compatibility

Test

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

ArronHZG left a comment

Choose a reason for hiding this comment

Uh oh!

ArronHZG Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

wuxibin89 Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PeterSH6 left a comment

Choose a reason for hiding this comment

Uh oh!

wuxibin89 commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wuxibin89 commented Apr 23, 2026 •

edited

Loading

wuxibin89 Apr 27, 2026 •

edited

Loading