Skip to content

Add async tool-enabled vLLM server for GRPO training via OpenAI-compatible interface#3469

Closed
BjarniHaukur wants to merge 68 commits into
huggingface:mainfrom
ASSERT-KTH:async-vllm-server
Closed

Add async tool-enabled vLLM server for GRPO training via OpenAI-compatible interface#3469
BjarniHaukur wants to merge 68 commits into
huggingface:mainfrom
ASSERT-KTH:async-vllm-server

Conversation

@BjarniHaukur

Copy link
Copy Markdown

What does this PR do?

This PR adds a new vllm_serve_async.py script to TRL. It:

  • Enables asynchronous, OpenAI-compatible inference with vLLM
  • Supports models that use tool calls (e.g., search APIs, python tool, general terminal usage)
  • Mirrors the weight syncing logic from vllm_serve.py
  • Delegates endpoint complexity to vllm.entrypoints.openai.api_server
  • Exposes a rollout_func interface that lets users define custom input/output structures and tool definitions to forward into reward functions

Fixes #3284

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@tmabraham

Copy link
Copy Markdown

thank you for this PR!!

@lewtun

lewtun commented Oct 20, 2025

Copy link
Copy Markdown
Member

Hi @BjarniHaukur thank you for the PR! We're now looking to integrate environments in TRL, so would you like to rebase your branch on main so we can test your proposal more thoroughly?

@qgallouedec

Copy link
Copy Markdown
Member

Hey @BjarniHaukur, thanks for opening this and for the early push toward async, OpenAI-compatible, tool-enabled rollouts.
Re-reading the PR a year on, the direction you proposed turned out to be exactly the right one, but the codebase took a longer path to get there.

Most of what this PR set out to do has landed independently, just in a different shape:

  • Async rollouts + tool calls: the experimental AsyncGRPOTrainer (trl/experimental/async_grpo/) shipped via 🔌 Asynchronous GRPO #5293 with a series of follow-ups. It includes a tool-calling loop with max_tool_calling_iterations, per-rollout tool-call tracking, and a dedicated AsyncRolloutWorker.
  • OpenAI-compatible vLLM server: I'm currently iterating on this directly inside the canonical trl vllm-serve script Make trl vllm-serve OpenAI-compatible (exploratory) #5803, rather than adding a parallel server. We probably won't merge it though, see the PR. The AsyncGRPOTrainer relies on vllm serve directly, which is openai-compatible

Going to close this PR (and #3284) as superseded, but want to be clear that "superseded" here means your proposal was correct and the rest of the project caught up to it, not that the idea was wrong. Thanks for being early on this! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for realistic multi-step rollouts via async vLLM API

5 participants