Add async tool-enabled vLLM server for GRPO training via OpenAI-compatible interface by BjarniHaukur · Pull Request #3469 · huggingface/trl

BjarniHaukur · 2025-05-19T14:49:42Z

What does this PR do?

This PR adds a new vllm_serve_async.py script to TRL. It:

Enables asynchronous, OpenAI-compatible inference with vLLM
Supports models that use tool calls (e.g., search APIs, python tool, general terminal usage)
Mirrors the weight syncing logic from vllm_serve.py
Delegates endpoint complexity to vllm.entrypoints.openai.api_server
Exposes a rollout_func interface that lets users define custom input/output structures and tool definitions to forward into reward functions

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…essAider

tmabraham · 2025-06-17T01:44:30Z

thank you for this PR!!

lewtun · 2025-10-20T09:50:14Z

Hi @BjarniHaukur thank you for the PR! We're now looking to integrate environments in TRL, so would you like to rebase your branch on main so we can test your proposal more thoroughly?

qgallouedec · 2026-05-22T20:05:51Z

Hey @BjarniHaukur, thanks for opening this and for the early push toward async, OpenAI-compatible, tool-enabled rollouts.
Re-reading the PR a year on, the direction you proposed turned out to be exactly the right one, but the codebase took a longer path to get there.

Most of what this PR set out to do has landed independently, just in a different shape:

Async rollouts + tool calls: the experimental AsyncGRPOTrainer (trl/experimental/async_grpo/) shipped via 🔌 Asynchronous GRPO #5293 with a series of follow-ups. It includes a tool-calling loop with max_tool_calling_iterations, per-rollout tool-call tracking, and a dedicated AsyncRolloutWorker.
OpenAI-compatible vLLM server: I'm currently iterating on this directly inside the canonical trl vllm-serve script Make trl vllm-serve OpenAI-compatible (exploratory) #5803, rather than adding a parallel server. We probably won't merge it though, see the PR. The AsyncGRPOTrainer relies on vllm serve directly, which is openai-compatible

Going to close this PR (and #3284) as superseded, but want to be clear that "superseded" here means your proposal was correct and the rest of the project caught up to it, not that the idea was wrong. Thanks for being early on this! :)

BjarniHaukur added 30 commits March 28, 2025 18:42

log answer key to wandb

a902e46

all Table

1fa6c69

HTML logging

46bf3f8

table

338730e

bump patch

c9777d3

hmm

d81295a

formatting

bd92df8

html esacape

05610d6

reward isnt string

908d739

sync fork

99fd8de

preliminary openai compatible endpoint

3943e72

early concept, needs refining

1723c56

dedupe

1491cb1

debug print

9a9c416

some slop to work on

57557da

unslop, missing hist

044a490

almost valid pseudocode

2a3c178

middle-ware monkey patch in mp.Pool()...

faf116c

remove unused

7f2c730

More accurate .md

1348c23

need gpu

fb79fb6

renting lambda again

b16f072

much nicer

14be11b

small

7af1273

aider-chat and datasets conflict

4506bad

risky reqs change

8388e83

should work, but hacky

63088e0

some insights, but monkeypatching probably wont suffice

8b0ed76

refactor: Rewrite test script to use SWE-bench dataset with MultiProc…

cea5eec

…essAider

refactor: Remove logging statements from test.py

50ea732

BjarniHaukur added 21 commits May 16, 2025 17:37

close to being ready

2618826

Merge remote-tracking branch 'upstream/main'

50624c7

almost pull ready

0e762a6

undo rename

7c48129

accidental

e2c4c4a

fixes

42dca01

comment

415d91e

sync with upstream

2c81b08

bug bump

833d849

missing

5e9ee94

we should probably warn users to not use 8bit

207ab2f

new dev branch

adf6508

refactor

80617ec

correct logging

36da30c

removed usecase specific

0afcbb8

pr cleanup

c01eebe

pr cleanup

48fec41

pr cleanup

5e6b02f

pr cleanup

f4e4f11

pr cleanup

cd87130

pr cleanup

254510c

BjarniHaukur mentioned this pull request May 19, 2025

Support for realistic multi-step rollouts via async vLLM API #3284

Open

BjarniHaukur mentioned this pull request May 26, 2025

vLLM native OpenAI compatible server with weight syncing PrimeIntellect-ai/verifiers#63

Closed

lewtun mentioned this pull request Oct 20, 2025

🕹️ Add rollout function for OpenEnv integration #4310

Merged

9 tasks

cmunley1 mentioned this pull request Nov 29, 2025

openai-compatible responses and chat completions endpoints in vllm_serve.py #4602

Open

cmunley1 mentioned this pull request Jan 30, 2026

NeMo-Gym Integration #4848

Merged

qgallouedec closed this May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add async tool-enabled vLLM server for GRPO training via OpenAI-compatible interface#3469

Add async tool-enabled vLLM server for GRPO training via OpenAI-compatible interface#3469
BjarniHaukur wants to merge 68 commits into
huggingface:mainfrom
ASSERT-KTH:async-vllm-server

BjarniHaukur commented May 19, 2025

Uh oh!

tmabraham commented Jun 17, 2025

Uh oh!

lewtun commented Oct 20, 2025

Uh oh!

qgallouedec commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

BjarniHaukur commented May 19, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

tmabraham commented Jun 17, 2025

Uh oh!

lewtun commented Oct 20, 2025

Uh oh!

qgallouedec commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants