🔌 Asynchronous GRPO by qgallouedec · Pull Request #5293 · huggingface/trl

qgallouedec · 2026-03-16T16:38:02Z

Add `AsyncGRPOTrainer`

Note

This is a first MVP! It doesn't have to be perfect. The goal is to have an initial implementation to build on so we can make improvements in subsequent pull requests:

documentation (add figures + practical guidelines to configure the vllm server and the training arguments
tests (more!)
logging (align with GRPOTrainer)
config (find better default values)

Adds an async variant of GRPO that decouples rollout generation from training. A background worker continuously streams completions from a vLLM server while the training loop consumes them, so generation and gradient updates overlap instead of alternating.

Architecture

Rollout worker (background thread) — sends prompts to vLLM, scores completions with reward functions, computes advantages, pushes ready-to-train samples into a queue.
Training loop (main process) — pulls samples from the queue, computes the clipped surrogate loss, updates weights.
Weight sync — after every weight_sync_steps steps, updated weights are transferred to vLLM via NCCL.
Staleness control — max_staleness discards samples generated by an outdated policy.

What's included

AsyncGRPOConfig / AsyncGRPOTrainer under trl.experimental.async_grpo
AsyncRolloutWorker with async generation, scoring, and NCCL weight transfer
Tool calling / environment support with max_tool_calling_iterations guard
Example script, documentation, and unit tests (with stub rollout worker)
vLLM version bumped to support 0.17.1

Note

Medium Risk
Adds a new asynchronous training path with background rollout generation and NCCL weight transfer to a vLLM server, introducing concurrency and distributed synchronization behavior that may be fragile. Also expands vLLM dependency/version handling, which can affect environments that use trl[vllm].

Overview
Introduces trl.experimental.async_grpo with AsyncGRPOConfig, AsyncGRPOTrainer, and an AsyncRolloutWorker that streams rollouts from an external vLLM server while training consumes a queue, periodically syncing updated weights back to vLLM via NCCL and discarding stale samples.

Adds an example script and unit tests (with a stub rollout worker) plus documentation wired into the experimental docs to describe setup and constraints.

Updates vLLM integration to support up to 0.17.1, adds aiohttp to the vllm extra, extends is_vllm_available() with an optional min_version, and ignores uv.lock in .gitignore.

^{Written by Cursor Bugbot for commit 00c802b. This will update automatically on new commits. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 495f9676da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…rker

…ull queue

qgallouedec · 2026-03-18T03:49:18Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d8c2908d81

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

qgallouedec · 2026-03-18T04:41:47Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c22c8fa9dd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cursor · 2026-03-18T04:52:48Z

+                            f"[generate] all {self.max_inflight_tasks} slots busy, "
+                            f"pending_groups={len(pending_groups)}, waiting for completions..."
+                        )
+                    continue


Generate loop never exits on stop with stuck tasks

Low Severity

When stop_event is set but inflight tasks are stuck in _generate_one_turn's infinite retry loop (e.g., vLLM server permanently down), _generate_loop's outer while True never exits. It repeatedly calls asyncio.wait with a 0.1s timeout, getting empty done sets, and loops back without ever checking stop_event on that path. The finally block that cancels tasks is never reached, causing the worker thread to hang.

Additional Locations (1)

trl/experimental/async_grpo/async_rollout_worker.py#L600-L608

AmineDiro · 2026-03-18T08:55:21Z

        while True:
            t0 = time.time()
+            qsize = self.queue.qsize()
+            if qsize == 0:


This might be redundant with the queue.get(timeout=self.timeout) . The timeout will raise a queue.Empty that we correctly catch

AmineDiro · 2026-03-18T08:56:04Z

            "completions."
        },
    )
+    max_tool_calling_iterations: int | None = field(


Beautiful, definetly need this !

Nit: Maybe max_turns to be explicit ? In the future maybe we will have chatlike rollouts that don't need to call ?

AmineDiro · 2026-03-18T09:04:50Z

                    free_slots.add(slot)
                    logger.debug(f"[slot] freed   slot={slot} group={group_id} free_after={len(free_slots)}")
+                    if task.exception() is not None:
+                        raise task.exception()


I think we shouldnt raise in main loop as it can stop worker_thread silently. But there is a discussion to have here.

In the opened PR #5299 . I opted for dropping the whole group with a warning.

trl/trl/experimental/async_grpo/async_rollout_worker.py

Line 420 in f5dad09

except Exception:

. Because we retry HTTP errors for each requests this is a simple sane default but as we don't want to have a missing sample from the group for training.

AmineDiro · 2026-03-18T09:06:14Z

-                        await self._groups_to_score.put(group)
+                        while True:
+                            try:
+                                self._groups_to_score.put_nowait(group)


there is no need for a sync function here.
The while True loop is equivalent to await point. The scheduler will switch the current task is the queue is full and switch to some other coroutine

AmineDiro · 2026-03-18T09:07:16Z

+            # Use put_nowait: if the queue is full at shutdown, skip the sentinel —
+            # _score_loop will exit via stop_event check in its outer loop.
+            try:
+                self._groups_to_score.put_nowait(None)


Same as previous self._groups_to_score.put_nowait(group). The async queue mechanic with await point is correct and more efficient

AmineDiro · 2026-03-18T09:08:34Z

+        while not stop_event.is_set():
+            t_wait = time.monotonic()
+            try:
+                group = await asyncio.wait_for(self._groups_to_score.get(), timeout=0.5)


I don't think we need asyncio.wait_for for .get() with Timeout error. We are running in a while True loop, so the await point is the correct implementation.

cursor · 2026-03-19T02:50:15Z

+                rollout_queue=self.rollout_queue,
+                model_version_fn=lambda: self.model_version,
+                max_staleness=self.args.max_staleness,
+                timeout=self.args.vllm_server_timeout,


Server timeout reused as queue timeout causes premature stop

Medium Severity

vllm_server_timeout is reused as the RolloutQueueDataset queue poll timeout. This conflates two separate concerns: how long to wait for the vLLM server to start and how long to wait for rollout samples during training. A user who sets a short vllm_server_timeout (because their server is already running) risks premature epoch termination whenever the queue is temporarily empty, such as during weight sync pauses.

Additional Locations (1)

trl/experimental/async_grpo/async_grpo_trainer.py#L92-L96

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-19T03:08:00Z

+            }
+            if metrics_list and metrics_list[0]
+            else {}
+        )


Metrics collator drops keys missing from first sample

Low Severity

DataCollatorForRollout.torch_call builds the batch metrics dict using only keys from metrics_list[0]. When a batch mixes samples from tool-calling groups (which include tools/call_frequency and tools/failure_frequency keys) and non-tool-calling groups (which lack those keys), any metric key absent from the first sample is silently dropped from the entire batch, leading to inaccurate logged metrics.

first commit

495f967

chatgpt-codex-connector Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py Outdated

Comment thread trl/experimental/async_grpo/async_rollout_worker.py Outdated

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py Outdated

cursor Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py

Comment thread trl/experimental/async_grpo/async_rollout_worker.py

qgallouedec added 2 commits March 16, 2026 16:48

consistency

6a35777

fix

1652027

cursor Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_rollout_worker.py Outdated

qgallouedec and others added 4 commits March 16, 2026 17:34

address review

dcd9bdb

wire timeout + fix name tool

a7fecc2

Merge branch 'main' into async-grpo

5d2327e

add buffer queue size metric to samples in AsyncRolloutWorker

9e7706e

cursor Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_rollout_worker.py

added aiohttp

7228399

AmineDiro reviewed Mar 17, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py

Comment thread trl/experimental/async_grpo/async_rollout_worker.py

Comment thread trl/experimental/async_grpo/async_rollout_worker.py Outdated

cursor Bot reviewed Mar 17, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_rollout_worker.py

qgallouedec added 5 commits March 17, 2026 01:26

style

7e12720

support async scoring of reward functions and enhance metrics reporting

a62fe60

style

606c98b

docstring + consistency

887fe44

better segment trainer/rollout

f6084ff

cursor Bot reviewed Mar 17, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py Outdated

Comment thread trl/experimental/async_grpo/async_rollout_worker.py

inherit from base trainer and allow max_steps=None and stub for test

a162681

cursor Bot reviewed Mar 17, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_rollout_worker.py Outdated

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py Outdated

qgallouedec added 4 commits March 17, 2026 03:18

refactor: adjust max_steps calculation to consider accelerator processes

94fbeac

doc

99ccd54

better log tok/sec

ce6e00c

fix: update max_completion_length default value to 2048

bdaf5c1

cursor Bot reviewed Mar 17, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py Outdated

Comment thread pyproject.toml Outdated

qgallouedec added 3 commits March 17, 2026 03:50

fix timeout, tok/sec, and vllm import

3c5ec3d

style

fae8012

style

f468106

qgallouedec added 2 commits March 18, 2026 03:44

Improve error handling for server connection issues in AsyncRolloutWo…

a115faf

…rker

Improve queue handling in AsyncRolloutWorker to prevent blocking on f…

d8c2908

…ull queue

cursor Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_rollout_worker.py

chatgpt-codex-connector Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py Outdated

Comment thread trl/experimental/async_grpo/async_rollout_worker.py

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py

qgallouedec added 2 commits March 18, 2026 04:20

first sync and then start generation

5d5c46c

nits

9a76dec

qgallouedec changed the title ~~Async GRPO~~ 🔌 Asynchronous GRPO Mar 18, 2026

we don't need to set max-num-seq apparently

6266546

cursor Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_rollout_worker.py

Comment thread trl/experimental/async_grpo/async_rollout_worker.py Outdated

remove dead code

c22c8fa

chatgpt-codex-connector Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py

Comment thread trl/experimental/async_grpo/async_rollout_worker.py

Comment thread trl/experimental/async_grpo/async_rollout_worker.py

cursor Bot reviewed Mar 18, 2026

View reviewed changes

qgallouedec requested review from AmineDiro and albertvillanova March 18, 2026 05:05

AmineDiro reviewed Mar 18, 2026

View reviewed changes

Merge branch 'main' into async-grpo

c365023

cursor Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_rollout_worker.py

Merge branch 'main' into async-grpo

8d65072

cursor Bot reviewed Mar 19, 2026

View reviewed changes

fix logprobs

00c802b

cursor Bot reviewed Mar 19, 2026

View reviewed changes

AmineDiro approved these changes Mar 19, 2026

View reviewed changes

AmineDiro merged commit b86b760 into main Mar 19, 2026
16 checks passed

AmineDiro deleted the async-grpo branch March 19, 2026 08:45

qgallouedec mentioned this pull request May 22, 2026

Add async tool-enabled vLLM server for GRPO training via OpenAI-compatible interface #3469

Closed

5 tasks

This was referenced Jun 11, 2026

Parallelizing generation and evaluation for complex reward functions #4489

Closed

Async-GRPO: Decouple rollout generation and reward computation/model update for faster training #4591

Closed

Conversation

qgallouedec commented Mar 16, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add AsyncGRPOTrainer

Architecture

What's included

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec commented Mar 18, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec commented Mar 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Mar 18, 2026

Choose a reason for hiding this comment

Generate loop never exits on stop with stuck tasks

Uh oh!

AmineDiro Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

AmineDiro Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

AmineDiro Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

AmineDiro Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

AmineDiro Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

AmineDiro Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot Mar 19, 2026

Choose a reason for hiding this comment

Server timeout reused as queue timeout causes premature stop

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

qgallouedec commented Mar 16, 2026 •

edited by cursor Bot

Loading

Add `AsyncGRPOTrainer`