Make `trl vllm-serve` OpenAI-compatible (exploratory) by qgallouedec · Pull Request #5803 · huggingface/trl

qgallouedec · 2026-05-21T01:47:47Z

Generated with Claude Code. Not intended to merge as-is. Opening this to evaluate whether the OpenAI work is worth doing.

closes #4602

Context

We have trl vllm-serve because old vLLM had no way to sync weights from a training process into a running engine. vLLM 0.17+ fixed that, and we're already on a path to drop older versions. Once that's done, trl vllm-serve can be retired in favour of stock vllm serve (already OpenAI-compatible, ofc). So the question is: is OpenAI compatibility worth doing in the meantime?

What it does

Reshapes the whole HTTP surface to OpenAI: /generate/ → /v1/completions, /chat/ → /v1/chat/completions, new /v1/models, trailing slashes dropped; TRL-only weight-sync endpoints kept. Server gains a concurrent-chat batcher to recover throughput under one-conversation-per-request semantics. Client switches to the openai SDK for /v1/*. The public VLLMClient API is unchanged, so trainers / openenv / distillation work without modification. trl vllm-serve's wire format is not public API.

Pros / cons

+ Long-standing user request; any OpenAI SDK now works against the server.
+ Real spec rather than ad-hoc TRL fields.
− Big diff for a script we plan to retire.
− New complexity (chat batcher, workers lock, OpenAI-extras plumbing) that wouldn't exist on stock vllm serve.

My take

I'm leaning toward not doing this. The diff is heavy for a script we're planning to retire, and once trl vllm-serve is gone, the OpenAI compatibility comes for free via stock vllm serve. Happy to be argued out of it.
@albertvillanova i'd love to hear what you think.

Not tested

Diff hasn't been run end-to-end. Existing tests call only the preserved public API.

Note

Medium Risk
Moderate risk because it rewires the HTTP contract for trl vllm-serve (new /v1/* endpoints, removed legacy routes) and adds new batching/concurrency paths plus an openai SDK dependency that could affect training integrations and runtime behavior.

Overview
Makes trl vllm-serve OpenAI-compatible by replacing the legacy /generate/ and /chat/ routes with /v1/completions, /v1/chat/completions, plus /v1/models, and by standardizing URL paths (no trailing slashes; /health, /world_size, etc.). Responses now follow OpenAI shapes while carrying TRL extensions like prompt_token_ids/token_ids and logprob token-id payloads.

Updates VLLMClient to use the official openai Python SDK for generation calls while keeping TRL-only side-channel endpoints for weight sync, prefix-cache reset, and teacher-logprob batching; the client also adds concurrent fan-out for chat requests and converts PIL images to OpenAI image_url data URIs.

Server throughput changes: adds a background chat request batcher (queue + grouping by tools/template) and a shared worker lock to serialize pipe I/O across concurrent endpoints, alongside refactoring sampling/logprob formatting to match OpenAI semantics.

^{Reviewed by Cursor Bugbot for commit a165260. Bugbot is set up for automated code reviews on this repo. Configure here.}

HuggingFaceDocBuilderDev · 2026-05-21T01:50:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27026732ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T01:55:41Z

+                {
+                    "index": i,
+                    "message": {"role": "assistant", "content": completion.text},
+                    "finish_reason": completion.finish_reason or "stop",


Preserve tool calls in chat completion messages

The new /v1/chat/completions handler accepts tools and forwards them to vLLM, but each choice is serialized with only message.content and never includes message.tool_calls. In requests where the model chooses a function/tool call (e.g., finish_reason == "tool_calls"), OpenAI-compatible clients rely on choices[i].message.tool_calls to execute the tool; omitting it makes tool-calling flows fail even though the request advertised tool support.

Useful? React with 👍 / 👎.

cursor · 2026-05-21T01:57:25Z

+                    if part.get("type") == "image_url":
+                        url = part.get("image_url", {}).get("url", "")
+                        part["image_pil"] = _decode_data_url(url)
+        return messages


Image conversion doesn't update part type for vLLM

Medium Severity

_convert_messages_for_vllm adds an image_pil key to content parts but never changes the type from "image_url" to "image_pil" and never removes the image_url key. The function's docstring states it prepares images for vLLM's chat API which "accepts PIL," but without updating the type field, vLLM's message processing will still see the part as image_url type and attempt to process the data URI directly rather than using the pre-decoded PIL object. The added image_pil key becomes dead weight that vLLM never reads.

^{Reviewed by Cursor Bugbot for commit 2702673. Configure here.}

cursor · 2026-05-21T01:57:25Z

+                    {
+                        "token": tok,
+                        "bytes": None,
+                        "logprob": -1e30 if v is None else v,


Chat logprobs replace None with sentinel value

Low Severity

_format_chat_logprobs replaces None logprob values (from NaN in vLLM) with -1e30, but the client's _extract_choice_logprobs reads these as plain floats from entry["logprob"]. This changes the semantics: the old code preserved None in the returned logprobs list, while the new chat path returns -1e30. Downstream code that checks if value is None to detect invalid logprobs would silently get a very negative float instead.

Additional Locations (1)

trl/generation/vllm_client.py#L235-L238

^{Reviewed by Cursor Bugbot for commit 2702673. Configure here.}

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit a165260. Configure here.}

cursor · 2026-05-22T14:45:21Z

        }
+        for connection in connections:
+            connection.send({"type": "fire_and_forget", "method": "collective_rpc", "kwargs": kwargs})
+        return {"message": "Request received, initializing communicator"}


Fire-and-forget endpoints bypass workers lock causing races

Low Severity

The init_communicator, update_named_param, and close_communicator endpoints call connection.send() without acquiring _workers_lock. The locked endpoints (/v1/completions, chat batcher, logprob batcher, reset_prefix_cache) use run_in_executor for pipe recv, which runs in a separate thread. If a fire-and-forget endpoint sends to a connection while that executor thread is concurrently sending or receiving on the same Connection object, this constitutes a thread-safety violation on multiprocessing.Connection.

Additional Locations (2)

trl/scripts/vllm_serve.py#L1063-L1066

trl/scripts/vllm_serve.py#L1044-L1047

^{Reviewed by Cursor Bugbot for commit a165260. Configure here.}

Make trl vllm-serve OpenAI-compatible (exploratory)

2702673

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

cursor Bot reviewed May 21, 2026

View reviewed changes

Merge branch 'main' into vllm-serve-oai-compat

a165260

cursor Bot reviewed May 22, 2026

View reviewed changes

qgallouedec mentioned this pull request May 22, 2026

Add async tool-enabled vLLM server for GRPO training via OpenAI-compatible interface #3469

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `trl vllm-serve` OpenAI-compatible (exploratory)#5803

Make `trl vllm-serve` OpenAI-compatible (exploratory)#5803
qgallouedec wants to merge 2 commits into
mainfrom
vllm-serve-oai-compat

qgallouedec commented May 21, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Uh oh!

cursor Bot May 21, 2026

Uh oh!

cursor Bot May 21, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qgallouedec commented May 21, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

What it does

Pros / cons

My take

Not tested

Uh oh!

HuggingFaceDocBuilderDev commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 21, 2026

Choose a reason for hiding this comment

Image conversion doesn't update part type for vLLM

Uh oh!

cursor Bot May 21, 2026

Choose a reason for hiding this comment

Chat logprobs replace None with sentinel value

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

Fire-and-forget endpoints bypass workers lock causing races

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qgallouedec commented May 21, 2026 •

edited by cursor Bot

Loading