[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs by qgallouedec · Pull Request #5242 · huggingface/trl

qgallouedec · 2026-03-07T05:40:15Z

Context

Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).

closes #5224
closes #5144

When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.

This is the final PR in the series. It eliminates the re-tokenization in the tool-calling loop — the actual source of the bug.

Changes

New _get_tool_suffix_ids(tool_messages) method: Tokenizes only the tool result portion by diffing a minimal dummy conversation (2 messages vs 3 messages). This avoids re-tokenizing the full conversation history.
_tool_call_loop: Instead of re-tokenizing prompt + completion + tool_results via apply_chat_template, builds the token sequence by concatenation: prompt_ids + completion_ids + tool_suffix_ids. The original prompt and completion token IDs are preserved exactly as they were — only the new tool result tokens are freshly tokenized.
Removed the prefix-preserving sanity check (no longer needed since the prefix is preserved by construction).
Removed the _tokenize_prompts call in the tool loop.

The bug and the fix

Previously, after a tool call:

The completion was decoded to text and appended as an assistant message
The full prompt + assistant + tool_results was re-tokenized via apply_chat_template
Due to BPE merge ambiguity, step 2 could produce different token IDs for the completion part

Now:

The original prompt_ids and completion_ids are kept as-is (never decoded and re-tokenized)
Only the tool result suffix is tokenized, using a minimal dummy conversation to extract just the template formatting
The full prompt is built by concatenation: prompt_ids + completion_ids + suffix_ids

Backward compatibility

No user-facing API changes. _get_tool_suffix_ids and _tool_call_loop are internal methods.

Note

Medium Risk
Generation/tokenization flow for GRPO tool-calling is changed to operate on raw token IDs, which can affect multi-turn tool execution, truncation behavior, and multimodal batching. Risk is mitigated by being internal-only but impacts a core training loop.

Overview
Fixes a GRPO multi-turn tool-calling bug caused by decoding and re-tokenizing completions (BPE ambiguity) by switching the loop to a token-in/token-out pipeline.

Adds _get_tool_suffix_ids() and updates _tool_call_loop() to build follow-up prompts by concatenating prompt_ids + completion_ids + tool_suffix_ids (and passing through the correct images/multimodal_fields subset), eliminating apply_chat_template re-tokenization and related prefix-preservation checks.

Adjusts _generate_single_turn()/callers in grpo_trainer.py and rloo_trainer.py to stop returning/expecting prompt IDs from generation backends and simplifies prompt-length handling in the transformers path.

^{Written by Cursor Bugbot for commit f74b5d1. This will update automatically on new commits. Configure here.}

…dling

…-token

… for None values

…-token

…ration

…_turn

… left-padding for per-token fields

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3375aeac6c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… and multimodal_fields parameters

…loop

…r and RLOOTrainer

…loop

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

qgallouedec · 2026-03-13T23:57:30Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 367a79ebc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

kashif

The changes are correct and the approach is sound:

The concatenation approach preserves exact token IDs (the bug fix)
Images/multimodal_fields indexing is correct: filtered by idxs_with_tool after overlong removal, maintaining alignment
Overlong truncation logic remains intact
Prefix-preserving check removal is justified since prefix is now preserved by construction

…token IDs (#5242) Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

qgallouedec and others added 27 commits March 5, 2026 19:10

support prompts or token IDs in VLLMClient and update API request han…

f10285e

…dling

test

7d2bb67

consistency

3b356ac

fix

82c4508

another fix

3ea2fcf

fix docstring

445f4ba

Add support for multi-modal inputs in VLLMClient and vllm_serve

8c6c88d

Merge branch 'main' into vllm-accept-token-ids

f617b2d

Merge branch 'main' into vllm-accept-token-ids

eaffd67

Move rollout_func from _generate_single_turn to _generate`

f3f6a5d

fix style

d417543

support multi-image

4b927d6

style

029fc1f

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

20b4039

…-token

Merge branch 'vllm-support-image-with-raw-token' into move-rollout-func

b8e3912

Fix handling of images in OnlineDPOTrainer to ensure proper structure…

07181cb

… for None values

Merge branch 'main' into vllm-accept-token-ids

6ff1e56

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

9f340e4

…-token

Merge branch 'vllm-support-image-with-raw-token' into move-rollout-func

d138be7

Move tokenization before vLLM generation call

09128d6

Fix deadlock issue by ensuring images are always gathered in VLLMGene…

7fd1711

…ration

Unify tokenization across all generation backends in _generate_single…

3ab04b0

…_turn

Extract tokenization out of _generate_single_turn into _tokenize_prompts

5d6d067

Enhance multimodal input handling in GRPO and RLOO trainers by adding…

b4d2c34

… left-padding for per-token fields

style

4922362

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

37c48b3

Fix re-tokenization bug in tool-calling loop by concatenating token IDs

3375aea

chatgpt-codex-connector Bot reviewed Mar 7, 2026

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py Outdated

Enhance _tool_call_loop to support multimodal inputs by adding images…

638f88a

… and multimodal_fields parameters

cursor Bot reviewed Mar 7, 2026

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py Outdated

qgallouedec and others added 13 commits March 10, 2026 12:33

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

50418e0

…loop

Merge branch 'main' into unify-tokenization-generate

7e7e3b3

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

31d8a0c

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

e88987f

…loop

Merge branch 'main' into extract-tokenize-prompts

8b4f6af

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

a704d89

…loop

Merge branch 'main' into extract-tokenize-prompts

81cf273

Remove dead code: eliminate prompt tokenization logic from GRPOTraine…

918686b

…r and RLOOTrainer

remove unused extra_fields from _generate_single_turn return value

9b8de83

style

6c8f55c

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

130d974

…loop

properly merge upstream

8b27397

fix

6c9db28

Base automatically changed from extract-tokenize-prompts to main March 10, 2026 21:35

Merge branch 'main' into fix-retokenization-tool-loop

441725b

cursor Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread trl/trainer/rloo_trainer.py Outdated

align with main

367a79e

chatgpt-codex-connector Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py

Comment thread trl/trainer/grpo_trainer.py Outdated

qgallouedec and others added 2 commits March 14, 2026 00:21

fix

f3f0f8d

Merge branch 'main' into fix-retokenization-tool-loop

5147625

qgallouedec mentioned this pull request Mar 14, 2026

Fix GRPO tool mask alignment after tool-call retokenization #5145

Closed

qgallouedec added 2 commits March 16, 2026 07:56

Merge branch 'main' into fix-retokenization-tool-loop

10708ca

Merge branch 'main' into fix-retokenization-tool-loop

f81f6a9

sergiopaniego mentioned this pull request Mar 19, 2026

Update openenv examples to use environment_factory #5235

Merged

8 tasks

Merge branch 'main' into fix-retokenization-tool-loop

f74b5d1

kashif approved these changes Mar 19, 2026

View reviewed changes

qgallouedec merged commit ebdfe82 into main Mar 19, 2026
14 checks passed

qgallouedec deleted the fix-retokenization-tool-loop branch March 19, 2026 18:05

qgallouedec added a commit that referenced this pull request Mar 20, 2026

[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating …

3a5b99b

…token IDs (#5242) Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs#5242

[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs#5242
qgallouedec merged 103 commits into
mainfrom
fix-retokenization-tool-loop

qgallouedec commented Mar 7, 2026 •

edited by cursor Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

qgallouedec commented Mar 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

kashif left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qgallouedec commented Mar 7, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

The bug and the fix

Backward compatibility

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec commented Mar 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

kashif left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qgallouedec commented Mar 7, 2026 •

edited by cursor Bot

Loading