[GRPO] Improve completion length logging by edbeeching · Pull Request #3188 · huggingface/trl

edbeeching · 2025-03-31T09:08:13Z

What does this PR do?

Adds min, mean and max logging of completion lengths
Also logs min, mean, max completion logs for sequences that terminate in EOS
Adds clipped completion ratio from SimpleRL-Zero
Example:

HuggingFaceDocBuilderDev · 2025-03-31T09:12:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lewtun

LGTM with a suggestion to also log the clip fraction of truncated completions

lewtun · 2025-03-31T09:25:56Z

+        self._metrics[mode]["max_completion_length"].append(max_completion_length)
+
+        # identify sequences that terminated with EOS and log their lengths
+        agg_terminated_with_eos = self.accelerator.gather_for_metrics(is_eos.any(dim=1))


Since we now have the sequences with / without EOS, WDYT about also logging clip_ratio_completion_length or truncated_completion_length_ratio as suggested in SimpleRL-Zero:

This is just num_terminated_completions / num_total_completions ?

No it's num_truncated_completions / num_total_completions (i.e. a high clip ratio means a lot of completions are too verbose)

qgallouedec

Nice! Can you just add these new metrics to the doc (there is a section for this "logged metrics")

…ggingface#3131) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> log answer key to wandb all Table HTML logging table bump patch hmm formatting html esacape reward isnt string [Liger] Liger KTO support (huggingface#2812) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> 🏃 Migrate CI to self-hosted runners (huggingface#3174) ❤️‍🩹 [CI] fix transformers dev CI failure (huggingface#3176) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> ⏯️ Fix: handle None inputs when resuming GRPO Trainer from checkpoint (huggingface#3148) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> 📎 Fix is_clipped to compute the effective clip_ratio (huggingface#3175) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Fix breaking typo for flash_attention reducing_memory_usage.md (huggingface#3190) Show unique prompts in GRPO WandB tables (huggingface#3191) 🐗 [CI] Fix trufflehog false positives (huggingface#3192) [GRPO] Improve completion length logging (huggingface#3188) preliminary openai compatible endpoint early concept, needs refining dedupe debug print some slop to work on unslop, missing hist almost valid pseudocode middle-ware monkey patch in mp.Pool()... remove unused More accurate .md need gpu renting lambda again much nicer small aider-chat and datasets conflict risky reqs change should work, but hacky some insights, but monkeypatching probably wont suffice refactor: Rewrite test script to use SWE-bench dataset with MultiProcessAider refactor: Remove logging statements from test.py one step closer finally, the correct abstraction doc todo unslop unslop undo accidental black cleaner abstraction new abstraction

edbeeching added 3 commits March 31, 2025 09:04

adds more detailed completion logging

fd05960

precommit

3d6076a

typo

d0c118e

edbeeching requested review from qgallouedec and shirinyamani March 31, 2025 09:08

lewtun approved these changes Mar 31, 2025

View reviewed changes

fix edge case

f3a7e2c

kashif approved these changes Mar 31, 2025

View reviewed changes

cleanup, adds clipped completin ratio

c35fc99

lewtun reviewed Mar 31, 2025

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py Outdated

qgallouedec reviewed Mar 31, 2025

View reviewed changes

edbeeching added 2 commits April 1, 2025 07:13

docs

e45e2d7

lewis nit

f8fbf47

edbeeching merged commit 9f3702f into main Apr 1, 2025

edbeeching deleted the improve-completions-logging branch April 1, 2025 08:00

lewtun mentioned this pull request Apr 4, 2025

overlong filtering #3229

Closed

5 tasks

yxliu-TAMU pushed a commit to mincheolseong/ECEN743-GRPO-Project-Proposal that referenced this pull request Apr 20, 2025

[GRPO] Improve completion length logging (huggingface#3188)

4357c48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GRPO] Improve completion length logging#3188

[GRPO] Improve completion length logging#3188
edbeeching merged 7 commits into
mainfrom
improve-completions-logging

edbeeching commented Mar 31, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 31, 2025

Uh oh!

lewtun left a comment

Uh oh!

lewtun Mar 31, 2025

Uh oh!

edbeeching Mar 31, 2025

Uh oh!

lewtun Mar 31, 2025

Uh oh!

Uh oh!

qgallouedec left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

edbeeching commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 31, 2025

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

lewtun Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

edbeeching Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

lewtun Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

edbeeching commented Mar 31, 2025 •

edited

Loading