[GRPO] Improve completion length logging#3188
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
lewtun
left a comment
There was a problem hiding this comment.
LGTM with a suggestion to also log the clip fraction of truncated completions
| self._metrics[mode]["max_completion_length"].append(max_completion_length) | ||
|
|
||
| # identify sequences that terminated with EOS and log their lengths | ||
| agg_terminated_with_eos = self.accelerator.gather_for_metrics(is_eos.any(dim=1)) |
There was a problem hiding this comment.
Since we now have the sequences with / without EOS, WDYT about also logging clip_ratio_completion_length or truncated_completion_length_ratio as suggested in SimpleRL-Zero:
There was a problem hiding this comment.
This is just num_terminated_completions / num_total_completions ?
There was a problem hiding this comment.
No it's num_truncated_completions / num_total_completions (i.e. a high clip ratio means a lot of completions are too verbose)
qgallouedec
left a comment
There was a problem hiding this comment.
Nice! Can you just add these new metrics to the doc (there is a section for this "logged metrics")
…ggingface#3131) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> log answer key to wandb all Table HTML logging table bump patch hmm formatting html esacape reward isnt string [Liger] Liger KTO support (huggingface#2812) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> 🏃 Migrate CI to self-hosted runners (huggingface#3174) ❤️🩹 [CI] fix transformers dev CI failure (huggingface#3176) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> ⏯️ Fix: handle None inputs when resuming GRPO Trainer from checkpoint (huggingface#3148) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> 📎 Fix is_clipped to compute the effective clip_ratio (huggingface#3175) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Fix breaking typo for flash_attention reducing_memory_usage.md (huggingface#3190) Show unique prompts in GRPO WandB tables (huggingface#3191) 🐗 [CI] Fix trufflehog false positives (huggingface#3192) [GRPO] Improve completion length logging (huggingface#3188) preliminary openai compatible endpoint early concept, needs refining dedupe debug print some slop to work on unslop, missing hist almost valid pseudocode middle-ware monkey patch in mp.Pool()... remove unused More accurate .md need gpu renting lambda again much nicer small aider-chat and datasets conflict risky reqs change should work, but hacky some insights, but monkeypatching probably wont suffice refactor: Rewrite test script to use SWE-bench dataset with MultiProcessAider refactor: Remove logging statements from test.py one step closer finally, the correct abstraction doc todo unslop unslop undo accidental black cleaner abstraction new abstraction

What does this PR do?
Example: