We finally shipped TRL v1.0!!
stable APIs, broad integrations, and a design built to absorb whatever the field throws at it next. Let's go!
hf.co/blog/trl-v1
Last moments of closed-source AI 🪦 :
Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open training. open models, open collaboration.
🫵 Let's go!
☄️ GRPO now scales to 70B+ models with multi-node training and super-fast performance. Install the latest v0.16 version of TRL
pip install trl
With all these the freshest features and optimizations that we've added, you can train up to 60 times faster!
More details in the
🪂 Getting GRPO Done Right (Dr GRPO) is now in TRL
@zzlccc proved that scaling by the std introduces question-level difficulty bias! You can now remove this bias 🗑️
GRPO x Curriculum learning 😳
The only difference is that I sorted the dataset (math questions) by difficulty.
Do you agree that it's the kind of curve you'd expect?
But the most interesting question is, does it give better results? Answer in the thread 🧵 (0/n)
🚨 Big news! We decided that @huggingface’s post-training library, TRL, will natively supports training Vision Language Models 🖼️
This builds on our recent VLM support in SFTTrainer — and we’re not stopping until TRL is the #1 VLM training library 🥇
More here 👉
🤹♀️ GRPO Trainer in TRL now handles mixed objectives!
Simply return `None` if the reward function doesn’t apply to the sample.
More in the documentation!
Kudos to Shirin for contributing this feature to TRL.