I’m an LLM researcher with a passion for explaining scientific concepts to others.

GRPO++: Tricks for Making RL Actually Work by Cameron R. Wolfe, Ph.D.

How to go from the vanilla GRPO algorithm to functional RL training at scale...

Read on Substack