Pinned
What Makes a Base Language Model Suitable for RL?
Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”:
(1) Is the magic only happening on Qwen + Math?
(2) Does the "aha moment" only spark during math reasoning?
(3) Is evaluation hiding some tricky traps?
































