Pinned
Reinforcing Recursive Language Models
Can a 4B model learn to recursively call itself to answer hard long-context questions?
We RL fine-tuned a small model to behave as a native RLM.
On evidence selection across scientific papers, our 4B RLM matches Sonnet 4.6 in quality






