⏰ We introduce Reinforcement Pre-Training (RPT🍒)
— reframing next-token prediction as a reasoning task using RLVR
✅ General-purpose reasoning
📑 Scalable RL on web corpus
📈 Stronger pre-training + RLVR results
🚀 Allow allocate more compute on specific tokens
Qingxiu Dong
145 posts
Joined August 2019
- (Perhaps a bit late) Excited to announce our survey on ICL has been accepted to #EMNLP2024 main conf and been cited 1,000+ times! Thanks to all collaborators and contributors to this field! We've updated the survey arxiv.org/abs/2301.00234. Excited to keep pushing boundaries!
- OpenAI o1 scores 94.8% on MATH dataset😲 Then...how should we proceed to track and evaluate the next-gen LLMs' math skills? 👉Omni-Math: a new, challenging benchmark with 4k competition-level problems, where OpenAI o1-mini only achieves 60.54 acc Paper: huggingface.co/papers/2410.07…
- How can we guide LLMs to continually expand their own capabilities with limited annotation? SynPO: a self-boosting paradigm training LLM to auto-learn generative rewards and synthesize preference data. After 4 iterations, Llama3&Mistral achieve over 22.1% win rate improvements
- All my labmates and I from the PKU Computational Linguistics Lab will be in #Suzhou for #EMNLP2025 (Nov 3–9) ! Looking forward to meeting old and new friends. Always happy to grab a coffee and chat 🥰
- So happy to reunite with old and new friends at ICLR! Had an amazing time exploring Singapore too! 🌟🇸🇬 #ICLR2025
- BoFei's new survey on LLM alignment is here! 🐱 ▪️ breaks down preference learning into key components—model, data, feedback, and algorithm ▪️ offering a unified framework for better understanding ... Check out: shorturl.at/n6lSv Notion blog: shorturl.at/ighog
- Discover how to reliably quantify knowledge in LLMs with our latest #NeurIPS2023 paper! - build a Bayesian network between the knowledge symbols and the textual aliases - assess 20 LLMs' factual knowledge with a statistical approach Read more👉arxiv.org/pdf/2305.10519…
- About to arrive in #Miami 🌴 after a 30-hour flight for #EMNLP2024! Excited to see new and old friends :) I’d love to chat about data synthesis and deep reasoning for LLMs (or anything else) —feel free to reach out!
- Attending #NeurIPS2023! Looking forward to meeting old and new friends. We will present our work on reliable knowledge assessment for #LLMs on 12/14 morning. Please feel free to DM/email if you’d like to catch up or chat on research :)
- Explore the visual commonsense of LLMs & VaLMs in our #EMNLP2023 paper #ImageNetVC! Unveiling insights with a unique zero-shot evaluation dataset. Dive in 👉 arxiv.org/abs/2305.15028 #AI #MachineLearning
- Amazed by the multimodal capabilities of GPT-4o!😺 But it seems GPT-4o still struggles to grasp the hidden messages in visual input—it mostly just sees what's on the surface. Here are some cases tested on GPT-4o: (For more on our paper and datasets, see arxiv.org/abs/2402.11281)
- Reinforcement Pre-Training New pre-training paradigm for LLMs just landed on arXiv! It incentivises effective next-token reasoning with RL. This unlocks richer reasoning capabilities using only raw text and intrinsic RL signals. A must-read! Bookmark it! Here are my notes:



















