Qingxiu Dong (@qx

Qingxiu Dong

145 posts

Qingxiu Dong

@qx_dong

Research Scientist @GoogleDeepmind, #Gemini RL ✨ Prev: PhD @PKU1898, Intern @MSFTResearch Asia.

Joined August 2019

Qingxiu Dong
@qx_dong
Jun 10, 2025
⏰ We introduce Reinforcement Pre-Training (RPT🍒) — reframing next-token prediction as a reasoning task using RLVR ✅ General-purpose reasoning 📑 Scalable RL on web corpus 📈 Stronger pre-training + RLVR results 🚀 Allow allocate more compute on specific tokens
112K
Qingxiu Dong
@qx_dong
Oct 12, 2024
(Perhaps a bit late) Excited to announce our survey on ICL has been accepted to #EMNLP2024 main conf and been cited 1,000+ times! Thanks to all collaborators and contributors to this field! We've updated the survey arxiv.org/abs/2301.00234. Excited to keep pushing boundaries!
arxiv.org
A Survey on In-context Learning
With the increasing capabilities of large language models (LLMs), in-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP), where LLMs make predictions based...
20K
Qingxiu Dong
@qx_dong
Oct 15, 2024
OpenAI o1 scores 94.8% on MATH dataset😲 Then...how should we proceed to track and evaluate the next-gen LLMs' math skills? 👉Omni-Math: a new, challenging benchmark with 4k competition-level problems, where OpenAI o1-mini only achieves 60.54 acc Paper: huggingface.co/papers/2410.07…
16K
Qingxiu Dong
@qx_dong
Oct 11, 2024
How can we guide LLMs to continually expand their own capabilities with limited annotation? SynPO: a self-boosting paradigm training LLM to auto-learn generative rewards and synthesize preference data. After 4 iterations, Llama3&Mistral achieve over 22.1% win rate improvements
14K
Qingxiu Dong
@qx_dong
Nov 2, 2025
All my labmates and I from the PKU Computational Linguistics Lab will be in #Suzhou for #EMNLP2025 (Nov 3–9) ! Looking forward to meeting old and new friends. Always happy to grab a coffee and chat 🥰
16K
Qingxiu Dong
@qx_dong
Apr 29, 2025
So happy to reunite with old and new friends at ICLR! Had an amazing time exploring Singapore too! 🌟🇸🇬 #ICLR2025
7.4K
Qingxiu Dong
@qx_dong
Jun 10, 2025
Replying to @qx_dong
Paper:
Paper page - Reinforcement Pre-Training
From huggingface.co
4.9K
Qingxiu Dong
@qx_dong
Sep 12, 2024
BoFei's new survey on LLM alignment is here! 🐱 ▪️ breaks down preference learning into key components—model, data, feedback, and algorithm ▪️ offering a unified framework for better understanding ... Check out: shorturl.at/n6lSv Notion blog: shorturl.at/ighog
3.8K
Qingxiu Dong
@qx_dong
Oct 31, 2023
Discover how to reliably quantify knowledge in LLMs with our latest #NeurIPS2023 paper! - build a Bayesian network between the knowledge symbols and the textual aliases - assess 20 LLMs' factual knowledge with a statistical approach Read more👉arxiv.org/pdf/2305.10519…
2.9K
Qingxiu Dong
@qx_dong
Nov 11, 2024
About to arrive in #Miami 🌴 after a 30-hour flight for #EMNLP2024! Excited to see new and old friends :) I’d love to chat about data synthesis and deep reasoning for LLMs (or anything else) —feel free to reach out!
4.2K
Qingxiu Dong
@qx_dong
Dec 12, 2023
Attending #NeurIPS2023! Looking forward to meeting old and new friends. We will present our work on reliable knowledge assessment for #LLMs on 12/14 morning. Please feel free to DM/email if you’d like to catch up or chat on research :)
2.8K
Qingxiu Dong
@qx_dong
Oct 31, 2023
Explore the visual commonsense of LLMs & VaLMs in our #EMNLP2023 paper #ImageNetVC! Unveiling insights with a unique zero-shot evaluation dataset. Dive in 👉 arxiv.org/abs/2305.15028 #AI #MachineLearning
1.8K
Qingxiu Dong
@qx_dong
May 14, 2024
Amazed by the multimodal capabilities of GPT-4o!😺 But it seems GPT-4o still struggles to grasp the hidden messages in visual input—it mostly just sees what's on the surface. Here are some cases tested on GPT-4o: (For more on our paper and datasets, see arxiv.org/abs/2402.11281)
1.1K
Qingxiu Dong
@qx_dong
Jun 11, 2025
Thanks to @omarsar0 for sharing our work!
elvis
@omarsar0
Jun 10, 2025
Reinforcement Pre-Training New pre-training paradigm for LLMs just landed on arXiv! It incentivises effective next-token reasoning with RL. This unlocks richer reasoning capabilities using only raw text and intrinsic RL signals. A must-read! Bookmark it! Here are my notes:
5.9K