Ximing Lu (@GXiming) / X

Ximing Lu

202 posts

Ximing Lu

@GXiming

PhD @uwcse @uwnlp.

Santa Clara, CA

gloriaximinglu.github.io

Joined February 2018

Pinned
Ximing Lu
@GXiming
Feb 3
There’s growing excitement around scaling up RLVR to get continuous gains with more compute. But in practice, improvements saturate on finite training data. 😱 Introducing Golden Goose 🦢✨, a simple trick to synthesize unlimited RLVR tasks 😎 from unverifiable internet text. 🌐
109K
Ximing Lu
@GXiming
Apr 22, 2025
With the rise of R1, search seems out of fashion? We prove the opposite! 😎 Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.
72K
Ximing Lu
@GXiming
Aug 12, 2025
🚀 How far can RL scaling take LLMs? Drop ProRLv2! 🔥We keep expanding LLM’s reasoning boundaries through 3,000+ RL steps over 5 domains and set a new state-of-the-art ✨ among 1.5B reasoning models. 🔗Full blog: research.nvidia.com/labs/lpr/prorl… 🤗Open model: huggingface.co/nvidia/Nemotro…
49K
Ximing Lu
@GXiming
Nov 22, 2024
Are LLMs 🤖 as creative as humans 👩‍🎓? Not quite! Introducing CREATIVITY INDEX: a metric that quantifies the linguistic creativity of a text by reconstructing it from existing text snippets on the web. Spoiler: professional human writers like Hemingway are still far more creative
36K
Ximing Lu
@GXiming
Jun 2, 2025
What happens when you ✨scale up RL✨? In our new work, Prolonged RL, we significantly scale RL training to >2k steps and >130k problems—and observe exciting, non-saturating gains as we spend more compute 🚀.
Andrew Zhao
@_AndrewZhao
Jun 2, 2025
RL scaling is here arxiv.org/pdf/2505.24864
14K
Ximing Lu
@GXiming
Jan 22, 2024
If you're interested in this direction, check out our paper Inference-Time Policy Adapters (IPA🍺). IPA guides a frozen LLM such as GPT-3 during decoding time through a lightweight policy adapter trained to optimize an arbitrary user objective with RL.
Sebastian Raschka
@rasbt
Jan 18, 2024
There's a new promising method for finetuning LLMs without modifying their weights called proxy-tuning (by Liu et al. arxiv.org/abs/2401.08565). How does it work? It's a simple decoding-time method where you modify the logits of the target LLM. In particular, you compute the
24K
Ximing Lu
@GXiming
Dec 9, 2023
I'm not able to travel to #EMNLP2023 due to visa issues, but my amazing co-author @faeze_brh is there and will present our work tomorrow Dec 10th at 9:00am-10:30am! ✨ Come and check it out! I'll attend #NeurIPS2023 next week. Feel free to DM me if you wanna chat! 🍹
Faeze Brahman
@faeze_brh
Dec 9, 2023
Can we efficiently tailor LLMs towards arbitrary user objectives without fine-tuning them?! The answer is ✅yes with IPA which combines RL and inference-time techniques! Come to our poster “Inference-time Policy Adapters” tomorrow Dec 10th at 9:00am-10:30am! #emnlp2023
7.9K
Ximing Lu
@GXiming
Dec 23, 2024
Check out our latest work "AI as Humanity's Salieri," featured in ✨News from Science ✨. Dive into how we quantify linguistic creativity and explore: Are LLMs 🤖 as creative as humans 👩‍🎓? Paper link:
Matthew Hutson
@SilverJacket
Dec 23, 2024
New research: AI writing is improving, but it still can’t match human creativity. My latest for @ScienceMagazine @NewsfromScience, on work by @GXiming of @UW et al. Thanks also @mircomusolesi and @VioletNPeng. Story link in reply.
arxiv.org
AI as Humanity's Salieri: Quantifying Linguistic Creativity of...
Creativity has long been considered one of the most difficult aspect of human intelligence for AI to mimic. However, the rise of Large Language Models (LLMs), like ChatGPT, has raised questions...
8.6K
Ximing Lu
@GXiming
Jun 7, 2021
Check out our new preprint:🍷𝓜𝓔𝓡𝓛𝓞𝓣🍷! This is joint work with my awesome co-authors: @rown🍷, @jmhessel🍷, @easycommit, @jae_sung_park96, @JizeCao, Ali Farhadi & @YejinChoinka. I'm especially grateful to work with my amazing joint first-authors @rown🍷and @jmhessel🍷!
Jack Hessel
@jmhessel
Jun 7, 2021
🍷Super excited about our new preprint!🍷 𝓜𝓔𝓡𝓛𝓞𝓣: Multimodal Script Knowledge Models! arxiv.org/abs/2106.02636 rowanzellers.com/merlot/ TL;DR: By pretraining on 6M youtube videos, we transfer with SoTA performance on 10+ tasks (e.g. Video QA) that require temporal reasoning
Ximing Lu
@GXiming
Nov 22, 2024
Replying to @GXiming
Moreover, we found that RLHF dramatically reduces the CREATIVITY INDEX of LLMs, by an average of 30.1%. This reduction is more significant at the verbatim level than the semantic level, indicating that LLMs may have converged to certain linguistic style preferred by humans during
4K
Ximing Lu
@GXiming
Dec 22, 2024
Excited to talk about our latest work, "AI as Humanity's Salieri," at Fireside Chat today at 7 PM PST! 🔥 app.ploutos.dev/streams/innoce…
4.9K
Ximing Lu
@GXiming
Nov 22, 2024
Replying to @GXiming
TLDR: We found the seemingly remarkable creativity of LLMs 🤖can be attributable in large part to the creativity of human-written texts on the web. In contrast, works by distinguished human authors 👩‍🎓cannot be easily replicated by merely assembling snippets from other works.
1.4K
Ximing Lu
@GXiming
Nov 22, 2024
Replying to @GXiming
We found CREATIVITY INDEX of human authors—specifically professional writers and historical figures—is on average 66.2% higher than that of LLMs. This gap is consistent across various domains—novel snippets, modern poems, and speech transcripts—at both verbatim and semantic
1.2K
Ximing Lu
@GXiming
Aug 15, 2025
The official blog of ProRLv2 is now live. 🔥 Check it out and see how we scale RL! 🚀
Ximing Lu
@GXiming
Aug 12, 2025
🚀 How far can RL scaling take LLMs? Drop ProRLv2! 🔥We keep expanding LLM’s reasoning boundaries through 3,000+ RL steps over 5 domains and set a new state-of-the-art ✨ among 1.5B reasoning models. 🔗Full blog: research.nvidia.com/labs/lpr/prorl… 🤗Open model: huggingface.co/nvidia/Nemotro…
Scaling LLM Reinforcement Learning with Prolonged Training Using ProRL v2 | NVIDIA Technical Blog
From developer.nvidia.com
824