Log inSign up
Stella Li ✈️ ICML🇰🇷
422 posts
user avatar
Stella Li ✈️ ICML🇰🇷
@StellaLisy
PhD student @uwnlp | visiting researcher @AIatMeta | undergrad @jhuclsp #NLProc
Seattle, WA
stellalisy.com
Joined April 2022
536
Following
3,610
Followers
  • Pinned
    user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    May 7
    LMs can learn from human labels, training data, and stronger teachers. But what happens when all of these run out🫪 when the model is already at the frontier and there is no stronger external source to learn from❓ In EvoLM, we extract the model's own evaluative knowledge into
    36K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    May 27, 2025
    🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
    701K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    Jul 22, 2025
    WHY do you prefer something over another? Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵
    51K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    Nov 25, 2024
    OpenReview turned into Reddit🤯 Can we now add upvote/downvote buttons to reviews and rebuttals plz?? Would be a very rich and interesting source of preference data🤡
    user avatar
    Ravid Shwartz Ziv
    @ziv_ravid
    Nov 25, 2024
    Looking at ICLR submissions with the lowest score - What a work of art! 🧵
    32K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    Oct 2, 2025
    🚨What if solving a problem correctly isn't enough—cuz the WAY to reason about it based on your audience matters just as much⁉️ We introduce ✨personalized reasoning✨: proactively asking user preferences and adapting HOW models think Frontier models are not doing well at this!🧵
    34K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    Feb 21, 2025
    Asking the right questions can make or break decisions in high-stake fields like medicine, law, and beyond✴️ Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVELY seek information through better questions🏥❓ (co-led with @jiminmun_) 👉🏻🧵
    25K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    Jun 13, 2025
    Spurious Rewards was not all‼️We now present spurious PROMPTS🤔 check out our latest findings and discussion on evaluation: tinyurl.com/spurious-prompt. Who knew Lorem ipsum can bring 19.4% gains compared to default prompt👀 Also, arXiv is out🤩 arxiv.org/abs/2506.10947📄
    user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    May 27, 2025
    🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
    47K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    Apr 15, 2023
    Excited to share that I will be starting my PhD at @uwnlp next fall in the @tsvetshop. I’m so grateful for the support from my mentors and friends in the past few years at @jhuclsp. Looking forward to moving back to the west (better👀) coast for the next chapter!! #decisionday
    18K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    Dec 6, 2024
    31% of US adults use generative AI for healthcare🤯But most AI systems answer questions assertively—even when they don’t have the necessary context. Introducing #MediQ a framework that enables LLMs to recognize uncertainty🤔and ask the right questions❓when info is missing: 🧵
    29K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    May 27, 2025
    Replying to @StellaLisy
    💡Our hypothesis: RLVR amplifies reasoning patterns that already exist Qwen2.5-Math can uniquely do "code reasoning"-solving math by writing Python💻 (without execution) Code reasoning correlates with correctness (64% w/ vs 29% w/o) Spurious training amplifies code usage to 90%+
    40K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    Oct 4, 2025
    I will be at #COLM2025 next week, super excited to explore Montreal!🍁 I've been thinking about personalization, question-asking, multi-turn, RL etc. DM if you want to chat! Catch me at: 📍Poster for ALFA: Tue 1:30pm 💡Spotlight talk for PrefPalette: Thur 10:15am (poster 11am)
    17K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    Jun 4, 2025
    Excited to share more about Spurious Rewards! Also keep an eye out for some new experiments and arxiv coming soon 👀🔜
    user avatar
    Cohere Labs
    Cohere
    @Cohere_Labs
    Jun 4, 2025
    Next week on Wednesday, June 11th we're excited to welcome @StellaLisy for a session on "Spurious Rewards: Rethinking Training Signals in RLVR." Thanks to @AhmadMustafaAn1 for organizing this session! 🔥 Learn more: cohere.com/events/Cohere-…
    18K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    May 27, 2025
    Replying to @StellaLisy
    We empirically prove this with surgical experiments: 🐍 Directly rewarding string “python” → +11.8% performance 🚫 Random rewards BUT blocking code → gains disappear The "magic" is just surfacing useful patterns already learned in pre-training.
    12K
  • user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    May 27, 2025
    Replying to @StellaLisy
    🚨Future RLVR research should be validated on diverse models rather than a single de facto choice, as we show that it's easy to get significant gains on Qwen even with completely spurious reward signals. 📄 Details, code, and full paper in our blogpost: tinyurl.com/spurious-rewar…
    12K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up