user avatar
Sohee Yang
@soheeyang_
PhD student/research scientist intern at @ucl_nlp/@GoogleDeepMind (50/50 split). Previously MS at @kaist_ai and research engineer at Naver Clova. #NLProc & ML
London, United Kingdom
Joined August 2020
  • Pinned
    user avatar
    Our paper "Do Large Language Models Perform Latent Multi-Hop Reasoning without exploiting shortcuts?" will be presented at #ACL2025 today. 📍 Mon 18:00-19:30 Findings Posters (Hall X4 X5) Please visit our poster if you are interested! ✨
    🚨 New Paper 🚨 Can LLMs perform latent multi-hop reasoning without exploiting shortcuts? We find the answer is yes – they can recall and compose facts not seen together in training or guessing the answer, but success greatly depends on the type of the bridge entity (80%+ for
    GIF
  • user avatar
    🚨 New Paper 🚨 LLMs excel at storing facts & in-context reasoning like CoT. But do they latently💭 reason over their parametric knowledge without answering step-by-step? We found positive evidence 👀 But it varies for different relation types, and scaling doesn't help much! 1/N
    GIF
  • user avatar
    1/9 Excited to share "Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis". We've performed a unified evaluation & analysis on probability-based prompt selection methods, increasing the effectiveness from 87.79% to 99.44%! soheeyang.github.io/publication/ya…
  • user avatar
    It’s been 8 months since I've worked with burning pain on all fingers from typing. Three doctors say I may never be able to code again if I keep using a keyboard. I finally stopped my research pjt and it's a really tough time for one who loves to work. Any advice is appreciated..
  • user avatar
    I uploaded to HF my DPR reproduction (arxiv.org/abs/2004.04906) trained solely on TriviaQA that hasn't been officially released and RDR (arxiv.org/abs/2010.10999) trained on NQ & TriviaQA with retrieval accuracy higher than DPR. Check huggingface.co/soheeyang if you are interested!
  • user avatar
    Really happy to see that @seo_minjoon's and my submission to the "Systems Under 500Mb" track of EfficientQA #NeurIPS2020 competition ranked 2nd in the automatic evaluation and 1st in manual evaluation! 😀 Thanks a lot to the organizers for setting up this exciting challenge!
  • user avatar
    Thanks @arankomatsuzaki for introducing our work! 🥰
    Google presents: Do Large Language Models Latently Perform Multi-Hop Reasoning? Finds strong evidence of latent multi-hop reasoning for the prompts of certain relation types, with the reasoning pathway used in more than 80% of the prompts arxiv.org/abs/2402.16837
  • user avatar
    Excited to share that our short paper "Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering" has been accepted at #NAACL2021! We will release the code and the final version of the paper in a month. Joint work with @seo_minjoon 😆
    Really happy to see that @seo_minjoon's and my submission to the "Systems Under 500Mb" track of EfficientQA #NeurIPS2020 competition ranked 2nd in the automatic evaluation and 1st in manual evaluation! 😀 Thanks a lot to the organizers for setting up this exciting challenge!
  • user avatar
    If you've ever wondered how to update the knowledge stored in the parameters of pretrained language models in a scalable way and how to evaluate if the update is successful, check out our new preprint "Towards Continual Knowledge Learning of Language Models"!
    Pretrained language models encode world knowledge 🌐 📚 in their parameters. Then, can we train ever-changing LMs 🤖 that update their implicit knowledge as the world changes? Joint work w/ @SeonghyeonYe @soheeyang_ LG AI Research @seo_minjoon 📰 arxiv.org/abs/2110.03215 1/N
  • user avatar
    Replying to @soheeyang_
    Work done with my amazing collaborators @elenagri_, @KassnerNora, @megamor2, @riedelcastro at @GoogleDeepMind ✨ Check out our paper for full details 👉 arxiv.org/abs/2402.16837 🧵🔚
  • user avatar
    Amazing work! The idea of retrieving QA pairs to solve open-domain QA is itself really fresh (I was super amazed at the EfficientQA challenge) and the competitive performance of the model with its high efficiency and flexibility is even more remarkable.
    🚨 New work 🚨 “PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them”. Read the paper here: arxiv.org/abs/2102.07033, and check out the thread below w/ @mindjimmy,@likicode, @PMinervini, @HeinrichKuttler,@olapiktus, Pontus Stenetorp, @riedelcastro. 1/N
  • user avatar
    Replying to @soheeyang_
    Moreover, while the first hop of reasoning shows a clear scaling trend with increasing model size, the utilization of the second-hop reasoning doesn't scale similarly. This might be the reason for the compositionality gap reported in the work of @OfirPress et al., 2023. 9/N
  • user avatar
    Replying to @soheeyang_
    Why does it matter? If LLMs do latent multi-hop reasoning, it means that they can connect & traverse through knowledge in parameters instead of redundantly storing & recalling information. Strengthening such behavior may enhance parameter efficiency & controllability of LLMs. 3/N
  • user avatar
    Replying to @soheeyang_
    Consider this: LLMs can identify "Lula" as Stevie Wonder's mother. However, if asked about "the mother of the singer of 'Superstition'" without mentioning Stevie, to what extent today's LLMs latently deduce the answer by recalling and utilizing its parametric knowledge alone? 2/N