user avatar
Danqi Chen
@danqi_chen
Princeton, NJ
Joined December 2009
Posts
  • user avatar
    I am going to present two papers at #COLM2025 tomorrow from 4:30-6:30pm, as none of our leading authors can attend due to visa issues. Haven't done poster presentations for years 🤣🤣 .... so I will do my best! #76: LongProc #80: Goedel-Prover v1
    Our Goedel-Prover V1 will be presented at COLM 2025 in Montreal this Wednesday afternoon! I won’t be there in person, but my amazing and renowned colleague @danqi_chen will be around to help with the poster — feel free to stop by!
  • user avatar
    An article written about me :)
    Danqi Chen works in natural language processing or #NLP, a fast-moving field that uses #AI to create machines that not only read documents written by humans but also assimilate and manipulate the knowledge that the documents contain. Read more here: t.ly/Vwg0V
  • user avatar
    New center at Princeton on large language models research. Come join us! 😍😍😍
    Princeton has a new Center for Language and Intelligence, researching LLMs + large AI models, as well as their interdisciplinary applications. Looking for postdocs/research scientists/engineers; attractive conditions. nlp.cs.princeton.edu/center-languag…
  • user avatar
    I am super excited about this paper. A new training approach for LMs with memory augmentation! * A simple and (maybe) better training objective for LMs? * With clever memory construction and data batching, better than kNN-LM, Transformer-XL etc.
    Very excited to share a preprint “Training Language Models with Memory Augmentation”! t.ly/6n2l We propose a new training objective TRIME for language modeling—inspired by contrastive learning—which aligns with both token embeddings and *in-batch memories*. 1/n
  • user avatar
    #NAACL2022 I am already in Seattle. This is my first conference since I became a faculty😂..... Let's catch up of course :) Oh, and all my students are here!
  • user avatar
    Very surprised and excited by this result. Contrastive learning can go a loooong way in NLP!
    💥 to share “SimCSE: Simple Contrastive Learning of Sentence Embeddings”. We show that a contrastive objective can be VERY effective with right *augmentation* or *datasets*. Large gains on STS tasks and unsup. SimCSE matches previous supervised results! bit.ly/3gqgh0d
  • user avatar
    Today, we released - ProLong: A set of long-context models (512K context ). Only trained on 5% of Llama-3.1 budget but strong results. - Helmet: A comprehensive eval for LCLMs. Important to get the evaluation right first! Kudos to team @gaotianyu1350 @_awettig @HowardYen1
    Very proud to introduce two of our recent long-context works: HELMET (best long-context benchmark imo): shorturl.at/JnBHD ProLong (a cont’d training & SFT recipe + a SoTA 512K 8B model): shorturl.at/XQV7a Here is a story of how we arrived there
  • user avatar
    I’ve just arrived in Vancouver and am excited to join the final stretch of #NeurIPS2024! This morning, we are presenting 3 papers 11am-2pm: - Edge pruning for finding Transformer circuits (#3111, spotlight) @AdithyaNLP - SimPO (#3410) @yumeng0818 @xiamengzhou - CharXiv (#5303)
  • user avatar
    Glad this SimPO paper is finally out. I am intrigued by its simplicity and effectiveness. The team has done a very impressive job in various experimental settings (and careful hyper-parameter tuning!) and in-depth analysis. Kudos to @yumeng0818 @xiamengzhou
    Introducing SimPO: Simpler & more effective Preference Optimization!🎉 Significantly outperforms DPO w/o a reference model!📈 Llama-3-8B-SimPO ranked among top on leaderboards!💪 ✅44.7% LC win rate on AlpacaEval 2 ✅33.8% win rate on Arena-Hard arxiv.org/abs/2405.14734 🧵[1/n]
  • user avatar
    We are planning the 2nd workshop on Machine Reading for Question Answering (MRQA): mrqa.github.io. This year we are adding a new shared task focusing on generalization of MRQA systems. Also features awesome speakers. Check it out and vote for us!
  • user avatar
    V. happy with this work! We’ve explored domain mixtures and quality filtering (including Alex’s previous work!), but what is even a “domain” in Common Crawl? Can we use these domains to better understand quality filters, and combine them for data curation? Cool visuals too!
    🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N
  • user avatar
    Mengzhou is on the job market this year, and she is awesome :-)
    SimPO is a new method from @PrincetonPLI for improving AI models using preference data. It is simpler than last year's DPO and often outperforms it. Within a few months it has been widely adopted in models that have hit the top of the chatbot arena leaderboard in their
  • user avatar
    Announcing the EfficientQA competition and #NeurIPS2020 workshop, a collaborative effort with @Princeton and @UW that challenges developers to create end-to-end open-domain question answering systems that are small, yet robust. Learn all about it ↓ goo.gle/2AVm3Vg
    efficientqa.github.io
    Efficient Open-Domain Question Answering
    The official website for the open domain question answering challenge at NeurIPS 2020.
  • user avatar
    I am at #NeurIPS2023 today! Students are presenting two oral papers: - @danfriedman0 Transformer Programs (Oral 3B / poster 3 #1509) - @SadhikaMalladi @gaotianyu1350 Memory-efficient zerothorder optimizer MeZO (Oral 4A / poster 4 #514) Come find us! More from Princeton 👇
    Look at the breadth of Princeton research being presented at @NeurIPSConf (happening now) - not just in computer science, but also from a range of other departments. PLI blog post for details: bit.ly/3RFioiX
    Logo for Neural Information Processing Systems (NeurIPS) conference.