Yuchen Zeng (@yzeng58) / X

Yuchen Zeng

154 posts

Yuchen Zeng

@yzeng58

Researcher @MSFTResearch, AI Frontiers Lab | Reasoning, Agent | Previously @Meta @MSFT_GSL @MITIBMLab @WisconsinCS

Redmond, WA

Joined March 2017

Pinned
Yuchen Zeng
@yzeng58
Apr 8
Reasoning models think hard — but all that thinking fills up your KV cache fast. Memento fixes this: the model compresses its own chain-of-thought mid-generation, flushing old KV entries after each block. 2-3× less peak KV cache, ~2× throughput — accuracy largely preserved.
Dimitris Papailiopoulos
@DimitrisPapail
Apr 8
Article
Memento: Teaching LLMs to Manage Their Own Context
We taught models to compress their own chain-of-thought mid-generation. Peak KV cache drops 2–3x, throughput nearly doubles, and the erased reasoning blocks leave traces in the KV cache that the model...
14K
Yuchen Zeng
@yzeng58
Feb 17, 2024
In-context learning (ICL) excels with LLMs, but what about MLLMs? 📜 Our paper: • Highlights an important problem: Text-to-Image ICL (T2I-ICL) • Introduces 🔥CoBSAT🔥, the first T2I-ICL dataset • Benchmarks MLLMs, explores challenges & enhances T2I-ICL performance 1/n 🧵
22K
Yuchen Zeng
@yzeng58
Oct 15, 2024
📢 Excited to share our latest research: "Parameter-Efficient Fine-Tuning of State Space Models" arxiv.org/abs/2410.09016 Existing PEFT works well for Transformers, but what about State Space Models like S4 and Mamba? Our study combines theory and empirics to show: not quite!
Kangwook Lee
@Kangwook_Lee
Oct 15, 2024
🚀 Excited to share our latest research: "Parameter-Efficient Fine-Tuning of SSMs" Summary: 🧵
arxiv.org
Parameter-Efficient Fine-Tuning of State Space Models
Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. However,...
19K
Yuchen Zeng
@yzeng58
Oct 27, 2023
We delve into the theory behind LoRA's remarkable empirical performance, showing that LoRA can adapt any model to exactly approximate a target model given a small rank! 🎯 "The Expressive Power of Low-Rank Adaptation" by me and my advisor @Kangwook_Lee
Kangwook Lee
@Kangwook_Lee
Oct 27, 2023
🧵 1/8 📣 Excited to share our new paper led by my student @yzeng58! "The Expressive Power of Low-Rank Adaptation" #LoRA #finetuning #LLM #diffusion arxiv.org/abs/2310.17513
arxiv.org
The Expressive Power of Low-Rank Adaptation
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that leverages low-rank adaptation of weight matrices, has emerged as a prevalent technique for fine-tuning pre-trained models...
21K
Yuchen Zeng
@yzeng58
Nov 11, 2024
🎉 Milestone: Our LIFT paper has hit 100+ citations! We introduced a simple method to adapt LLMs to new domains, and researchers are now achieving success with it across predictive chemistry, metamaterial physics & more! Check our work at
uw-madison-lee-lab.github.io
LIFT: Language-Interfaced Fine-Tuning
A framework that enables fine-tuning language models for non-language tasks without architectural changes, demonstrating competitive performance across classification and regression tasks.
21K
Yuchen Zeng
@yzeng58
Dec 15, 2023
Exciting day at #NeurIPS! Presenting two papers today: 1. The Expressive Power of Low-Rank Adaptation, 3:00-4:00 p.m., OPT workshop, 📍 Hall D. 2. Outlier-Robust Group Inference via Gradient Space Clustering, 10:30 a.m.-12:00 p.m., DistShift workshop, 📍 Room R06-R09 (Level 2).
11K
Yuchen Zeng
@yzeng58
Jun 14, 2022
Are you tired of changing model architectures and coding for different machine learning tasks? 🙌 Let the pretrained language model do the things for you by asking it: when x1 = 1 and x2 = 2, what is y? With appropriate fine-tuning, this works well for various non-language tasks!
Kangwook Lee
@Kangwook_Lee
Jun 14, 2022
😎! Finetuning a pretrained lang model (e.g., GPT3) has become a popular approach to solve many text-based tasks. This paradigm is making ML very accessible as all you need to prepare is text data for finetuning. Does it also work for non-text tasks? Surprisingly, yes!!! (1/8)
Yuchen Zeng
@yzeng58
May 1, 2023
[#ICLR Today 11:30 am at MH1-2-3-4 #157] Are rejected groups treated fairly under current fairness notions? We study this issue and solve it with our new fairness notion, Equal Improvability, which takes long-term impact into consideration!
Kangwook Lee
@Kangwook_Lee
May 1, 2023
1/5 Introducing Equal Improvability (EI), our new effort-based fairness notion for ML classifiers. With many existing definitions, why another? Current notions have key limitations! If you're at #ICLR2023, join today’s poster session @ 11:30 AM!
2.4K
Yuchen Zeng
@yzeng58
Oct 1, 2021
Yuchen Zeng
@yzeng58
Oct 27, 2023
Replying to @yzeng58 and @Kangwook_Lee
I am actively seeking summer internships for 2024, specifically in the areas of Large Language Models (LLMs), model adaptation, and evaluation. If you have any related openings, please feel free to direct message me. 😁
789
Yuchen Zeng
@yzeng58
Nov 11, 2024
Replying to @yzeng58
🎯 Currently on the job market - open to industry & postdoc positions in LLMs & MLLMs! If interested, please DM me directly or drop me an email [email protected]!
643
Yuchen Zeng
@yzeng58
Jun 27, 2024
Replying to @siyan_zhao
Congratulations on your excellent work! :) We also explored a similar problem in our 2022 NeurIPS publication, which you can find here: arxiv.org/abs/2206.06565. Our study compares the decision boundaries of "finetuned" LLM with those of traditional models.
709
Yuchen Zeng
@yzeng58
Feb 17, 2024
Replying to @yzeng58
2/n 🧵 Why T2I-ICL matters? Multimodal ICL expands ICL's capabilities to MLLMs. T2I-ICL, a novel aspect of Multimodal ICL, diverges from the commonly explored image-to-text direction, opening doors to innovative applications. See the image below for examples.
637
Yuchen Zeng
@yzeng58
Nov 11, 2024
Replying to @yzeng58
Back in 2022, our paper already demonstrated LLM's capability in regression, tabular classification, image classification, and even image generation. The core insight? Simply represent your data as sentences! This simple approach opened doors for applications across many fields.
649