I am pleased to share that I'll be joining @Harvard as a PhD student this Fall. Looking forward to work with @elmelis, @wattenberg, @viegasf, et al. at SEAS!
I'll be supported by a @KempnerInst fellowship, and am keen to further our understanding & usability of large ML models!
Rachit Bansal
327 posts
PhD’ing @Harvard
- Extending an LLM for new knowledge sources is tedious—fine-tuning is expensive/causes forgetting, LoRA is restrictive. Excited to share our work where we show that an LLM can be efficiently *composed* with specialized (L)LMs to enable new tasks! arxiv.org/abs/2401.02412 🧵(1/8)
- For people in the grad school application cycle: I am reserving some time every day for the next month to review statements, discuss lists, talk about your thoughts & fears 🎎 Reserve some time here: calendly.com/rachitbansal-g Esp. keen to meet if you identify w/ a minority group.
- You have an exciting use-case, you train a neural network, but would your model work for the many kind of (OOD) inputs it would see? In our #NeurIPS paper, we find answers studying the relationship between information organization & memorization! w/ @danish037 & @boknilev (1/7)
- Looking forward to presenting this work at #ICLR2024 next week in Vienna! 🇦🇹 Please stop by our poster on 8th (10:45am) if you are interested in efficient, modular, decentralized development of large models!Extending an LLM for new knowledge sources is tedious—fine-tuning is expensive/causes forgetting, LoRA is restrictive. Excited to share our work where we show that an LLM can be efficiently *composed* with specialized (L)LMs to enable new tasks! arxiv.org/abs/2401.02412 🧵(1/8)
- This is enraging. The outrageous application fee at these schools is a serious factor towards non-inclusivity. It is not a joke. Here, I am listing a set of analogies depicting the magnitude of this problem (especially as an international student)👇 (0/n)
- #NLPaperAlert: Our work "How Low is Too Low? A Computational Perspective on Extremely Low-Resource Languages" with @cdli_news was accepted at ACL SRW 2021 (@acl_srw). Elated. 📖 Read here: arxiv.org/abs/2105.14515 ⭐ Star here: linktr.ee/rachitbansal Thread 🔽 \1
- Personal update: I would be spending the next several months at Technion, working on exciting problems with @boknilev and @technionnlp. Grateful and looking forward to being a part of this beautiful, vibrant community.
- Replying to @rach_it_ @Harvard and 4 othersI am greatly indebted to an incredible set of mentors, collaborators, and idols: @partha_p_t, @jainprateek_, @boknilev, @nsaphra, @kchonyc, @danish037. I am grateful to my friends (@akankshat1701, @BadolaKartikeya, @tiwarishabh16, @_toolazyto_) for all their love over the years.
- Super excited to present this work at ICLR in Kigali w/ my super co-authors @jeevesh_juneja and @nsaphra! (So happy that the three of us finally met for the first time in person today). 🌟 Please do stop by at our poster on Wednesday, 3rd May, if you are around.We have been told that every training run goes to the same basin. (@jefrankle, 2019) That permutations will make everything connected. (@rahiment, 2021; @SamuelAinsworth, 2022) But is it really the case? Our work (@iclr) reveals, NO: arxiv.org/abs/2205.12411
- I had an incredible time working on this with @nsaphra. We took a deep dive into loss surface connectivity of seemingly similar models ID yet drastically different OOD, and were intrigued by how much there is to learn. Special shout-out to @JunejaJeevesh for steering it upfront.- Mama, how does pretraining lead to high accuracy? - Well, dear, transfer selects a good loss basin that contains all finetuning runs. - But mama—why does OOD accuracy vary so much between models? 🧵 w @JunejaJeevesh @deaddarkmatter @kchonyc @JoaoSedoc arxiv.org/abs/2205.12411
- Replying to @rach_it_We propose CALM—Composition to Augment Language Models: (i) Scales up LLMs on new tasks by *re-using* existing (L)LMs w/ very few new parameters & data, (ii) Keeps existing model weights intact, hence preserves original capabilities, (iii) Applies to diverse domains and settings.
- Replying to @rach_it_Consider a toy example: You have some key-value pairs {x1: 10, x2: 7,..., xn: 2} to reason upon. You have an LLM that excels at reasoning but has no knowledge of the KV pairs. Composing a model trained on the pairs with the LLM enables reasoning over the pairs (x1+x8*xn = 38)!
- Replying to @rach_it_Coding: We compose an LM trained on the entire set of open-source GitHub code w/ an LLM where code is under-represented in its training data. We see significant gains across all tasks: Code explanation (CodeXGLUE), completion (HumanEval), and generation (MBPP). Again, unlike FT.











