user avatar
Rachit Bansal
@rach_it_
PhD’ing @Harvard
Cambridge, MA
Joined March 2019
Posts
  • user avatar
    I am pleased to share that I'll be joining @Harvard as a PhD student this Fall. Looking forward to work with @elmelis, @wattenberg, @viegasf, et al. at SEAS! I'll be supported by a @KempnerInst fellowship, and am keen to further our understanding & usability of large ML models!
  • user avatar
    Extending an LLM for new knowledge sources is tedious—fine-tuning is expensive/causes forgetting, LoRA is restrictive. Excited to share our work where we show that an LLM can be efficiently *composed* with specialized (L)LMs to enable new tasks! arxiv.org/abs/2401.02412 🧵(1/8)
    The title of our paper: "LLM Augmented LLMs: Expanding Capabilities through Composition". All authors: "Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Shikhar Vashishth, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar", and their affiliations: Google Research, India and Google DeepMind.

Figure shows an overview of our framework: To augment an anchor LLM (mB) with new capabilities through composition with a specialized augmenting model (mA). Figure illustrates three mA with different
capabilities: key-value mapping (left), low-resource languages (center), and code (right). Models mA and mB remain unchanged during composition. A few additional parameters are learnt over models’ layer representations.
  • user avatar
    For people in the grad school application cycle: I am reserving some time every day for the next month to review statements, discuss lists, talk about your thoughts & fears 🎎 Reserve some time here: calendly.com/rachitbansal-g Esp. keen to meet if you identify w/ a minority group.
  • user avatar
    You have an exciting use-case, you train a neural network, but would your model work for the many kind of (OOD) inputs it would see? In our #NeurIPS paper, we find answers studying the relationship between information organization & memorization! w/ @danish037 & @boknilev (1/7)
    The header of our paper showing the title, "Memorization of Information Reflect Memorization Patterns", and the author names- Rachit Bansal, Danish Pruthi, and Yonatan Belinkov, along with their respective affiliations.
  • user avatar
    Looking forward to presenting this work at #ICLR2024 next week in Vienna! 🇦🇹 Please stop by our poster on 8th (10:45am) if you are interested in efficient, modular, decentralized development of large models!
    Extending an LLM for new knowledge sources is tedious—fine-tuning is expensive/causes forgetting, LoRA is restrictive. Excited to share our work where we show that an LLM can be efficiently *composed* with specialized (L)LMs to enable new tasks! arxiv.org/abs/2401.02412 🧵(1/8)
    The title of our paper: "LLM Augmented LLMs: Expanding Capabilities through Composition". All authors: "Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Shikhar Vashishth, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar", and their affiliations: Google Research, India and Google DeepMind.

Figure shows an overview of our framework: To augment an anchor LLM (mB) with new capabilities through composition with a specialized augmenting model (mA). Figure illustrates three mA with different
capabilities: key-value mapping (left), low-resource languages (center), and code (right). Models mA and mB remain unchanged during composition. A few additional parameters are learnt over models’ layer representations.
  • user avatar
    This is enraging. The outrageous application fee at these schools is a serious factor towards non-inclusivity. It is not a joke. Here, I am listing a set of analogies depicting the magnitude of this problem (especially as an international student)👇 (0/n)
  • user avatar
    #NLPaperAlert: Our work "How Low is Too Low? A Computational Perspective on Extremely Low-Resource Languages" with @cdli_news was accepted at ACL SRW 2021 (@acl_srw). Elated. 📖 Read here: arxiv.org/abs/2105.14515 ⭐ Star here: linktr.ee/rachitbansal Thread 🔽 \1
  • user avatar
    Personal update: I would be spending the next several months at Technion, working on exciting problems with @boknilev and @technionnlp. Grateful and looking forward to being a part of this beautiful, vibrant community.
  • user avatar
    Replying to @rach_it_ @Harvard and 4 others
    I am greatly indebted to an incredible set of mentors, collaborators, and idols: @partha_p_t, @jainprateek_, @boknilev, @nsaphra, @kchonyc, @danish037. I am grateful to my friends (@akankshat1701, @BadolaKartikeya, @tiwarishabh16, @_toolazyto_) for all their love over the years.
  • user avatar
    Super excited to present this work at ICLR in Kigali w/ my super co-authors @jeevesh_juneja and @nsaphra! (So happy that the three of us finally met for the first time in person today). 🌟 Please do stop by at our poster on Wednesday, 3rd May, if you are around.
    We have been told that every training run goes to the same basin. (@jefrankle, 2019) That permutations will make everything connected. (@rahiment, 2021; @SamuelAinsworth, 2022) But is it really the case? Our work (@iclr) reveals, NO: arxiv.org/abs/2205.12411
  • user avatar
    I had an incredible time working on this with @nsaphra. We took a deep dive into loss surface connectivity of seemingly similar models ID yet drastically different OOD, and were intrigued by how much there is to learn. Special shout-out to @JunejaJeevesh for steering it upfront.
    - Mama, how does pretraining lead to high accuracy? - Well, dear, transfer selects a good loss basin that contains all finetuning runs. - But mama—why does OOD accuracy vary so much between models? 🧵 w @JunejaJeevesh @deaddarkmatter @kchonyc @JoaoSedoc arxiv.org/abs/2205.12411
  • user avatar
    Replying to @rach_it_
    We propose CALM—Composition to Augment Language Models: (i) Scales up LLMs on new tasks by *re-using* existing (L)LMs w/ very few new parameters & data, (ii) Keeps existing model weights intact, hence preserves original capabilities, (iii) Applies to diverse domains and settings.
    The abstract of our work: Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domains and tasks. In this work, we study the problem of efficient and practical composition of existing foundation models with more specific models to enable newer capabilities. To this end, we propose CALM -- Composition to Augment Language Models -- which introduces cross-attention between models to compose their representations and enable new capabilities. Salient features of CALM are: (i) Scales up LLMs on new tasks by 're-using' existing LLMs along with a few additional parameters and data, (ii) Existing model weights are kept intact, and hence preserves existing cap
  • user avatar
    Replying to @rach_it_
    Consider a toy example: You have some key-value pairs {x1: 10, x2: 7,..., xn: 2} to reason upon. You have an LLM that excels at reasoning but has no knowledge of the KV pairs. Composing a model trained on the pairs with the LLM enables reasoning over the pairs (x1+x8*xn = 38)!
    An overview of our key-value task along with a table that shows the results. Evaluation (accuracy (%)) for a synthetic key-value (KV) task. mA is trained to memorize the KV mappings while mB excels at arithmetic We see that a composition mA⊕B is able to perform arithmetic over held-out keys.
  • user avatar
    Replying to @rach_it_
    Coding: We compose an LM trained on the entire set of open-source GitHub code w/ an LLM where code is under-represented in its training data. We see significant gains across all tasks: Code explanation (CodeXGLUE), completion (HumanEval), and generation (MBPP). Again, unlike FT.
    Results of CALM for coding evaluations. Evaluations for code generation and understanding across three tasks: Code Completion
(CC), Text-to-Code (T2C), and Code-to-Text (C2T). Augmenting code understanding to mB using mA significantly improves performances across all datasets. mCode B represents a skyline where mB further pretrained on the DCode, which shows catastrophic forgetting of text generation task.