user avatar
Xiang Lisa Li
@XiangLisaLi2
PhD student at Stanford
Joined May 2019
Posts
  • user avatar
    arxiv.org/abs/2205.14217 We propose Diffusion-LM, a non-autoregressive language model based on continuous diffusions. It enables complex controllable generation. We can steer the LM to generate text with desired syntax structure ( [S [NP...VP…]]) and semantic content (name=Coupa)
  • user avatar
    arxiv.org/abs/2210.15097 We propose contrastive decoding (CD), a more reliable search objective for text generation by contrasting LMs of different sizes. CD takes a large LM (expert LM e.g. OPT-13b) and a small LM (amateur LM e.g. OPT-125m) and maximizes their logprob difference
  • user avatar
    arxiv.org/abs/2407.08351 LM performance on existing benchmarks is highly correlated. How do we build novel benchmarks that reveal previously unknown trends? We propose AutoBencher: it casts benchmark creation as an optimization problem with a novelty term in the objective.
  • user avatar
    Can we get language models to exhibit certain behaviors? We train investigator models to elicit target behaviors from LMs, which helps us proactively detect harmful responses and hallucination!
    Excited to finally share what I’ve been up to at @TransluceAI: training Investigator Agents to elicit behaviors in LMs (including harmful responses and hallucinations)!
  • user avatar
    Replying to @adveisner
    I am so happy and honored to be working with you. Thanks for introducing the field to me when I was a sophomore, and let me be part of Argo. Huge appreciation for all the research advising, enlightening discussions, writing tips, presentation tips, etc.
  • user avatar
    Replying to @XiangLisaLi2
    Exciting joint work with @jwthickstun @__ishaan @percyliang @tatsu_hashimoto 🙂 Code available at github.com/XiangLi1999/Di… Diffusion-LM shows strong performance in controllable generation, but it remains an open question whether it could match autoregressive LMs in PPL and speed.
  • user avatar
    I enjoyed chatting with @pdasigi and @anmarasovic about my paper with @percyliang on prefix-tuning. Thanks for the invitation and I am very grateful to have this opportunity to talk about my work! 😀
    #nlphighlights 126: We invited Lisa Li (@XiangLisaLi2) to talk about Prefix-tuning, her recently proposed efficient alternative to finetuning. @anmarasovic and I had a great time discussing this interesting work with Lisa. soundcloud.com/nlp-highlights…
  • user avatar
    Replying to @XiangLisaLi2
    Continuous diffusions have been successful for images (DDPM, DALL-E2), but text data is hard due to its discreteness. We add embedding and rounding to the standard diffusion model through an end-to-end objective for learning embeddings and a clamping technique for rounding.
  • user avatar
    Replying to @XiangLisaLi2
    Contrastive decoding is inspired by the observation that the failures of larger LMs are even more prevalent in smaller LMs (e.g., repetition, incoherence), and that this difference signals exactly which texts should be avoided/prefered.
  • user avatar
  • user avatar
    Replying to @XiangLisaLi2
    We consider 6 control tasks (e.g., semantic content, syntactic structures). Our method yields 2x the success rate of previous plug-and-play methods, often matches the fine-tuning oracle, and can even compose multiple controls at once.
  • user avatar
    Replying to @XiangLisaLi2
    CD requires zero training, and produces higher quality text than decoding from the larger LM alone. It also generalizes across model types (OPT, GPT2) and scales (1.5b, 6.7b, 13b) and significantly outperforms four strong decoding algorithms in automatic and human evaluations.
  • user avatar
    Replying to @XiangLisaLi2
    Exciting joint work with evanliu, @percyliang @tatsu_hashimoto 🙂Code available at
  • user avatar
    Replying to @XiangLisaLi2
    We use AutoBencher to find knowledge gaps in LLMs. it proposes evaluation topics; constructs high-quality QA datasets using additional information (e.g. retrieval and tools); and computes novelty scores of the datasets to inform the proposal of new evaluation topics.