Amanda Bertsch (@abertsch72) / X

Amanda Bertsch

404 posts

Amanda Bertsch

@abertsch72

PhD student @LTIatCMU / @SCSatCMU / student researcher @allen_ai, researching long context + decoding | she/her | @ abertsch on bsky or by email (cs.cmu.edu)

cs.cmu.edu/~abertsch/

Joined August 2014

Pinned
Amanda Bertsch
@abertsch72
Nov 7, 2025
Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
81K
Amanda Bertsch
@abertsch72
May 4, 2023
What if we could run Transformer models without worrying about context length? With our new work Unlimiformer, you can jailbreak your current models to use unlimited length inputs! Preprint: arxiv.org/abs/2305.01625 Thread 🧵 (1/6)
arxiv.org
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Since the proposal of transformers, these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose Unlimiformer: a...
209K
Amanda Bertsch
@abertsch72
Oct 3, 2023
What do decoding methods including self-consistency, output ensembling, and range voting have in common? They’re all variants of Minimum Bayes Risk (MBR) decoding! This useful and easy-to-apply method generalizes many modern generation techniques! arxiv.org/abs/2310.01387 👇🧵
49K
Amanda Bertsch
@abertsch72
Sep 25, 2023
Now accepted to NeurIPS'23! Looking forward to talking to folks in New Orleans 🎉
Amanda Bertsch
@abertsch72
May 4, 2023
What if we could run Transformer models without worrying about context length? With our new work Unlimiformer, you can jailbreak your current models to use unlimited length inputs! Preprint: arxiv.org/abs/2305.01625 Thread 🧵 (1/6)
12K
Amanda Bertsch
@abertsch72
May 4, 2023
Replying to @abertsch72
Unlimiformer is a retrieval-augmentation method for encoder-decoder models. It dynamically updates the context window at each decoding step, so that each head in each layer attends to its own top-k input tokens. Unlimiformer can be added to any pretrained encoder-decoder! (2/6)
7.4K
Amanda Bertsch
@abertsch72
May 4, 2023
Replying to @abertsch72
This is joint work with the wonderful @urialon1, @gneubig, and Matt Gormley! Preprint: arxiv.org/abs/2305.01625 Github: github.com/abertsch72/unl… (6/6)
arxiv.org
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Since the proposal of transformers, these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose Unlimiformer: a...
3.6K
Amanda Bertsch
@abertsch72
Oct 28, 2022
What if you could get the benefits of extractive summarization in the dialogue domain? Check out our #EMNLP2022 Findings paper "He Said, She Said: Style Transfer for Shifting the Perspectives of Dialogues"! paper link: arxiv.org/abs/2210.15462 🧵👇 (1/9)
arxiv.org
He Said, She Said: Style Transfer for Shifting the Perspective of Dialogues
In this work, we define a new style transfer task: perspective shift, which reframes a dialogue from informal first person to a formal third person rephrasing of the text. This task requires...
Amanda Bertsch
@abertsch72
May 4, 2023
Replying to @abertsch72
At test time, Unlimiformer can summarize *books* of more than 300k tokens without truncation! And inference cost increases sub-linearly with input length, so summarizing 100k token inputs is only 1.5x slower than summarizing 1k token inputs. (5/6)
4.3K
Amanda Bertsch
@abertsch72
Dec 12, 2023
I'm at NeurIPS this week! Would love to chat with folks interested in long contexts, fair evaluation, decoding, and/or philosophy of science. I'll also be presenting our paper Unlimiformer (below) in poster session 5: 10:45-12:45 Thursday. Check it out at board # 524!
Amanda Bertsch
@abertsch72
May 4, 2023
What if we could run Transformer models without worrying about context length? With our new work Unlimiformer, you can jailbreak your current models to use unlimited length inputs! Preprint: arxiv.org/abs/2305.01625 Thread 🧵 (1/6)
6.4K
Amanda Bertsch
@abertsch72
Dec 7, 2023
Excited to see new and old friends this week in Singapore! Happy to chat about long context reasoning, summarization, decoding, philosophy of science, or finding good vegetarian food at hawker centers :) My collaborators and I are presenting several papers, starting today! ⬇️
4.7K
Amanda Bertsch
@abertsch72
May 4, 2023
Replying to @abertsch72
We reformulate attention for efficient retrieval using a single datastore, shared across all cross-attention heads in all decoder layers. This requires storing only a single vector per input token, allowing inputs of more than half a million tokens on a single GPU! (3/6)
4.4K
Amanda Bertsch
@abertsch72
May 4, 2023
Replying to @abertsch72
Unlimiformer improves existing models without any fine-tuning, in cheap fine-tuning regimes, and with specialized training. Not only does it outperform other strong long-range transformers, but it can be applied on top of models such as LongFormer for further improvements! (4/6)
3.8K
Amanda Bertsch
@abertsch72
Nov 10, 2023
Replying to @nsaphra @enfleisig and @kchonyc
excited to read this, it looks really cool! you might also like our paper, which discusses some similar themes from a different methodological angle :) x.com/_sireesh/statu…
2.1K
Amanda Bertsch
@abertsch72
May 12, 2023
Check out our survey on using human feedback for generation!
Patrick Fernandes
@psanfernandes
May 12, 2023
*Human feedback* was the necessary secret sauce in making #chatgpt so human-like But what exactly is feedback? And how can we leverage it to improve our models? Check out our new survey on the use of (human) feedback in Natural Language Generation! arxiv.org/abs/2305.00955 1/16
1.3K