Can LLMs accurately aggregate information over long, information-dense texts? Not yet…
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
What if we could run Transformer models without worrying about context length?
With our new work Unlimiformer, you can jailbreak your current models to use unlimited length inputs!
Preprint: arxiv.org/abs/2305.01625
Thread 🧵 (1/6)
What do decoding methods including self-consistency, output ensembling, and range voting have in common? They’re all variants of Minimum Bayes Risk (MBR) decoding!
This useful and easy-to-apply method generalizes many modern generation techniques!
arxiv.org/abs/2310.01387 👇🧵
What if we could run Transformer models without worrying about context length?
With our new work Unlimiformer, you can jailbreak your current models to use unlimited length inputs!
Preprint: arxiv.org/abs/2305.01625
Thread 🧵 (1/6)
Unlimiformer is a retrieval-augmentation method for encoder-decoder models. It dynamically updates the context window at each decoding step, so that each head in each layer attends to its own top-k input tokens. Unlimiformer can be added to any pretrained encoder-decoder! (2/6)
What if you could get the benefits of extractive summarization in the dialogue domain? Check out our #EMNLP2022 Findings paper "He Said, She Said: Style Transfer for Shifting the Perspectives of Dialogues"!
paper link: arxiv.org/abs/2210.15462
🧵👇 (1/9)
At test time, Unlimiformer can summarize *books* of more than 300k tokens without truncation!
And inference cost increases sub-linearly with input length, so summarizing 100k token inputs is only 1.5x slower than summarizing 1k token inputs. (5/6)
I'm at NeurIPS this week! Would love to chat with folks interested in long contexts, fair evaluation, decoding, and/or philosophy of science.
I'll also be presenting our paper Unlimiformer (below) in poster session 5: 10:45-12:45 Thursday. Check it out at board # 524!
What if we could run Transformer models without worrying about context length?
With our new work Unlimiformer, you can jailbreak your current models to use unlimited length inputs!
Preprint: arxiv.org/abs/2305.01625
Thread 🧵 (1/6)
Excited to see new and old friends this week in Singapore! Happy to chat about long context reasoning, summarization, decoding, philosophy of science, or finding good vegetarian food at hawker centers :)
My collaborators and I are presenting several papers, starting today! ⬇️
We reformulate attention for efficient retrieval using a single datastore, shared across all cross-attention heads in all decoder layers. This requires storing only a single vector per input token, allowing inputs of more than half a million tokens on a single GPU! (3/6)
Unlimiformer improves existing models without any fine-tuning, in cheap fine-tuning regimes, and with specialized training. Not only does it outperform other strong long-range transformers, but it can be applied on top of models such as LongFormer for further improvements! (4/6)
excited to read this, it looks really cool! you might also like our paper, which discusses some similar themes from a different methodological angle :)
x.com/_sireesh/statu…
*Human feedback* was the necessary secret sauce in making #chatgpt so human-like
But what exactly is feedback? And how can we leverage it to improve our models?
Check out our new survey on the use of (human) feedback in Natural Language Generation!
arxiv.org/abs/2305.00955
1/16