Olga Golovneva (@OlgaNLP) / X

Olga Golovneva

153 posts

Olga Golovneva

@OlgaNLP

Doing research at Meta AI

Joined October 2022

Pinned
Olga Golovneva
@OlgaNLP
Jan 30
While others claim to do RL in pretraining, we actually did it :) To fix safety, factually, hallucinations at *pretraining* we ensure the model is trained to generate only high-quality safe tokens, even for unsafe/corrupted prompts.
Jason Weston
@jaseweston
Jan 30
📈Self-Improving Pretraining 📈 ✍️: arxiv.org/abs/2601.21343 Reinvents pretraining: no more next token prediction! - Uses existing LM from last self-improvement iteration to give rewards to pretrain new model on *sequences* - Large gains in factuality, safety & quality 🧵1/5
15K
Olga Golovneva
@OlgaNLP
May 30, 2024
Context matters! In our new work we propose context-aware positional encodings. More details in the linked paper!
Jason Weston
@jaseweston
May 30, 2024
🚨 Contextual Position Encoding (CoPE) 🚨 Context matters! CoPE is a new positional encoding method for transformers that takes into account *context*. - Can "count" distances per head dependent on need, e.g. i-th sentence or paragraph, words, verbs, etc. Not just tokens. -
120K
Olga Golovneva
@OlgaNLP
Oct 6, 2024
Our team at Meta AI (former FAIR Labs) is hiring 2025 research interns and postdoc. Research areas cover LLM reasoning, alignment, memory, and architectures. DM me if interested in chatting during @COLM_conf !
47K
Olga Golovneva
@OlgaNLP
Apr 2, 2025
We have been cooking! 👨‍🍳 🧵(1/6)
Jason Weston
@jaseweston
Apr 2, 2025
🚨Multi-Token Attention🚨 📝: arxiv.org/abs/2504.00927 Attention is critical for LLMs, but its weights are computed by single query & key vectors, limiting capability. MTA combines query, key & head operations over multiple tokens, improving performance in terms of PPL, std
53K
Olga Golovneva
@OlgaNLP
Sep 17, 2024
Today we have released the code for Contextual Position Encodings. Please, check it out in our GitHub repo: github.com/facebookresear… #opensource
Jason Weston
@jaseweston
May 30, 2024
🚨 Contextual Position Encoding (CoPE) 🚨 Context matters! CoPE is a new positional encoding method for transformers that takes into account *context*. - Can "count" distances per head dependent on need, e.g. i-th sentence or paragraph, words, verbs, etc. Not just tokens. -
8.7K
Olga Golovneva
@OlgaNLP
Apr 27, 2025
Thanks to all the organizers, it was a pleasure to attend and learn from great speakers!
Arthur Douillard
@Ar_Douillard
Apr 27, 2025
Replying to @Ar_Douillard @ahmetustun89 and 3 others
Collaborative and modular training by @OlgaNLP
4.1K
Olga Golovneva
@OlgaNLP
Jul 10, 2024
Happy to share that I have 2 papers accepted at COLM: Nursing the reversal curse, and Branch-Train-Mix! See you there in October! #COLM #COLM2024
6.9K
Olga Golovneva
@OlgaNLP
Apr 2, 2025
Replying to @OlgaNLP
MTA Recipe: - The high level goal is to make it possible to use the similarities of multiple vector pairs to determine where attention must focus. - We add convolutions for keys, queries, and attention heads to allow conditioning on neighboring tokens! 🧵(3/6)
1.1K
Olga Golovneva
@OlgaNLP
Apr 2, 2025
Replying to @OlgaNLP
Motivation: soft attention looks at two tokens at a time to weigh their importance. But often it’s not enough! Suppose you are reading a history book, and you want to find what happened in Rome in 1417. You need to match both city and date *mentioned together*. 🧵(2/6)
1.1K
Olga Golovneva
@OlgaNLP
Aug 6, 2024
It is very challenging to get the right recipe for synthetic data, but the results speak for themselves.
Jason Weston
@jaseweston
Aug 6, 2024
🚨New paper!🚨 Self-Taught Evaluators - Llama 3-70B trained w/ synthetic data *only* - Iteratively finds better judgments in training - Best LLM-as-a-Judge model on RewardBench (88.3, 88.7 w/ maj vote) - Outperforms bigger models or human labels arxiv.org/abs/2408.02666 🧵(1/4)
2K
Olga Golovneva
@OlgaNLP
May 2, 2024
Our Large LM (1.4B) finally finished training, and we have updated the paper with more exciting results! TL;DR: mixing training data with Random segment reversal not only resolves the reversal curse, but improves performance on the variety on benchmarks wrt data-matched models!
Olga Golovneva
@OlgaNLP
Mar 21, 2024
New paper! We propose simple yet effective data augmentation method for training LLMs, that improves model performance and resolves the reversal curse
15K
Olga Golovneva
@OlgaNLP
Mar 21, 2024
New paper! We propose simple yet effective data augmentation method for training LLMs, that improves model performance and resolves the reversal curse
Jason Weston
@jaseweston
Mar 21, 2024
🚨 Reverse Training to Nurse the Reversal Curse🚨 LLMs fail on “B is A” if only trained on "A is B". - Reverse training doubles training tokens by reversing strings - Outperforms data-matched standard baselines - Fixes issues on reversal tasks arxiv.org/pdf/2403.13799… 🧵(1/6)
3.5K
Olga Golovneva
@OlgaNLP
Apr 2, 2025
Replying to @OlgaNLP
Finally, we look at convolution kernels, many interesting patterns! This one for example is responsible for matching sequences. But what else did the model encode in kernels during pretraining? Check the paper for more & thanks for paying attention to all these tokens! 🙏 🧵(6/6)
1.1K
Olga Golovneva
@OlgaNLP
Apr 2, 2025
Replying to @OlgaNLP
We train a LLM and show that MTA improves… pretty much everywhere, but especially on long-range dependency tasks, where regular attention struggles. 🧵(5/6)
1.3K