Ofir Press (@OfirPress) / X

Ofir Press

3,233 posts

Ofir Press

@OfirPress

I push the AI frontier by building tough benchmarks with amazing people. SWE-bench, SWE-agent, SciCode, AlgoTune. Postdoc @Princeton. PhD @nlpnoah @UW.

NYC

Joined June 2016

Pinned
Ofir Press
@OfirPress
May 5
1) Our team at Meta has a tough new coding benchmark challenging models to code entire programs including ffmpeg and the PHP compiler from scratch. 2) Top accuracy is 0% 3) We will be making the benchmark harder.
John Yang
@jyangballin
May 5
How much of SQLite, FFmpeg, PHP compiler can LMs code from scratch? Given just an executable and no starter code or internet access. Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end. 🧵
130K
Ofir Press
@OfirPress
Nov 19, 2023
My entire feed is OpenAI employees retweet Sam with the heart emoji. If the board doesn't let him back, he's going to start a new company and take a large chunk of those people with him. If the board does let him back, Ilya is going to leave and start a competitor. (1/2)
760K
Ofir Press
@OfirPress
Oct 4, 2022
We've found a new way to prompt language models that improves their ability to answer complex questions Our Self-ask prompt first has the model ask and answer simpler subquestions. This structure makes it easy to integrate Google Search into an LM. Watch our demo with GPT-3 🧵⬇️
00:00
Ofir Press
@OfirPress
Dec 6, 2023
There's no moat. You just need $400M and a bunch of good engineers and you can build your own GPT-4. Now we gotta get someone to build an open version.
221K
Ofir Press
@OfirPress
Feb 19, 2024
I just discovered regional prompting for image generation and I'm so impressed (wait till the end). From: reddit.com/r/StableDiffus…
00:00
87K
Ofir Press
@OfirPress
Sep 20, 2023
New (1h32m) video lecture: Transformers From Scratch: Building 5 Language Models at Increasing Complexity Levels youtu.be/s09NPN1BSdE It's an intuitive way to learn what every component of a modern transformer LM does and why they're there.
78K
Ofir Press
@OfirPress
Feb 19, 2024
Cool new idea from DeepMind: They evaluate LMs by giving them a piece of code, having them describe it, and then asking the LM to rewrite that code given only the description. The metric is the similarity between the original code and the rewritten code. semanticscholar.org/paper/Unsuperv…
57K
Ofir Press
@OfirPress
Nov 20, 2023
Can someone fix this table please? Satya should be at the top.
57K
Ofir Press
@OfirPress
Nov 19, 2023
Replying to @OfirPress
I'm sure this chaos and uncertainty sucks for all of those involved but if the world gets 2 strong competing LMing companies out of what used to be OpenAI, we'll all win... Especially if the Sam-led one ends up actually being a bit more open. (2/2)
67K
Ofir Press
@OfirPress
Aug 25, 2021
Since Transformer LMs were invented, we’ve wanted them to be able to read longer inputs during inference than they saw during training. Our Attention with Linear Biases enables this, in very few lines of code, without requiring extra params or runtime ofir.io/train_short_te… 🧵⬇
Ofir Press
@OfirPress
Oct 11, 2022
As language models grow in size they know more, but do they get better at reasoning? To test GPT-3, we generated lots of questions such as "What is the calling code of the birthplace of Adele?". We show that as GPT size grows, it does not improve its compositional abilities🧵⬇️
00:00
Ofir Press
@OfirPress
Dec 30, 2020
Everyone thinks that you have to increase the input length of language models to improve their performance. Our new Shortformer model shows that by *shortening* inputs performance improves while speed and memory efficiency go up. ⬇(1/n) ofir.io/shortformer.pdf (code below)
Ofir Press
@OfirPress
Apr 5, 2025
Transformers can work without using positional embeddings at all. Llama 4 uses positional embs for local attn but not globally. Our paper from 2022 shows why this works- the causal mask allows transformers to infer positions. arxiv.org/pdf/2203.16634
35K
Ofir Press
@OfirPress
Jun 9, 2023
Reddit launched in 2005. StackOverflow in 2008. Both are shutting off access to their data because they're annoyed that they aren't getting payed when it gets used for LM training. Silly move- the value of future data is miniscule given that we already have data from 2008-now.
133K