Jacob Buckman (@jacobmbuckman) / X

Jacob Buckman

2,436 posts

Jacob Buckman

@jacobmbuckman

Formerly @jhuclsp, @GoogleAI, @SCSatCMU, @MilaMontreal, founder @manifest__ai.

Joined December 2016

Jacob Buckman
@jacobmbuckman
Oct 29, 2025
The end of the transformer era marches slowly closer: we trained a completely attention-free foundation model at the 14B scale for only $4,000. The performance matches other models of similar scale, including transformers and hybrid models.
Manifest AI
@manifest__ai
Oct 29, 2025
Today we are releasing Brumby-14B-Base, the strongest attention-free base model around. manifestai.com/articles/relea…
Readers added context they thought people might want to knowReaders added context
The post implies from-scratch training of an attention-free model for $4,000, but Brumby-14B repurposes pretrained Qwen3-14B weights via "power retention layers" for rapid adaptation. The author agree they should have used different wording. manifestai.com/articles/relea… x.com/jacobmbuckman/…
Context is written by people who use X, and appears when rated helpful by others. Find out more.
214K
Jacob Buckman
@jacobmbuckman
Jan 10, 2023
Are you a PhD student struggling to get a job or internship? Jealous of the success of your more-cited peers? More concerned with your career than doing good science? Here is a thread of eight invaluable techniques to "improve" your publication and citation metrics. vv 🧵🧵🧵 vv
112K
Jacob Buckman
@jacobmbuckman
Sep 23, 2025
Transformers are broken. Today, Manifest AI is releasing Power Retention, an open-source architecture to replace them. More below 🧵:
Manifest AI
@manifest__ai
Sep 23, 2025
Today, we’re releasing Power Retention, a new architecture beyond Transformers. It enables LLMs to handle millions of tokens efficiently, unlocking long-context applications that were too costly before. manifestai.com/articles/relea…
00:00
95K
Jacob Buckman
@jacobmbuckman
May 30, 2021
New blog post, "Please Commit More Blatant Academic Fraud": jacobbuckman.com/2021-05-29-ple… Yes, I'm serious. Blatant academic fraud might be our best shot at developing the future of artificial intelligence.
jacobbuckman.com
Please Commit More Blatant Academic Fraud
This week, I was thrilled to read about the first well-documented case of explicit academic fraud in the artificial intelligence community. I hope that this is the beginning of a trend, and that...
Jacob Buckman
@jacobmbuckman
Jun 26, 2018
New blog post on understanding Tensorflow abstractions! jacobbuckman.com/post/tensorflo…
Jacob Buckman
@jacobmbuckman
Dec 27, 2021
I'm trying to write a good answer for "What is deep learning?" -- an answer that is specific but also complete. What's something that obviously deserves to be considered deep (supervised) learning, but doesn't fit this definition?
Jacob Buckman
@jacobmbuckman
Jan 18, 2020
New blog post with @carlesgelada -- "A Sober Look at Bayesian Neural Networks": jacobbuckman.com/2020-01-17-a-s… Without a good prior, Bayesian uncertainties are meaningless. We argue that BNN priors are likely quite poor, and concretely characterize one specific failure mode.
jacobbuckman.com
A Sober Look at Bayesian Neural Networks
by Carles Gelada and Jacob Buckman WARNING: This is an old version of this blogpost, and if you are a Bayesian, it might make you angry. Click here for an updated post with the same content. Context:...
Jacob Buckman
@jacobmbuckman
Jun 12, 2021
Paper writing tip: no matter the topic, always remember to cite (1) a random paper by Hinton from the 80s and (2) capsule networks, both within the first two paragraphs. Reviewers will assume that the paper is by Geoff Hinton and give you a free accept!
Jacob Buckman
@jacobmbuckman
May 9, 2021
The three worst ideas in deep learning are batchnorm, epochs, and overfitting
Jacob Buckman
@jacobmbuckman
Apr 9, 2023
This thread is, unfortunately, a pretty clear indication that he does *not* properly understand some of the concepts underlying DL. While "GPTs are not GANs" is true in the most literal sense, his description of the implications of this is totally off. 1/n
181K
Jacob Buckman
@jacobmbuckman
Sep 24, 2019
New blog post: Automation via RL jacobbuckman.com/2019-09-23-aut… RL research should be oriented around the eventual goal of solving real-world tasks with less effort. To progress towards this goal, we need to change how we motivate and evaluate RL algorithms.
Jacob Buckman
@jacobmbuckman
Jun 15, 2022
New blog post, "An Actually-Good Argument Against Naive AI Scaling": jacobbuckman.com/2022-06-14-an-… A response to @slatestarcodex and @GaryMarcus, in which I point out that they are both wrong. The current paradigm is certainly limited, but not for the reasons that Gary claims.
jacobbuckman.com
An Actually-Good Argument Against Naive AI Scaling
The past few days have seen a back-and-forth between Scott Alexander and Gary Marcus on the topic of AI scaling (post1, post2, post3, post4). Specifically, the debate is whether scaled-up language...
Jacob Buckman
@jacobmbuckman
Jan 8, 2024
Anyone who has trained a Transformer has viscerally felt its O(T^2) cost. It is not tractable to train Transformers end-to-end on long contexts. Here's a writeup of the research direction I believe is most likely to solve this: linear transformers. manifestai.com/blogposts/fast… 1/7
89K
Jacob Buckman
@jacobmbuckman
Jun 11, 2021
Permanent offer: if anyone wants a high-effort, public, non-anonymized, most-likely-critical review, please send a draft of your paper my way. I can't promise I will help you get into conferences, but I will do my best to help improve the quality of the science.