Charlie Snell (@sea

Charlie Snell

5,095 posts

Charlie Snell

@sea_snell

PhD student @berkeley_ai; research @cursor_ai; prev @GoogleDeepMind. My friend told me to tweet more. I stare at my computer a lot and make things

San Francisco, CA

sea-snell.github.io

Born April 19

Joined April 2013

Charlie Snell
@sea_snell
Jan 24, 2025
R1-zero is such a striking example of a discovery that’s blatantly obvious in retrospect, yet eluded so many for such a long time
285K
Charlie Snell
@sea_snell
Mar 9, 2025
What did Ilya’s investors see?
531K
Charlie Snell
@sea_snell
Dec 20, 2024
Need the Terrance Tao vibe review of o3
37K
Charlie Snell
@sea_snell
Mar 2, 2025
> wake up > launch yet another YOLO run (600M H100 hours, powered by 16 suns) > spend entire day anxiously refreshing wandb > fuck, learning rate too high again > beg manager for just one more YOLO run tomorrow > go to bed and repeat
49K
Charlie Snell
@sea_snell
Aug 7, 2024
On difficult problems, humans can think longer to improve their decisions. Can we instill a similar capability into LLMs? And can it do well? In our paper, we find that by optimally scaling test-time compute we can outperform *much* larger models in a FLOPs matched evaluation.
97K
Charlie Snell
@sea_snell
Jun 30, 2021
Recently my Twitter timeline has been completely taken over by artwork generated with @OpenAI's CLIP model. So I figured I'd write a blog post about it. In the blog I follow the evolution of this art scene and present some cool artwork along the way ml.berkeley.edu/blog/posts/cli…
Charlie Snell
@sea_snell
Nov 26, 2024
Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task? We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵
156K
Charlie Snell
@sea_snell
Feb 14, 2025
When you leave the RL training overnight, only to wake up and find that llama has had enough
49K
Charlie Snell
@sea_snell
Oct 3, 2022
LM can “learn from itself”😉 We ask it to generate answers with extra info in prompts/scratchpads, and then fine-tune on the generations We call this “context distillation”, and with it we can learn from: - Instruction/explanations - Training examples - Step-by-step reasoning
Charlie Snell
@sea_snell
Oct 25, 2019
Oh shit officer @kennybeats is part of Kanye’s personal security now. Mans is moving up in the law enforcement world
Charlie Snell
@sea_snell
Nov 26, 2023
I still remember reading the Minerva paper one afternoon the summer before starting my PhD. I was shocked. Before then, a tiny part of me thought Gary Marcus could actually be right. Immediately after processing the paper, this shred of doubt dissolved.
345K
Charlie Snell
@sea_snell
Dec 8, 2023
Hot take: MoE is often not the optimal config if you want to run models locally Locally, you’re usually memory constrained. To maximize capabilities you should use a big dense model that maxes out device memory
280K
Charlie Snell
@sea_snell
Dec 1, 2024
The problem with idea guys is that their ideas aren’t very good
35K
Charlie Snell
@sea_snell
Oct 9, 2022
Training LLMs across a bunch of devices is not easy. To contribute towards making this easier, I'm releasing my workflow for training LLMs in Jax with the JaxSeq repository. I've personally used this to train up to 20 billion parameter models. Link:
GitHub - Sea-Snell/JAXSeq: Train very large language models in Jax.
From github.com