user avatar
Charlie Snell
@sea_snell
PhD student @berkeley_ai; research @cursor_ai; prev @GoogleDeepMind. My friend told me to tweet more. I stare at my computer a lot and make things
San Francisco, CA
Born April 19
Joined April 2013
Posts
  • user avatar
    R1-zero is such a striking example of a discovery that’s blatantly obvious in retrospect, yet eluded so many for such a long time
  • user avatar
    What did Ilya’s investors see?
  • user avatar
    Need the Terrance Tao vibe review of o3
  • user avatar
    > wake up > launch yet another YOLO run (600M H100 hours, powered by 16 suns) > spend entire day anxiously refreshing wandb > fuck, learning rate too high again > beg manager for just one more YOLO run tomorrow > go to bed and repeat
  • user avatar
    On difficult problems, humans can think longer to improve their decisions. Can we instill a similar capability into LLMs? And can it do well? In our paper, we find that by optimally scaling test-time compute we can outperform *much* larger models in a FLOPs matched evaluation.
  • user avatar
    Recently my Twitter timeline has been completely taken over by artwork generated with @OpenAI's CLIP model. So I figured I'd write a blog post about it. In the blog I follow the evolution of this art scene and present some cool artwork along the way ml.berkeley.edu/blog/posts/cli…
  • user avatar
    Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task? We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵
  • user avatar
    When you leave the RL training overnight, only to wake up and find that llama has had enough
  • user avatar
    LM can “learn from itself”😉 We ask it to generate answers with extra info in prompts/scratchpads, and then fine-tune on the generations We call this “context distillation”, and with it we can learn from: - Instruction/explanations - Training examples - Step-by-step reasoning
  • user avatar
    Oh shit officer @kennybeats is part of Kanye’s personal security now. Mans is moving up in the law enforcement world
  • user avatar
    I still remember reading the Minerva paper one afternoon the summer before starting my PhD. I was shocked. Before then, a tiny part of me thought Gary Marcus could actually be right. Immediately after processing the paper, this shred of doubt dissolved.
  • user avatar
    Hot take: MoE is often not the optimal config if you want to run models locally Locally, you’re usually memory constrained. To maximize capabilities you should use a big dense model that maxes out device memory
  • user avatar
    The problem with idea guys is that their ideas aren’t very good
  • user avatar
    Training LLMs across a bunch of devices is not easy. To contribute towards making this easier, I'm releasing my workflow for training LLMs in Jax with the JaxSeq repository. I've personally used this to train up to 20 billion parameter models. Link: