user avatar
Tom Brown
@NotTomBrown
Co-founder and Chief Compute Officer @AnthropicAI
SF
Joined June 2009
Posts
  • user avatar
    (1/4) Learning ML engineering is a long slog even for legendary hackers like @gdb. IMO, the two hardest parts of ML eng are: 1) Feedback loops are measured in minutes or days in ML (compared to seconds in normal eng) 2) Errors are often silent in ML
    How I became a machine learning practitioner: blog.gregbrockman.com/how-i-became-a… (Spoiler alert: you can too!)
  • user avatar
    Training/eval'ing GPT-3 involved a bunch of gnarly distributed system problems (which I love, but are an acquired taste tbh). The API hides those messy details so you can use normal python w/ a tight feedback loop. Gave me the same tingles as switching from TF to pytorch 😊
    We're releasing an API for accessing new AI models developed by OpenAI. You can "program" the API in natural language with just a few examples of your task. See how companies are using the API today, or join our waitlist: beta.openai.com
    00:00
  • user avatar
    An illustration of @OpenAI Universe. Each dot is a task in task space. You can measure the power of an AI by the range of tasks it solves.
  • user avatar
    Replying to @NotTomBrown
    ML dev speed hack #0 - Overfit a single batch - Before doing anything else, verify that your model can memorize the labels for a single batch and quickly bring the loss to zero - This is fast to run, and if the model can't do this, then you know it is broken
  • user avatar
    My personal experience with GPT-3 is similar to Max's. The model's surprisingly capable, but still has many weaknesses (which we tried our best to point out in the GPT-3 paper). I expect the future to be shiny, but getting there will need a lot of work from the whole community.
    New blog post up: so, you've probably seen all the tweets about GPT-3. GPT-3 is objectively a step forward in the field of AI text-generation, but the current hype on VC Twitter misrepresents the model's current capabilities. GPT-3 isn't magic. minimaxir.com/2020/07/gpt3-e…
  • user avatar
    Replying to @NotTomBrown
    ML dev speed hack #2 - Assert tensor shapes - Wrong shapes due to silent broadcasting or reduction is an extreme hot spot for silent errors, asserting on shapes (in torch or TF) makes them loud - If you're ever tempted to write shapes in a comment, make an assert instead
  • user avatar
    Excited to get to work with AWS and Annapurna Labs on optimizing Trainium from silicon to software. Our team’s been having fun going deep into the Neuron stack to get as close as possible to 100% peak theoretical performance.
    We're expanding our collaboration with AWS. This includes a new $4 billion investment from Amazon and establishes AWS as our primary cloud and training partner. anthropic.com/news/anthropic…
  • user avatar
    We've long had a culture of pair-programming at Anthropic, with one engineer as the Driver and one as the Navigator. It's been interesting to watch Claude rapidly becoming proficient in the Driver role. We're hiring for great Navigators :)
    Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.
    00:00
  • user avatar
    I love these new models. Excited to see how the world will put them to work.
    Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
    A table of Claude 3 model family benchmarks. Claude 3 Opus, the most capable model, exceeds SOTA across reasoning, math, code, and other evaluations versus GPT-4 and Gemini Ultra.
  • user avatar
    Replying to @NotTomBrown
    (2/4) Most ML people deal with silent errors and slow feedback loops via the "ratchet" approach: 1) Start with known working model 2) Record learning curves on small task (~1min to train) 3) Make a tiny code change 4) Inspect curves 5) Run full training after ~5 tiny changes
  • user avatar
    This is awesome! Language models do a form of data compression, so they can help people who have limited bandwidth from their bodies due to mobility issues.
    Typing using only 4 keys is challenging! This is my first go at making a semantic keyboard, which works by guiding a language model to write a text for you. Using GPT-3:
    00:00
  • user avatar
    Our work on the Adversarial Patch covered by @BBC. Glad to see mainstream media interested in ML security. Not sure what's going on with that photoshopped toast...
  • user avatar
    Replying to @NotTomBrown
    ML dev speed hack #1 - PyTorch over TF - Time to first step is faster b/c no static graph compilation - Easier to get loud errors via assertions within the code - Easier to drop into debugger and inspect tensors (TF2.0 may solve some of these problems but is still raw)
  • user avatar
    Now seems like a good time to mention that we’re always looking for ways to more efficiently turn raw compute into useful safety research. If you know of great software engineers who are interested in building big machines then have them message me at [email protected]
    We’ve raised $580 million in a Series B. This will help us further develop our research to build usable, reliable AI systems. Find out more: anthropic.com/news/announcem…