Log inSign up
Colin White
445 posts
user avatar
Colin White
@crwhite_ml
Evaluating generative AI models. Research Scientist at @MetaAI. Prev @abacusai, @SCSatCMU
Bay area
crwhite.ml
Joined June 2019
884
Following
1,608
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • user avatar
    Colin White
    @crwhite_ml
    Jun 20, 2024
    Wow! 😮 claude-3.5 is an extremely impressive overall model! It achieves the top score in **every category**, and substantially improves in reasoning! See for yourself with our interactive leaderboard: livebench.ai
    80K
  • user avatar
    Colin White
    @crwhite_ml
    Apr 9, 2021
    Best paper title of 2020? arxiv.org/abs/2002.01764
  • user avatar
    Colin White
    @crwhite_ml
    May 2, 2022
    What if you could predict the performance of any neural net in just 5 sec? Too good to be true? Check out our #ICLR2022 blog post, A Deeper Look at Zero-Cost Proxies for Lightweight NAS iclr-blog-track.github.io/2022/03/25/zer… With @khodakmoments @tu_renbo @sytelus @SebastienBubeck @debadeepta
  • user avatar
    Colin White
    @crwhite_ml
    Jun 20, 2024
    Replying to @crwhite_ml
    Nearly all questions in LiveBench are brand new, so there is no contamination, even for claude-3.5. It performs especially well on house_traversal, a spatial reasoning task which is brand new!
    53K
  • user avatar
    Colin White
    @crwhite_ml
    Jan 23, 2023
    In the past two years, there have been more than 1000 papers on neural architecture search🤯. What are the key insights? Introducing our new survey! arxiv.org/abs/2301.08727 with Mahmoud, @RheaSukthanker, Robin, Thomas, @ZelaArber, @debadeepta, @FrankRHutter #AutoML #NAS
    23K
  • user avatar
    Colin White
    @crwhite_ml
    Sep 13, 2024
    🚨🚨Early findings for o1-preview and o1-mini!🚨🚨 (1) The o1 family is unbelievably strong at hard reasoning problems! o1 perfectly solves a reasoning task that my collaborators and I designed for LLMs to achieve <60% performance, just 3 months ago 🤯🤯 (1 / ?)
    31K
  • user avatar
    Colin White
    @crwhite_ml
    Apr 17, 2023
    I am thrilled to join @Caltech as a postdoc working with @AnimaAnandkumar on AutoML for Science! 🔬 Thank you to my amazing colleagues at Abacus.AI for four amazing years, and I can't wait to follow the future accomplishments at Abacus!
    21K
  • user avatar
    Colin White
    @crwhite_ml
    Oct 28, 2019
    BANANAS: Bayesian optimization with neural architectures for neural architecture search. Paper: arxiv.org/abs/1910.11858 Github: github.com/naszilla/banan… Blog: medium.com/reality-engine… With @willieneis and Yash Savani @realityengines
    arXiv logo
    arxiv.org
    BANANAS: Bayesian Optimization with Neural Architectures for...
    Over the past half-decade, many methods have been considered for neural architecture search (NAS). Bayesian optimization (BO), which has long had success in hyperparameter optimization, has...
  • user avatar
    Colin White
    @crwhite_ml
    Oct 7, 2021
    Replying to @karpathy
    "A note on paper length. Expecting more text in this paper? Wondering if it’s a workshop paper we hastily submitted to ICLR? No. This paper presents a simple idea, one where we genuinely believe that a short paper presentation is more effective."
  • user avatar
    Colin White
    @crwhite_ml
    Jul 24, 2024
    🚨Llama 3.1 405B eval just dropped🚨 🥇 in instruction following 🥈 in reasoning On par with GPT-4o in math and coding It’s a great day for the open-source community!! Full evals on the challenging, contamination-free benchmark ➡️ livebench.ai
    9.7K
  • user avatar
    Colin White
    @crwhite_ml
    Jun 20, 2024
    Replying to @crwhite_ml
    Many of the benchmarks Anthropic reported are nearly saturated, with models achieving 88-96% performance. LiveBench is not saturated, so it shows the true improvement of claude-3.5! Stay tuned for next month when we release harder tasks! 🔗: livebench.ai
    4.3K
  • user avatar
    Colin White
    @crwhite_ml
    Feb 3, 2022
    #ICML2022 in Baltimore, Maryland is the first in-person general ML conference since NeurIPS 2019. July 17 to 23. Save the date! And, stay a few extra days in Baltimore to check out automl.cc !
  • user avatar
    Colin White
    @crwhite_ml
    May 15, 2024
    Replying to @pfau
    He's been a main driver behind major OpenAI results since Dota 2, but he keeps out of the limelight / social media. I think that is commendable!
    user avatar
    Sam Altman
    OpenAI
    @sama
    Mar 14, 2023
    GPT-4 was truly a team effort from our entire company, but the overall leadership and technical vision of Jakub Pachocki for the pretraining effort was remarkable and we wouldn’t be here without it
    9K
  • user avatar
    Colin White
    @crwhite_ml
    Aug 14, 2024
    Replying to @arena @lmsysorg and @OpenAI
    Here are the LiveBench scores for chatgpt-4o-latest! It ties gpt-4o-2024-05-13, yet gpt-4o-2024-08-06 is still the best GPT model according to livebench.ai!
    8.3K