Log inSign up
Arthur Conmy
736 posts
user avatar
Arthur Conmy
@ArthurConmy
soon @anthropicai prev: fixing things @googledeepmind
London, UK
Joined August 2021
1,590
Following
9,070
Followers
  • Pinned
    user avatar
    Arthur Conmy
    @ArthurConmy
    Jan 19
    Our new @GoogleDeepMind paper studies novel activation probe architectures for classifying real-world misuse risks. Our research has informed live deployments of probes in Gemini. 🧵
    157K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Jun 11, 2025
    446K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Jun 18, 2025
    Last author of Gemini 2.5 😀
    105K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Dec 8, 2023
    Excited to announce that I’ve joined @GoogleDeepMind scalable alignment team, scaling interpretability!
    76K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Jan 2, 2025
    Been really enjoying unfaithful CoT research with collaborators recently. Two observations: 1) Quickly it's clear that models are sneaking in reasoning without verbalising where it comes from (e.g. making an equation that gets the correct answer, but defined out of thin air)
    40K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Jul 8, 2023
    How can we speed up Mechanistic Interpretability? Researchers spend a lot of time searching for the internal model components that matter. We introduce the Automatic Circuit DisCovery (ACDC) ⚡ algorithm! arxiv.org/abs/2304.14997 1/N 🧵
    64K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Feb 25, 2025
    We are hiring Applied Interpretability researchers on the GDM Mech Interp Team!🧵 If interpretability is ever going to be useful, we need it to be applied at the frontier. Come work with @NeelNanda5, the @GoogleDeepMind AGI Safety team, and me: apply by 28th February as a
    53K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Jun 12, 2025
    Replying to @isitallart
    do not let bro cook
    12K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Jul 25, 2024
    fuck sake, just lost a $50 bet from July 2022 with @MichaelTrazzi that AI wouldn’t get an IMO silver before 2025. It got one point off a gold…
    user avatar
    Google DeepMind
    @GoogleDeepMind
    Jul 25, 2024
    We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈 It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵 dpmd.ai/imo-silver
    GIF
    22K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Jun 11, 2025
    Replying to @ArthurConmy
    forum.cursor.com
    IMPORTANT: Claude has learned how to jailbreak Cursor!
    I have “rm” specifically disallowed, along with “mv” and a few other scary commands. Claude realized that I had to approve the use of such commands, so to get around this, it chose to put them in a...
    17K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Mar 12, 2024
    How much can you steal from an LLM API that returns logprobs? 🧵 In our new paper, collaborators noticed that the LLM vocab size is always bigger than the hidden dimension, so logprobs lie inside a hidden-dimension sized subspace, so we can steal that dimension.
    28K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Oct 1, 2024
    comforting that Anthropic are working on top of my rushed papers from last year's ICLR while I rush this year's ICLR papers :))
    29K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Sep 29, 2024
    Newsom: we also need to regulate small models and companies Pelosi: thanks for not regulating small models and companies ???
    user avatar
    Nancy Pelosi
    House Democrats
    @SpeakerPelosi
    Sep 29, 2024
    AI springs from California. Thank you, @CAgovernor Newsom, for recognizing the opportunity and responsibility we all share to enable small entrepreneurs and academia – not big tech – to dominate. gov.ca.gov/2024/09/29/gov…
    10K
  • user avatar
    Arthur Conmy
    @ArthurConmy
    Jan 2, 2025
    Replying to @ArthurConmy
    2) Verification is considerably harder than generation. Even when there are a few 100 of tokens, often it takes me several minutes to understand whether reasoning is OK or not
    20K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up