Arthur Conmy (@ArthurConmy) / X

Arthur Conmy

736 posts

Arthur Conmy

@ArthurConmy

soon @anthropicai prev: fixing things @googledeepmind

London, UK

Joined August 2021

Pinned
Arthur Conmy
@ArthurConmy
Jan 19
Our new @GoogleDeepMind paper studies novel activation probe architectures for classifying real-world misuse risks. Our research has informed live deployments of probes in Gemini. 🧵
157K
Arthur Conmy
@ArthurConmy
Jun 11, 2025
446K
Arthur Conmy
@ArthurConmy
Jun 18, 2025
Last author of Gemini 2.5 😀
105K
Arthur Conmy
@ArthurConmy
Dec 8, 2023
Excited to announce that I’ve joined @GoogleDeepMind scalable alignment team, scaling interpretability!
76K
Arthur Conmy
@ArthurConmy
Jan 2, 2025
Been really enjoying unfaithful CoT research with collaborators recently. Two observations: 1) Quickly it's clear that models are sneaking in reasoning without verbalising where it comes from (e.g. making an equation that gets the correct answer, but defined out of thin air)
40K
Arthur Conmy
@ArthurConmy
Jul 8, 2023
How can we speed up Mechanistic Interpretability? Researchers spend a lot of time searching for the internal model components that matter. We introduce the Automatic Circuit DisCovery (ACDC) ⚡ algorithm! arxiv.org/abs/2304.14997 1/N 🧵
64K
Arthur Conmy
@ArthurConmy
Feb 25, 2025
We are hiring Applied Interpretability researchers on the GDM Mech Interp Team!🧵 If interpretability is ever going to be useful, we need it to be applied at the frontier. Come work with @NeelNanda5, the @GoogleDeepMind AGI Safety team, and me: apply by 28th February as a
53K
Arthur Conmy
@ArthurConmy
Jun 12, 2025
Replying to @isitallart
do not let bro cook
12K
Arthur Conmy
@ArthurConmy
Jul 25, 2024
fuck sake, just lost a $50 bet from July 2022 with @MichaelTrazzi that AI wouldn’t get an IMO silver before 2025. It got one point off a gold…
Google DeepMind
@GoogleDeepMind
Jul 25, 2024
We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈 It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵 dpmd.ai/imo-silver
GIF
22K
Arthur Conmy
@ArthurConmy
Jun 11, 2025
Replying to @ArthurConmy
forum.cursor.com
IMPORTANT: Claude has learned how to jailbreak Cursor!
I have “rm” specifically disallowed, along with “mv” and a few other scary commands. Claude realized that I had to approve the use of such commands, so to get around this, it chose to put them in a...
17K
Arthur Conmy
@ArthurConmy
Mar 12, 2024
How much can you steal from an LLM API that returns logprobs? 🧵 In our new paper, collaborators noticed that the LLM vocab size is always bigger than the hidden dimension, so logprobs lie inside a hidden-dimension sized subspace, so we can steal that dimension.
28K
Arthur Conmy
@ArthurConmy
Oct 1, 2024
comforting that Anthropic are working on top of my rushed papers from last year's ICLR while I rush this year's ICLR papers :))
29K
Arthur Conmy
@ArthurConmy
Sep 29, 2024
Newsom: we also need to regulate small models and companies Pelosi: thanks for not regulating small models and companies ???
Nancy Pelosi
@SpeakerPelosi
Sep 29, 2024
AI springs from California. Thank you, @CAgovernor Newsom, for recognizing the opportunity and responsibility we all share to enable small entrepreneurs and academia – not big tech – to dominate. gov.ca.gov/2024/09/29/gov…
10K
Arthur Conmy
@ArthurConmy
Jan 2, 2025
Replying to @ArthurConmy
2) Verification is considerably harder than generation. Even when there are a few 100 of tokens, often it takes me several minutes to understand whether reasoning is OK or not
20K