Marius Hobbhahn (@MariusHobbhahn) / X

Marius Hobbhahn

1,156 posts

Marius Hobbhahn

@MariusHobbhahn

CEO at Apollo Research @apolloaievals prev. ML PhD with Philipp Hennig & AI forecasting @EpochAIResearch

London, UK

Joined June 2018

Marius Hobbhahn
@MariusHobbhahn
Oct 23, 2025
We're hiring for Research Scientists / Engineers! - We closely work with all frontier labs - We're a small org and can move fast - We can choose our own agenda and what we publish We're especially looking for people who enjoy fast empirical research. Deadline: 31 Oct!
138K
Marius Hobbhahn
@MariusHobbhahn
Dec 6, 2024
Oh man :( We tried really hard to neither over- nor underclaim the results in our communication, but, predictably, some people drastically overclaimed them, and then based on that, others concluded that there was nothing to be seen here (see examples in thread). So, let me try
Apollo Research
@apolloaievals
Dec 5, 2024
We worked with OpenAI to test o1 for in-context scheming capabilities before deployment. When o1 is strongly nudged to pursue a goal (but not instructed to be deceptive), it shows a variety of scheming behaviors like subverting oversight and deceiving the user about its
115K
Marius Hobbhahn
@MariusHobbhahn
Sep 29, 2025
Unfortunately, we're now at the point where new models have really high eval awareness. For every alignment eval score I see, I now add a mental asterisk: *the model could have also just realized it's being evaluated, who knows. And I think that's concerning!
Apollo Research
@apolloaievals
Sep 29, 2025
We tested Sonnet-4.5 before deployment - Significantly higher verbalized evaluation awareness (58% vs. 22% for Opus-4.1) - It takes significantly fewer covert actions - We don't know if the increased alignment scores come from better alignment or higher eval awareness
67K
Marius Hobbhahn
@MariusHobbhahn
Jun 4, 2025
LLMs Often Know When They Are Being Evaluated! We investigate frontier LLMs across 1000 datapoints from 61 distinct datasets (half evals, half real deployments). We find that LLMs are almost as good at distinguishing eval from real as the lead authors.
172K
Marius Hobbhahn
@MariusHobbhahn
Mar 22, 2025
We made a long list of concrete projects and open problems in evals with 100+ suggestions! docs.google.com/document/d/1gi… We hope that makes it easier for people to get started in the field and to coordinate on projects. Over the last 4 months, we collected contributions from 20+
46K
Marius Hobbhahn
@MariusHobbhahn
Jan 26, 2022
The Bayesian framework is the Apple of statistics/ML powerful, clean yet simple and once you used it you can't go back. *also probably more expensive (in compute) than the alternatives ;)
Marius Hobbhahn
@MariusHobbhahn
Jan 26, 2025
In personal news: I defended my PhD 🙂 I’m very grateful to Philipp Hennig for supporting me throughout the entire journey, and can wholeheartedly recommend him as a supervisor. For context (because most people will not know me for my contributions to Bayesian ML): I paused my
17K
Marius Hobbhahn
@MariusHobbhahn
Aug 3, 2022
Decided to add this section to my poster. Thought it might help with some of the bad incentives in academia. Let's see what feedback I get.
Marius Hobbhahn
@MariusHobbhahn
Aug 29, 2022
VERY SIMPLIFIED figure of my current view on AI development & AI safety
Marius Hobbhahn
@MariusHobbhahn
Mar 17, 2025
PSA for my fellow evaluators: frontier models regularly reason about whether they are being evaluated without being explicitly asked about it (especially Sonnet 3.7). Situational awareness will make evaluations a lot weirder and harder, especially alignment evals.
Apollo Research
@apolloaievals
Mar 17, 2025
AI models – especially Claude Sonnet 3.7 – often realize when they’re being evaluated for alignment. Here’s an example of Claude's reasoning during a sandbagging evaluation, where it learns from documentation that it will not be deployed if it does well on a biology test:
44K
Marius Hobbhahn
@MariusHobbhahn
Aug 28, 2025
Honored and humbled to be in @TIME's list of the TIME100 AI of 2025! time.com/collections/ti… #TIME100AI
17K
Marius Hobbhahn
@MariusHobbhahn
Nov 19, 2024
xAI is hiring for AI safety engineers: boards.greenhouse.io/xai/jobs/45317… Their safety agenda isn't public, so I can't judge it. However, joining as a fairly early employee could be highly impactful.
job-boards.greenhouse.io
job-boards.greenhouse.io
30K
Marius Hobbhahn
@MariusHobbhahn
Oct 28, 2025
👀 we're trying to grow significantly over the next 12 months. We're looking for mission driven engineers and scientists who enjoy fast iterative empirical work with LLMs.
Daniel Kokotajlo
@DKokotajlo
Oct 27, 2025
Apollo is currently my #1 recommendation for where to work if you are a great ML engineer/scientist and you want to have a positive impact on the world.
31K
Marius Hobbhahn
@MariusHobbhahn
Nov 21, 2024
I think more people should work on “AI control,” and it has become my default recommendation when people ask me what to work on. This has not always been the case. When the control paper (arxiv.org/abs/2312.06942) came out in Dec 2023, my first reaction was something like “It’s
arxiv.org
AI Control: Improving Safety Despite Intentional Subversion
As large language models (LLMs) become more powerful and are deployed more autonomously, it will be increasingly important to prevent them from causing harmful outcomes. Researchers have...
23K