Subscribe
Sign in
Home
Archive
About
Analyzing coding agent transcripts to upper bound productivity gains from AI agents
Introduction
Feb 17
Latest
Top
Frontier AI safety regulations: A reference for lab staff
Research Note by Miles Kodama and Michael Chen
Jan 29
Time Horizon 1.1
We’re releasing a new version of our time horizon estimates (TH1.1), using more tasks and a new eval infrastructure.
Jan 29
Early work on monitorability evaluations
Future AI systems may be capable enough to carry out sabotage, either via malicious real-world actions or via intentionally tampering with evaluations.
Jan 22
MALT: A Dataset of Natural and Prompted Behaviors That Threaten Eval Integrity
Access the dataset on Hugging Face:
Oct 14, 2025
Forecasting the Impacts of AI R&D Acceleration: Results of a Pilot Study
Introduction
Aug 20, 2025
Research Update: Algorithmic vs. Holistic Evaluation
TL;DR On 18 real tasks from two large open-source repositories, early-2025 AI agents often implement functionally correct code that cannot be easily…
Aug 13, 2025
Notes on Scientific Communication at METR
When writing our recent paper, Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, we thought hard about how to…
Aug 12, 2025
See all
METR
Research updates and other news from METR, a research nonprofit developing the science of autonomous AI evaluations.
Subscribe
METR
Subscribe
About
Archive
Sitemap
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts