Agents' Last Exam logoAgents' Last Exam
HomeBlogsTasksLeaderboardDocsContributeSign in

Agents' Last Exam

A large-scale benchmark for evaluating AI agents on real-world professional workflows with verifiable success criteria. Built by UC Berkeley RDI.

Resources

  • Demo
  • FAQ

Get Involved

  • Join Research
  • Submit Workflow

Agents' Last Exam is a research project by UC Berkeley RDI. All contributions are used to advance agent evaluation research.

Dataset licensed under CC BY 4.0 · Code licensed under Apache-2.0 · See Contributor Terms