Documentation | Agents' Last Exam

Agents' Last Exam

Home Blogs Tasks Leaderboard Docs Contribute Sign in

Agents' Last Exam

A large-scale benchmark for evaluating AI agents on real-world professional workflows with verifiable success criteria. Built by UC Berkeley RDI.

Resources

Demo
FAQ

Get Involved

Join Research
Submit Workflow

Agents' Last Exam is a research project by UC Berkeley RDI. All contributions are used to advance agent evaluation research.

Dataset licensed under CC BY 4.0 · Code licensed under Apache-2.0 · See Contributor Terms