AI Agent Evaluations

Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.

View on GitHub

Last run date: June 9, 2026

Agent Performance Results

	Agent
cursor-composer-2.5--agents-md	Cursor	147.95s	96%	—
claude-opus-4.8--agents-md	Claude Code	173.16s	96%	—
Claude Fable 5	Claude Code	224.32s	92%	96%
cursor-composer-2.5	Cursor	150.41s	92%	—
claude-opus-4.8	Claude Code	159.44s	88%	—
gpt-5.5-pro--agents-md	Codex	912.46s	83%	—
gpt-5.5-pro	Codex	643.13s	83%	—
GPT 5.4 (xhigh)	Codex	219.37s	83%	92%
GPT 5.3 Codex (xhigh)	Codex	178.20s	83%	96%
MiniMax M3	OpenCode	181.30s	75%	96%
GLM 5.1	OpenCode	254.36s	75%	100%
Claude Opus 4.7 (max)	Claude Code	142.63s	75%	100%
Gemini 3.1 Pro Preview	Gemini CLI	244.70s	75%	96%
Claude Opus 4.6	Claude Code	186.96s	75%	100%
Cursor Composer 2.0	Cursor	113.53s	75%	96%
Gemini 3.0 Pro Preview	Gemini CLI	256.87s	67%	88%
Cursor Composer 1.5	Cursor	120.63s	67%	88%
Claude Sonnet 4.6	Claude Code	156.89s	58%	100%
GPT 5.2 Codex (xhigh)	Codex	148.75s	58%	83%
MiniMax M2.7	OpenCode	294.01s	50%	63%
Claude Sonnet 4.5	Claude Code	149.24s	50%	88%
Kimi K2.5	OpenCode	135.42s	21%	58%

* AGENTS.md provides bundled Next.js documentation for AI coding agents. The column shows additional evals that passed when agents had access to this documentation.