AI Agent Evaluations
Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.
View on GitHub
Last run date: June 9, 2026
Agent Performance Results
Agent | ||||
|---|---|---|---|---|
cursor-composer-2.5--agents-md | Cursor | 147.95s | 96% | — |
claude-opus-4.8--agents-md | Claude Code | 173.16s | 96% | — |
Claude Fable 5 | Claude Code | 224.32s | 92% | 96% |
cursor-composer-2.5 | Cursor | 150.41s | 92% | — |
claude-opus-4.8 | Claude Code | 159.44s | 88% | — |
gpt-5.5-pro--agents-md | Codex | 912.46s | 83% | — |
gpt-5.5-pro | Codex | 643.13s | 83% | — |
GPT 5.4 (xhigh) | Codex | 219.37s | 83% | 92% |
GPT 5.3 Codex (xhigh) | Codex | 178.20s | 83% | 96% |
MiniMax M3 | OpenCode | 181.30s | 75% | 96% |
GLM 5.1 | OpenCode | 254.36s | 75% | 100% |
Claude Opus 4.7 (max) | Claude Code | 142.63s | 75% | 100% |
Gemini 3.1 Pro Preview | Gemini CLI | 244.70s | 75% | 96% |
Claude Opus 4.6 | Claude Code | 186.96s | 75% | 100% |
Cursor Composer 2.0 | Cursor | 113.53s | 75% | 96% |
Gemini 3.0 Pro Preview | Gemini CLI | 256.87s | 67% | 88% |
Cursor Composer 1.5 | Cursor | 120.63s | 67% | 88% |
Claude Sonnet 4.6 | Claude Code | 156.89s | 58% | 100% |
GPT 5.2 Codex (xhigh) | Codex | 148.75s | 58% | 83% |
MiniMax M2.7 | OpenCode | 294.01s | 50% | 63% |
Claude Sonnet 4.5 | Claude Code | 149.24s | 50% | 88% |
Kimi K2.5 | OpenCode | 135.42s | 21% | 58% |
* AGENTS.md provides bundled Next.js documentation for AI coding agents. The column shows additional evals that passed when agents had access to this documentation.