Pinned
Traditional coding benchmarks do not reflect how software is actually built and maintained.
That's why we built a new benchmark, APEX-SWE, in partnership with @cognition. It measures whether AI models can perform complex, real-world software engineering work to ship systems
00:00














