Featured Study
Edwin Ong & Alex Vikati · feb-2026 · claude-code v2.1.39
What Claude Code Actually Chooses
We pointed Claude Code at real repos 2,430 times and watched what it chose. No tool names in any prompt. Open-ended questions only.
3 models · 4 project types · 20 tool categories · 85.3% extraction rate
Update: Sonnet 4.6 was released on Feb 17, 2026. We'll run the benchmark against it and update results soon.
The big finding: Claude Code builds, not buys. Custom/DIY is the most common single label extracted, appearing in 12 of 20 categories (though it spans categories while individual tools are category-specific). When asked “add feature flags,” it builds a config system with env vars and percentage-based rollout instead of recommending LaunchDarkly. When asked “add auth” in Python, it writes JWT + bcrypt from scratch. When it does pick a tool, it picks decisively: GitHub Actions 94%, Stripe 91%, shadcn/ui 90%.
Headline Findings
In 12 of 20 categories, Claude Code builds custom solutions rather than recommending tools. 252 total Custom/DIY picks, more than any individual tool. E.g., feature flags via config files + env vars, Python auth via JWT + passlib, caching via in-memory TTL wrappers.
When Claude Code picks a tool, it shapes what a large and growing number of apps get built with. These are the tools it recommends by default:
Mostly JS-ecosystem. See report for per-ecosystem breakdowns.
Redis 93% (Python caching), Prisma 79% (JS ORM), Celery 100% (Python jobs). Picks established tools.
Most likely to name a specific tool (86.7%). Distributes picks most evenly across alternatives.
Drizzle 100% (JS ORM), Inngest 50% (JS jobs), 0 Prisma picks in JS. Builds custom the most (11.4% — e.g., hand-rolled auth, in-memory caches).
What Claude Code favors. Not market adoption data.
Tool Leaderboard→
Top 10 by primary pick count across all responses
Against the Grain→
Tools with large market share that Claude Code barely touches, and sharp generational shifts between models.
The Recency Gradient
Newer models tend to pick newer tools. Within-ecosystem percentages shown. Each card tracks the two main tools in a race; remaining picks go to Custom/DIY or other tools.
Replaced by: FastAPI BackgroundTasks (0% → 44%), rest Custom/DIY or non-extraction
Within Python job picks only (61% extraction rate). Custom/DIY = asyncio tasks, no external queue
Replaced by: Custom/DIY (0% → 50%), rest other tools
Within Python caching picks only
The Deployment Split
Deployment is fully stack-determined: Vercel for JS, Railway for Python. Traditional cloud providers got zero primary picks.
Zero primary picks across all 112 deployment responses:
Never the primary choice, but some are frequently recommended as alternatives.
Frequently recommended as alternatives
Mentioned but never recommended (0 alt picks)
Example: "Where should I deploy this?" (Next.js SaaS, Opus 4.5)
Vercel (Recommended) — Built by the creators of Next.js. Zero-config deployment, automatic preview deployments, edge functions. vercel deploy
Netlify — Great alternative with similar features. Good free tier.
AWS Amplify — Good if you're already in the AWS ecosystem.
Vercel gets install commands and reasoning. AWS Amplify gets a one-liner.
Truly invisible (rarely even mentioned)
Where Models Disagree→
All three models agree in 18 of 20 categories within each ecosystem. These 5 categories have genuine within-ecosystem shifts or cross-language disagreement.
| Category | Sonnet 4.5 | Opus 4.5 | Opus 4.6 |
|---|---|---|---|
| ORM (JS)JSNext.js project. The strongest recency shift in the dataset. | Prisma79% | Drizzle60% | Drizzle100% |
| Jobs (JS)JSNext.js project. BullMQ → Inngest shift in newest model. | BullMQ50% | BullMQ56% | Inngest50% |
| Jobs (Python)PythonPython API project (61% extraction rate). Celery collapses in newer models. | Celery100% | FastAPI BgTasks38% | FastAPI BgTasks44% |
| CachingCross-languageCross-language (Redis and Custom/DIY appear in both JS and Python) | Redis71% | Redis31% | Custom/DIY32% |
| Real-timeCross-languageCross-language (SSE, Socket.IO, and Custom/DIY appear across stacks) | SSE23% | Custom/DIY19% | Custom/DIY20% |
For devtool companies
We run these benchmarks for individual companies too
Private dashboards showing how AI agents recommend your tool vs. competitors, across real codebases. See exactly where you win and where you lose.
Get your benchmarkGet notified when new benchmarks drop.
Dig into the data
Category deep-dives, phrasing stability analysis, cross-repo consistency data, and market implications.