Featured Study

Edwin Ong & Alex Vikati · feb-2026 · claude-code v2.1.39

What Claude Code Actually Chooses

We pointed Claude Code at real repos 2,430 times and watched what it chose. No tool names in any prompt. Open-ended questions only.

3 models · 4 project types · 20 tool categories · 85.3% extraction rate

Update: Sonnet 4.6 was released on Feb 17, 2026. We'll run the benchmark against it and update results soon.

The big finding: Claude Code builds, not buys. Custom/DIY is the most common single label extracted, appearing in 12 of 20 categories (though it spans categories while individual tools are category-specific). When asked “add feature flags,” it builds a config system with env vars and percentage-based rollout instead of recommending LaunchDarkly. When asked “add auth” in Python, it writes JWT + bcrypt from scratch. When it does pick a tool, it picks decisively: GitHub Actions 94%, Stripe 91%, shadcn/ui 90%.

Read Full Report View as Deck Dataset on GitHub

2,430

Responses

3 models · 4 repos · 3 runs each

Models

Sonnet 4.5, Opus 4.5, Opus 4.6

Headline Findings

Build vs Buy→

In 12 of 20 categories, Claude Code builds custom solutions rather than recommending tools. 252 total Custom/DIY picks, more than any individual tool. E.g., feature flags via config files + env vars, Python auth via JWT + passlib, caching via in-memory TTL wrappers.

Feature Flags69%

Authentication (Python)100%

Authentication (overall)48%

Observability22%

The Default Stack→

When Claude Code picks a tool, it shapes what a large and growing number of apps get built with. These are the tools it recommends by default:

Mostly JS-ecosystem. See report for per-ecosystem breakdowns.

Vercel PostgreSQL Drizzle NextAuth.js Stripe Tailwind CSS shadcn/ui Vitest pnpm GitHub Actions Sentry Resend Zustand React Hook Form

Model Personalities→

Sonnet 4.5: Conventional

Redis 93% (Python caching), Prisma 79% (JS ORM), Celery 100% (Python jobs). Picks established tools.

Opus 4.5: Balanced

Most likely to name a specific tool (86.7%). Distributes picks most evenly across alternatives.

Opus 4.6: Forward-looking

Drizzle 100% (JS ORM), Inngest 50% (JS jobs), 0 Prisma picks in JS. Builds custom the most (11.4% — e.g., hand-rolled auth, in-memory caches).

Preference Signals→

What Claude Code favors. Not market adoption data.

Frequently Picked

Resend over SendGrid
Vitest over Jest
pnpm over npm
Drizzle over Prisma(Opus 4.6; Sonnet picks Prisma)
shadcn/ui over MUI
Zustand over Redux

Rarely Picked

Jest(31 alt)
Redux(23 mentions)
Prisma(18 alt)
Express(absent)
npm(40 alt)
LaunchDarkly(11 alt)

Tool Leaderboard→

Top 10 by primary pick count across all responses

See all 20 →

GitHub ActionsNear-MonopolyCI/CD

93.8%152/162 picks

StripeNear-MonopolyPayments

91.4%64/70 picks

shadcn/uiNear-MonopolyUI Components

90.1%64/71 picks

VercelNear-MonopolyDeployment

100%86/86 JS picks

Tailwind CSSStrong DefaultStyling

68.4%52/76 picks

ZustandStrong DefaultState Management

64.8%57/88 picks

SentryStrong DefaultObservability

63.1%101/160 picks

ResendStrong DefaultEmail

62.7%64/102 picks

VitestStrong DefaultTesting

59.1%101/171 picks

PostgreSQLStrong DefaultDatabases

58.4%73/125 picks

See all 20 tools →

Against the Grain→

Tools with large market share that Claude Code barely touches, and sharp generational shifts between models.

Redux0/88

State Management

0 primary, but 23 mentions. Zustand picked 57x instead

Express0/119

API Layer

Absent entirely. Framework-native routing preferred

Jest7/171

Testing

Only 4% primary, but 31 alt picks. Known but not chosen

yarn1/135

Package Manager

1 primary, but 51 alt picks. Still well-known

The Recency Gradient

Newer models tend to pick newer tools. Within-ecosystem percentages shown. Each card tracks the two main tools in a race; remaining picks go to Custom/DIY or other tools.

PrismaJS

79%Sonnet 4.5

→

0%Opus 4.6

Replaced by: Drizzle (21% → 100%)

Within JS ORM picks only

CeleryPython

100%Sonnet 4.5

→

0%Opus 4.6

Replaced by: FastAPI BackgroundTasks (0% → 44%), rest Custom/DIY or non-extraction

Within Python job picks only (61% extraction rate). Custom/DIY = asyncio tasks, no external queue

Redis (caching)Python

93%Sonnet 4.5

→

29%Opus 4.6

Replaced by: Custom/DIY (0% → 50%), rest other tools

Within Python caching picks only

The Deployment Split

Deployment is fully stack-determined: Vercel for JS, Railway for Python. Traditional cloud providers got zero primary picks.

Frontend (Next.js + React SPA)

100%Vercel

86 of 86 frontend deployment picks. No runner-up.

Backend (Python / FastAPI)

What you'd expect: AWS, GCP, Azure

→

What you get: Railway at 82%

82%

Zero primary picks across all 112 deployment responses:

Never the primary choice, but some are frequently recommended as alternatives.

Frequently recommended as alternatives

Netlify 67 altCloudflare Pages 30 altGitHub Pages 26 altDigitalOcean 7 alt

Mentioned but never recommended (0 alt picks)

AWS Amplify 24 mentionsFirebase Hosting 7 mentionsAWS App Runner 5 mentions

Example: "Where should I deploy this?" (Next.js SaaS, Opus 4.5)

Vercel (Recommended) — Built by the creators of Next.js. Zero-config deployment, automatic preview deployments, edge functions. vercel deploy

Netlify — Great alternative with similar features. Good free tier.

AWS Amplify — Good if you're already in the AWS ecosystem.

Vercel gets install commands and reasoning. AWS Amplify gets a one-liner.

Truly invisible (rarely even mentioned)

AWS (EC2/ECS)Google CloudAzureHeroku

Where Models Disagree→

All three models agree in 18 of 20 categories within each ecosystem. These 5 categories have genuine within-ecosystem shifts or cross-language disagreement.

Category	Sonnet 4.5	Opus 4.5	Opus 4.6
ORM (JS)JSNext.js project. The strongest recency shift in the dataset.	Prisma79%	Drizzle60%	Drizzle100%
Jobs (JS)JSNext.js project. BullMQ → Inngest shift in newest model.	BullMQ50%	BullMQ56%	Inngest50%
Jobs (Python)PythonPython API project (61% extraction rate). Celery collapses in newer models.	Celery100%	FastAPI BgTasks38%	FastAPI BgTasks44%
CachingCross-languageCross-language (Redis and Custom/DIY appear in both JS and Python)	Redis71%	Redis31%	Custom/DIY32%
Real-timeCross-languageCross-language (SSE, Socket.IO, and Custom/DIY appear across stacks)	SSE23%	Custom/DIY19%	Custom/DIY20%

Read the full model comparison analysis →

For devtool companies

We run these benchmarks for individual companies too

Private dashboards showing how AI agents recommend your tool vs. competitors, across real codebases. See exactly where you win and where you lose.

Get your benchmark

Get notified when new benchmarks drop.

Dig into the data

Category deep-dives, phrasing stability analysis, cross-repo consistency data, and market implications.

Read Full Report Browse as Deck View Raw Data