The Missing Index Crisis: A 40-Repo Scan and Five-Module Benchmark Study of Prisma and PostgreSQL

by Ko-Hsin Liang
  • prisma
  • postgresql
  • database-indexes
  • performance
  • typescript
  • benchmarking
  • static-analysis
  • empirical-study
  • query-optimization
  • foreign-key
  • composite-index
  • covering-index
  • cohens-d
  • power-law
  • orm

The Missing Index Crisis: A 40-Repo Scan and Five-Module Benchmark Study of Prisma and PostgreSQL

There’s a database index bug hiding in most Prisma projects, and it’s almost certainly in yours too.

It’s not a bug that throws an error. Your queries return the right data. Everything works fine in development, staging, even early production. Then your table crosses 100K rows and a query that used to take 2ms starts taking 400ms. You run EXPLAIN ANALYZE and see two words you didn’t want: Seq Scan.

The cause, almost every time: a foreign key column with no index.

Here’s the part that surprises most Prisma developers: Prisma does not create indexes on foreign key columns by default. Unlike MySQL/InnoDB, which auto-indexes foreign keys, PostgreSQL leaves that to you. And Prisma doesn’t add them. If you have a userId Int field with a @relation, there is no index on userId unless you explicitly add @@index([userId]). (Prisma v4.7.0 added a missing-index warning — but only when relationMode = "prisma" is set. The default relationMode = "foreignKeys", which all standard PostgreSQL projects use, produces no warning. If you haven’t explicitly opted into prisma relation mode, you’re in the silent case.)

I wanted to know how widespread this is and what it actually costs. So I scanned 40 production Prisma repositories from GitHub and benchmarked five missing-index scenarios against PostgreSQL with 30 trials each at four dataset sizes — 1K, 10K, 100K, and 1 million rows.

The results are more dramatic than I expected.


TL;DR

ScenarioNo indexWith indexSpeedup
Point lookup WHERE email = ? at 1M rows6.45 ms0.25 ms26×
Sort query ORDER BY created_at LIMIT 20~49 ms~0.27 ms190×
FK scan WHERE user_id = ?~42 ms~0.27 ms153×
Composite filter WHERE status AND created_at >~46 ms~0.28 ms166×
Covering index INCLUDE (email)~52 ms~49 msNo benefit

Real-world scan of 40 Prisma repos: 1,209 missing index patterns, 55% prevalence rate.

One important negative result: BM-05 (covering index with INCLUDE) showed zero benefit — speedup ~0.92–1.20×, p = 0.24. PostgreSQL chose sequential scan for both variants because the status column is low-cardinality. Not all index additions help.

Methodology note: All benchmarks use a warm shared buffer pool (5 warmup queries before each trial set). Baselines with parallel scan had CV > 15% — I use medians, not means, to account for this. The static detector uses naming-convention heuristics and may have false positives.


Part 1: The benchmarks

Setup

The benchmark database has two tables: bench_users (email, status, created_at) and bench_orders (user_id, status, amount, created_at). No application-layer indexes except primary key — all indexes are created and dropped programmatically per benchmark. For each module and dataset size:

  1. Seed exactly n rows, VACUUM ANALYZE
  2. 5 warmup queries to populate shared buffer pool
  3. 30 baseline trials (no index), performance.now() per trial
  4. CREATE INDEX + ANALYZE
  5. 5 warmup queries
  6. 30 optimized trials
  7. DROP INDEX to restore baseline state

CV threshold: 15%. Some baselines had higher variance due to parallel scan jitter — I’ll note those.


BM-01: Point lookup on an unindexed column

WHERE email = '[email protected]' with no index on email. PostgreSQL scans every row looking for a match.

// Both versions use the same Prisma query — difference is @@index([email]) in schema
const user = await prisma.benchUser.findFirst({ where: { email } });
nNo index (ms)Indexed (ms)Speedup
1K0.290.241.2×
10K0.560.252.2×
100K4.760.2518.8×
1M6.450.2526.0×

The baseline doesn’t grow linearly. Power-law regression gives exponent b = 0.50 (roughly O(√n)), not b = 1.0 as sequential scan theory predicts. This is because the data fits in PostgreSQL’s shared buffer pool — the scan is a memory traversal, not disk I/O. CPU cache effects compress the curve.

The optimized version: b = 0.003. Effectively O(1). A B-tree lookup takes ~0.25ms whether there are 1K or 1M rows.

Cohen’s d at n=1M: 11.0. For reference, d > 0.8 is “large.” At this magnitude, Cohen’s d stops adding granularity — it simply confirms total separation of distributions. The more practically useful number is the raw speedup: 26×. Every single optimized trial was faster than every single baseline trial, without exception.


BM-02: ORDER BY without an index

ORDER BY created_at DESC LIMIT 20. No index on created_at. PostgreSQL scans the full table, sorts everything, returns the top 20.

// Without index: full table scan + sort on every query
model BenchUser {
  id        Int      @id @default(autoincrement())
  createdAt DateTime @default(now())
  // No index — every ORDER BY query scans the full table
}

// With index: walks the first 20 index entries — done
model BenchUser {
  id        Int      @id @default(autoincrement())
  createdAt DateTime @default(now())
  
  @@index([createdAt(sort: Desc)])
}
nNo index (ms)Indexed (ms)Speedup
1K48.80.29170×
10K53.70.31171×
100K49.20.31158×
1M48.50.26190×

The baseline is approximately flat across all n — 48–54ms at every dataset size (the variation is within the CV range already disclosed). This surprises people. The reason: PostgreSQL uses parallel workers (plan type: Gather → Sort → Parallel Seq Scan). With a warm buffer pool, the bottleneck is parallel worker coordination and the sort, not I/O. Those costs don’t scale linearly with n when the data fits in shared buffers.

The optimized version reads exactly 20 rows in sorted order and stops. O(1). Whether the table has 1K or 1M rows, the answer is always in the first 20 index entries.

If you have a “list recent items” endpoint without an index on your timestamp column, you’re paying 170–190× right now — regardless of table size.


BM-03: The unindexed FK scan — Prisma’s default

This is the big one. When you define a relation in Prisma:

model BenchOrder {
  id     Int       @id @default(autoincrement())
  userId Int
  user   BenchUser @relation(fields: [userId], references: [id])
  // No @@index([userId]) — Prisma generates no index here
}

Every query like findMany({ where: { userId } }) is a full table scan. Prisma does not add @@index([userId]) automatically. This is intentional — Prisma’s docs note that indexes have write costs and leave the decision to you — but most developers don’t know to make this decision.

Prisma v4.7.0 note: Prisma added a missing-index validation warning in v4.7.0, but it only fires when relationMode = "prisma" is configured. The default relationMode = "foreignKeys" — the mode every standard PostgreSQL project uses — produces no warning. Unless you’ve explicitly set relationMode = "prisma" in your schema datasource block, this problem is completely silent.

nNo FK index (ms)With FK index (ms)Speedup
1K42.00.27153×
10K42.70.28153×
100K41.00.27150×
1M41.10.27153×

Again the baseline is constant — parallel Gather plan with warm buffers. The optimized plan switches to Bitmap Heap Scan: PostgreSQL builds a bitmap of matching row positions from the index, then fetches only those rows. A completely different access strategy.

The fix is one line in your Prisma schema:

model BenchOrder {
  userId Int
  user   BenchUser @relation(fields: [userId], references: [id])
  @@index([userId])  // Add this
}

One line. 153× faster. It’s the single highest-ROI change you can make to most Prisma schemas.


BM-04: Single-column index vs composite index

You have WHERE status = 'active' AND created_at > ?. You’ve added @@index([status]). Is that enough?

// Baseline state: @@index([status]) only
model BenchUser {
  id        Int      @id @default(autoincrement())
  status    String
  createdAt DateTime @default(now())
  
  @@index([status])  // Only indexes status — not enough!
}

// Optimized state: @@index([status, createdAt])
model BenchUser {
  id        Int      @id @default(autoincrement())
  status    String
  createdAt DateTime @default(now())
  
  @@index([status, createdAt])  // Composite index handles both predicates
}
nSingle-col index (ms)Composite index (ms)Speedup
1K46.40.29157×
10K45.30.27167×
100K45.90.30151×
1M45.80.28166×

With only @@index([status]), PostgreSQL won’t use the index — the planner determines a sequential scan is cheaper when the single-column index covers a large fraction of rows (around 40% match status='active' in this dataset) and doesn’t help narrow the date range. It still scans the full table with a parallel Gather plan. The composite index on (status, createdAt) handles both predicates: seek to status='active' in the index, walk forward for the date range. Zero heap rows outside the range are visited.

Rule of thumb: put equality fields first in the composite index, range fields last. @@index([status, createdAt]) follows this. @@index([createdAt, status]) would not help the range query as effectively.


BM-05: Covering index — the one that didn’t help

SELECT id, email FROM bench_users WHERE status = 'active'. A covering index adds the projected columns to the index itself via INCLUDE, so PostgreSQL never needs to touch the heap.

-- Optimized: covering index, no heap fetch needed
CREATE INDEX ON bench_users (status) INCLUDE (id, email);
nIndex on status (ms)Covering index (ms)Speedup
1K56.457.60.98×
10K52.156.30.93×
100K59.149.21.20×
1M52.456.70.92×

No benefit. p-value at n=1M: 0.24. Both variants show Seq Scan in EXPLAIN ANALYZE.

PostgreSQL’s planner chose sequential scan for both — because status is low-cardinality (a handful of distinct values). If 30-40% of rows match status='active', visiting each through an index + heap lookup is slower than a straight sequential scan. The covering index never gets used, so INCLUDE changes nothing.

Covering indexes help when your filter column is highly selective (e.g., WHERE userId = ? returns a tiny fraction of rows) and you want to avoid heap fetches for the projected columns. On a low-selectivity column like status, the planner correctly ignores the index.

This is a useful negative result: don’t blindly add INCLUDE to every index. Check your selectivity first with EXPLAIN ANALYZE and look for Heap Fetches in the index scan stats.


The full scaling picture

ModuleVariantExponent (b)Label
BM-01Baseline0.500.92O(√n) — warm buffer scan
BM-01Optimized0.0030.25O(1) — B-tree lookup
BM-02Baseline−0.0050.08O(1) — parallel scan constant
BM-02Optimized−0.0160.24O(1) — 190× faster constant
BM-03Baseline−0.0040.48O(1) — parallel Gather constant
BM-03Optimized−0.0030.36O(1) — Bitmap Heap Scan
BM-04Baseline−0.0010.13O(1) — parallel scan constant
BM-04Optimized−0.0040.04O(1) — Index Scan
BM-05Baseline−0.0040.04O(1) — Seq Scan
BM-05Optimized−0.0080.10O(1) — Seq Scan (same)

BM-01 is the only module where the baseline scales with n (b=0.50). The others are effectively constant because parallel sequential scan + warm buffer pool makes the bottleneck independent of table size in this range.

The practical implication: the performance gap between indexed and unindexed is not a “scaling problem” you can defer. BM-02/03/04 show 150–190× penalties at n=1K. You’re paying that cost right now, in production, regardless of table size. The indexed version is also constant — ~0.27ms — meaning your query time stays flat as your data grows. The unindexed version does too, but 150× slower.


Part 2: How common is this in real Prisma projects?

The scan

I built a static detector (prisma-index-detector.ts) that parses schema.prisma files and checks two rules:

  1. Missing FK index: Any field ending in Id with type Int or String that lacks a matching @@index
  2. Missing sort index: createdAt or updatedAt DateTime fields without @@index

Then I pointed it at 40 production Prisma repos from GitHub — 5 domains × 8 repos each, all with ≥1K stars, all actively maintained, all confirmed to have schema.prisma files present.

The numbers

40 repos scanned. 1,209 missing index patterns found. 22 repos (55%) had at least one.

PatternCountShare
Missing sort index (createdAt/updatedAt)84670%
Missing FK index (userId, orderId, etc.)36330%

Sort indexes dominate. Most Prisma schemas have createdAt DateTime @default(now()) on nearly every model, and ORDER BY createdAt DESC is one of the most common queries. The @@index is almost never there.

Repos with most missing indexes

RepoMissing indexesDomain
triggerdotdev/trigger.dev177Developer tools
calcom/cal.com171SaaS
amplication/amplication137Developer tools
prisma/prisma-examples125Examples
baptisteArno/typebot.io110SaaS
toeverything/AFFiNE107Collaboration
blitz-js/blitz72Framework
documenso/documenso68SaaS
formbricks/formbricks66SaaS
mfts/papermark36SaaS

A few things stand out here.

trigger.dev and cal.com have 170+ missing indexes each. These are serious, production-grade projects used by thousands of developers. Their Prisma schemas are large and complex. The missing sort indexes accumulate fast when you have 50+ models and none of them have @@index([createdAt]).

prisma/prisma-examples has 125. The official Prisma example repository — the one most developers look at when learning Prisma — has 125 missing indexes. If the canonical learning resource doesn’t model this, it’s not surprising that production apps don’t either.

Only 18 of 40 repos have zero findings. The other 22 range from 1 missing index to 177. If you pick a random Prisma project with more than a few models, there’s a 55% chance it’s missing at least one index.


The three fixes

Fix 1: Index every FK column

Every @relation in Prisma adds a scalar FK field. Add @@index for each one:

model Post {
  id       Int    @id @default(autoincrement())
  authorId Int
  author   User   @relation(fields: [authorId], references: [id])

  @@index([authorId])  // Required — Prisma won't add this for you
}

If you have 10 models with relations, you need 10 @@index additions. It’s tedious, which is probably why it doesn’t happen. A linter can catch these automatically — more on that below.

Fix 2: Index your timestamp columns

If you ever query ORDER BY createdAt DESC or filter by date range, index it:

model Order {
  id        Int      @id @default(autoincrement())
  createdAt DateTime @default(now())

  @@index([createdAt(sort: Desc)])  // Descending order matches most queries
}

Fix 3: Use composite indexes for multi-field WHERE

If you filter on status AND createdAt together, a composite index is 150× faster than a single-column index:

model Order {
  status    String
  createdAt DateTime

  // Instead of separate @@index([status]) and @@index([createdAt])
  @@index([status, createdAt])
}

How to detect missing indexes in your own schema

The fastest way to detect missing indexes — along with N+1 queries, memory leaks, and other performance issues — is to use Code Evolution Lab. It scans your entire codebase (not just Prisma schemas) and provides actionable reports with line numbers and fix recommendations.

Code Evolution Lab uses the same detection algorithms validated in this study, plus dozens more rules for React, Vue, Angular, Node.js, and database patterns.

Option 2: Run the open-source detector

You don’t need to audit by hand. The detector I built is open source:

git clone https://github.com/liangk/empirical-study.git
cd empirical-study/studies/05-missing-index
npm install
npm run detect -- --path /path/to/your/project

It will find all schema.prisma files under the path, check for missing FK and sort indexes, and list them with line numbers and recommended fixes. Running it on your project takes about 2 seconds.

You can also run the full benchmark suite if you have a local PostgreSQL instance:

cp .env.example .env  # Add your DATABASE_URL
npm run prisma:push
npm run seed
npm run bench:all

What I’d push back on

“This only matters at scale.”

The BM-02/03/04 data shows 150–190× penalties at n=1K. If you have 1,000 orders — which is nothing, a few days of data for a small SaaS — an unindexed WHERE user_id = ? takes 42ms instead of 0.27ms. That’s not a scale problem. That’s a problem right now.

“PostgreSQL will choose the best plan anyway.”

The planner can only work with the indexes that exist. If there’s no index on userId, the planner’s best option is a sequential scan. It will do that efficiently — parallel workers, warm buffers — but it’s still 150× slower than what’s possible. The planner is not magic; it needs something to work with.

“Our ORM handles this.”

Prisma and Sequelize do not add FK indexes automatically on PostgreSQL. TypeORM’s behavior depends on version and configuration — recent versions with createForeignKeyConstraints: true (the default) do create FK indexes, though behavior has varied across versions. If you’re on PostgreSQL, check your ORM’s documentation. In Prisma specifically, FK indexes are never automatic — the decision is always left to you.


The honest caveats

Warm buffer pool. I ran warmup queries before every measurement to populate shared buffers. Both variants benefit equally, but cold-start numbers would be worse for the unindexed cases because they scan more data. These benchmarks represent steady-state performance, not first-query-of-the-day performance.

CV exceeded 15% on some baselines. BM-02 and BM-03 baselines had CV of 35–60%. Parallel scan jitter — the timing of parallel worker scheduling introduces noise. The medians are stable and consistent; the variance is in the tails. I report medians, not means, to be robust to this.

Only two detection rules. The static detector checks FK columns and createdAt/updatedAt. A complete index audit would also check fields used in orderBy, where, and distinct in your query code — a much harder problem that requires analyzing TypeScript call sites, not just the schema file. The detector uses a naming-convention heuristic (fields ending in Id with type Int or String), which can produce false positives for non-FK fields like externalId or sequenceId, as well as false negatives for FK fields not following the convention. Treat the 1,209 count as an order-of-magnitude estimate rather than a precise figure.

Dataset size. My benchmark tables are simple two-model schemas. Real application schemas are more complex — foreign key chains, partial indexes, partial queries, more selective filters. Results will vary.


Try it yourself

git clone https://github.com/liangk/empirical-study.git
cd empirical-study/studies/05-missing-index
npm install
# Run the static detector on your own project:
npm run detect -- --path /path/to/your/prisma/project

For the full benchmark suite, you’ll need a local PostgreSQL instance and a .env with DATABASE_URL. See the README for setup instructions.


What’s next

This study focused on schema-level detection: missing @@index declarations. The harder problem is query-level detection: finding fields used in findMany where clauses that lack an index, even when the field is not a named FK or timestamp. That requires analyzing TypeScript call sites and cross-referencing the schema — much more complex static analysis.

I’m also curious how the BM-01 point lookup numbers change at n=10M, where the table no longer fits in shared buffers and the warm-cache assumption breaks down. At that scale, the unindexed sequential scan hits disk I/O, and the penalty should become much more dramatic.

The bottom line: missing indexes in Prisma projects are not an edge case. They’re the default. 55% prevalence across 40 production repos, 1,209 instances, projects with hundreds of thousands of users. The fix is one line per relation. The performance penalty for not fixing it starts at 150× and it’s already there at 1K rows.


About this research

This study is part of a series of empirical performance investigations that validate the detection rules used in Code Evolution Lab. Each article in this series:

  • Scans real-world production codebases to measure prevalence
  • Runs controlled benchmarks to quantify performance impact
  • Provides open-source detectors and raw data for reproducibility

Other studies in this series:

If you want these checks running automatically on your codebase, check out Code Evolution Lab.


The benchmark suite, static detector, and raw data are on GitHub. Built at StackInsight.