I still remember the moment a “simple” customer search took down an otherwise healthy production database. Nothing was on fire in the app layer: no deploy, no memory leak, no sudden traffic spike. The database server was the problem—CPU pinned, disks busy, and a queue of sessions all waiting behind one expensive query that had quietly gotten slower as the table grew.
That experience changed how I approach SQL performance. I don’t treat it as a one-time cleanup task; I treat it as a feedback loop: measure, pinpoint where time goes (I/O, CPU, waits, locks), make a narrow change, then verify the result under realistic concurrency. If you do this consistently, you get lower response times, steadier latency during peak hours, and fewer “mystery” incidents.
What follows is the workflow I use in 2026 to tune SQL safely: how I find slow queries, how I read plans without getting lost, which query rewrites typically matter, how I build indexes that actually pay off, and how I keep performance from regressing when schemas and data volumes evolve.
Where time goes: I/O, CPU, and waiting
When a query is slow, I start with a blunt question: is it doing too much work, or is it blocked from doing work?
- Too much work usually means excessive reads (scanning lots of pages), heavy CPU (sorting, hashing, spilling to temp storage), or large intermediate results (joins and aggregates that explode row counts).
- Being blocked usually means lock waits (another session holds a lock you need), latch waits (internal contention), or resource waits (memory grants, temp storage, disk throughput).
A few forces dominate the shape of “work” in SQL:
1) Table size
A query that is fine at 200k rows can become painful at 200 million. Even “fast” operators become expensive if they touch too many pages.
2) Joins
Joins are where row counts can multiply. A missing join predicate, a mismatched data type, or a non-sargable filter (more on that soon) can force huge join inputs.
3) Aggregations
GROUP BY over a large range can be CPU-heavy (hash aggregate) or sort-heavy (stream aggregate), and can spill when memory is short.
4) Concurrency
A query that runs in 150ms alone can become 3s when 200 sessions run it at once. Locks, temp storage, and memory are shared.
5) Indexes
Indexes are the main accelerator for point lookups and selective ranges. But extra or poorly chosen indexes can slow writes, bloat storage, and confuse plan selection.
A practical rule I use: before changing SQL text, I try to answer these three questions from data and telemetry:
- Are we waiting on something (locks, I/O, memory grants, temp storage)?
- Are we reading far more than we return?
- Did this query get worse after a data growth or stats change?
Finding slow queries in SQL Server: plans, counters, and DMVs
On SQL Server, I rely on three pillars: execution plans, resource counters, and DMVs (dynamic management views). Each gives a different angle, and together they keep me from guessing.
1) Actual execution plan (and why I insist on “actual”)
In SSMS, I enable the actual execution plan and run the query. The plan tells me how SQL Server executed it: joins chosen, index access methods, sorts, memory grants, and where the big costs are.
What I look for first:
- Huge mismatches between estimated rows and actual rows
- Index or table scans on large tables when the predicate should be selective
- Sorts and hashes that spill (warnings)
- Key lookups that run millions of times
- “Residual predicates” (filters applied after reading, often due to non-sargable expressions)
A plan is a map, not a verdict. If I only read one thing, it’s the row counts at each step—those tell you where work balloons.
2) Monitor resource usage (CPU, memory, disk)
SQL performance is often a system story. I’ll watch:
- CPU: sustained high CPU during query peaks points to heavy joins, sorts, scalar functions, or parallelism overhead.
- Memory: pressure can cause temp storage spills, more physical reads, and longer waits for memory grants.
- Disk: high read latency makes scans and large range reads hurt; high write latency makes temp storage and logging hurt.
On Windows, Performance Monitor (PerfMon) is still useful because it correlates OS-level counters with SQL behavior. I like to pair Windows counters with SQL-side waits so I can tell whether a “slow query” is really “the system is starving.”
3) DMVs to identify expensive query patterns
DMVs help me find the high-impact queries by total CPU, total reads, average duration, or execution count. I rarely chase the slowest single execution; I chase the biggest total cost to the system.
T-SQL example (runnable) to list high-CPU cached queries:
— Top cached queries by total CPU time
SELECT TOP (25)
qs.totalworkertime / 1000.0 AS totalcpums,
qs.execution_count,
(qs.totalworkertime / NULLIF(qs.executioncount, 0)) / 1000.0 AS avgcpu_ms,
(qs.totallogicalreads / NULLIF(qs.executioncount, 0)) AS avglogical_reads,
qs.lastexecutiontime,
SUBSTRING(st.text,
(qs.statementstartoffset/2) + 1,
((CASE qs.statementendoffset
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statementendoffset
END – qs.statementstartoffset)/2) + 1) AS statement_text
FROM sys.dmexecquery_stats AS qs
CROSS APPLY sys.dmexecsqltext(qs.sqlhandle) AS st
ORDER BY qs.totalworkertime DESC;
Follow-ups I often run:
- Sort by totallogicalreads to find I/O hogs.
- Sort by execution_count to find “death by a thousand cuts” queries.
- Capture the plan via sys.dmexecquery_plan when I need details.
Important caveat: the cache resets (service restarts, memory pressure, plan eviction). If you can, I recommend Query Store for durable history and regression detection.
Reading execution plans without getting lost
Execution plans can look intimidating because they compress a lot of decision-making into one picture. I keep my reading routine simple: start at the biggest row count and work backward.
Cardinality mistakes are the root of many bad plans
When the plan expects 10 rows and gets 10 million, everything downstream becomes wrong:
- Join method choice becomes wrong (nested loops vs hash join)
- Memory grant becomes wrong (leading to spills)
- Parallelism choice becomes wrong
- Index choice becomes wrong
Common reasons estimates go off:
- Stale or low-quality statistics
- Skewed data distributions (hot customers, seasonal orders)
- Correlated predicates (City and State, or ProductCategory and Brand)
- Table variables (historically estimated poorly; newer versions improved, but it still bites)
- Parameter sensitivity (one parameter value is rare, another is common)
If I see a big estimate/actual mismatch, I don’t immediately rewrite the query. I first check whether stats are current and whether the predicate matches the index definitions.
The scan vs seek trap
People fixate on “scan bad, seek good.” Reality is more nuanced:
- A scan on a tiny table is fine.
- A scan can be the best choice when you truly need a large percentage of rows.
- A seek can still be expensive if it triggers millions of random page reads.
So I ask: did we read far more pages than we returned rows? If yes, I look for a better access path (index) or a more selective predicate (query rewrite).
Pay attention to lookups and sorts
Two plan patterns that repeatedly cause pain:
- Key lookups: A nonclustered index finds keys, then the engine fetches extra columns from the base table. A few lookups are fine; millions are not. The fix is often a covering index (include columns) or fetching fewer columns.
- Sorts: Sorting large sets is CPU-heavy and can spill to temp storage. Sometimes you remove the sort by changing an ORDER BY requirement; sometimes you support it with an index that matches the order.
I also check for:
- Warnings about spills
- “Implicit conversion” on join or filter columns
- Scalar UDF usage (can serialize execution and burn CPU)
Indexes that pay off (and the ones that quietly hurt you)
Indexes are the sharpest tool in tuning, but they’re also the easiest tool to misuse.
I start with a workload question, not an index question
Before I add an index, I write down:
- The query pattern (filter columns, join columns, order-by columns)
- The expected selectivity (how many rows match typically)
- The read/write ratio for the table
If the table is write-heavy (events, logs), I’m conservative: every extra index means extra write cost and more pages to keep hot.
Composite indexes: order matters
If you often filter by (TenantId, CreatedAt) and then sort by CreatedAt, a composite index on (TenantId, CreatedAt) is a strong candidate.
In practice:
- Put the most selective equality predicates first.
- Then put the range predicate (like CreatedAt between …) next.
- Then consider order-by needs.
Covering indexes: fewer lookups, fewer reads
Covering indexes help when:
- You filter/join on a narrow key
- You return a small set of columns
Instead of “SELECT *”, I keep selects narrow so the index can cover the query.
Filtered indexes: great for sparse predicates
If only 2% of rows are active, a filtered index on WHERE IsActive = 1 can be small, hot, and fast. I’ve seen this drop query latency from seconds to tens of milliseconds when the workload is right.
Index pitfalls I watch for
- Too many overlapping indexes: you pay for each on writes and maintenance.
- Low selectivity indexes: an index on a boolean column is often not useful unless it’s filtered.
- Stale stats: even a good index can be ignored if stats misrepresent reality.
- Missing index suggestions: they’re hints, not gospel. They can over-prescribe indexes that bloat your system.
If you’re in SQL Server, I also watch for index fragmentation where it matters (large range scans), but I don’t chase fragmentation as a hobby. I chase measurable read reduction.
Query rewrites I reach for first (with runnable examples)
When the plan shows the engine doing unnecessary work, small rewrites can have outsized impact. These are the patterns I teach teams because they’re easy to apply and easy to review.
1) Select the columns you need (avoid SELECT *)
When you fetch every column, you increase I/O, memory, and network cost. You also make covering indexes less likely.
T-SQL example:
— Inefficient for wide tables
SELECT *
FROM Sales.Customer;
— Better: narrow result set
SELECT CustomerId, FirstName, LastName, Email, LastPurchaseAt
FROM Sales.Customer;
If you only need a count, don’t fetch rows at all:
SELECT COUNT_BIG(*)
FROM Sales.Customer
WHERE IsActive = 1;
2) Avoid DISTINCT when it masks a join problem
DISTINCT can be a legitimate need, but I often see it used as a bandage for accidental duplication from joins.
Instead of:
SELECT DISTINCT c.CustomerId, c.Email
FROM Sales.Customer AS c
JOIN Sales.OrderHeader AS o
ON o.CustomerId = c.CustomerId;
I prefer to be explicit about the intent (“customers with at least one order”) using EXISTS:
SELECT c.CustomerId, c.Email
FROM Sales.Customer AS c
WHERE EXISTS (
SELECT 1
FROM Sales.OrderHeader AS o
WHERE o.CustomerId = c.CustomerId
);
This often reduces row explosion and avoids big sorts/hashes needed to remove duplicates.
3) Use explicit JOIN syntax (and keep predicates sargable)
Old-style joins in the WHERE clause make it easier to miss join predicates and produce accidental cross joins.
Instead of:
SELECT c.CustomerId, c.Email, o.OrderDate
FROM Sales.Customer c, Sales.OrderHeader o
WHERE c.CustomerId = o.CustomerId;
Use:
SELECT c.CustomerId, c.Email, o.OrderDate
FROM Sales.Customer AS c
INNER JOIN Sales.OrderHeader AS o
ON o.CustomerId = c.CustomerId;
Then keep filters in a form that can use indexes:
— Non-sargable: function on column
WHERE CONVERT(date, o.OrderDate) = @OrderDate;
Prefer a range:
WHERE o.OrderDate >= @OrderDate
AND o.OrderDate < DATEADD(day, 1, @OrderDate);
4) Use WHERE instead of HAVING when you can
HAVING filters after grouping, which can force the engine to aggregate more rows than needed.
Instead of:
SELECT CustomerId, COUNT_BIG(*) AS OrderCount
FROM Sales.OrderHeader
GROUP BY CustomerId
HAVING CustomerId = @CustomerId;
Filter first:
SELECT CustomerId, COUNT_BIG(*) AS OrderCount
FROM Sales.OrderHeader
WHERE CustomerId = @CustomerId
GROUP BY CustomerId;
5) Pagination: avoid “OFFSET a million” patterns
Large OFFSET forces the engine to walk many rows just to discard them.
If you page by a stable key (like OrderId), keyset pagination is usually better:
— Keyset pagination
SELECT TOP (50)
OrderId, CustomerId, OrderDate, TotalAmount
FROM Sales.OrderHeader
WHERE OrderId > @LastSeenOrderId
ORDER BY OrderId;
This pairs nicely with an index on (OrderId) or (CustomerId, OrderId) depending on how you page.
6) Watch for implicit conversions
If you join NVARCHAR to INT, SQL Server may convert one side and prevent index use.
I fix this at the schema boundary (types aligned), not by sprinkling CAST in queries.
Aggregations and large datasets: make big work smaller
Aggregations and analytics are where teams accidentally build “perfectly correct” queries that are too expensive for production.
Reduce the scanned range
If you need “last 7 days,” make it a true range predicate. I often see:
WHERE CreatedAt >= DATEADD(day, -7, GETUTCDATE());
That’s fine, but only if CreatedAt is indexed and the query doesn’t wrap CreatedAt in a function.
Pre-aggregate when the business meaning is stable
If your dashboard always shows daily totals by tenant and product category, I prefer a summary table updated incrementally. You trade a small write path cost for large read path savings.
A pattern that works well:
- Raw events table (append-only)
- Daily summary table keyed by (TenantId, Date, CategoryId)
- A job (or streaming consumer) that updates summaries
This also reduces concurrency pressure because many dashboard users read a tiny summary table instead of hammering the raw events.
Use window functions carefully
Window functions are powerful, but they can force sorts. If you need “top N per group,” I like patterns that keep partitions small and indexes aligned with the partition/order.
Example:
WITH RankedOrders AS (
SELECT
o.CustomerId,
o.OrderId,
o.OrderDate,
o.TotalAmount,
ROW_NUMBER() OVER (
PARTITION BY o.CustomerId
ORDER BY o.OrderDate DESC, o.OrderId DESC
) AS rn
FROM Sales.OrderHeader AS o
WHERE o.OrderDate >= @StartDate
AND o.OrderDate < @EndDate
)
SELECT CustomerId, OrderId, OrderDate, TotalAmount
FROM RankedOrders
WHERE rn <= 5;
Then I support it with an index that matches the access path for the partition and order.
Partitioning: helpful when it matches real access patterns
Table partitioning can help manage very large tables and maintenance windows, but it’s not a magic speed button. It helps most when your queries regularly filter on the partition key (like CreatedAt) and your indexes align.
If your workload rarely filters by the partition key, partitioning may just add complexity.
Concurrency and locking: making fast queries stay fast under load
A query that is “fast” alone can still cause incidents if it blocks or gets blocked.
I look at waits and blockers early
In SQL Server, I often start with “what are sessions waiting on right now?” and “who is blocking whom?” If lock waits dominate, reducing query time helps, but so does reducing lock scope and duration.
T-SQL example to find currently running requests and waits:
SELECT
r.session_id,
r.status,
r.command,
r.cpu_time,
r.totalelapsedtime,
r.reads,
r.writes,
r.wait_type,
r.wait_time,
r.blockingsessionid,
SUBSTRING(t.text,
(r.statementstartoffset/2) + 1,
((CASE r.statementendoffset
WHEN -1 THEN DATALENGTH(t.text)
ELSE r.statementendoffset
END – r.statementstartoffset)/2) + 1) AS statement_text
FROM sys.dmexecrequests AS r
CROSS APPLY sys.dmexecsqltext(r.sqlhandle) AS t
WHERE r.session_id @@SPID
ORDER BY r.totalelapsedtime DESC;
Keep transactions short and predictable
Long transactions hold locks longer. If you do multiple steps in a transaction, I recommend:
- Do the minimum necessary inside the transaction
- Avoid user interaction inside the transaction
- Batch large updates (small chunks) to reduce lock duration
Batching pattern:
— Batch update to reduce lock time per batch
WHILE 1 = 1
BEGIN
UPDATE TOP (5000) Sales.OrderHeader
SET Status = ‘Archived‘
WHERE Status = ‘Shipped‘
AND ShippedAt < DATEADD(day, -90, SYSUTCDATETIME());
IF @@ROWCOUNT = 0 BREAK;
END
Choose isolation with intent
If you run high-read dashboards alongside heavy writes, row versioning (like read committed snapshot in SQL Server) can reduce reader/writer blocking. But it increases temp storage work, so I measure the temp storage impact before and after.
Deadlocks also show up in mixed workloads. My rule: don’t “hope” deadlocks go away. Fix ordering (consistent table access order), reduce lock footprints, and keep transactions short.
A 2026 workflow: preventing regressions with automation and AI assistance
In 2026, I treat performance tuning as part of the delivery pipeline, not a heroic late-night fix.
Here’s the table I use when I explain the shift to teams:
Traditional approach
—
Spot-check SQL text
Unit tests only
Ad hoc screenshots
Rewrite until it feels faster
Manual plan reading
What “assisted” looks like in practice:
- I keep a small catalog of critical queries (checkout, search, pricing, permissions).
- I store baseline metrics: typical duration range, logical reads range, and expected plan shape.
- In CI, I run a targeted performance check against realistic seed data (not huge, just representative).
- When a query plan changes, I review the plan diff the same way I review code diff.
AI assistants help most with:
- Explaining plan operators in plain language
- Generating candidate indexes for review (I still decide)
- Suggesting rewrites that preserve semantics
I do not let an assistant push schema changes straight to production. Index changes are production-affecting, and they deserve the same discipline as application migrations.
Key takeaways and what I’d do next in your system
If you want a practical starting point, I’d pick one slow endpoint and run the loop end-to-end:
1) Capture reality: duration distribution (p50/p95), logical reads, and concurrency level during peak.
2) Identify the top cost driver: total reads, total CPU, or lock waits.
3) Inspect the actual plan and look for one of the repeat offenders: big estimate errors, scans over huge ranges, lookups at massive scale, spills, or implicit conversions.
4) Make one narrow change: tighten SELECT columns, rewrite a predicate into a sargable range, replace DISTINCT-with-join-explosion using EXISTS, or add a single index designed for one query pattern.
5) Verify under load, not just alone: I want to see stable latency when many sessions hit the query.
6) Lock it in: add a lightweight regression guardrail (Query Store alerting, a plan-change review step, or a small perf check in CI).
If you share your slow query text, table row counts, and the actual plan, I can walk through a concrete tuning proposal and the exact index definition I’d test first, along with how I’d validate it safely in staging before you roll it into production.


