Hardware Archives | Darling Data

Does My SQL Server Need More Memory?

Posted on March 28, 2022May 16, 2022 by Erik Darling

Work It

Memory is S-Tier crucial for most workloads to run reliably fast. It’s where SQL Server caches data pages, and it’s what it gives to queries to process Sorts and Hashes (among other things, but these are most common).

Without it, those two things, and many other caches, would be forced to live on disk. Horrible, slow disk. Talk about a good way to make databases less popular, eh?

With no offense to the SAN administrators of the world, I consider it my sworn duty to have databases avoid your domain as much as possible.

In this post, we’ll talk about how to figure out if your SQL Server needs more memory, and if there’s anything you can do to make better use of memory at the same time.

After all, you could be doing just fine.

(You’re probably not.)

Tale Of The Wait Stats

You can look at wait stats related to memory and query performance by running sp_PressureDetector.

It’ll give you some details about wait stats that relate to CPU and memory pressure. You wanna pay attention to the memory and disk waits, here. I had to squish it a little, but if you’re unfamiliar you can use the “description” column to better understand which ones to pay attention to.

Some important metrics to note here:

How do wait times relate to server uptime?
How long on average do we wait on each of these?

This won’t tell the whole story, of course, but it is a reasonable data point to start with. If your workload isn’t a 24×7 slog, though, you might need to spend more time analyzing waits for queries as they run.

In this example, it’s my local SQL instance, so it hasn’t been doing much work since I restarted it. Sometimes, you gotta look at what queries that are currently running are waiting on.

For that, go grab sp_WhoIsActive. If you see queries constantly waiting on stuff like this, it might be a sign you need more memory, because you have to keep going out to disk to get what queries need to use.

It could also be a sign of other things, like queries and indexes that need tuning, but if it’s sustained like this then that’s not entirely likely.

It’s much more likely a memory deficiency, but it’s up to you to investigate further on your system.

How Is SQL Server Using Memory Now?

Current memory utilization can be a good way to find out if other things are using memory and taking valuable space away from your buffer pool.

A lot of folks out there don’t realize how many different things SQL Server has to share memory across.

And, hey, yeah, sp_PressureDetector will show you that, too. Here’s a “normal” run:

SQL Server’s buffer pool is uninfringed upon by other consumers! Great. But sometimes queries ask for memory grants, and that’s where things can get perilous.

You may sometimes see Ye Olde First Memory Bank Of Motherboard loan out a swath to one or more queries:

The difference here? The buffer pool is reduced by ~9GB to accommodate a query memory grant.

sp_PressureDetector will show you the queries doing that, too, along with query plans.

It’ll also show you memory available in resource pools for granting out to queries. On this server, Max Server Memory is set to 50GB.

If you’re shocked that SQL Server is willing to give out 37GB of that to query memory grants, you haven’t been hanging around SQL Server long enough.

And you wonder why I worry about memory!

Does SQL Server Need More Memory?

If your queries are constantly waiting on:

Reading pages from disk (PAGEIOLATCH_…)
Query memory grants (RESOURCE_SEMAPHORE)
Compiling query plans (RESOURCE_SEMAPHORE_QUERY_COMPILE)

Then there’s a pretty good chance that it does, especially if data just plain outpaces memory by a good margin (like 3:1 or 4:1 or more).

You also have some options for making better use of your current memory, too.

Check critical queries for indexing opportunities (there may not always be a missing index request, but seasoned query tuners can spot ones the optimizer doesn’t)
Apply PAGE compression to existing row store indexes to make them smaller on disk and in memory
Check the plan cache for queries asking for large memory grants, but not using all of what’s granted to them

You can check the plan cache using a query like this. It’ll look for queries that ask for over 5GB of memory, and don’t use over 1GB of it.

WITH 
    unused AS
(
    SELECT TOP (100)
        oldest_plan = 
            MIN(deqs.creation_time) OVER(),
        newest_plan = 
            MAX(deqs.creation_time) OVER(),
        deqs.statement_start_offset,
        deqs.statement_end_offset,
        deqs.plan_handle,
        deqs.execution_count,
        deqs.max_grant_kb,
        deqs.max_used_grant_kb,
        unused_grant = 
            deqs.max_grant_kb - deqs.max_used_grant_kb,
        deqs.min_spills,
        deqs.max_spills
    FROM sys.dm_exec_query_stats AS deqs
    WHERE (deqs.max_grant_kb - deqs.max_used_grant_kb) > 1024.
    AND   deqs.max_grant_kb > 5242880.
    ORDER BY unused_grant DESC
)
SELECT      
    plan_cache_age_hours = 
        DATEDIFF
        (
            HOUR,
            u.oldest_plan,
            u.newest_plan
        ),
    query_text = 
        (
            SELECT [processing-instruction(query)] =
                SUBSTRING
                (
                    dest.text, 
                    ( u.statement_start_offset / 2 ) + 1,
                    (
                        ( 
                            CASE u.statement_end_offset 
                                 WHEN -1 
                                 THEN DATALENGTH(dest.text) 
                                 ELSE u.statement_end_offset 
                            END - u.statement_start_offset 
                        ) / 2 
                    ) + 1
                )
                FOR XML PATH(''), 
                    TYPE
        ),
    deqp.query_plan,
    u.execution_count,
    u.max_grant_kb,
    u.max_used_grant_kb,
    u.min_spills,
    u.max_spills,
    u.unused_grant
FROM unused AS u
OUTER APPLY sys.dm_exec_sql_text(u.plan_handle) AS dest
OUTER APPLY sys.dm_exec_query_plan(u.plan_handle) AS deqp
ORDER BY u.unused_grant DESC
OPTION (RECOMPILE, MAXDOP 1);

This will get you the top (up to!) 100 plans in the cache that have an unused memory grant, ordered by the largest difference between grant and usage.

What you wanna pay attention to here:

How old the plan cache is: if it’s not very old, you’re not getting the full picture
How big the memory grants are: by default, the max is ~25% of max server memory

Controlling Memory Grants

If you’re looking for ways to control memory grants that doesn’t involved a bunch of query and index tuning, you have a few options:

Resource Governor: Enterprise Edition only, and usually applies to the whole workload
MIN_GRANT_PERCENT and MAX_GRANT_PERCENT query hints: You usually wanna use both to set a proper memory grant, just setting an upper level isn’t always helpful
Batch Mode Memory Grant Feedback: Requires Batch Mode/Columnstore, only helps queries between executions, usually takes a few tries to get right

For Resource Governor, you’ll wanna do some analysis using the query in the previous section to see what a generally safe upper limit for memory grants is. The more memory you have, and the higher your max server memory is, the more insane 25% is.

SQL Server Resource Governor — signs and numbers

Again, just be cautious here. If you change this setting based on a not-very-old plan cache, you’re not gonna have a clear pictures of which queries use memory, and how much they use. If you’re wondering why I’m not telling you to use Query Store for this, it’s because it only logs how much memory queries used, not how much they asked for. It’s pretty ridiculous.

After you make a change like this, or start using those query hints, you’ll wanna do some additional analysis to figure out if queries are spilling to disk. You can change the query above to something like this to look at those:

WITH 
    unused AS
(
    SELECT TOP (100)
        oldest_plan = 
            MIN(deqs.creation_time) OVER(),
        newest_plan = 
            MAX(deqs.creation_time) OVER(),
        deqs.statement_start_offset,
        deqs.statement_end_offset,
        deqs.plan_handle,
        deqs.execution_count,
        deqs.max_grant_kb,
        deqs.max_used_grant_kb,
        unused_grant = 
            deqs.max_grant_kb - deqs.max_used_grant_kb,
        deqs.min_spills,
        deqs.max_spills
    FROM sys.dm_exec_query_stats AS deqs
    WHERE deqs.min_spills > (128. * 1024.) 
    ORDER BY deqs.max_spills DESC
)
SELECT      
    plan_cache_age_hours = 
        DATEDIFF
        (
            HOUR,
            u.oldest_plan,
            u.newest_plan
        ),
    query_text = 
        (
            SELECT [processing-instruction(query)] =
                SUBSTRING
                (
                    dest.text, 
                    ( u.statement_start_offset / 2 ) + 1,
                    (
                        ( 
                            CASE u.statement_end_offset 
                                 WHEN -1 
                                 THEN DATALENGTH(dest.text) 
                                 ELSE u.statement_end_offset 
                            END - u.statement_start_offset 
                        ) / 2 
                    ) + 1
                )
                FOR XML PATH(''), 
                    TYPE
        ),
    deqp.query_plan,
    u.execution_count,
    u.max_grant_kb,
    u.max_used_grant_kb,
    u.min_spills,
    u.max_spills,
    u.unused_grant
FROM unused AS u
OUTER APPLY sys.dm_exec_sql_text(u.plan_handle) AS dest
OUTER APPLY sys.dm_exec_query_plan(u.plan_handle) AS deqp
ORDER BY u.max_spills DESC
OPTION (RECOMPILE, MAXDOP 1);

Small spills aren’t a big deal here, but you’ll definitely wanna pay attention to larger ones. This is set to find ones that are over 1GB, which is still pretty small, but could be meaningful.

If you notice a lot more queries spilling in a substantial way, you may have capped the high end of query memory grants too low.

Recap

Memory is something that I see people struggle to right-size, forecast, and understand the physics of in SQL Server. The worst part is that hardly anything in this post applies to Standard Edition, which is basically dead to me.

You can’t use Resource Governor
Batch Mode is nearly useless
The cap for the buffer pool is 128GB, but you can use memory over that for other stuff

The main things to keep an eye on are:

Wait stats overall, and for running queries
Large unused memory grants in the plan cache
Size of data compared to size of memory

If you need help with this sort of thing, hit the link below to drop me a line about consulting.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Signs You Need More Memory In Your SQL Server

Posted on November 18, 2021May 16, 2022 by Erik Darling

Guessotron

Most of these signs have to do with wait stats. One could venture out and say that if you have way less memory than data, you need more memory, but… If the server is sitting around bored, who cares?

If we’re going to spend money on memory, let’s make sure it’ll get used. When I’m talking to people with performance problems that memory would solve, here are some of the top reasons.

You’re In The Cloud Where Storage Sucks

Okay, okay, storage can totally suck other places, too. I’ve seen some JBOD setups that would make you cry, and some of them weren’t in the cloud. Where you need to differentiate a little bit here is that memory isn’t going to help slow writes directly. If you add a bunch more memory and free up some network bandwidth for writes by focusing the reads more from the buffer pool, it might.

Look, just avoid disk as much as possible and you’ll be happy.

You’re Using Column Store And/Or Batch Mode

Good column store compression can often rely on adequate memory, but you also need to account for the much larger memory grants that batch mode queries ask for. As more and more workloads move towards SQL Server 2019 and beyond, query memory needs are going to go up because Batch Mode On Row Store will become more common.

You’re Waiting On RESOURCE_SEMAPHORE A Lot

This wait shows up when a bunch of queries are contending for memory grants, but SQL Server has given out all it can. If you run into these a lot, it’s a pretty good sign you need more memory. Especially if you’ve already tuned queries and indexes a bunch, or you’re dealing with a vendor app where they refuse to fix anything.

Other things that might help? The MAX_GRANT_PERCENT hint or Resource Governor

You’re Waiting On RESOURCE_SEMAPHORE_QUERY_COMPILE A Lot

This is another “queue” wait, but it’s for query compilation rather than query execution. Having more memory can certainly help this quite a bit, but so can simplifying queries so that the amount of memory SQL Server has to throw at compiling them chills out a little. You can start by reconsidering those views nested 10 levels deep and the schema design that leads you to needing a 23 table join to construct one row.

You’re Waiting On PAGEIOLATCH_SH Or PAGEIOLATCH_EX A Lot

These waits show up when data pages your query needs aren’t already there. The more you see these, the more latency you’re adding to your workload by constantly shuffling out to disk to get them. Of course, there’s other stuff you can do, like clean up unused and overlapping indexes, compress your indexes, etc. But not everyone is comfortable with or able to do that.

Thanks for reading!

Going Further

You Probably Don’t Need To Offload SQL Server Reads To Another Server

Posted on October 13, 2021May 16, 2022 by Erik Darling

Moneybags

Duplicating data for reporting, outside of moving it to a data warehouse where there’s some transformations involved, can be an expensive and perilous task.

Your options come down to a native solution like AGs, Replication, or Log Shipping. You can brew something up yourself that relies on native stuff too, like Change Data Capture, Change Tracking, Temporal Tables, or triggers.

Outside of that, you’re getting into the world of SSIS/ADF, other third party vendors, etc. to get data where it ought to go. That’s none of my business, and good luck to you.

As soon as you let people read that data, you have to fully license the SQL Server that it’s sitting on.

Alt Rock

I’ve talked to a few dozen people about this over the years, too. Most were shocked to learn about the licensing implications, thinking that having Software Assurance from Microsoft covered their needs.

The most frequent desire for offloading reads is real-time reporting, and the most frequently cited reason is that reporting queries put too much load on the current server hardware.

You can see where the conflict sits, here. People think they’re getting a free SQL Server to report off of, so they don’t need to up their current hardware and pay for it. Microsoft may be dumb, but it ain’t that dumb.

Once you get people past the licensing issues, tuning the current hardware becomes a more approachable subject.

Gestalt Block

Of course, the idea of tuning the reporting queries has occurred to most people, but the attempts at tuning are usually flailing attempts to solve problems that aren’t there.

Maybe this local variable will do a thing for parameter sniffing
NOLOCK is better for reports, anyway
Don’t forget to recompile, too
Add a DISTINCT, there’s too many rows
Throw up hands, rebuild indexes

While everything has its place, it’s rare that this combination will get you past all of your performance issues. Reporting queries are especially interesting to me, because of how few people embrace batch mode to any degree for them.

The cost of tuning queries is far cheaper than the cost of licensing a whole other server, especially for Enterprise Edition customers. If you’re on Standard Edition, you have many other problems. Many, many other problems, standing in the way of scaling and performance issues.

If this is the kind of thing you need help with, drop me a line. This is exactly the kind of thing I do.

Thanks for reading!

Going Further

Why Reporting Tables Aren’t A Great Idea In SQL Server Databases

Posted on September 30, 2021May 16, 2022 by Erik Darling

Headway

I’ve seen a lot of crazy ways for people to run reports on live data, but one of the worst ideas is creating a reporting table based on the set of data a user wants, and then letting them query that data.

Here’s why:

You still have to run the query to populate them, which hits the server with load you’re adverse to
The buffer pool gets taken up by all these reporting tables, which only one person uses
It’s hard to index the reporting tables, and column store isn’t useful on Standard Edition
They get stale quickly, and refreshing them means re-running the original query, or
A dynamic Merge statement, because no two tables will have the same columns
Generic reporting tables will often use the wrong data types for columns and be very wide

Put all that together, and you’ve got a pretty bad solution for your server and your users.

Money Talks

I’m gonna say that there are better ways to do this, and I’m gonna be right, but they’re all gonna take work and/or money.

The ones that take money are reporting off a copy of the data, which you can do with Availability Groups, Replication, or Log Shipping. You have to fully license the other server that you’re reading from no matter which you choose.

I have no interest in deep-diving into any one of those here, but I’m gonna generalize a little bit to make sure you’re not totally in the dark:

Availability Groups are easy to set up, sometimes complicated to manage and patch, and require databases in full recovery model, but give you pretty up-to-date data on the secondary for users to get at.
Replication is a lot like the above, except it makes me way more nervous. Snapshot Replication gets bonus points for allowing you to create different indexes on the subscriber, which AGs and Log Shipping don’t let you do. The freshness of the data does depend on which type of Replication you choose, so choose wisely.
Log Shipping just ships transaction log backups, which is pretty dead simple unless you need to deal with servers that can’t directly see each other for whatever reason. Simple is nice, but simple means you’re gonna have to stale data, since you have to kick users out to restore more transaction logs.

Manifesto

The solutions that require work are usually ones that you implement locally. I’m not going to talk about data warehouses here, because it’s a lot like the stuff up there, just with ETL in the middle. Or ELT. Whatever.

Getting reporting working from live data usually takes a few steps, depending on your situation:

Nonclustered column store indexes if you’re on Enterprise Edition
Indexed views if you’re on Standard Edition (I’ll talk more about this in a minute)
Enabling snapshot isolation just for reporting queries to not get blocked
Tuning the hell out of your report code to not beat the server up too bad

Indexed views have a lot of rules, and if you’re lucky you can get some mileage out of them to speed up aggregations on Standard Edition. You can try using column store, but it’s really severely hobbled on Standard Edition. The DOP two limit is especially a killer in a lot of cases.

All of this is work though, and it at least partially assumes your server can come anywhere near being able to handle the additional workload. If it’s just limping along now, you probably don’t wanna keep throwing straw at Joe Camel.

Out There

What I left out a little bit are tables that everyone can report from, because every time I’ve seen them implemented, it’s been an absolute mess of triggers or periodic data synchronizations that left users unhappy.

Either because their transactional workload slowed way down to keep things up to date synchronously, or because the background process that started out running every 10 minutes beat the crap out of the server every 10 minutes, so now it runs every hour but there’s another Agent job to kill that job if it runs more than five minutes because that means something bad is about to happen.

This is why most of the time, I’m happy to do the work to report off the live data.

Thanks for reading!

Going Further

How Do I Estimate How Much Memory My SQL Server Needs?

Posted on September 28, 2021May 16, 2022 by Erik Darling

It’s Brand New

If you’re looking to scope out hardware for SQL Server, whether it’s in the cloud, or… not in the cloud, you’re gonna wanna get memory right.

Lots of people complain about SQL Server being a memory hog, without pausing to reflect on the fact that they’re data hogs.

You don’t like SQL Server taking up all that memory? Drop all those big databases.

Problem solved.

Considerate

There are a number of things you can look at on your current server that tie in to how much memory you’ll need in the future:

Wait Stats
Size of data
Current indexing
Data growth rate

Waits

I have wait stats there because the first thing you should look at to see if you have adequate memory is PAGEIOLATCH_** waits. If they’re more than 10% of your server’s uptime, you might be losing a whole bunch of query performance to getting the data you need into memory. Of course, from here you need to monitor when those waits are happening to figure out if they’re affecting critical user queries, or just happening overnight during maintenance.

Sizes

The size of your data is going to depend on how big your tables are, and how many indexes you have on them. If you have a lot of unused or duplicative indexes, you’re going to have a much bigger database than is necessary. That’s going to inflate the amount of memory you currently “need”, because all of those cruft indexes are going to be competing for space in the buffer pool.

Indexes

An easy first step to making the best possible use of the RAM you currently have is to make sure you have only the most beneficial indexes in place. Once you’ve got that done, you can go even further by looking at compression. Compressed indexes squish things down on disk and in the buffer pool.

Growths

Some database have predictable growth patterns based on regular usage. It’s up to you to trend that stuff for the most part.

There’s a disk usage report under each database’s standard reports that you can get to with a right click, but it only has growths that are currently in the standard trace file currently. They could be aged out when you go looking for them.

Another idea would be to look at backup sizes over the past year or so to see how they’ve grown.

Other databases are unpredictable based on acquisitions, large customer onboarding, big sales, sudden success, etc.

Of course, if you haven’t tended to indexes or compression like I listed above, these trends may not hold up well compared to after doing them. This is something I’d do last, after taking care of the index stuff.

All This And More

An additional consideration that adds quite a bit of complication is query memory grants.

The difficulty here is twofold, because you may have query and index tuning to do to reduce the memory grants queries ask for, and when memory pressure arises on a server the plan cache (where most people wanna go look for queries asking for large memory grants) will be getting cleared out constantly. That can make tracking them hard.

I know, you’re thinking that Query Store is awesome for this, but it’s not. It only tracks memory that queries used, not what they were granted. A query can ask for a huge memory grant, only use a small fraction of it, and there’s no sign of that in Query Store. Brick and mortar, indeed.

Most people aren’t ready to truly speculate on hardware needs when I talk to them, but they can get there. If you want a basic guess? Try to get enough memory to cache 50-75% of your server’s data in memory. That should get you enough to cache what you need, deal with memory growth, and deal with memory grants.

And you know, you’ll probably wanna pad that a little bit if you’re going to the cloud, where storage gear and networking is basically electric porridge.

Thanks for reading!

Going Further

How Many Threads Can One Query Use In SQL Server?

Posted on August 13, 2021May 16, 2022 by Erik Darling

Overlook

Something that many people overlook about parallel plans is that while they are limited to DOP active CPUs, they can suck up way more threads.

How many more depends on how many branches can be active at the same time.

Plus the coordinator thread.

Big Query

In order to test these limits out, we need a rather large query.

Since I’m lazy, I’m using a bit of dynamic SQL to make the query I want. Along the way figuring out how many joins I could gin up before things went amok, I learned some fun things.

For example, if I used Hash Joins, I’d get an error that the Query Processor ran out of stack space, and if I used Nested Loops joins I wouldn’t get the requisite parallel exchanges necessary to have multiple parallel zones.

And if I don’t use a force order hint, I’ll end up spending a really long time waiting for a query plan to compile. It wasn’t a good time.

There’s also a tipping point with the number of joins, where if I go over a certain number, my query’s DOP gets downgraded to one.

Outcoming

After finding my sweet spot at 316 joins, I still had to toggle with DOP a little.

On my home server, I have 706 worker threads available.

With my 316 join query, I was able to reserve 634 worker threads.

But that’s where I topped off. After that, the query would get downgraded to DOP 1 and only ask for 1 thread.

Qurious

I’m not sure if there’s some built-in cap on how many threads a query can use, or if the limit is global before downgrades start happening.

What I found more interesting was that even though the query reserves 634 workers, those workers weren’t immediately subtracted from available workers on the server.

Technically, reserved threads don’t exist yet. They haven’t been created or attached to a task. All the reservation does is prevent other queries from reserving threads beyond the configured limit.

For example, if I run two copies of the big query at the same time, one is downgraded to DOP 1, likely because it’s hinted to DOP 2 and that’s the next lowest DOP, and it can’t run at DOP 2 and reserve 634 more threads that the first query has already reserved.

Thanks for reading!

Going Further

SQL Server Queries Go Faster When They Don’t Touch Disk

Posted on March 2, 2021January 21, 2026 by Erik Darling

A Rears

GitHub scripts

Thanks for watching!

Video Summary

In this video, I delve into the intricacies of how data caching affects query performance in SQL Server. Starting with a 37-gig table, I demonstrate how having all necessary data cached in memory can drastically reduce execution times from 14 seconds to just half a second, showcasing the power of efficient buffer pool utilization. As we move on to larger datasets, I explore scenarios where memory limitations and inappropriate indexing strategies can lead to prolonged page IO latch wait times, emphasizing the importance of optimizing both memory allocation and index design for optimal performance.

Full Transcript

Silly, shameful, slut of a charlatan. Silly, shame… Dang! My vocal coach is not going to be happy with me. Erik Darling here with Erik Darling Data, the Darlingest Data of them all, I hear. And on this rainy afternoon, I am enjoying a glass of champagne. If I were drinking a cocktail, I would go over the ingredients with you. Well, since I’m drinking a one-ingredient cocktail, it is a Paul Lenoir composition number three, and it is a 100% Chardonnay Grand Cru of some sort.

I did my best to memorize everything on the label, but I have probably failed you as miserably as I have failed my voice training coach. And anyway, getting to the point, today’s video… Actually, this is one of two videos I’m recording today. The other one is different, but I love when one demo spawns two things I can talk about.

So stay tuned for that useless information that has no bearing on your life whatsoever, or your Saturday afternoon. Unless you’re going to watch it, which… Thanks. Thanks for watching. But this video is going to continue on with a theme that I have been talking about in blog posts and other videos lately about how queries can steal space from the buffer pool, where you store your data pages that SQL Server gives out to queries, and how important it is to make sure that you have the appropriate hardware for your workload.

Because oftentimes when I’m working with clients, they go, well, how do I know if I have the right hardware? How do I know this out of the other thing? And I’m going to show you a little bit about that, and also a little bit about how to tell things are good, bad, or ugly for you. So, what we’re going to look at first, because I have to explain a little bit about this environment, is what it looks like.

So, this server, this VM, here we go, let’s go deep. This VM has 16 virtual processors. Mm-hmm. 16 of them.

This is a laptop. I’m very impressed. And this VM has 96 gigs of memory, because the right thing to do is to multiply 96 by 1024, and you get that number. Or else you’re a metric idiot.

So, we have that. And the laptop itself, my laptop, this 17-inch thing sitting next to me working very hard, is like so. So, we have this processor in there, with eight real cores, and let’s call it eight fake cores.

Actually, I don’t know which ones are real and fake. It could go this way, too. It could have real and fake in that direction.

I don’t really know. I don’t really care. Hyper-threaded like a loser anyway. It doesn’t matter which ones are fake. Half of them are fake. And so, I guess the 16 virtual cores that the VM have are like really virtual. I don’t know.

Extra virtual. Eight of them are fake. Eight of them are wrong. Eight of them are not things. And my laptop. Again, my laptop has 128 gigs of memory. If your production SQL Server has less than this, and you are concerned about performance, boy, howdy.

I will gladly open up my darling data’s cloud to you. Just watch out when I run demos. Things get a little hairy.

And you’ll notice that I have SSDs in this thing. I don’t know why disk D is zero. It’s a brave choice, but let’s move on. If we measure the disks that I have in here, and the only one I ran was the top line, because that’s the only one that I really care about for these purposes.

You can see that I can read, and this is gigabytes of, oh, I’m sorry. I forgot to hit a button. This is gigabytes a second.

All right. Gigabytes, not megabytes. I can read data at 3.2 gigs a second. I can write data right around 2.6 gigs a second. So that’s nice.

It’s pretty sweet, right? 3.2 gigs a second. I like that. I like the sound of that. That sounds good. And, you know, when I’m working with clients and talking about sort of like the correct hardware for SQL Server, at some point someone is always going to jump up on their desk and talk, we need faster disks, damn it.

Well, technically the fastest disk out there is memory. So let’s focus on that. But most people who I talk to are not using direct attached storage. Most people who I talk to are virtualized in some way and using a SAN in some way.

And so they are not going to get this. They’re not going to get all this goodness here. They are going to get much different speeds.

And it’s not going to be the fault of their disks. When they talk about getting faster and faster disks, that’s great. But the data still has to get to those disks somehow. And it’s usually the getting to those disks that doesn’t work well.

I mean, you can make it work well, but most people don’t. And the point of this all is to say that I have very fast disks. They are undeniably fast disks.

And I want to show you two different things here. So on that 2019 server, I have two copies of the Stack Overflow database. I have a full, let’s call it a full Stack Overflow, a full stack over here.

And the full stack database, I forget, I think the last date is at the end of 2019 or so. Maybe, yeah, the end of 2019. So it’s a pretty recent copy and it’s a pretty big copy.

And then I have this other copy of Stack Overflow that ends in 2013. So the last date in here is Christmas Eve of, or New Year’s Eve. Not Christmas Eve.

New Year’s Eve of 2013. And that’s technically when the world should have ended anyway, if God still listened to me. Which I don’t know why God stopped listening to me. I give such great advice about everything else.

Getting rid of the planet Earth was right in the plan since day one, end in 2013. I don’t know why mine’s got changed. Anyway.

What we have here is a few things. And these are helper views that I use in some of my demonstrations. And I will have links to the GitHub links to these up in the YouTube description and in the blog post, hopefully, if I, as long as I remember. And so what happens here?

What we’re going to do is we’re going to look at how big this index is. We’re going to clear out memory. We’re going to get the execution plan for this count query from the post table. We’re going to look at what’s in memory afterwards.

Right? We know that since we’re clearing out memory that nothing’s going to be in there. And then we’re going to run the query again afterwards, again getting the execution plan. And we’re going to do the same.

So this is in the 2013 database. That’s that context. And then we’re going to do that again in the big Stack Overflow database. And if you’ll notice, I have this hint on the, on this, this query, because apparently the nice people who make the dynamic management views in SQL Server are not terribly good at, at designing them and performance stinks unless you tinker with things a little bit.

So Microsoft, if you would like some consultation on how to make these things faster, I am available for you. I care about your health and wellness and happiness, especially that of Joe Sack. Everyone else?

I’m kidding. I’m kidding. You’re all fine people. So let’s look at what happens here. Let’s look. So at the very beginning, right, we have this. And this tells us how big the, the clustered index on the post table is.

Because I don’t have any nonclustered indexes right now. And by gosh, I’m a terrible DBA for that, huh? So this copy, right, 2013 is about 37 gigs.

And a number that I have a lot of physical agony trying to round between 47 and 40, between 4.7 gigs and 4.8 gigs. I just don’t know where to go with it. It’s just so in the middle.

But you can see there, you decide for yourself. It is 4741.96, yada, yada, yada, megabytes. You can go to gigabytes with that any way you want.

Any way you want, baby. So we run this query. We have the execution plan for it. We look at what’s in memory afterwards, which is basically the entire table, or at least all the pages that we needed to get a count. Which is great.

The count again to make sure we didn’t cheat. We didn’t mess around here. We didn’t count fewer rows. That is the number of rows in the table if you look at the nice matching row count there. Wow, that database sure is consistent.

And then let’s look at the query plans. So the first time this runs, we get a query that takes about 14 seconds. That’s reading from clean Bufferville.

We had to get everything from disk. And it took about 14 seconds for us to read about 37 gigs from disk up into memory. You can see all that time spent right in here.

13.916 seconds. Ooh, wee, ooh, ah, ah. Charming, I’m sure.

And if you look at the properties over here, because we are on such a spankin’ new version of SQL Server. SQL Server 2019 probably patched up to the latest. Again, I’m a terrible DBA, so I don’t really know these things off the top of my head.

But if you look at the weight stats over here, so very important thing whenever you’re looking at query plans, especially actual execution plans, actual factual plans, is to be hitting the properties of different operators and looking at the stuff that comes up in this window.

Because all sorts of fun, interesting things show up there that just don’t show up in the tooltips. If you look at that tooltip, there’s hardly any information there. If you look at what’s in the properties pane, boy howdy.

Whoo! Whoo! If you’re data-driven, you could spend days driving around in there. So let’s look at the weight stats of this thing. And way up at the top, way, way, way up at the top, I think the rest of these, honestly, the rest of these weight stats in here are going to be completely useless, but way up at the top, we spend 10 seconds reading pages from disk into memory.

So for about 37-gig table, reading data at 3.2 gigs a second takes 10 seconds. Would you believe that? Would you believe that?

Would you believe that math? Would you believe that math to get 32 gigs of pages into memory? Well, I guess it’s a little bit worth a lot of pages.

I don’t know. Math works out. It’s there. It’s perfectly fine. Don’t worry about it. I’m sure there were other things involved. I’m sure there were other things involved. So that’s what happened there, right?

Cool. 14 seconds. And now let’s look at the execution plan for the second run when everything was already there. Quite a remarkable difference, isn’t it?

Hmm? Quite remarkable. About half a second to run that query. If you go look at the wait stats for this, we will no longer have 10 wait stats here. And on top of that, we will no longer have, well, I mean, crossing my fingers now that I’ve said it, we don’t have any weights on page IO.

I didn’t actually didn’t look at this before I ran it. Again, on top of being a terrible DBA, I’m also a terrible presenter. So just never watch anything I do.

You’ll be horrified. So we look at this and what do we have? No weights on reading pages from disk. We have some internal weights for SQL Server to do things that it has to do, but we don’t have any weights on disk anymore. Wonderful.

And that solved a 14 second problem for us by about 13 and a half seconds, having that data already in there. This gets worse when we have bigger data, bigger data, big, big data. What we’re going to see here is a slightly different scenario running through the exact same thing as in the other one.

We have slightly different information. This table, rather than being 37 gigs, is about 120 gigs with 22 gigs of lob data. I told you the world should have ended in 2013 and you didn’t listen to me.

And then after we read pages from disk into memory, notice now that we don’t have the entire table cached in memory anymore. We do not have that. We do not have enough space in the 96 gigs of data or 96 gigs of memory that we have assigned to this server.

We’re hobbled a little bit by the max server memory setting. If you go with the properties and we look at memory, you will see that I have about 88 gigs. Again, if you divide 90112 by 1024 because you’re a smart person, you will get back 88 gigs.

So we have 88 gigs of memory assigned to this. We read about 83 gigs of this table up into memory. I’m sure there’s other memory needs on here.

There’s other stuff going on that the TGL server needs some memory for. So we use about 83 gigs of space for the buffer pool and we have that hanging about in there. Wonderful.

Perfect. Glorious. Wonderful. Like the champagne. I’m told that I’m not allowed to drink on camera by my lawyer. I’m just going to smell it.

Mmm. That smells delicious. So let’s look at this when we have a bigger table. All right.

So we have to, when we read this one from disc, that takes 40, well, rounding this isn’t bad. This takes about 43 seconds. I’m willing to round there. I’m willing to go the extra mile for you. And if we, again, because we are very smart performance, we’re terrible DBAs, but we are very smart performance tuners.

If we go look at the properties of this. Now we look at the weight stats here. We have, oh, that’s a tough rounder. Oh, it’s so close. It’s right in the middle.

Oh, I can’t make these decisions. We now have about 32-ish seconds of page IO latch weight. So about 32-ish seconds of our lives were spent reading pages from disc up into memory. And if we go look at that second execution plan, this one’s going to be different.

Isn’t it? This one is going to have run for 34 seconds. Why?

Because we had to read stuff back into memory. We only had some of this stuff in memory. We didn’t have the right stuff in memory. If we go look at weight stats and we look at the top one, we will have spent a little bit less time reading pages from disc into memory, but we still had to read a whole bunch of pages from disc into memory, right?

So less, but still not great. If we crack that first one back open, so we go from, oh, wait, that’s the wrong one. I went back too far. Demo over.

Leave. Leave. All of you. If we go back to the source one, it’s 42 seconds versus 34 seconds. So that didn’t turn out too much better, did it? And again, this is reading data very fast. This is not slow data.

This is fast data. I like my data big. I like my data fast. I like the smell of that champagne. It’s a, it’s the, these are the few of my favorite things. And so the point here is that if you are looking at your server, if you’re looking at the weight stats on your server and you see that you are waiting a lot on page IO latch weight.

So again, we come back to these weight stats over here. If you find yourself waiting on page IO latch underscore S to the H, you most likely have a deficiency. Now, your deficiency could be in one of three areas.

You could have too little memory. That would be an obvious deficiency. You could have inappropriate indexes for your workload, either too many indexes, right? Cause too many things competing for space in the buffer pool that you have to keep reading up and flushing out and bring them back and come on again, off again and missing indexes, right?

So you could have a lack of opportune indexes for your queries. That’s another one. And you could also have queries that are battling your buffer pool for memory, for memory grants, right?

So things like sorts and hashes that require memory that will take memory away from your buffer pool. So those are three places where you could have some room to improve. Often when I look at servers, all three are true.

Often when I look at servers, they are laughably smaller than my production, my production laptop, which is again, this. Oh, wait, I should go back to the CPU graph so you can see all my fake CPUs again. So this and this, all right, that’s my laptop.

Cost me about four grand from Lenovo. It was a good sale, but you know. Put some money into your production SQL Server that runs your business.

I put some money into the production SQL Server that runs mine. It’s just what we do. Got to spend money to make money in here.

So anyway, what we talked about today, a little bit, sort of in a nutshell, is, God, I forget. Well, there was a champagne. That was good.

There was the size of my laptop, the size of the VM, the size of the two stack overflow databases, and the size, two different sizes of the post table. And how having more, and how having the data fully in memory when we needed to read it was very, very helpful. That query went from 14 or seconds down to about half a second.

But when we had a table that didn’t fully fit into memory, even reading from it again with some of the data in memory didn’t save us all that much time. We still ended up in a pretty tough spot. And we also talked about how if you see your servers waiting a lot on page IOLatch waits.

Now it could be more than .underscore sh. There are also page IOLatch underscore ex and underscore up and I think KP and KL and some other ones. But the ones that you’ll see the most often are page IOLatch sh and page IOLatch ex.

That is an exclusive page IOLatch and that is when modification queries need data. The SH is shared latches for select queries for the most part there. So that’s what you would look at.

And if you see a lot of those waits, if you like yours, if the amount of time that queries are waiting on those waits is significant, then you have some work to do. You have to look at how you have sized your server. You have to look at how you have designed your indexes.

And you have to look at how your queries are asking for memory. If you need help with that stuff, I guess that’s where someone like me comes in. But I don’t know, you’re watching this YouTube video for free.

Apparently you like free. So who knows? Anyway, thank you for watching. I hope you learned something. I’m going to take another sniff of my champagne and enjoy my Saturday. Bye.

Going Further

Tuning I/O Is Often About Tuning Indexes In SQL Server

Posted on February 26, 2021May 16, 2022 by Erik Darling

One Metric Ton Of Indexes

Let’s say you hate your storage. Let’s say you hate it so much that you want you SQL Serve to touch it as little as possible.

You’re most of the people I talk to. Congratulations.

But how do you do that?

Let’s talk about a few things.

How SQL Server Works With Data

It doesn’t matter if a query wants to read or modify data, all those itty-bitty little data pages need to end up in memory.

How much ends up in memory depends on how big your tables are, and how helpful your indexes are.

Likewise, the more indexes you need to modify, the more need to be in memory for that to happen.

You need to design indexes so that you can support your queries by making it easy for them to locate data. That’s your where clause, and guess what?

Your modification queries have where clauses, too.

How You Can Make Indexing Better

Make sure you’re reviewing your indexes regularly. Things that you need to keep an eye on:

Duplicative indexes
Under-utilized indexes

Even when indexes are defined on the same columns, they’re separate sets of pages within your data files.

If you have indexes that are on very similar sets of columns, or supersets/subsets of columns, it’s probably time to start merging them
If you have indexes that just aren’t being read, or aren’t being read anywhere near as much as they’re written to, you should think about ditching them

Cleaning up indexes like this gives you more breathing room to add in other indexes later.

It also gives you far fewer objects competing for space in memory.

That means the ones you have left stand a better chance of staying there, and your queries not having to go to disk for them.

How You Can Make Indexes Better

There are all sorts of things you can do to make indexes better, too. I don’t mean rebuilding them, either!

I mean getting smarter about what you’re indexing.

Things like filtered indexes and index compression can net you big wins when it comes to reducing the overall size of indexes.

My friend Andy Mallon has some Great Posts™ about compression over on his blog:

And of course, computed columns can help if you’ve got a wonky schema.

Smaller indexes that take up less space in memory make more efficient use of the space you have, which means you can fit more in there.

How You Can Make Tables Better

There are some obvious bits here, like being extra careful with choosing string length.

LOB data can lead to weird locking, and mess with memory grants.

And of course, overly-wide, non-normalized tables can also lead to issues.

If you’re running an OLTP workload, you may also want to make sure that your critical tables aren’t heaps.

Those things tend to take up more space in memory than they need to.

And of course, if you need any help fixing these types of issues, drop me a line!

Thanks for reading!

Going Further

The Great Memory Heist: How Queries Steal Memory From SQL Server’s Buffer Pool

Posted on February 24, 2021January 21, 2026 by Erik Darling

Best Buy

Thanks for watching!

Video Summary

In this video, I dive into the world of SQL Server memory management and how it can be a tricky beast to tame. I share my personal experience with a large table that exceeds the available memory on my VM, illustrating just how critical memory is for SQL operations, even when you think you have plenty. I also explore the often misunderstood relationship between buffer pool memory and non-buffer pool memory, highlighting why simply setting server memory might not be enough to ensure smooth query performance.

Full Transcript

Oh boy. Oh boy. Erik Darling here with Erik Darling Data to this very day. I’m thinking about taking data out of the company title. I don’t really, I don’t actually care for data. I actually, I actually sort of hate it. It seems to be a big problem for people. Either they have too much of it, they don’t know what they have, they have, they have stuff they shouldn’t have. I don’t know, it just seems like, seems like data is nothing but trouble. I think this would be the Erik Darling Fun Company where we only talk about fun, non-dreary. things because data is very dreary, isn’t it? There’s no good news. It’s all depressing. And speaking of depressing, today I’m going to talk about memory, something which I have very little of, personally. But thankfully, my laptop has a decent amount of. My laptop sitting, my new laptop sitting over here to the right of me has 128 gigs of memory in it. And I have a single VM with 96 gigs of memory in it. Unheard of, I know. Unheard of. A single laptop with 128 gigs of memory. Amazing. But this memory is going to have a tough time in life. Because despite the fact that there are 96 gigs of memory in my laptop, I have this table in this Stack Overflow database that is 120 gigs. 120.

And as we all know, SQL Server does not work with pages on disk. Doesn’t matter how good your disks are. Doesn’t matter how expensive they are. Doesn’t matter how much. You paid your storage vendor for all flash and memory, NVMe, whatever. Other fine words they have for these disks. It’s before my 9am cocktails. So this might not be the smoothest take that I’ve ever done in my life. But, yeah, so SQL Server doesn’t care about pages on disk. Anything that you want to do with your data.

Whether it’s a read or a modification, it must end up in memory first. Hmm? Hmm. So even if we were to completely overtake memory on this VM with data from this table, it’s still not completely fitting there. Now, right now, in the buffer pool, well, it went down a little bit. I don’t know why it did. I didn’t do anything.

But actually, this is a good thing. So the buffer pool. Boy, oh boy. Monte Carlo. I might need to pour myself a Monte Carlo if I’m going to continue recording this video. So if we look at right now, what’s in the buffer pool is about 76 gigs of data.

Dreary, dreary data. And if I come over here and I run this count, I’ve been bullied a little bit into changing my query formatting. And so I’m going to give it, I’m going to give this a shot. It’s a little spidery for me.

I’m going to send a little alias in there. So it’s a little spidery for me. I don’t know if I like everything on a new line and indent it over. I’m going to give it a shot. I’m going to see, I’m going to see how it works out.

But if I run this, and I come back over here, and I start running this, we’re going to see the buffer pool gradually go up a little bit, right? 86. Oh, it’s climbing a little bit. Oh, it’s fluctuating. But it’s right around 86 gigs, which is fine because I have max server memory set to about 88 gigs here.

Now, the second query is looking for non-buffer pool memory, right? And I have a filter on here that’s looking, that’s filtering out stuff to make sure it has at least a gig of memory assigned to it. But, oops, I’m going to see this pre-9 a.m.

This pre-9 a.m. cocktail stuff is rough. But if I look over here, there’s a whole bunch of non-buffer pool memory that’s probably, I don’t know, about 2 gigs worth of stuff. I’m not a math guy. I don’t want to write another query to prove it to you.

But I would imagine that the 86 gigs of buffer pool is diminished by 2 gigs because of all the other stuff in here, right? But right now, the thing that I care about is anything, any non-buffer pool memory that has more than a gig of stuff in it, right? And if I run this and look, it’s going to be about 86 gigs.

So that’s just about there. Okay, fine. Now, here’s where people get all messed up when it comes to SQL Server. They look at, you know, I don’t know, the amount of data they have, and they think, well, 64 gigs will do.

It doesn’t matter how much data is actually in there. It’s like, I don’t know. I’m looking at this cloud instance, and if I keep adding memory, then the cores keep going up, and it keeps getting more expensive.

So I’m just going to stop adding memory. But memory is important. And not just for this buffer pool thing, because I want you to watch what happens to the buffer pool as I run queries that need memory.

So I’m just going to select a top 1,000. See, this is very spidery looking to me. It’s quite spidery.

I’m going to select a top 1,000 from comments ordered by score descending. And, of course, I do not have an index on this comments table that puts the score column in order. So I am going to have to physically sort this.

And sorting data is one of those SQL Server’s tiny little baby hands comes and breaks out the, comes and starts ordering the data. But we need memory to do that. We need a memory grant to do that.

So if I come over here and I run this, and I start looking at the buffer pool and what other memory is getting used, you can see that SQL Server has granted that query 16 gigs of memory. And as memory gets loaned out to that query, the size of the buffer pool goes down. And if I throw another one of those in the mix, we’ll see non-buffer pool memory go up again and buffer pool memory go down again.

This is getting into a rough situation. My buffer pool is severely limited, right? So now let’s get another one in there, right?

Let’s get a third one in there. Now we can see non-buffer pool memory. It’s up to 50. It’s almost half and half, right? It’s almost half of our buffer pool is gone to memory clerk SQL reservation. So SQL Server has made a reservation at Shea memory clerk, and we have granted these queries memory, and we have taken that memory from the buffer pool.

All right? We have stolen that memory from the buffer pool. It’s ours now.

So this is something that a lot of people don’t plan for when they start designing or speccing SQL Servers or choosing cloud instances. The memory you choose is not purely for the buffer pool. The memory that you choose has many, many other tasks.

We saw that even before there was 50 gigs of memory granted out to these queries, what happened? There’s about 2 gigs to other stuff, right? Just other things.

I don’t know. It’s playing cache, other doodads, gizmos, whatever SQL Server has to do. I don’t really know. I’m not good at this stuff. But all these queries have finished running now, and what I want you to notice is that the buffer pool is down to 32 gigs. 32 gigs.

And it’s not immediately coming back up, is it? They can keep running this query, and it’s going to stay right where it is. Even though there’s nothing else down here, SQL Server isn’t immediately just like, oh, well, come on, get in the buffer pool. Come on, the water’s fine.

It’s beautiful in here. Is it? No, it’s not happening. It’s not happening. SQL Server is not immediately filling. Because what does it know? How does it know what we need in there?

It doesn’t. SQL Server is not that smart. SQL Server is dumb like me. That’s why I like it. But if we come over here and we run this count query again against the post table, we will slowly see the buffer pool start to fill back up. And here it comes.

Coming roaring back to life. But this is sort of a funny thing because this is a situation that people often confuse with parameter sniffing. Waiting for SQL Server to read a bunch of pages from disk into memory is not exactly a fast thing all the time.

Right? Depending on the size of your data or the type of the data that you’re reading in. Like all sorts of stuff.

Like what you have to read. You can end up waiting a very, very long time to read pages from disk off into memory. Now, this count against the post table just ran for about 40 seconds.

It doesn’t take that long. It doesn’t actually take that long to count records in the post table. It’s like it’s not a 40 second operation.

But going off to disk and reading stuff can be a not fun thing to do. And this is something that you need to think very, very carefully about when you’re designing hardware or picking hardware or picking an instance size for your SQL Server. It’s not only how much data you have.

What are the memory requirements of your queries? Right? Because this thing ran for 24 seconds this time. It ran for 40 seconds last time reading a bunch of stuff from disk into memory.

But we had a bunch more stuff in memory this time. So we actually had a slightly faster query. We look at the properties of this. And we look at the weight stats. Number one is going to be page IOL action.

What do you know? About 16 seconds of the time we spent in this query was reading pages from disk into memory. Crazy, right?

Crazy. Well, it is 9.05 about. And it is time for my 9 a.m. cocktail. So I am going to go have that.

And I am going to bid you a fun to do. Thank you for watching. And I hope you learned something. And I will see you in another video another time. Goodbye. Thank you very much. Goodbye. Bye.

Going Further

If You’re Using Under 20% CPU, You’re Wasting Money On SQL Server Licensing

Posted on January 27, 2021May 16, 2022 by Erik Darling

Sensational

CPUs aren’t normally expensive, but ask them to run a query and the cost skyrockets. It doesn’t matter where you are in the world or in the cloud, prepare to get gouged.

You’d think SQL Server was an Hermes purse.

Life is a bit worse in the cloud, where CPU count and RAM amount are inexorably tied together. Sure, Azure offers constrained vCPU instances that help with that, but still.

Money is expensive, and it never goes on sale.

Slacker

If your CPU load stays consistently under 20%, and you don’t need a bunch of cushion for regularly scheduled tasks like ETL or ELT or LET me write a sentence before changing the acronym, then what’s the point of all those CPUs?

I’m not saying they have to run at 100% and you should be using them to air fry your Hot Pockets, but what’s wrong with running at 40-60%? That still leaves you a good bit of free ticks and cycles in case a parameter gets poorly sniffed or some other developer disaster befalls your server.

When a workload is pretty well-tuned, you should focus on right-sizing hardware rather than staring at your now-oversized hardware.

Bragging Rights

It’s often quite an achievement to say that you tuned this and that and got CPU down from 80% to 20%, but now what?

Can you give back some cores to the VM host? Consolidate workloads? Move to a smaller instance size?

Fashion models are great, but they’re not great role models for servers. CPUs should not be sitting around being expensive and bored.

Thanks for reading!