Alternative to RAND() for Random Number Generation

December 4, 2023Andy (@SQLBek) 1 Comment

This is a quick blog to “document” a T-SQL technique for generating random numbers. I’ve been using this for years, but don’t use it frequently enough to have it fully memorized. So whenever I do need it, I must constantly have to go look up whenever I need to use it.

TL;DR

CHECKSUM(NEWID())

SELECT a bunch of Random Numbers

Let’s say you need to generate random numbers as part of a resultset. Here’s a simple example.

/* Use RAND() */
SELECT TOP 10 

	name,

	RAND()

FROM sys.databases;

Unfortunately, if you use RAND(), you’ll get the same value for each record in the resultset. But what if I needed DIFFERENT random values for each record?

There’s a few different approaches you can take but here’s my favorite that I think is the cleanest:

/* Use CHECKSUM(NEWID()) */
SELECT TOP 10 

	name,

	CHECKSUM(NEWID())

FROM sys.databases;

GO 3  -- to loop the prior batch 3 times

Now we get a different random value per record AND different random values per execution!

At this point, you can use whatever other method to limit it down to values that you might want. I use the modulus % operator regularly for this.

/* Generate a random value between 1 and 10 */
SELECT TOP 10 

	name,

	ABS((CHECKSUM(NEWID()) % 10)) + 1

FROM sys.databases;

GO 2

Notice that I added one additional function to the column definition – ABS(). That’s because as you can see in the prior examples, the raw output of CHECKSUM(NEWID()) will return both positive and negative integer values. So if I wanted only positive values between 1 and 10, using ABS() to get the absolute value is a clean solution.

Hope someone else finds this helpful! Thanks for reading.

Backup Internals – Part 5: Balancing Performance

December 1, 2023Andy (@SQLBek) 3 Comments

Welcome back to Part 5 of my Backup Internals Series. In this blog, I want to explore some thoughts around maximizing the performance of your SQL Server backup operations.

Blog Series Table of Contents

Fast, Good, or Cheap…

Many of us have heard the tidbit that there’s 3 options to any endeavor: Fast, Good, and Cheap, but you can only ever pick 2. When it comes to accelerating your (FULL) backup performance, there’s a similar set of choices you can make, most of which have a trade-off of some sort.

Gather Requirements

If I were to have an opportunity to re-evaluate an environment’s backup strategy and schedule, there’s a number of questions I’d want to try to answer first. Remember, the scope of this thought exercise is thinking about FULL Backup strategy from a purely performance angle.

Questions to Ask

Backup maintenance windows: How long and how frequently?
Every night for 4 hours? Anytime over the weekend? For 3 minutes on Sunday, between 22:55 and 22:59?
How many databases to back up and how large are they?
Does the server just have 1 database, but it’s 15TB in size? Or do you have 800 databases, one per customer? And of those 800, what’s the individual size distribution? Maybe 700 databases are smaller than 100GB, another 75 databases are between 100GB and 500GB, and the last 25 databases are +500GB?
Exclusivity during backup maintenance window?
Do you have the luxury of no workload running during your maintenance window? Or do you also have to juggle index maintenance, DBCC CHECKDB, or other application processes like ETLs, nightly processing jobs, etc.? Or to put it another way, do you have free reign to max out your SQL Server resources during your window?

Assumptions

1 SQL Server with many databases
CPU = 12 cores; RAM = 256GB
Backup storage capacity and storage ingest throughput are not a concern (you bought “orange”)
Will ignore RPO/RTO business priorities (which I might tackle in a future blog)

Start Doing Math

Now that you know how much time you have and how many/how much you need to back up, you need to start weighing your options. Do you have the luxury to run one single backup job that will back up your databases one after another? Or do you need to split your databases across multiple backup jobs that run in parallel?

If server resources were infinite, one could theoretically kick off an individual backup job for each and every database, all starting at the exact same time. But of course, that makes no sense.

Revisiting What We’ve Learned

We now know that if our database’s data file(s) all reside on a single volume, we’ll only ever get one Reader Thread. So with a 12 core server, if I have full usage of the server’s resources, I may start with 8 backup output files to get 8 Writer Threads. And I might choose to use a large BUFFERCOUNT value and larger MAXTRANSFERSIZE value, which will result in more RAM consumption.

Now let’s pretend that to meet our requirements, we need to run backups in parallel. You might estimate that you need to run 4 backup jobs simultaneously. If you use the above parameters, you’ll may now start overrunning your CPU! Remember it’s not always about 100% CPU utilization either… all cores could be utilized at say 30%, but you could be context switching like crazy.

Who Doesn’t Love a Trace Flag?

So of course the above means you need to test different permutations. But when you’re doing testing, how can you determine the actual resource utilization of a given backup job? This is where some infrequently highlighted trace flags come into play.


Trace Flag 3213 - Generate diagnostic data about BACKUP & RESTORE operation (least)

Trace Flag 3004 - Generate diagnostic data about BACKUP & RESTORE operation (more)

Trace Flag 3014 - Generate diagnostic data about BACKUP & RESTORE operation (all)

These are officially undocumented



Trace Flag 3604 - Redirects output information to SSMS Results Pane

Trace Flag 3605 - Redirects output information to SQL Server Error Log: use if you need timestamps

You’ll see the above three Trace Flag are suffixed with least, more, and all. That’s because the output that each Trace Flag yields seems to have some overlap. And what’s more challenging is that I’ve found inconsistent documentation in older blogs that cover these Trace Flags as well. So I just opt to use them all to cover all of my bases.


-- Turn on Trace Flags

DBCC TRACEON(3604, 3004, 3014, 3213, -1);

GO

When turned on, you will receive a ton of diagnostic information after running a backup operation. Here’s an example (with prefix removed for brevity):

BACKUP DATABASE Sandbox_MultiFile_SingleVol TO

	DISK='NUL'

WITH COPY_ONLY, FORMAT, INIT, STATS = 15

GO 



--------------

[prefix]: BACKUP DATABASE started

[prefix]: Opening the database with S lock

[prefix]: Acquiring bulk-op lock on the database

[prefix]: Synchronizing with other operations on the database is complete

[prefix]: Opening the backup media set

[prefix]: The backup media set is open



Backup/Restore buffer configuration parameters



Memory limit:               32765 MB

BufferCount:                7

Sets Of Buffers:            1

MaxTransferSize:            1024 KB

Min MaxTransferSize:        64 KB

Total buffer space:         7 MB

Tabular data device count:  1

Fulltext data device count: 0

Filestream device count:    0

TXF device count:           0

Filesystem i/o alignment:   512

Media Buffer count:         7

Media Buffer size:          1024 KB



[prefix]: Preparing the media set for writing

[prefix]: The media set is ready for backup



[prefix]: Effective options: Checksum=0, Compression=0, Encryption=0, BufferCount=7, MaxTransferSize=1024 KB

[prefix]: Checkpoint LSN: 0:0:0

[prefix]: Checkpoint is complete (elapsed = 17 ms)

[prefix]: Start LSN: 3246:76679:230, SERepl LSN: 0:0:0

[prefix]: Last LSN: 3246:76775:1

[prefix]: Scanning allocation bitmaps

[prefix]: Data section: 204684263424 bytes in total

[prefix]: Scanning allocation bitmaps is complete



[prefix]: Calculating expected size of total data

[prefix]: FID=1, ExpectedExtents=780610, IsDifferentialMapAccurate=1

[prefix]: FID=3, ExpectedExtents=781546, IsDifferentialMapAccurate=1

[prefix]: FID=4, ExpectedExtents=780523, IsDifferentialMapAccurate=1

[prefix]: FID=5, ExpectedExtents=780555, IsDifferentialMapAccurate=1

[prefix]: TotalSize=204684263424 bytes

[prefix]: Estimated total size = 204684312576 bytes (data size = 204684263424 bytes, log size = 49152 bytes)



[prefix]: Work estimation is complete



[prefix]: Do the first force checkpoint before copying data section

[prefix]: Checkpoint LSN: 3246:76679:230

[prefix]: Checkpoint is complete (elapsed = 8 ms)

[prefix]: Writing the leading metadata



Shared Backup BufferQ count: 7



[prefix]: Calculating expected size of total data

[prefix]: FID=1, ExpectedExtents=780610, IsDifferentialMapAccurate=1

[prefix]: FID=3, ExpectedExtents=781546, IsDifferentialMapAccurate=1

[prefix]: FID=4, ExpectedExtents=780523, IsDifferentialMapAccurate=1

[prefix]: FID=5, ExpectedExtents=780555, IsDifferentialMapAccurate=1

[prefix]: TotalSize=204684263424 bytes



[prefix]: Copying data files

[prefix]: Data file backup process set to run in serial mode within a volume.

[prefix]: Number of data file readers = 1



15 percent processed.

30 percent processed.

45 percent processed.

60 percent processed.

75 percent processed.

90 percent processed.



[prefix]: InitialExpectedSize=204684263424 bytes, FinalSize=204684263424 bytes, ExcessMode=0



[prefix]: Calculating expected size of total data

[prefix]: FID=1, ExpectedExtents=780610, IsDifferentialMapAccurate=1

[prefix]: FID=3, ExpectedExtents=781546, IsDifferentialMapAccurate=1

[prefix]: FID=4, ExpectedExtents=780523, IsDifferentialMapAccurate=1

[prefix]: FID=5, ExpectedExtents=780555, IsDifferentialMapAccurate=1

[prefix]: TotalSize=204684263424 bytes



[prefix]: Do the second force checkpoint before copying diff section

[prefix]: Checkpoint LSN: 3246:76679:230

[prefix]: Checkpoint is complete (elapsed = 9 ms)

[prefix]: Start pin the log.

[prefix]: Start LSN: 3246:76778:1, SERepl LSN: 0:0:0

[prefix]: Offline the sparse bitmap

[prefix]: Scanning allocation bitmaps

[prefix]: Diff section: 3473408 bytes in total

[prefix]: Scanning allocation bitmaps is complete



[prefix]: Calculating expected size of total data

[prefix]: FID=1, ExpectedExtents=14, IsDifferentialMapAccurate=1

[prefix]: FID=3, ExpectedExtents=13, IsDifferentialMapAccurate=1

[prefix]: FID=4, ExpectedExtents=13, IsDifferentialMapAccurate=1

[prefix]: FID=5, ExpectedExtents=13, IsDifferentialMapAccurate=1

[prefix]: TotalSize=3473408 bytes



[prefix]: Copying data files

[prefix]: Data file backup process set to run in serial mode within a volume.

[prefix]: Number of data file readers = 1

[prefix]: InitialExpectedSize=3473408 bytes, FinalSize=3473408 bytes, ExcessMode=0



[prefix]: Calculating expected size of total data

[prefix]: FID=1, ExpectedExtents=14, IsDifferentialMapAccurate=1

[prefix]: FID=3, ExpectedExtents=13, IsDifferentialMapAccurate=1

[prefix]: FID=4, ExpectedExtents=13, IsDifferentialMapAccurate=1

[prefix]: FID=5, ExpectedExtents=13, IsDifferentialMapAccurate=1

[prefix]: TotalSize=3473408 bytes

[prefix]: Diff section copy finished

[prefix]: Last LSN: 3246:76781:1

[prefix]: Copying data files is complete



Processed 6244992 pages for database 'Sandbox_MultiFile_SingleVol', file 'Sandbox_MultiFile' on file 1.

Processed 6252472 pages for database 'Sandbox_MultiFile_SingleVol', file 'Sandbox_MultiFile2' on file 1.

Processed 6244288 pages for database 'Sandbox_MultiFile_SingleVol', file 'Sandbox_MultiFile3' on file 1.

Processed 6244544 pages for database 'Sandbox_MultiFile_SingleVol', file 'Sandbox_MultiFile4' on file 1.



[prefix]: Copying transaction log

[prefix]: MediaFamily(0): FID=2, VLFID=3246, DataStreamSize=65536 bytes

Processed 1 pages for database 'Sandbox_MultiFile_SingleVol', file 'Sandbox_MultiFile_log' on file 1.

[prefix]: Copying transaction log is complete



[prefix]: Writing the trailing metadata

[prefix]: Writing the end of backup set

[prefix]: Writing history records for NoteBackup

[prefix]: Writing history records for NoteBackup is complete (elapsed = 37 ms)



BACKUP DATABASE successfully processed 24986297 pages in 87.111 seconds (2240.881 MB/sec).

[prefix]: BACKUP DATABASE finished

I’ve grouped together interesting “sub-operations” like when the tail of the log backup is taken. Note the multiple size estimates that are taken throughout as well. Note that there are 4 data files, which is why you see 4 FID entries each time. But also note that all 4 data files are on 1 single volume!

Key TF Output Highlights

Backup/Restore buffer configuration parameters



Memory limit:               32765 MB

BufferCount:                7

Sets Of Buffers:            1

MaxTransferSize:            1024 KB

Min MaxTransferSize:        64 KB

Total buffer space:         7 MB



[prefix]: Effective options: Checksum=0, Compression=0, Encryption=0, BufferCount=7, MaxTransferSize=1024 KB



[prefix]: Number of data file readers = 1

Let’s compare and contrast this to a backup operation with multiple backup output targets (this different database’s data files are spread across 4 different data volumes).

BACKUP DATABASE [Sandbox_MultiFile_MultiVol] TO 

DISK = '\\10.21.200.27\ayun-sql-backups\Sandbox_MultiFile_MultiVol_1.bak', 

DISK = '\\10.21.200.28\ayun-sql-backups\Sandbox_MultiFile_MultiVol_2.bak', 

DISK = '\\10.21.200.70\ayun-sql-backups\Sandbox_MultiFile_MultiVol_3.bak', 

DISK = '\\10.21.200.71\ayun-sql-backups\Sandbox_MultiFile_MultiVol_4.bak', 

DISK = '\\10.21.200.72\ayun-sql-backups\Sandbox_MultiFile_MultiVol_5.bak', 

DISK = '\\10.21.200.73\ayun-sql-backups\Sandbox_MultiFile_MultiVol_6.bak', 

DISK = '\\10.21.200.74\ayun-sql-backups\Sandbox_MultiFile_MultiVol_7.bak', 

DISK = '\\10.21.200.75\ayun-sql-backups\Sandbox_MultiFile_MultiVol_8.bak' 

WITH FORMAT, INIT, STATS = 10, 

MAXTRANSFERSIZE = 2097152, 

BUFFERCOUNT = 500, COMPRESSION;

GO



--------------



Backup/Restore buffer configuration parameters

Memory limit:               98295 MB

BufferCount:                500

Sets Of Buffers:            3

MaxTransferSize:            2048 KB

Min MaxTransferSize:        64 KB

Total buffer space:         3000 MB

Filesystem i/o alignment:   512

Media Buffer count:         500

Media Buffer size:          2048 KB

Encode Buffer count:        500



Backup(Sandbox_MultiFile_MultiVol): Number of data file readers = 4

I think there’s a lot of really cool data in here. Remember that you can use Trace Flag 3605 to log this information to the Error Log. Thusly, you could use that to record what your BACKUP jobs are currently doing and review them later at your leisure (just remember it turn the TF off, lest you bloat your logs).

If you want to see more examples of these Trace Flags in action, visit my GitHub and check out my demo scripts from the original presentation. I introduce the Trace Flags in the 1_Baselining.sql script, then leverage them throughout the other demo scripts.

Workers vs CPU Cores

One more thing to keep in mind are the total number of worker threads you have available on your SQL Server. That’s different than the number of CPU cores you happen to have. I’m not going to dive deeper into this one today but will leave this old but still valid blog from Bob Dorr for reference instead: “How It Works: How many databases can be backed up simultaneously?“

Bringing It All Together

Hopefully now, with the above information, you can see that you really need to do an analysis and essentially create a matrix, based on your requirements and available resources. This will help you determine how much you want to crank up your database tuneables to best accelerate your FULL backups for your environment.

Hope You’ve Enjoyed This Series

This’ll most likely be the final blog of this series, though I might do an “addendum” with random tidbits. Let me know in the comments if there’s anything else you’d like me to explore in an epilogue.

Thanks for reading!

SQL Server Backup Internals Series

November 29, 2023Andy (@SQLBek) 5 Comments

Summary

This blog series is a companion piece for my How to Accelerate Your Database Backups presentation. You can find a full recording of that presentation on MSSQLTips.com.

Part 1: Introduction
https://sqlbek.wordpress.com/2023/07/18/sql-server-backup-internals-part-1/
Part 2: Backup Buffers
https://sqlbek.wordpress.com/2023/07/26/backup-internals-part-2-backup-buffers/
Part 3: CPU Threads
https://sqlbek.wordpress.com/2023/08/24/backup-internals-part-3-cpu-threads/
Part 4: Backup Compression
https://sqlbek.wordpress.com/2023/10/09/backup-internals-part-4-backup-compression/
Part 5: Balancing Performance
https://sqlbek.wordpress.com/2023/12/01/backup-internals-part-5-balancing-performance/

DMV Script for Diagnosing Storage Performance

November 29, 2023Andy (@SQLBek) Leave a comment

TL;DR

As part of my PASS Summit 2023 community session, I created a hybrid of two of my favorite sys.dm_io_virtual_file_stats scripts. It captures data from the DMV at two different intervals, to help you diagnose SQL Server I/O related performance data.

The Bigger Picture

A few weeks ago, I had the honor and pleasure of presenting at PASS Summit 2023. My session, Is Storage the Root Cause of Your Performance Woes… or Not? was part of a Learning Pathway called Solving Real World SQL Server Problems featuring sessions by myself, Deborah Melkin (b), and Jeff Iannucci (b).

The inspiration for my particular session was to share knowledge about storage related performance headaches, to Andy-3-years-before, before I joined Pure Storage. I just celebrated my 2 year anniversary with Pure Storage, and to say that I’ve learned a tremendous amount about storage in these two years is an understatement. So this presentation was trying to address some of the more interesting performance & storage related nuances, beyond just looking at perf counters and DMV queries.

Scripts That I Love Using

That being said, in the course of the presentation, I did share two of my favorite existing scripts for working with sys.dm_io_virtual_file_stats.

The first script has been my go-to for the majority of my career, written by Paul Randal of SQLskills. Paul has a standalone version which queries the DMV but also has one that queries the DMV twice over a period of time, to obtain a more focused delta of your I/O and workload behavior.

The second script was written by my friend and teammate Anthony Nocentino. I really like Anthony’s script, because he calculates some additional data points that Paul’s script does not.

So, I did what any good DBA would do with two useful scripts – I merged them together! (all done with prior permission from both Paul and Anthony.) And that’s why I’m writing this blog today – to highlight that it’s out there and for you to try it out if you want.

If you weren’t able to attend PASS Summit 2023, no worries – my slidedeck is also in the same github repository. But a slidedeck only tells part of a story, so I am looking out for opportunities to present this session virtually in the coming months!

Thanks for reading and let me know if you have any feedback the hybrid delta DMV the script.

T-SQL Tuesday #167: TDE and Data Protection

October 10, 2023Andy (@SQLBek) 4 Comments

Welcome to another edition of T-SQL Tuesday! This month’s blog party post is hosted by Matthew McGiffen. He asks bloggers to write about anything related to SQL Server Encryption and Data Protection.

Security Theater

I’ll start this blog off by saying that I despise “Security Theater. ”

If you’re not familiar with the term, it “is the practice of taking security measures that are considered to provide the feeling of improved security while doing little or nothing to achieve it.” Essentially it is doing something simply for the show of it, to make people feel better, while providing little to no actual security value.

And for us data professionals, especially on the operations side, security and data protection are topics that should be top of mind for all of us.

Transparent Data Encryption – Not a Fan

Of all of the various data protection options available to us in SQL Server, I argue that Transparent Data Encryption (aka TDE) is worthless Security Theater.

TDE encrypts your data files “at rest.” This means one cannot crack open a hex editor and just start reading your raw data. And did you know that you can use a hex editor and start reading string values right out of your MDF files? Go search on that – that’s a separate tangent you can go explore.

So Andy, what’s wrong with that?

I feel that TDE is worthless because it is meant to protect against relatively narrow attack vectors. One attack vector TDE is meant to address is if someone physically walks into your data center, grabs a hard drive, and walks out. Because your data files are encrypted-at-rest, they cannot then go home, hook up said hard drive to a PC, and read your data off of it.

But who stores their SQL Server data on singular spinning hard drives anymore? Okay, there may be some people that still use Direct Attached Storage (DAS) instead of a SAN, but those are few and far between. With a SAN, depending on how your data is laid out, it’ll most likely be distributed across a bunch of physical storage media. So you’d have to waltz out with an entire SAN chassis, as a singular drive would only contain fragments. And if people are waltzing out of your data centers with entire SANs, you have a much bigger problem already.

Fine, but Andy, if an intruder gets access to my Windows machine and copies the data files elsewhere, that’s still a problem that TDE solves, right?

Not so fast… if someone has access to your Windows OS, you’re already screwed. Why? Because then an attacker than very easily exfiltrate your Service Master Key from SQL Server. Once they have that, then can then use that key plus your copied data files and decrypt/open those data files on another SQL Server at their leisure.

I’m not going to link to the piece, but there is a piece out there that showcases all of this, with the exact script one needs. It’s really disgustingly easy.

And I haven’t even noted the trade-offs of TDE, like performance penalties and backup compression consequences.

Not a Sponsored Message – What about Pure?

I should note that all of the above is just my personal opinion. Yes I happen to work for a storage company now and TDE is especially pointless in my world. Why? Because Pure Storage ALREADY encrypts-at-rest all data that is written to our Direct Flash Modules!

The other reason why I dislike TDE, from a Pure Storage context, is that it also inhibits our storage array’s data reduction capabilities. If you’re writing encrypted randomized data down to a data-reducing array like FlashArray, there’s little that can be done with randomized data to data reduce or compress it – simple as that. So if your primary storage is a data reducing array like a FlashArray, you’ve just eliminated one of your primary cost savings… all for nothing.

What Should You Do Instead?

If you really need to have encrypted data, encrypt it WITHIN the application (like within SQL Server). Store your PII using column-level encryption methods. That has actual security value.

And if you are really concerned about security, you need to do a true threat assessment and figure out what attack vectors you want to protect from. Because after all, if someone is already on your Windows OS or physically in your data center, you’ve already failed hard at security and TDE isn’t going to save your butt.

Thanks for reading.

Backup Internals – Part 4: Backup Compression

October 9, 2023Andy (@SQLBek) 2 Comments

Welcome back to Part 4 of my Backup Internals series. Today, I’d like to spend a little time exploring backup compression.

When you take a regular FULL BACKUP, SQL Server is literally taking a byte-for-byte copy of your data files. Don’t believe me? Then go read this, then come back. Additionally, Microsoft architected BACKUP operations such that the resource utilization and impact would be minimal (when using default parameters).

Blog Series Table of Contents

Back in the Day

Back in the days of my youth, storage cost a heck of a lot more per megabyte (hahaha). And backups were extra copies of your data that had to live somewhere. So who would not want their backups to consume less (expensive) storage? But alas, Microsoft did not offer built-in BACKUP compression with SQL Server in its earlier days. Instead, data professionals had to resort to 3rd party products like Litespeed to have compressed backups. Investing in a software solution often outweighed the extra storage costs of an uncompressed backup, so it was a sensible expenditure. Unfortunately for backup software vendors, Microsoft introduced native backup compression in 2008.

Costs of Backup Compression

Regardless of how one does compression, it comes at a price. Compression reads and manipulates ALL OF YOUR DATA to make it smaller, which requires CPU and memory resources. The benefit is that SQL Server now has less data to send over the wire to your backup target, reducing required storage and (usually) overall execution time. And this (usually) made both DBAs and storage admins happy.

In our last two blog posts, we explored a BACKUP operation’s RAM and CPU worker thread utilization. But what happens when you turn on compression?

The answer for RAM and backup buffers is easy – compression requires 3x whatever backup buffers you were using already. So if a regular BACKUP operation needed 10MB of RAM for backup buffers, turning on compression will now use 30MB of RAM. The buffers themselves do not get larger, rather the number of “Sets of Buffers” goes from 1 to 3.

Digging Deeper: Compression and CPU Utilization

When utilizing compression, most of us know full well that SQL Server burns a lot more CPU. But how much CPU? Does it just burn more cycles on the same Reader & Writer threads that were invoked as normal? Or does it invoke additional threads to do compression work?

To dig a bit deeper, I turned to my good friend Sysinternals’ Process Explorer. If you’re not familiar with it, it’s essentially Task Explorer on steroids. One awesome thing you can do with it is right click a given process/.exe, go to Properties, and see Thread information!

Using this, I can at least see the number of threads that are actively using CPU at any given time. Here’s a baseline shot showing an otherwise idle SQL Server

Note that there are threads “using” CPU, but they’re all marked “< 0.01”, so for purposes of this simple analysis, we’ll consider them idle.

Next, this is what the output looks like when I execute a BACKUP operation, whose files are on a single volume, against a single backup target, with no compression:

Note that there are two threads (ThreadID or TID): 10636 & 5176, showing CPU utilization values that are not “< 0.01”. That’s indicative of my 1 Reader thread and 1 Writer threads for my BACKUP operation.

Now as a different control, I executed a BACKUP whose data files are on 4 separate volumes, against 1 backup target, with no compression.

Note here that I see activity on 5 threads: 4 reader threads and 1 writer thread.

Finally, here’s a BACKUP run with a single volume, single backup target, but with compression turned on.

That’s a heck of a bunch of threads that are not idle – over 20 in this screenshot!

How Many CPU Threads Does Compression Need?

I’ll come right out and say it; I’m not certain. Here’s the thing about the above observation methodology. It updates once per second and while it’s insightful, it’s not bulletproof.

For one, I cannot confirm with absolute certainty, that the given threads are related to the BACKUP operation. In fact, for my single volume, single backup target control, I’d occasionally get a sampling that would have a 3rd thread pop up with some activity, but only for a single snapshot. In multi-volume and/or multi-backup target examples, the number of threads active at any one time would occasionally drop below what I would expect. I attribute that to a CPU thread having to wait for a resource at that precise moment in time.

If I wanted to get deeper, I suspect I’d have to go attach a debugger to SQL Server and go digging to find the actual discrete calls that are invoking additional worker threads. But for the purposes of this blog, I’m content with the above observation that when using compression, many more CPU threads get spun up.

Compression: Yay or Nay?

As we’ve seen, Backup Compression requires more resources – CPU and memory. And unless someone else is willing to fill in the gaps, I cannot tell you the algorithm/methodology to better predict CPU thread utilization when it comes to utilizing Compression. But I would also argue that one should not let that deter you either.

In the end, everything is all about trade-offs… which is what Part 5 of this series will be all about – performance trade offs of using all of these different backup parameters.

Thanks for reading!

Learn From My Oversights: MFA

September 14, 2023Andy (@SQLBek) Leave a comment

Lately, I’ve been seeing a huge volume of Chinese LinkedIn profiles trying to connect with me. It’s quite obvious that these are fraudulent accounts, though they all have seemingly “real” profile information filled out. It’s been a nuisance at best… until earlier this month.

It’s Not a Matter of If… But When…

Earlier this month, I’d received a “New device registration for Remember me” e-mail from LinkedIn. But those e-mails get funneled to a separate folder that is rarely checked so I was totally oblivious to it.

A few days later, when I tried logging into LinkedIn, I got a strange “your account has been temporarily restricted” message and was asked to submit photos of physical ID. When I first encountered this, I thought I was getting phished! Switched browsers, VPNs, etc. and searched on the ‘net, and winds up that that part seemed legit. But I didn’t think about WHY it was happening (nor did the website actually say why), so I figured it was a routine thing and said screw it, I’ll deal with it later.

A few more days pass by and I finally start trying to deal with it. I submit my Driver’s License, and 30 seconds thereafter, I get an automated e-mail saying that my ID was rejected. Okay, now I’m getting pissed, because I’m effectively locked out of LinkedIn entirely.

While many out there wouldn’t shed any tears to never ever use LinkedIn again, I find that it has its uses, especially in my role and general community activity. So being locked out was definitely a bad thing.

Okay fine, so I try the ID verification again with my passport. But here’s where things get really obnoxious – I cannot submit ID again! The website returns an error saying that I’ve already submitted “recently” and to “wait before trying again.” Repeat this over 3 days and now I’m quite irate.

Digging Deeper

Digging deeper on Twitter and Reddit, I find others who have been hacked and hijacked by those lovely Chinese profiles. Even worse, I come to learn that LinkedIn Support is apparently swamped with a 2-3 WEEK response time… if you’re lucky!

So how does one open a Support Case with LinkedIn? You have to log in. But… I… CAN’T!!!

Reddit to the Rescue

I’d come close to resigning myself to waiting several weeks to MAYBE get my LinkedIn account back, when I stumble upon a peculiar Reddit thread. The poster shared that they were able to get their LinkedIn account resolution fast-tracked by… filing a complaint with the BBB.

Whuuut? Seriously?

I thought this was a fake post, but others chimed in claiming the same. I perused a few of the profiles & comment history of some of the individuals who claimed success, and they seemed like legit humans. Alright fine, what do I have to lose? So I filed a BBB complaint against LinkedIn with details and screenshots.

And I actually got a response within a few hours from LinkedIn!

Unfortunately it was a trash response, saying to check my Support Case correspondence. I “rejected the resolution” on the BBB, pointing out that the one cannot review a Support Case without logging in, and I couldn’t log in! A few hours later, I receive an automated Password Reset e-mail from LinkedIn!

Got My Account Back! But…

So yay, I got my account back! And upon logging in, the FIRST thing I did was enable MFA. I had thought I did so before, but obviously not.

And to my dismay and irritation, my entire profile had also been wiped. I was now “Cheryl,” a Chinese wedding dress maker out of Brooklyn, NY. Uhm… yeah, okay then.

So I’ve cleaned house and gotten profile back up and running. That was a few days of stressful irritation that I could have done without.

Epilogue

… but I think the hacker may be getting a bit of a last laugh. For whatever reason, all of my ads on LinkedIn are now in Chinese.

Backup Internals – Part 3: CPU Threads

August 24, 2023Andy (@SQLBek) 3 Comments

Welcome back to Part 3 of my SQL Server Backup Internals Series.

In Part 1, I introduced the “parts” of a BACKUP Operation and in Part 2, we delved into Backup Buffers. Today, we’re going to talk about what manipulates those Backup Buffers = CPU Threads. This’ll be a longer blog, so go refill your coffee now.

Blog Series Table of Contents

Many Hands Make Light Work

Let’s start by returning to our analogy. We’ve called in some friends to help out. First, what happens if we have all of our extra help on the Reader side loading Backup Buffers?

Let’s try working with three “Reader” workers

Assume that each worker processes widgets at the same rate of 1 per second. So each second that passes, 3 Reader workers are copying widgets into baskets. But our lone Writer worker can only process at a rate of 1 widget per second. So the Writer worker will be overwhelmed very quickly as more baskets are sent by the Reader workers. If the Writer worker happens to be much faster, say able to process at a rate of 30 widgets a second, then that worker can easily keep up with everything that the 3 Reader workers send over.

Now let’s give the “Writer” more workers

Alternatively, let’s shift our extra workers to help out on the Writer side. If our Reader worker can process at a rate of 3 widgets a second, and each Writer worker processes at a rate of 1 widget a second, we’ll have a processing balance on both sides, thus maximizing our throughput potential.

But like before, if processing rates differ, a bottleneck may occur. If our single Reader worker only processes at a rate of 1 widget a second, they will not be able to fill up baskets as fast as the three Writer workers can process them (a total of 3 widgets per second). This imbalance will result in Writer workers sitting idle and waiting.

Let’s add additional “Reader” and “Writer” workers

In an ideal situation, one would add more Readers and Writers, and supply enough baskets to the entire workflow. We want a balance where both Readers and Writers are always working, regardless of individual worker processing rate differences. Ideally, no one is ever waiting for a basket.

Adding More Reader Threads to BACKUP Operations

Okay, you jumped ahead to find out how to add additional Reader threads to your BACKUP operations. Unfortunately, the answer is a bit tricky. The number of Reader threads a BACKUP operation is tied to the number of database volumes that a given database exists on. It is critical to note that this is not synonymous with the number of data files, though you obviously need multiple data files in order for a database to be able to span multiple volumes.

In Windows, a common setup is each storage volume (aka Disk) that is presented to a server has its full capacity partitioned as a simple volume and mapped to a drive letter.

Three storage volumes (Disk 1, 2, & 3), fully partitioned as 3 simple volumes, and mapped to 3 different drive letters (D:, X:, & L:).

Now let’s say I have a database with multiple data files, and all of the data files reside on a single Windows Drive – also a common arrangement.

If I execute a BACKUP operation, no matter what parameters I use, my BACKUP operation will only ever get 1 Reader Thread.

That’s it.

If you want MORE Reader threads, you’ll need to spread your data files across multiple Windows volumes.

In this example, the Sandbox_MultiVol database has 3 data files spread across 3 different Windows Drives – M:, N:, and O:. As a result, any BACKUP operation on this database will get 3 Reader threads.

Edge Case – Multiple Windows Volumes on a Single Disk

There’s also uncommon cases when one might take a Disk (Storage Volume) presented to Windows and partition it into multiple simple volumes.

Here’s two examples, where each Disk is are partitioned into 4 simple volumes. Disk 10 is mapped to drive letters and Disk 11 is mapped to mount points.

This multi-File database is spread across 3 Windows volumes, mapped to drive letters H:, I:, and J:, but all reside on 1 Storage Volume – Disk 10.

This multi-file database spread across 3 Windows volumes , mapped to mount points, but all reside on one Storage Volume – Disk 11.

For these two examples, do you think a BACKUP operation against either of these databases will get 1 Reader thread or 3 ?

In both cases, you’ll get 3 Reader threads.

The code behind BACKUP is tied to the volumes visible to Windows. I’ll confess that before I tested this, I theorized that the BACKUP operation would associate with the physical storage presented, thus granting only get one Reader thread.

Adding More Reader Threads – TL;DR

Unless you want to go to the hassle of carving up your database into multiple data files, adding more Windows volumes, and moving your data files across multiple Windows volumes, you cannot add additional Reader threads to your BACKUP operations.

Adding More Writer Threads – aka Backup Striping

Thankfully adding more Writer threads is much easier. That is tied to the number of backup devices that you specify in your BACKUP command.

-- Syntax:
BACKUP sandbox TO 
     <backup_device> [ ,...n ]

-- 1 backup_device on 1 Windows volume
-- this will yield 1 Writer threads
BACKUP sandbox TO 
     DISK = 'S:\backups\sandbox-1.bak';

-- 4 backup_devices on 1 Windows volume
-- this will yield 4 Writer threads
BACKUP sandbox TO 
     DISK = 'S:\backups\sandbox-1.bak',
     DISK = 'S:\backups\sandbox-2.bak',
     DISK = 'S:\backups\sandbox-3.bak',
     DISK = 'S:\backups\sandbox-4.bak';

-- 4 backup_devices across 2 Windows volumes (S:, T:)
-- this will also yield 4 Writer threads
BACKUP sandbox TO 
     DISK = 'S:\backups\sandbox-1.bak',
     DISK = 'S:\backups\sandbox-2.bak',
     DISK = 'T:\backups\sandbox-3.bak',
     DISK = 'T:\backups\sandbox-4.bak';

More Examples

Example 1 – Sandbox_MultiVol

-- 3 Reader Threads, 1 Writer Threads
BACKUP Sandbox_MultiVol TO 
     DISK = 'S:\backups\Sandbox_MultiVol-1.bak';

-- 3 Reader Threads, 4 Writer Threads
BACKUP Sandbox_MultiVol TO 
     DISK = 'S:\backups\Sandbox_MultiVol-1.bak',
     DISK = 'S:\backups\Sandbox_MultiVol-2.bak',
     DISK = 'S:\backups\Sandbox_MultiVol-3.bak',
     DISK = 'S:\backups\Sandbox_MultiVol-4.bak';

Example 2 – Sandbox_SingleVol

-- 1 Reader Threads, 1 Writer Threads
BACKUP Sandbox_SingleVol TO 
     DISK = 'S:\backups\Sandbox_SingleVol-1.bak';

-- 1 Reader Threads, 4 Writer Threads
BACKUP Sandbox_SingleVol TO 
     DISK = 'S:\backups\Sandbox_SingleVol-1.bak',
     DISK = 'S:\backups\Sandbox_SingleVol-2.bak',
     DISK = 'S:\backups\Sandbox_SingleVol-3.bak',
     DISK = 'S:\backups\Sandbox_SingleVol-4.bak';

In Conclusion

Now you should have a solid understanding of how BACKUP operations can get more Reader and/or Writer threads allocated. Stay tuned for Part 4 where I’ll dive into the topic of BACKUP compression, then Part 5 where I’ll explore balancing all of these tuneables to maximize performance.

Thanks for reading!

T-SQL Tuesday #165: Job Titles… a.k.a. Meaningless Labels

August 8, 2023Andy (@SQLBek) Leave a comment

Welcome back to another edition of T-SQL Tuesday. In this month’s edition, our host Josephine Bush asks “What Do All the Database Job Titles Actually Mean?“

What Does Title [dev\null] Mean?

My answer is simple: nonsense. Seriously.

My problem with job titles is that at every organization I’ve ever worked at or with, a given job title has little real value or meaning. All a job title is, is an attempt to label a collection of job duties and responsibilities into something succinct. And in our neck of the woods, there’s a decent assortment of such titles – DB Admin, DB Dev, Dev DBA, DB Engineer, Architect, blah, blah, blah. But as I’m sure others are blogging about today, the actual duties wildly vary from company to company, for identical job titles.

I also argue that they can often be meaningless except from a prestige perspective. Early on in my career (omg, this anecdote was 20 years ago), I worked for one of the largest banks in the world. My title was “Assistant Vice President.” In reality, I was a database developer. And most of the other 100+ in my division had identical titles as well! It was in place due to salary bands being tied to titles at that organization and IT, being higher paying, had to be on the “president” track. Ridiculous.

What Does [dev\null] Mean to YOU?

One thing I started doing in the course of my career is that when I’d interview, one of my counter-questions would be “what does [job title] mean to you?” followed by “what are the duties, tasks, and day-to-day for [job title] at this organization?”

I remember one place I was interviewing for an “Architect” role. But the job description’s was more fitting for a T-SQL developer & performance tuning expert. And I was right. They defined “Architect” totally differently than how I would. So we pivoted our conversation on the actual needs and tasks and the day-to-day.

So What Does Matter?

In my opinion, job titles are meaningless. For me, the only thing that does matter is what set of duties are expected and most importantly, whether the pay is proper for what is being expected.

“Call me a data janitor for all I care, as long as the paycheck clears the bank.”

Thanks for reading – happy T-SQL Tuesday!

Backup Internals – Part 2: Backup Buffers

July 26, 2023Andy (@SQLBek) 6 Comments

Welcome to Part 2 of SQL Server Backup Internals. This blog series is a companion piece to my How to Accelerate Your Database Backups for MSSQLTips.com. In Part 1 of this blog series, I introduced the parts of a BACKUP operation.

Blog Series Tables of Contents

Now let’s start focusing on performance characteristics. We can impact the performance of a BACKUP operation by making changes to or more of the following:

Increase the Number of Backup Buffers
Increase the Size of Backup Buffers
Add more CPU Reader Threads
Add more CPU Writer Thread

Analogy Time!

I like to explain things with silly analogies and today will be no exception. Today, let’s pretend that we are in a shipping warehouse.

Track the flow of widgets from the left to the right, then back to the left, in a clockwise fashion.

The person on the left “reads” the contents of the main container and makes copies of each widget, tossing as many widgets as they can into an available yellow basket. When the basket is full, it is sent to the other employee, who takes the contents of each basket and places it into the truck. And they also return the basket back to the first employee to be refilled.

In this silly analogy, the left employee represents a single CPU Reader Thread. They are copying data pages out of a database file and placing them into a yellow basket, which represents a backup buffer. The stack of empty yellow baskets represents the other backup buffers waiting to be filled in the Free Queue. Once a basket is filled, it is sent over to the Data Queue, which the other employee picks up one by one. That employee takes the contents of the basket and loads it into the truck. This is analogous to a CPU Writer Thread consuming a backup buffer and writing out to the backup target.

Note: The duration of transferring backup buffers from one queue to another is effectively nil. “Transfer” is probably not the best word choice here as I don’t believe any actual data movement occurs.

Backup Buffers

To help accelerate our backup operations, we can make two adjustments to our backup buffers – adding more backup buffers and increasing the size of the backup buffers.

Add More Baskets

First, let’s explore adding additional backup buffers. That’d be like having more than seven baskets to pass blocks back and forth in the diagram above. In a perfect system, both employees would be processing baskets at perfectly equal speeds. But the real world isn’t perfect – one employee may be much faster than the other.

Scenario 1

Let’s pretend the first employee is the slower one. They can fill a basket at a rate of 1 basket every 10 seconds. The second employee can process a basket at a rate of 1 every 5 seconds. So when they begin work, the second employee is sitting idle waiting for a basket to arrive. It is able to process the received basket quickly, pass it back, then sits and waits for the first employee to complete filling another basket.

Scenario 2

Conversely, if the first employee can fill a basket at a rate of 1 per second, they’ll have sent all 7 available baskets to the other employee after 7 seconds. The second employee still takes 5 seconds to process a filled basket, but the first employee will have to wait 4 seconds between receiving free baskets after they get going. The second employee now has a backlog of baskets to process because it is much slower than the first employee.

Time lapse chart to help illustrate state of the Reader and Writer thread at different time intervals.

Check out the time lapse chart. One thing that is very clear is that due to the different processing rates, either the Reader or Writer will incur some idle time. Imagine what this might look like if the number of buckets was larger or smaller? And imagine what this might look like if the processing rates were orders of magnitude different?

Use Bigger Baskets

The other adjustment that can be made is to make each basket bigger. Depending on the rate of filling or processing a basket, making all baskets larger can increase overall throughput of the process. If we doubled the size of the basket, the fill rate and process rate would still increase proportionally. So in the first scenario above, the fill rate would be 1 basket every 20 seconds and the processing rate would be 1 basket every 10 seconds. And the second scenario, we’d see a fill rate of 1 basket every 2 seconds and a processing rate of 1 basket every 10 seconds.

Are you seeing a bit of a “so what’s the benefit, why bother?” conundrum here? Hold that thought for a bit.

What Are the T-SQL Parameters?

Skipped over the analogy to the good stuff? Here you go:

BUFFERCOUNT (BC) – Define the number of Backup Buffers to be used
MAXTRANSFERSIZE (MTS) – Define the size of each Backup Buffer

MAXTRANSFERSIZE can range anywhere from 64KB to 4MB. Frankly, I’m not sure what the allowed range of values is for BUFFERCOUNT. But because of the workflow above, there is a point where having too many Backup Buffers are a waste, because neither side can fill or process them fast enough.

Additionally, there’s a resource price to pay in the form of RAM consumed. Very simply, your BACKUP operation will consume BUFFERCOUNT * MAXTRANSFERSIZE in RAM. So if you run a BACKUP with BUFFERCOUNT = 10 and MAXTRANSFERSIZE = 1MB, your operation will consume 10MB of RAM. But if you crank things up, BUFFERCOUNT = 100 and MAXTRANSFERSIZE = 4MB, that’s now 400MB of RAM for your operation.

What about the existing SQL Server Buffer Pool? Backup Buffers are allocated outside of buffer pool memory. If you don’t have enough RAM available, SQL Server will shrink the buffer pool as needed to accommodate your backup buffer requirement. (I plan to talk about overall resource balancing in a future part of this series.)

What’s the Benefit?

So what’s the overall benefit of messing with these two parameters. If one is solely leveraging these two parameters and no other adjustments, not much really. The main bottleneck your BACKUP operation will experience is due to only having 1 Reader and 1 Writer thread. So adding a few more Backup Buffers can help your overall performance a bit, but as you saw with the time lapse examples, if you have different processing rates on one side or the other, the extra backup buffers don’t really help – they’ll just sit in one of the queues waiting on the CPU threads.

It is far more beneficial to make adjustments to BUFFERCOUNT and MAXTRANSFERSIZE if you also make adjustments to the number of Reader and/or Writer threads for your BACKUP operation. And that’s what I’ll be covering in Part 3 of this series, so stay tuned! Thanks for reading!