Indexes are critical for fast data retrieval and efficient query performance in SQL Server. Clustered indexes define the physical storage order of data rows based on index key values. In this comprehensive 2600+ words guide, we will do an in-depth exploration of clustered indexes in SQL Server.

What are Clustered Indexes?

Clustered indexes physically sort the rows of a table or view based on the chosen index key values. The leaf level pages of a clustered index contain the actual data pages of the table.

Key Properties:

  • Tables can have only one clustered index.
  • Leaf level nodes contain data pages, not pointers to other pages.
  • Data insertion is expensive as rows are first sorted based on cluster key values before getting inserted.
  • Optimized for retrieving ranges of data very fast.
  • By default, primary key constraint creates a clustered index if none exists already.

Having defined what clustered indexes are, let us now understand how they physically organize data.

How Clustered Indexes Physically Store Data

Consider the below simple inventory table with some starter rows:

CREATE TABLE Inventory (
    InventoryID int IDENTITY(1,1) PRIMARY KEY, 
    ProductName varchar(100) NOT NULL,
    Quantity int NOT NULL DEFAULT 0, 
    Price money NOT NULL
);

INSERT INTO Inventory 
VALUES (‘Product1‘, 10, 25.50),
       (‘Product2‘, 5, 10.75),
       (‘Product3‘, 8, 17.99); 

Here, the InventoryID column is set as the primary key, so SQL Server automatically creates a clustered index on it.

When rows are inserted, following happens:

1. Data is first sorted by InventoryID

2. Sorted rows get physically stored adjacent to each other in leaf level pages

3. New inserts continue filling up next pages in sorted ID order

For example, the initial storage layout will be:

Page1:
InventoryID | ProductName | Quantity | Price
        1 | Product1        10 |  25.50 
        2 | Product2         5 |  10.75
        3 | Product3         8 |  17.99

As InventoryID is the cluster key, its values determine the physical placement even during updates/deletes.

Benefits of Clustered Indexes

Some major benefits provided by clustered indexes:

Rapid Data Access in Range Queries

As data sits sorted physically on disk, retrieving records having InventoryIDs between a given range e.g. 5000-8000 is extremely fast without requiring expensive sorting.

Optimized for OLTP Workloads

Frequent inserts, updates, deletes in online transaction processing (OLTP) systems have optimized performance due to ever-increasing nature of clustered keys plus availability of free space for inserts.

More Efficient Disk I/O

Data access is more sequential instead of random in heap tables. Related data sits together improving cache hits.

When Should We Use Clustered Indexes?

Ever-Increasing Columns: Great choices for cluster keys are monotonically increasing columns like identity, sequence, datetime etc. New row inserts are faster.

Frequently Range-Searched Columns: Attributes like dates, categories often filtered in range clauses, so clustering them speeds up queries.

Columns Frequently Accessed in Joins: Join speed improves significantly when joined columns are clustered.

High Cardinality Columns: Values having a high degree of uniqueness help avoiding page splits.

Conversely, avoid massive columns like varchar(8000) in cluster keys which only increase I/O overhead.

Impact of Clustered Indexes on Table Queries

Let‘s analyze the performance difference clustered indexes can make with some examples.

1. Table Scan vs Clustered Index Seek

--Inventory table without clustered index 
CREATE TABLE Inventory_Heap (
    InventoryID int, 
    ProductName varchar(100) NOT NULL,
    Quantity int NOT NULL DEFAULT 0,
    Price money NOT NULL
);

--Insert 1 million rows..

--Query to find a ProductName 
SELECT * FROM Inventory_Heap
WHERE ProductName = ‘Product1‘;

--Performs TABLE SCAN (~1 million rows scanned)

Now with clustered index on InventoryID column:

--Inventory table with clustered index
CREATE TABLE Inventory_Clustered (
    InventoryID int IDENTITY(1,1) PRIMARY KEY,
    ProductName varchar(100) NOT NULL, 
   Quantity int NOT NULL,
   Price money NOT NULL );

--Insert 1 million rows.. 

--Same query now performs clustered index SEEK  
SELECT * FROM Inventory_Clustered
WHERE ProductName = ‘Product1‘; 

--Reads only relevant pages (~10 rows) 

Huge improvement!

2. Improved Range Query Performance

-- fetching IDs 1-1000 without clustered index 

SELECT * FROM Inventory_Heap
WHERE InventoryID BETWEEN 1 AND 1000;

-- Table scan checks 1 million rows 

-- Now with clustered index
SELECT * FROM Inventory_Clustered
WHERE InventoryID BETWEEN 1 AND 1000;  

-- Direct range seek on clustered key  
-- Reads only 1000 relevant rows

Up to 1000X faster result!

Guidelines for Designing Clustered Indexes

Here are some useful tips:

Choose Unique and Ever-Increasing Columns

Optimal choices are identity, incrementing integers, datetimes etc. This guarantees inserts are fast.

Drop Unused Additional Nonclustered Indexes

Too many indexes create overhead hurting insert/update perf. Analyze & remove unused nonclustered indexes.

Start Small, Add Nonclustered Indexes Judiciously

Keep clustered index small with few important columns. Add nonclustered indexes on reporting filters.

Rewrite Queries Accessing Huge Range of Pages

If queries frequently scan large index ranges, review where clause choices.

Rebuild Indexes Periodically

Frequent inserts/updates fragment indexes over time. Rebuild them occasionally.

Methods to Create Clustered Indexes

We can create clustered indexes in SQL Server using:

1. CREATE TABLE Statement

CREATE TABLE Orders 
(
    OrderID int IDENTITY(1,1) PRIMARY KEY,
    CustomerName varchar(100)
);

Primary key constraint implicitly creates clustered index.

2. CREATE CLUSTERED INDEX Statement

CREATE TABLE Orders
(
    OrderID int,
    CustomerName varchar(100)
);

CREATE CLUSTERED INDEX idx_orderbyid 
ON Orders (OrderID);

Explicitly defining clustered index without primary key.

3. Table Designer in SSMS

Using visual Table Designer to define/modify clustered index.

Now let us look at some detailed examples.

SQL Server Create Clustered Index: Examples

-- Sample table
CREATE TABLE Data 
(
    ID int IDENTITY(1,1),
    Name varchar(100),
    DateAdded date
);  

-- Example 1: 
-- Create clustered index on primary key column

ALTER TABLE Data
ADD CONSTRAINT pk_id PRIMARY KEY CLUSTERED (ID);


-- Example 2:  
-- Clustered index combining 
-- primary key + nonkey column

CREATE CLUSTERED INDEX idx_data 
ON Data (ID, DateAdded);

-- Example 3: 
-- Unique clustered index 
-- on non-primary key

CREATE UNIQUE CLUSTERED INDEX idx_dateadded
ON Data (DateAdded);

Now let us go over some key clustered index maintenance tasks.

Maintaining Clustered Indexes

Here are common admin tasks around managing clustered indexes:

1. Rebuild To Restore Order

Frequent inserts/updates fragment clustered index order. Rebuilding it physically re-sorts rows.

ALTER INDEX ALL ON Data
REBUILD; 

2. Reorganize To Defrag Pages

Saves overhead of rebuild by just defragmenting pages.

ALTER INDEX ALL ON Data 
REORGANIZE

3. Switch Clustered Indexes

Changes existing clustered index to another column without dataloss. The old clustered index is retained as a nonclustered index automatically.

CREATE UNIQUE CLUSTERED INDEX idx_new  
ON Data (Name)  
WITH (DROP_EXISTING = ON);

4. Disable Clustered Indexes

Helpful when bulk inserting large volumes of data. Can be re-enabled post inserts.

ALTER INDEX idx_data ON Data DISABLE;  

-- Bulk insert data..

ALTER INDEX idx_data ON Data REBUILD; 

Clustered Index Architecture Internals

Understanding the internal structure of clustered indexes helps optimize design choices:

  • Clustered indexes utilize Balanced Tree (B-Tree) data structure to organize data
  • Consists of a root node, intermediate pages directory, leaf pages with data
  • Leaf pages are doubly-linked for fast ORDER BY queries
  • All rows with the same cluster key reside in same leaf page
  • Additional nonclustered indexes use row locators to point back to clustered row

B-Tree Characteristics

  • Fewer levels (height) ensures lesser IOs
  • Values ranges divided logically among intermediate pages
  • Intermediate pages guide search from root to correct leaf
  • Leaf pages contain max rows based on fill factor setting

Optimizing Performance

  • Monitor height by checking index metadata
  • Lower height by carefully selecting cluster keys
  • Configure fill factor to balance page splits

Nonclustered Index Overview

While we concentrated on clustered indexes, nonclustered indexes also serve an important role in SQL Server performance optimization. Below is a brief primer.

Key Properties

  • Do not affect physical order of rows
  • Up to 249 nonclustered indexes per table
  • Key columns get stored both in index leaf level and actual rows
  • 2x storage overhead but super fast seeks
  • Much lower insert/update impact

When Useful

  • Search keys are different from clustered index definition
  • Want lightning fast row lookups
  • Queries need alternative access path

By combining both index types effectively as per data usage patterns, we can build high performance SQL Server databases.

Summary

Some key takeways:

1. Clustered indexes physically sort data for fast I/O
2. Carefully select ever-increasing columns for efficiency
3. Dropping unused indexes improve throughput
4. Regularly rebuild indexes to restore ordering
5. Nonclustered Indexes complement for fast seeks
6. Understand internals to optimize designs

I hope this 2600+ words detailed guide helps on mastering clustered indexes in SQL Server! Let me know if you have any other related questions.

Similar Posts