Clustered Tables
Clustered Index
Clustered indexes sort and store the data rows in the table or view based on their key values. These key values are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be stored in only one order.
The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
In a clustered index structure, data is organized in a B-tree (Balanced Tree), which is highly efficient for searches, insertions, and deletions. Unlike a heap, where data is stored in an unstructured format, a clustered index arranges the table data itself in a sorted order based on the indexed column, allowing quick access to rows.
Example : MS SQL
CREATE TABLE Customers ( CustomerID INT PRIMARY KEY, Name NVARCHAR(50), Age INT, City NVARCHAR(50) );
Information about the indexes
This line means that the Customers table has a clustered index on the CustomerID column, which is also the primary key, making it both unique and sorted. This clustered index is stored in the PRIMARY filegroup. Because it’s a clustered index, the data rows in the table are physically ordered according to the CustomerID values, allowing for efficient retrieval of rows based on this key.
- Use
sp_helpindex
when you need a quick, summary view of all indexes on a single table. - Use
sys.indexes
when you need detailed information or want to filter or analyze index data across the database.
EXECUTE sp_helpindex Customers ;
index_name | index_description | index_keys |
---|---|---|
PK__Customer__A4AE64B802A9C4B4 | clustered, unique, primary key located on PRIMARY | CustomerID |
SELECT i.name AS IndexName, t.name AS TableName, i.index_id, i.type_desc AS IndexType, i.is_unique, i.is_primary_key FROM sys.indexes i JOIN sys.tables t ON i.object_id = t.object_id WHERE i.type = 1;
IndexName | IndexType | index_id | object_id |
---|---|---|---|
PK__Customer__A4AE64B802A9C4B4 | CLUSTERED | 1 | 311672158 |
- The output indicates that a clustered index exists on the CustomerID column in the Customers table. This index was not created explicitly by us; instead, it was automatically generated when we defined the primary key constraint on CustomerID
- By default, SQL Server creates a clustered index for the primary key, ensuring that the CustomerID values are unique and that the data is stored in a sorted order based on this column. This clustered index helps optimize data retrieval by ordering rows physically according to CustomerID.
Example
CREATE TABLE clustTable( myid int IDENTITY(1,1) PRIMARY KEY CLUSTERED ,mychar VARCHAR(3500) DEFAULT REPLICATE('a',3500) ) INSERT INTO dbo.clustTable DEFAULT VALUES;
SELECT * FROM clustTable ;
Result
myid | mychar |
---|---|
1 | aaaa... |
2 | aaaa... |
3 | aaaa... |
4 | aaaa... |
5 | aaaa... |
6 | aaaa... |
7 | aaaa... |
Messages 3:29:29 PMStarted executing query at Line 15 (7 rows affected) Table 'clustTable'. Scan count 1, logical reads 6, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0. Total execution time: 00:00:00.011
DBCC ind(training_db, clustTable, 1) go
PageFID | PagePID | IAMFID | IAMPID | ObjectID | IndexID | PartitionNumber | PartitionID | iam_chain_type | PageType | IndexLevel | NextPageFID | NextPagePID | PrevPageFID | PrevPagePID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 305 | NULL | NULL | 901578250 | 1 | 1 | 72057594043695104 | In-row data | 10 | NULL | 0 | 0 | 0 | 0 |
1 | 312 | 1 | 305 | 901578250 | 1 | 1 | 72057594043695104 | In-row data | 1 | 0 | 1 | 314 | 0 | 0 |
1 | 313 | 1 | 305 | 901578250 | 1 | 1 | 72057594043695104 | In-row data | 2 | 1 | 0 | 0 | 0 | 0 |
1 | 314 | 1 | 305 | 901578250 | 1 | 1 | 72057594043695104 | In-row data | 1 | 0 | 1 | 315 | 1 | 312 |
1 | 315 | 1 | 305 | 901578250 | 1 | 1 | 72057594043695104 | In-row data | 1 | 0 | 1 | 316 | 1 | 314 |
1 | 316 | 1 | 305 | 901578250 | 1 | 1 | 72057594043695104 | In-row data | 1 | 0 | 0 | 0 | 1 | 315 |
The output of the DBCC IND command provides a detailed breakdown of the page-level information in SQL Server for the specified table (clustTable). This command is typically used to understand the structure and storage of data in SQL Server at the page level. Here’s what each of the columns in this output represents:
Column Breakdown:
-
PageFID (File ID): Indicates the file ID within the database where the page is located. In this case, it’s 1 for all rows, meaning all pages reside in the same file.
-
PagePID (Page ID): The page number within the specified file. This, combined with PageFID, uniquely identifies a page. For instance, page 305 would be located in file 1.
-
IAMFID (IAM File ID) and IAMPID (IAM Page ID): If a page is part of an IAM (Index Allocation Map) chain, these columns show the file ID and page ID of the IAM page that maps this page. In this case, only rows with PageType equal to 1 (data pages) have IAM mapping data (IAMFID = 1, IAMPID = 305).
-
ObjectID: The unique identifier for the table or index within the database. Here, 901578250 represents the clustTable object.
-
IndexID: Specifies the index ID on the object. IndexID = 1 denotes a clustered index (the primary key on myid in this case).
-
PartitionNumber: Indicates the partition number of the index. Here, 1 suggests all data is within a single partition.
-
PartitionID: The unique identifier for the partition within which these pages are stored.
-
iam_chain_type: Describes the storage type of the pages. In-row data signifies that data is stored within the rows on these pages.
-
PageType: Represents the type of page:
- 10: Indicates an index page, specifically the root page in the index (topmost level).
- 1: Represents a data page containing actual table data.
- IndexLevel: The level of the index where the page resides:
- 10: Root level in a clustered index (top of the index hierarchy).
- 2: Intermediate level (if applicable).
- 1: Leaf level (contains actual data rows).
-
NextPageFID and NextPagePID: Point to the next page in the same level of the index or data structure. For example, page 312 points to page 314 as its next page.
-
PrevPageFID and PrevPagePID: Point to the previous page in the same level. For example, page 314 points back to page 312.
Interpreting the Output:
- This output shows the structure of pages in the clustered index for clustTable, detailing the root page (with IndexLevel = 10), as well as intermediate and data pages (with IndexLevel = 1).
- Page 305 serves as a root page (or higher-level index page).
- Pages 312, 313, and 314 are data pages at the leaf level (IndexLevel = 1) containing the actual rows of data in clustTable.
Clustered Index Scan vs. Clustered Index Seek
Clustered Index Scan: Reads every row in the clustered index, typically resulting in a full table scan. Scans are generally less efficient than seeks, especially for large tables, as they may lead to higher I/O costs.
SELECT * FROM clustTable ;
Fig: Query Plan - Clustered Index Scan
Clustered Index Seek: Used when SQL Server can directly locate specific rows based on indexed column values, making it much faster than a scan. For instance, if you had a query like SELECT * FROM clustTable WHERE myid = 5;, SQL Server would likely perform a Clustered Index Seek because it can directly access the page containing myid = 5 instead of reading the entire table.
SELECT * FROM clustTable where myid=5 ;
Fig: Query Plan - Clustered Index Scan
Single Table Clustering
Single Table Clustering involves organizing the data of a single table based on the values of one or more columns, creating a clustered index on those columns. In a clustered table, the data rows themselves are physically ordered and stored according to the index key values of one or more columns. This type of clustering is particularly effective for queries that frequently search based on the clustered key values.
How Single Table Clustering Works:
- The clustered index is created on one or more columns of a single table.
- The data is stored on disk in an order that aligns with this index, so retrieval based on the indexed column(s) is fast.
- Each table can only have one clustered index because the data can be physically sorted in only one order.
- This approach reduces the number of I/O operations required when querying because the data is already sorted according to the clustered key, making range queries, joins, and sorting operations faster.
Example
CREATE TABLE People ( PersonID INT PRIMARY KEY IDENTITY(1,1), FirstName NVARCHAR(50), LastName NVARCHAR(50), Age INT, City NVARCHAR(50) ); -- INSERT DATA IN PEOPLE TABLE DECLARE @i INT = 1; WHILE @i <= 1000 BEGIN INSERT INTO People (FirstName, LastName, Age, City) VALUES ( 'First' + CAST(@i AS NVARCHAR(50)), 'Last' + CAST(@i AS NVARCHAR(50)), FLOOR(RAND() * 100) + 1, -- Random age between 1 and 100 'City' + CAST((@i % 10) AS NVARCHAR(50)) -- Random cities (City0 to City9) ); SET @i = @i + 1; END; -- IF ALREADY CREATED CLUSTERED INDEX, THEN REMOVE IT ALTER TABLE People DROP CONSTRAINT PK__People__AA2FFB859D450124; -- CREATE CLUSTERED INDEX CREATE CLUSTERED INDEX IX_People_Age ON People (Age);
EXECUTE sp_helpindex People ;
index_name | index_description | index_keys |
---|---|---|
IX_People_Age | clustered located on PRIMARY | Age |
DBCC ind(training_db, People, 1) GO
PageFID | PagePID | IAMFID | IAMPID | ObjectID | IndexID | PartitionNumber | PartitionID | iam_chain_type | PageType | IndexLevel | NextPageFID | NextPagePID | PrevPageFID | PrevPagePID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 308 | NULL | NULL | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 10 | NULL | 0 | 0 | 0 | 0 |
1 | 7872 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7880 | 0 | 0 |
1 | 7880 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7881 | 1 | 7872 |
1 | 7881 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7882 | 1 | 7880 |
1 | 7882 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7883 | 1 | 7881 |
1 | 7883 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7884 | 1 | 7882 |
1 | 7884 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7885 | 1 | 7883 |
1 | 7885 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7886 | 1 | 7884 |
1 | 7886 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7887 | 1 | 7885 |
1 | 7887 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 0 | 0 | 1 | 7886 |
1 | 7912 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 2 | 1 | 0 | 0 | 0 | 0 |
SELECT * FROM People WHERE Age = 7 ;
PersonID | FirstName | LastName | Age | City |
---|---|---|---|---|
62 | First62 | Last62 | 7 | City2 |
29 | First29 | Last29 | 7 | City9 |
58 | First58 | Last58 | 7 | City8 |
21 | First21 | Last21 | 7 | City1 |
229 | First229 | Last229 | 7 | City9 |
345 | First345 | Last345 | 7 | City5 |
302 | First302 | Last302 | 7 | City2 |
414 | First414 | Last414 | 7 | City4 |
431 | First431 | Last431 | 7 | City1 |
368 | First368 | Last368 | 7 | City8 |
572 | First572 | Last572 | 7 | City2 |
808 | First808 | Last808 | 7 | City8 |
650 | First650 | Last650 | 7 | City0 |
980 | First980 | Last980 | 7 | City0 |
889 | First889 | Last889 | 7 | City9 |
Query Plan - Single Table Clustering
DBCC ind(training_db, People, 1) GO
PageFID | PagePID | IAMFID | IAMPID | ObjectID | IndexID | PartitionNumber | PartitionID | iam_chain_type | PageType | IndexLevel | NextPageFID | NextPagePID | PrevPageFID | PrevPagePID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 308 | NULL | NULL | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 10 | NULL | 0 | 0 | 0 | 0 |
1 | 7872 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7880 | 0 | 0 |
1 | 7880 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7881 | 1 | 7872 |
1 | 7881 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7882 | 1 | 7880 |
1 | 7882 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7883 | 1 | 7881 |
1 | 7883 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7884 | 1 | 7882 |
1 | 7884 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7885 | 1 | 7883 |
1 | 7885 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7886 | 1 | 7884 |
1 | 7886 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 1 | 7887 | 1 | 7885 |
1 | 7887 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 1 | 0 | 0 | 0 | 1 | 7886 |
1 | 7912 | 1 | 308 | 997578592 | 1 | 1 | 72057594043957248 | In-row data | 2 | 1 | 0 | 0 | 0 | 0 |
Let's go through the key parts of the data you've posted to understand how SQL Server is organizing this index:
Key Columns in the Output
- PageFID and PagePID: File ID and Page ID of each page, uniquely identifying each page in the database.
- PageType: Type of the page, where:
- 10 generally represents an IAM (Index Allocation Map) page.
- 1 represents a data page at the leaf level (where actual data is stored).
- 2 typically represents an intermediate level page (pointer).
- IndexLevel: The level of the page within the B-tree. NULL for IAM pages, 1 for leaf-level pages, 2 for intermediate pages.
- NextPageFID and NextPagePID: Links to the next page at the same level. Used for scanning within the same level.
- PrevPageFID and PrevPagePID: Links to the previous page at the same level.
You can view the data stored in a specific leaf page in SQL Server, but it requires some low-level commands and careful handling. SQL Server provides the DBCC PAGE command to inspect the contents of a specific page in a database. This command allows you to look at the raw data stored on a page, including leaf pages.
DBCC PAGE (5, 1, 7887, 3);
SELECT *, sys.fn_PhysLocFormatter(%%physloc%%) AS PageInfo FROM People WHERE Age IN (7, 75);
PersonID | FirstName | LastName | Age | City | PageInfo |
---|---|---|---|---|---|
62 | First62 | Last62 | 7 | City2 | (1:7872:68) |
29 | First29 | Last29 | 7 | City9 | (1:7872:69) |
58 | First58 | Last58 | 7 | City8 | (1:7872:70) |
21 | First21 | Last21 | 7 | City1 | (1:7872:71) |
229 | First229 | Last229 | 7 | City9 | (1:7872:72) |
345 | First345 | Last345 | 7 | City5 | (1:7872:73) |
302 | First302 | Last302 | 7 | City2 | (1:7872:74) |
414 | First414 | Last414 | 7 | City4 | (1:7872:75) |
431 | First431 | Last431 | 7 | City1 | (1:7872:76) |
368 | First368 | Last368 | 7 | City8 | (1:7872:77) |
572 | First572 | Last572 | 7 | City2 | (1:7872:78) |
808 | First808 | Last808 | 7 | City8 | (1:7872:79) |
650 | First650 | Last650 | 7 | City0 | (1:7872:80) |
980 | First980 | Last980 | 7 | City0 | (1:7872:81) |
889 | First889 | Last889 | 7 | City9 | (1:7872:82) |
948 | First948 | Last948 | 75 | City8 | (1:7885:53) |
939 | First939 | Last939 | 75 | City9 | (1:7885:54) |
755 | First755 | Last755 | 75 | City5 | (1:7885:55) |
442 | First442 | Last442 | 75 | City2 | (1:7885:56) |
246 | First246 | Last246 | 75 | City6 | (1:7885:57) |
165 | First165 | Last165 | 75 | City5 | (1:7885:58) |
124 | First124 | Last124 | 75 | City4 | (1:7885:59) |
147 | First147 | Last147 | 75 | City7 | (1:7885:60) |
72 | First72 | Last72 | 75 | City2 | (1:7885:61) |
Multi-Level Clustering
Multi-Level Clustering (sometimes called clustered multi-level indexes or composite clustering) extends the concept of single table clustering to multiple tables or groups of data based on hierarchical or composite keys. This approach is beneficial when tables are frequently joined on common fields or when data is naturally hierarchical.
How Multi-Level Clustering Works:
- In multi-level clustering, tables that are frequently queried together (such as through joins) are stored together based on shared or related clustered keys.
- The primary clustering is done on the highest-level key, and additional clustering levels organize data within the clustered groups based on secondary key values.
- This approach minimizes I/O operations when accessing related data from multiple tables, as the data is co-located on disk in a way that matches common access patterns.
Example of Multi-Level Clustering:
Consider an e-commerce database with Customers and Orders tables:
- The Customers table could be clustered based on CustomerID.
- The Orders table could be clustered based on CustomerID and then OrderID.
Creating the table
- The Customers table has a primary key on CustomerID, which automatically creates a clustered index on that column.
- The Orders table has a primary key on OrderID, which automatically creates a clustered index on OrderID.
CREATE TABLE Customers ( CustomerID INT PRIMARY KEY CLUSTERED, LastName NVARCHAR(50), FirstName NVARCHAR(50), City NVARCHAR(50) ); CREATE TABLE Orders ( OrderID INT PRIMARY KEY CLUSTERED, CustomerID INT, OrderDate DATETIME, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID) );
Creating a Multi-Column Clustered Index on CustomerID and OrderDate
CREATE CLUSTERED INDEX IX_Orders_CustomerID_OrderDate ON Orders (CustomerID, OrderDate);
EXECUTE sp_helpindex Customers ; EXECUTE sp_helpindex Orders ;
index_name | index_description | index_keys |
---|---|---|
PK__Customer__A4AE64B8703B9D53 | clustered, unique, primary key located on PRIMARY | CustomerID |
index_name | index_description | index_keys |
---|---|---|
IX_Orders_CustomerID_OrderDate | clustered located on PRIMARY | CustomerID, OrderDate |
DBCC ind(training_db, Customers, 1) GO
PageFID | PagePID | IAMFID | IAMPID | ObjectID | IndexID | PartitionNumber | PartitionID | iam_chain_type | PageType | IndexLevel | NextPageFID | NextPagePID | PrevPageFID | PrevPagePID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 307 | NULL | NULL | 1621580815 | 1 | 1 | 72057594045857792 | In-row data | 10 | NULL | 0 | 0 | 0 | 0 |
1 | 7840 | 1 | 307 | 1621580815 | 1 | 1 | 72057594045857792 | In-row data | 1 | 0 | 1 | 7842 | 0 | 0 |
1 | 7841 | 1 | 307 | 1621580815 | 1 | 1 | 72057594045857792 | In-row data | 2 | 1 | 0 | 0 | 0 | 0 |
1 | 7842 | 1 | 307 | 1621580815 | 1 | 1 | 72057594045857792 | In-row data | 1 | 0 | 1 | 7843 | 1 | 7840 |
1 | 7843 | 1 | 307 | 1621580815 | 1 | 1 | 72057594045857792 | In-row data | 1 | 0 | 1 | 7844 | 1 | 7842 |
1 | 7844 | 1 | 307 | 1621580815 | 1 | 1 | 72057594045857792 | In-row data | 1 | 0 | 1 | 7845 | 1 | 7843 |
1 | 7845 | 1 | 307 | 1621580815 | 1 | 1 | 72057594045857792 | In-row data | 1 | 0 | 1 | 7846 | 1 | 7844 |
1 | 7846 | 1 | 307 | 1621580815 | 1 | 1 | 72057594045857792 | In-row data | 1 | 0 | 1 | 7847 | 1 | 7845 |
1 | 7847 | 1 | 307 | 1621580815 | 1 | 1 | 72057594045857792 | In-row data | 1 | 0 | 1 | 7848 | 1 | 7846 |
1 | 7848 | 1 | 307 | 1621580815 | 1 | 1 | 72057594045857792 | In-row data | 1 | 0 | 1 | 7849 | 1 | 7847 |
1 | 7849 | 1 | 307 | 1621580815 | 1 | 1 | 72057594045857792 | In-row data | 1 | 0 | 1 | 7850 | 1 | 7848 |
1 | 7850 | 1 | 307 | 1621580815 | 1 | 1 | 72057594045857792 | In-row data | 1 | 0 | 0 | 0 | 1 | 7849 |
-- Insert 1000 dummy records into Customers DECLARE @i INT = 1; WHILE @i <= 1000 BEGIN INSERT INTO Customers (CustomerID, LastName, FirstName, City) VALUES (@i, CONCAT('LastName', @i), CONCAT('FirstName', @i), 'City' + CAST((@i % 10) AS NVARCHAR(10))); SET @i = @i + 1; END; -- Insert 1000 dummy records into the Orders table DECLARE @Counter INT = 1; WHILE @Counter <= 1000 BEGIN INSERT INTO Orders (OrderID, CustomerID, OrderDate) VALUES ( @Counter, -- OrderID is simply the counter (RAND() * 1000) + 1, -- Random CustomerID between 1 and 1000 DATEADD(DAY, (RAND() * 365), '2023-01-01') -- Random OrderDate in 2023 ); SET @Counter = @Counter + 1; END;
Now that we have created the multi-column clustered index on the Orders table, let's run some queries to see how the multi-column clustered index works.
Query 1: Find all orders for a specific customer and a range of order dates
When you run a SELECT query filtering on CustomerID and OrderDate, the database engine will use the multi-column clustered index to efficiently retrieve the data.
-- Query 1: Find all orders for a specific customer and a range of order dates SELECT o.OrderID, o.CustomerID, o.OrderDate, c.LastName, c.FirstName FROM Orders o JOIN Customers c ON o.CustomerID = c.CustomerID WHERE o.CustomerID = 500 AND o.OrderDate BETWEEN '2023-01-01' AND '2023-12-31' ORDER BY o.OrderDate;
OrderID | CustomerID | OrderDate | LastName | FirstName |
---|---|---|---|---|
529 | 500 | 2023-03-02 00:00:00.000 | LastName500 | FirstName500 |
441 | 500 | 2023-08-22 00:00:00.000 | LastName500 | FirstName500 |
Query Plan : Query 1 Multi-table clustering
Reference : https://www.youtube.com/watch?v=FJTk310B4A8