In this post, I wanted to write about the covering index and key lookup and how adding covering indexes can help increase performance by reducing lookups to the table data.
It’s helpful to know what each of these terms mean and then we will look at an example scenario further down the post.
What is a covering index?
A covering index is a non-clustered index which includes all columns referenced in the query and therefore, the optimizer does not have to perform an additional lookup to the table in order to retrieve the data requested. As the data requested is all indexed by the covering index, it is a faster operation.
To be considered as a covering index, all columns referenced by the query must be contained in the index. So this means all columns in the SELECT, JOIN, WHERE, GROUP BY, HAVING etc etc.
What is a key lookup?
A key lookup is an extra read which the query has to make in the absence of a suitable covering index.
In the query, the number of columns involved in the select statement may exceed the number of columns in the non-clustered index and when this happens, for each row in the result set, a key lookup is performed. So 500 rows in the result set equates to 500 extra key lookups.
Key lookup medicine – AKA the covering index!
Lets look at a simple query to help explain what I was talking about above. Imagine that dbo.MyTable consists of just two columns “Col1” and “Col2”.
[sourcecode language=’sql’]SELECT Col1 FROM dbo.MyTable WHERE Col2 = 1;[/sourcecode]
For this query to be considered covered by an index, columns “Col1” and “Col2” need to be part of the same index and you could create different variants of indexes which would act as a covering index.
For example, you could do:
[sourcecode language=’sql’]CREATE NONCLUSTERED INDEX IX_MyCoveringIndex
ON dbo.MyTable (Col1, Col2);
CREATE NONCLUSTERED INDEX IX_MyCoveringIndex
ON dbo.MyTable (Col2, Col1);[/sourcecode]
You could also use included columns:
[sourcecode language=’sql’]CREATE NONCLUSTERED INDEX IX_MyCoveringIndex
ON dbo.MyTable (Col1) INCLUDE (Col2);
CREATE NONCLUSTERED INDEX IX_MyCoveringIndex
ON dbo.MyTable (Col2) INCLUDE (Col1);[/sourcecode]
So the columns need only be found in the index and the column order or whether they are an “included column” or not does not matter. It is important to remember that the execution plan and performance may vary considerably depending on which index is applied however, so choosing the correct covering index is important.
Lets look at how the covering index removes the need for a key lookup.
I am going to use this simple query against a copy of the AdventureWorks2012 database using SQL Server 2012 Express.
I’ll enable the execution plan output and enable STATISTICS IO output.
[sourcecode language=’sql’]SET STATISTICS IO ON;
SELECT CustomerID, OrderDate
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]
WHERE CustomerID = 11000;[/sourcecode]
There is an non-clustered index on the CustomerID column. The clustered index on this table is on a column not referenced by the query called SalesOrderID.
The output looks like this:
STATISICS IO output….
Table 'SalesOrderHeader'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Now lets look at the execution plan
You can see that 68{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} of the total cost of this query has been allocated to the key lookup against the clustered index.
Let’s see what happens to it after we modify the index to cover the columns in the query.
[sourcecode language=’sql’]DROP INDEX Sales.SalesOrderHeader.IX_SalesOrderHeader_CustomerID;
CREATE NONCLUSTERED INDEX IX_SalesOrderHeader_CustomerID_OrderDate
ON Sales.SalesOrderHeader(CustomerID, OrderDate);[/sourcecode]
The key lookup is gone (well you were expecting that right 🙂 ) and now the query is retrieving the data entirely from the covering index.
Lets look at STATISTICS IO again
Table 'SalesOrderHeader'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
A big reduction can be seen in the number of logical reads compared to the first query. Logical reads are the pages read from the data cache regardless of whether the pages had to be read from disk or not. In both cases, they did not need to be read from disk (physical reads = 0), both queries read their pages direct from the data cache. As the second one read less pages from the data cache, that is still a good indicator that the covering index has improved performance.
If I ran DBCC DROPCLEANBUFFERS (do not do this on a production server as you could see massive performance reduction until the cache is re-populated) before each query and index change in order to force the optimizer to read from disk, then STATISTICS IO would look like:
--BEFORE covering index Table 'SalesOrderHeader'. Scan count 1, logical reads 11, physical reads 7, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. --AFTER covering index Table 'SalesOrderHeader'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
You can see the differences in the number of pages read from disk and this helps to show the benefit of having the covering index in place. Once the data pages are read from disk, they enter the data cache. All queries get their results by reading the data cache.
The second query reads less data pages into the data cache and subsequently, there is more room in the data cache. More room means more pages can be read into cache and the frequency for disk reads becomes less with performance increasing.
Should you apply a covering index in all cases?
Indexing is a complex subject and there is no definitive “one size fits all” solution to indexing a database. You have to know what is possible and then assess your needs. You should always test your index changes properly before putting them into production.
An analysis of server workload will help you figure this out. A few things to consider:
- Is existing query performance acceptable?
- How often is the query running?
- How selective is the query?
- Is the table you are adding the index to write intensive?
Remember, a query may perform well and return results in good time. What if it is running many hundreds or thousands of times a minute? Cumulatively without a covering index, it could be generating many more extra IO’s than necessary.
You need to assess whether the frequency and selectivity of the query warrants adding a covering index. Covering indexes have the greatest benefit to non-selective queries.
In addition, if your system is write intensive, adding an index could have the opposite affect of what you are trying to do by slowing down overall database performance. This is because for each update to the data, the indexes have to be updated as well.
For more information on SQL Server index creation, you can visit this link which talks about the covering index and included columns.
Anandan Kanagarajan says
Excellent post and simple and clear definitions about Key Lookup & Covering Index. Thanks dude and please continue to blog…
Andy Hayes says
Many thanks Anandan, I’m glad you liked the post.
Anandan Kanagarajan says
Andy Hayes,
It will be great if you continue this topic with Filtered Index (from 2008 on wards) and Column store index (from 2012 on wards)
Also shed some light on RID Look up.
Andy Hayes says
Some good ideas there Anandan, keep checking back for further updates! 🙂
Jéssica says
Excellent Post. I’m Brasilian DBA and I didn’t find any good explanation about this, like i found here. Thank you for this blog. Please, continue the blog.
Andy Hayes says
Many thanks for your comment Jéssica, I’m glad you found the article useful! 🙂
sunil kumar anna says
Thanks Andy.. it’s is useful