DBA Diaries

Thoughts and experiences of a DBA working with SQL Server and MySQL

T-SQL – Determine the Difference in Values Between Columns in Different Rows

Posted on June 21, 2015 Written by Andy Hayes Leave a Comment

Have you ever needed to calculate the difference between columns in two or more rows in a table using a T-SQL query? For example you might want to run some trend analysis or audit data for changes. Thankfully using ROW_NUMBER() and Common Table Expressions (CTE’s), this can easily be achieved.

--create our table
DECLARE @TableRows TABLE
(
DateAdded SMALLDATETIME,
Points TINYINT
);

--insert some data
INSERT INTO @TableRows
VALUES
('20150615 15:11:05',10),
('20150615 16:04:23',15),
('20150615 16:23:01',21),
('20150615 16:30:50',22),
('20150615 17:04:07',30);

--set up a CTE which we will perform a self join on
WITH ExampleCTE
AS
(SELECT 
	  ROW_NUMBER() OVER(ORDER BY DateAdded) AS RowNum
	, DateAdded
	, Points
FROM @TableRows)

--now query the CTE using the self join to get our result
SELECT 
	  t1.DateAdded
	, t2.DateAdded
	, t1.Points
	, t2.Points
	, t1.Points - t2.Points AS PointsDifference
FROM ExampleCTE t1
LEFT JOIN ExampleCTE t2 ON T1.RowNum = T2.RowNum + 1

The resulting output above shows the calculated column “PointsDifference” containing the difference between the current row and the row previous to it.

Calculate Difference Between Columns in Different Rows

The magic happens in the LEFT JOIN part of the query. By adding 1 to the value of the RowNum column, the resulting calculation within the join allows the analysis between the “Points” columns to be accurately returned.

So simple, so effective. I love T-SQL! 🙂

Filed Under: All Articles, SQL Tips and Tricks Tagged With: sql server, sql server 2008, sql server 2012, t-sql

SQL Server Certification – 2 for 1 exam offer

Posted on April 29, 2012 Written by Andy Hayes Leave a Comment

I came across this today and wanted to share it with you. If you are interested in SQL Server certification then for a limited time, Microsoft are offering “2 for 1” on SQL Server exams.

Between April 11 2012 and June 30 2012, you can purchase a SQL Server exam at full price and then once you have taken it you will be emailed a voucher for the second exam at no additional cost.

Please note that the voucher is for the exam in the next version of your chosen technology.

So for example, you can purchase an exam in SQL Server 2008 and once you have sat the exam, you will be emailed a voucher for an exam in SQL Server 2012 once the new exams have been released.

MCSE SQL Server 2012

The voucher code will be valid for 90 days after the availability of the next version certification.

If you sign up and sit one of the following exams, then the voucher will be available to you.

Exam 70-432 : TS: Microsoft SQL Server 2008, Installation and Maintenance
Exam 70-433 : TS: Microsoft SQL Server 2008, Database Development
Exam 70-448 : TS: Microsoft SQL Server 2008, Business Intelligence Development and Maintenance
Exam 70-450 : PRO: Designing, Optimizing and Maintaining a Database Server Infrastructure using Microsoft SQL Server 2008
Exam 70-451 : PRO: Designing Database Solutions and Data Access Using Microsoft SQL Server 2008
Exam 70-452 : PRO: Designing a Business Intelligence Infrastructure Using Microsoft SQL Server 2008

You can purchase up to five “2 for 1” exam vouchers per technology path and on the sign up page, there are other technologies to choose from.

The link to the sign up page is here

Filed Under: All Articles, Career Development Tagged With: certification, mcitp, mcsa, mcse, mcts, sql server 2008, sql server 2012

SQL Server covering index and key lookup performance

Posted on April 15, 2012 Written by Andy Hayes 7 Comments

covering index performance increaseIn this post, I wanted to write about the covering index and key lookup and how adding covering indexes can help increase performance by reducing lookups to the table data.

It’s helpful to know what each of these terms mean and then we will look at an example scenario further down the post.

What is a covering index?

A covering index is a non-clustered index which includes all columns referenced in the query and therefore, the optimizer does not have to perform an additional lookup to the table in order to retrieve the data requested. As the data requested is all indexed by the covering index, it is a faster operation.

To be considered as a covering index, all columns referenced by the query must be contained in the index. So this means all columns in the SELECT, JOIN, WHERE, GROUP BY, HAVING etc etc.

What is a key lookup?

A key lookup is an extra read which the query has to make in the absence of a suitable covering index.

In the query, the number of columns involved in the select statement may exceed the number of columns in the non-clustered index and when this happens, for each row in the result set, a key lookup is performed. So 500 rows in the result set equates to 500 extra key lookups.

Key lookup medicine – AKA the covering index!

Lets look at a simple query to help explain what I was talking about above. Imagine that dbo.MyTable consists of just two columns “Col1” and “Col2”.

SELECT Col1 FROM dbo.MyTable WHERE Col2 = 1;

For this query to be considered covered by an index, columns “Col1” and “Col2” need to be part of the same index and you could create different variants of indexes which would act as a covering index.

For example, you could do:

CREATE NONCLUSTERED INDEX IX_MyCoveringIndex
ON dbo.MyTable (Col1, Col2);

CREATE NONCLUSTERED INDEX IX_MyCoveringIndex
ON dbo.MyTable (Col2, Col1);

You could also use included columns:

CREATE NONCLUSTERED INDEX IX_MyCoveringIndex
ON dbo.MyTable (Col1) INCLUDE (Col2);

CREATE NONCLUSTERED INDEX IX_MyCoveringIndex
ON dbo.MyTable (Col2) INCLUDE (Col1);

So the columns need only be found in the index and the column order or whether they are an “included column” or not does not matter. It is important to remember that the execution plan and performance may vary considerably depending on which index is applied however, so choosing the correct covering index is important.

Lets look at how the covering index removes the need for a key lookup.

I am going to use this simple query against a copy of the AdventureWorks2012 database using SQL Server 2012 Express.

I’ll enable the execution plan output and enable STATISTICS IO output.

SET STATISTICS IO ON;

SELECT CustomerID, OrderDate
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]
WHERE CustomerID = 11000;

There is an non-clustered index on the CustomerID column. The clustered index on this table is on a column not referenced by the query called SalesOrderID.

The output looks like this:

STATISICS IO output….

Table 'SalesOrderHeader'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Now lets look at the execution plan

covering index key lookup

You can see that 68{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} of the total cost of this query has been allocated to the key lookup against the clustered index.

Let’s see what happens to it after we modify the index to cover the columns in the query.

DROP INDEX Sales.SalesOrderHeader.IX_SalesOrderHeader_CustomerID;

CREATE NONCLUSTERED INDEX IX_SalesOrderHeader_CustomerID_OrderDate
ON Sales.SalesOrderHeader(CustomerID, OrderDate);

covering index key lookup

The key lookup is gone (well you were expecting that right 🙂 ) and now the query is retrieving the data entirely from the covering index.

Lets look at STATISTICS IO again

Table 'SalesOrderHeader'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

A big reduction can be seen in the number of logical reads compared to the first query. Logical reads are the pages read from the data cache regardless of whether the pages had to be read from disk or not. In both cases, they did not need to be read from disk (physical reads = 0), both queries read their pages direct from the data cache. As the second one read less pages from the data cache, that is still a good indicator that the covering index has improved performance.

If I ran DBCC DROPCLEANBUFFERS (do not do this on a production server as you could see massive performance reduction until the cache is re-populated) before each query and index change in order to force the optimizer to read from disk, then STATISTICS IO would look like:

--BEFORE covering index
Table 'SalesOrderHeader'. Scan count 1, logical reads 11, physical reads 7, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

--AFTER covering index
Table 'SalesOrderHeader'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

You can see the differences in the number of pages read from disk and this helps to show the benefit of having the covering index in place. Once the  data pages are read from disk, they enter the data cache. All queries get their results by reading the data cache.

The second query reads less data pages into the data cache and subsequently, there is more room in the data cache. More room means more pages can be read into cache and the frequency for disk reads becomes less with performance increasing.

Should you apply a covering index in all cases?

Indexing is a complex subject and there is no definitive “one size fits all” solution to indexing a database. You have to know what is possible and then assess your needs. You should always test your index changes properly before putting them into production.

An analysis of server workload will help you figure this out.  A few things to consider:

  1. Is existing query performance acceptable?
  2. How often is the query running?
  3. How selective is the query?
  4. Is the table you are adding the index to write intensive?

Remember, a query may perform well and return results in good time. What if it is running many hundreds or thousands of times a minute? Cumulatively without a covering index, it could be generating many more extra IO’s than necessary.

You need to assess whether the frequency and selectivity of the query warrants adding a covering index. Covering indexes have the greatest benefit to non-selective queries.

In addition, if your system is write intensive, adding an index could have the opposite affect of what you are trying to do by slowing down overall database performance. This is because for each update to the data, the indexes have to be updated as well.

For more information on SQL Server index creation, you can visit this link which talks about the covering index and included columns.

 

 

Filed Under: All Articles, SQL Server Performance Tagged With: mysql, performance, sql server 2000, sql server 2005, sql server 2008, sql server 2012

GROUPING SETS performance versus UNION performance

Posted on January 2, 2012 Written by Andy Hayes Leave a Comment

GROUPING SETS is a way of aggregating data to produce a result set containing different sets of columns.

For the SQL Server platform, it was first introduced in SQL 2008. Prior to SQL Server 2008, achieving this type of result set was possible using UNION, WITH CUBE, WITH ROLLUP and a helper function called GROUPING to achieve similar results.

The issues with WITH CUBE and WITH ROLLUP were that they were both non standard and not flexible enough. GROUPING SETS is ISO standard, simplifies writing these queries and performs better in comparison to UNION, also an ISO standard.

To help show this, I will use the table “Staff” which I created in a previous post on views and modify the table to include a region identifier relating to a new table called “Region”. I will also add data to the Staff table as well as modifying some of the salary values.

Modification of the view “vwStaff” which I set up in that post is required to include some new columns. The data in the view now looks like this and I will be wanting to get some aggregations based on average salary from this data.

SELECT * FROM vwStaff
StaffID     FirstName  LastName   Department Region     Salary
----------- ---------- ---------- ---------- ---------- -----------
1           John       Smith      Sales      South East 45000
2           Raymond    Jones      Finance    South East 40000
3           Tracey     Carter     Technology South East 42000
4           Andy       Milward    Sales      South West 41000
5           Joanne     Baldock    Finance    South West 37000
6           Simon      Jackson    Technology South West 40000
7           Greg       Benson     Sales      Midlands   47500
8           Jack       Andrews    Finance    Midlands   39000
9           James      Michaels   Technology Midlands   38000
10          Mark       Taylor     Sales      North West 39000
11          Frances    Keagan     Finance    North West 42000
12          Jane       Coleson    Technology North West 35000
13          Jennifer   Seymour    Sales      North East 35500
14          Rebecca    Morgan     Finance    North East 38000
15          Alex       Parker     Technology North East 38500

(15 row(s) affected)

To aggregate the data using GROUPING SETS, I have built this query below and you can see from the list that there are multiple result sets in the one list as defined in the GROUPING SETS part of the query. The “()” produces an aggregate for the entire result set based on the Salary column.

SELECT
Department
, Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
GROUPING SETS
(
, (Department,Region)
, (Department)
, (Region)
()
)

Which returns the following results

Department      Region          AverageSalary
--------------- --------------- -----------
Finance         Midlands        39000
Sales           Midlands        47500
Technology      Midlands        38000
NULL            Midlands        41500
Finance         North East      38000
Sales           North East      35500
Technology      North East      38500
NULL            North East      37333
Finance         North West      42000
Sales           North West      39000
Technology      North West      35000
NULL            North West      38666
Finance         South East      40000
Sales           South East      45000
Technology      South East      42000
NULL            South East      42333
Finance         South West      37000
Sales           South West      41000
Technology      South West      40000
NULL            South West      39333
NULL            NULL            39833
Finance         NULL            39200
Sales           NULL            41600
Technology      NULL            38700

(24 row(s) affected)

Not to be confused with WITH CUBE, the same query could be written using GROUP BY CUBE (also ISO compliant)

SELECT
Department
, Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
CUBE
(
Region
, Department
)

GROUP BY CUBE works by accepting a list of elements and then defining all possible GROUPING SETS from that supplied list. Personally I use GROUPING SETS but it is a matter of preference.

Before SQL 2008, this query would have been written using UNION like this

SELECT
Department
, Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
Department
, Region
UNION
SELECT
Department
, NULL AS Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
Department
UNION
SELECT
NULL AS Department
, Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
Region
UNION
SELECT
NULL AS Department
, NULL AS Region
, AVG(Salary) AS AverageSalary
FROM vwStaff

In regard to how both queries perform and without even looking at any performance statistics or execution plans, the obvious thing to note about the query using UNION is that there multiple calls to the base data as opposed to one call in the GROUPING SETS query. So you could estimate that there would be a 75{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} improvement in IO performance when using GROUPING SETS but the reality is not quite that exact.

To get some idea of the differences between the two, there are a couple of useful ways to draw comparisons

SET STATISTICS IO ON – tells SQL Server to display the amount of disk activity generated which produced these results

---GROUPING SETS
Table 'Department'. Scan count 0, logical reads 60, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Region'. Scan count 0, logical reads 60, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Staff'. Scan count 2, logical reads 4, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

---UNION
Table 'Department'. Scan count 0, logical reads 60, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0
Table 'Region'. Scan count 0, logical reads 120, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Staff'. Scan count 4, logical reads 8, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

It is worth nothing that the number of scans (index or table scans) is greater in the UNION statement with 50{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} more scans required than the GROUPING SETS operation. 50{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} less scans? That’s a nice performance gain.

Click the following links to see that the execution plan of the UNION query is considerably larger than the GROUPING SETS plan.

GROUPING SETS versus UNION conclusion

I can’t think of a reason not to modify any existing UNION code across to GROUPING SETS. It will simplify your code and perform faster. For large datasets, considerable performance benefits could be seen.

Filed Under: All Articles, SQL Server Performance Tagged With: performance, sql server 2008, t-sql

Categories

  • All Articles (82)
  • Career Development (8)
  • MySQL Administration (18)
  • MySQL Performance (2)
  • SQL Server Administration (24)
  • SQL Server News (3)
  • SQL Server Performance (14)
  • SQL Server Security (3)
  • SQL Tips and Tricks (19)

Top 10 Popular Posts

  • Using sp_change_users_login to fix SQL Server orphaned users
  • How to shrink tempdb
  • MySQL SHOW USERS? – How to List All MySQL Users and Privileges
  • How to Transfer Logins to Another SQL Server or Instance
  • How to Delete Millions of Rows using T-SQL with Reduced Impact
  • T-SQL – How to Select Top N Rows for Each Group Using ROW_NUMBER()
  • New T-SQL features in SQL Server 2012 – OFFSET and FETCH
  • How to Kill All MySQL Processes For a Specific User
  • Using exec sp_who2 to help with SQL Server troubleshooting
  • How to move tempdb

Recent Posts

  • How to Setup MySQL Master Master Replication
  • How To Use SQL to Convert a STRING to an INT
  • How to set up MySQL Replication Tutorial
  • How to Use SQL CASE for Conditional Logic in Your SQL Queries
  • Using ISNULL in SQL Server to Replace NULL Values

Search

Connect

  • Twitter
  • Facebook
  • Google+
  • RSS

About

  • Cookie Policy
  • Disclaimer
  • About
Copyright © ‘2021’ DBA Diaries built on the Genesis Framework

This site uses cookies. We assume you are happy with cookies but click the link if you are not. Close