• Skip to main content
  • Skip to primary sidebar

DBA Diaries

Thoughts and experiences of a DBA working with SQL Server and MySQL

GROUPING SETS performance versus UNION performance

January 2, 2012 by Andy Hayes 1 Comment

GROUPING SETS is a way of aggregating data to produce a result set containing different sets of columns.

For the SQL Server platform, it was first introduced in SQL 2008. Prior to SQL Server 2008, achieving this type of result set was possible using UNION, WITH CUBE, WITH ROLLUP and a helper function called GROUPING to achieve similar results.

The issues with WITH CUBE and WITH ROLLUP were that they were both non standard and not flexible enough. GROUPING SETS is ISO standard, simplifies writing these queries and performs better in comparison to UNION, also an ISO standard.

To help show this, I will use the table “Staff” which I created in a previous post on views and modify the table to include a region identifier relating to a new table called “Region”. I will also add data to the Staff table as well as modifying some of the salary values.

Modification of the view “vwStaff” which I set up in that post is required to include some new columns. The data in the view now looks like this and I will be wanting to get some aggregations based on average salary from this data.

SELECT * FROM vwStaff

 

StaffID     FirstName  LastName   Department Region     Salary
----------- ---------- ---------- ---------- ---------- -----------
1           John       Smith      Sales      South East 45000
2           Raymond    Jones      Finance    South East 40000
3           Tracey     Carter     Technology South East 42000
4           Andy       Milward    Sales      South West 41000
5           Joanne     Baldock    Finance    South West 37000
6           Simon      Jackson    Technology South West 40000
7           Greg       Benson     Sales      Midlands   47500
8           Jack       Andrews    Finance    Midlands   39000
9           James      Michaels   Technology Midlands   38000
10          Mark       Taylor     Sales      North West 39000
11          Frances    Keagan     Finance    North West 42000
12          Jane       Coleson    Technology North West 35000
13          Jennifer   Seymour    Sales      North East 35500
14          Rebecca    Morgan     Finance    North East 38000
15          Alex       Parker     Technology North East 38500

(15 row(s) affected)

To aggregate the data using GROUPING SETS, I have built this query below and you can see from the list that there are multiple result sets in the one list as defined in the GROUPING SETS part of the query. The “()” produces an aggregate for the entire result set based on the Salary column.

SELECT
Department
, Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
GROUPING SETS
(
, (Department,Region)
, (Department)
, (Region)
()
)

Which returns the following results

Department      Region          AverageSalary
--------------- --------------- -----------
Finance         Midlands        39000
Sales           Midlands        47500
Technology      Midlands        38000
NULL            Midlands        41500
Finance         North East      38000
Sales           North East      35500
Technology      North East      38500
NULL            North East      37333
Finance         North West      42000
Sales           North West      39000
Technology      North West      35000
NULL            North West      38666
Finance         South East      40000
Sales           South East      45000
Technology      South East      42000
NULL            South East      42333
Finance         South West      37000
Sales           South West      41000
Technology      South West      40000
NULL            South West      39333
NULL            NULL            39833
Finance         NULL            39200
Sales           NULL            41600
Technology      NULL            38700

(24 row(s) affected)

Not to be confused with WITH CUBE, the same query could be written using GROUP BY CUBE (also ISO compliant)

SELECT
Department
, Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
CUBE
(
Region
, Department
)

GROUP BY CUBE works by accepting a list of elements and then defining all possible GROUPING SETS from that supplied list. Personally I use GROUPING SETS but it is a matter of preference.

Before SQL 2008, this query would have been written using UNION like this

SELECT
Department
, Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
Department
, Region
UNION
SELECT
Department
, NULL AS Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
Department
UNION
SELECT
NULL AS Department
, Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
Region
UNION
SELECT
NULL AS Department
, NULL AS Region
, AVG(Salary) AS AverageSalary
FROM vwStaff

In regard to how both queries perform and without even looking at any performance statistics or execution plans, the obvious thing to note about the query using UNION is that there multiple calls to the base data as opposed to one call in the GROUPING SETS query. So you could estimate that there would be a 75% improvement in IO performance when using GROUPING SETS but the reality is not quite that exact.

To get some idea of the differences between the two, there are a couple of useful ways to draw comparisons

SET STATISTICS IO ON – tells SQL Server to display the amount of disk activity generated which produced these results

---GROUPING SETS
Table 'Department'. Scan count 0, logical reads 60, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Region'. Scan count 0, logical reads 60, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Staff'. Scan count 2, logical reads 4, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

---UNION
Table 'Department'. Scan count 0, logical reads 60, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0
Table 'Region'. Scan count 0, logical reads 120, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Staff'. Scan count 4, logical reads 8, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

It is worth nothing that the number of scans (index or table scans) is greater in the UNION statement with 50% more scans required than the GROUPING SETS operation. 50% less scans? That’s a nice performance gain.

Click the following links to see that the execution plan of the UNION query is considerably larger than the GROUPING SETS plan.

GROUPING SETS versus UNION conclusion

I can’t think of a reason not to modify any existing UNION code across to GROUPING SETS. It will simplify your code and perform faster. For large datasets, considerable performance benefits could be seen.

Related Posts:

  • sql grouping sets
    Using SQL GROUPING SETS for Multiple GROUP BY…
  • How to fix "conversion failed when converting date and/or time from character string"
    How to fix "conversion failed when converting date…

Filed Under: All Articles, SQL Server Performance Tagged With: performance, sql server 2008, t-sql

About Andy Hayes

Andy Hayes is a DBA working with SQL Server since version 7.0. He has a wonderful wife and two beautiful children. He loves database technology, playing cricket, and blogging. He is passionate about sharing his experiences as a DBA and learning more to further his understanding and knowledge. You can follow me on Twitter, check out my Facebook page or follow me on Google+

Reader Interactions

Comments

  1. Filipe says

    October 14, 2021 at 9:01 pm

    Very nice, congrats!!

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Categories

  • All Articles (84)
  • Career Development (8)
  • MySQL Administration (18)
  • MySQL Performance (2)
  • SQL Server Administration (24)
  • SQL Server News (3)
  • SQL Server Performance (14)
  • SQL Server Security (3)
  • SQL Tips and Tricks (21)

Top 10 Popular Posts

  • Using sp_change_users_login to fix SQL Server orphaned users
  • MySQL SHOW USERS? – How to List All MySQL Users and Privileges
  • How to shrink tempdb
  • How to Transfer Logins to Another SQL Server or Instance
  • How to Delete Millions of Rows using T-SQL with Reduced Impact
  • T-SQL – How to Select Top N Rows for Each Group Using ROW_NUMBER()
  • New T-SQL features in SQL Server 2012 – OFFSET and FETCH
  • How to Kill All MySQL Processes For a Specific User
  • Using exec sp_who2 to help with SQL Server troubleshooting
  • How to fix “conversion failed when converting date and/or time from character string”

Recent Posts

  • How to fix “conversion failed when converting date and/or time from character string”
  • Using SQL GROUPING SETS for Multiple GROUP BY Queries in a Single Query
  • How to Setup MySQL Master Master Replication
  • How To Use SQL to Convert a STRING to an INT
  • How to set up MySQL Replication Tutorial

Search

Connect

  • Twitter
  • Facebook
  • RSS

About

  • Cookie Policy
  • Disclaimer
  • About
Copyright ©