DBA Diaries

Thoughts and experiences of a DBA working with SQL Server and MySQL

10 simple tips on improving query and server performance

Posted on February 18, 2012 Written by Andy Hayes 3 Comments

improving database performance

The following tips do not promise that your database server will magically become blisteringly fast and go like a Ferrari but they are important things to bear in mind and relatively simple to implement either when building your database or if you have an existing performance problem and are looking for ways in which to make your database run faster.

1/ SELECT * FROM

The rest of my points are in no particular order but I must place this one at the top of the list as it really annoys me. I’ve seen this many times where developers take the lazy approach to writing their queries by including all columns for the table, or in the case of there being joins involved, then this means every column for every table involved in the query. It is far better to limit the query to only bring back the columns required reducing I/O, memory pressure and on bandwidth between the database and the application layers.

2/ Incorrect sizing of data types – less is more

Database design is sometimes a neglected aspect of building an application. The need to rapidly build a database to store the data and get coding results in tables containing data types which are incorrectly sized.

So lets look at a simple example of what I mean using MS SQL Server.

The system we are designing for allows advertisements to be placed in one county in England per advert. Whilst this does not sound like a terribly flexible system, it serves the purpose in trying to demonstrate the issue with badly sized data types 🙂

So here is the create table script. (I have omitted the Customer create table script which would theoretically exist here to ensure that the relationship on Advert.CustomerID could be created).

CREATE TABLE County
(
CountyID INT PRIMARY KEY IDENTITY(1,1),
County VARCHAR(20)
);

CREATE TABLE Advert
(
AdvertID INT PRIMARY KEY IDENTITY(1,1),
CustomerID INT NOT NULL REFERENCES Customer(CustomerID),
DatePlaced DATETIME DEFAULT GETDATE() NOT NULL,
CountyID INT NOT NULL REFERENCES County(CountyID),
AdvertText VARCHAR(MAX) NOT NULL,
);

So, two tables and where in this design are there issues?

How many counties are there in the England? Well certainly not this many: 2,147,483,647 which is the limit of the data type above 0. Yes you can store negatives also up to -2,147,483,648.

So what, they’re just numbers right? What’s the problem? The problem is the storage needed. The INT data type will require 4bytes of storage. The number of English counties is less than 255. Is it ever going to get larger than 255? I can’t see that happening any time soon so better to design the County table to be CountyID TINYINT and the related field in Advert to also be a TINYINT. This data type uses 1byte of storage. That’s a 75{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} reduction and if you had an extremely large Advert table it could all start to add up.

The other field in question is the DatePlaced. Always question whether you need the increased size of the types you are setting. In my example, I don’t think I am too fussy about the extra precision which the DATETIME data type gives me, so I should have designed that as a SMALLDATETIME.

But storage is cheap right? Yes but this is not the point. It’s all about I/O and memory pressure. The smaller the data types are, the less pages have to be read from disk into the memory buffers. If the memory buffers are not constantly being emptied to make way for other data, then your database server and application will be a lot happier as there will be less read requests to disk.

3/ Adding and Removing Indexes

Identify queries which are performing badly. Can you add indexes which would make them faster? SQL Server 2008 Management Studio will even recommend which indexes to add when you view the execution plan. Identify where you need to add clustered indexes and put them in.

Identify unused indexes and remove them. For each record inserted to a table, every index is updated. If you do not need an index, get rid of it. It will help speed up inserts.

4/ Use stored procedures instead of in-line SQL

There are many advantages to using stored procedures. From a performance perspective, having them in place means that each time the procedure is executed, the execution plan is pulled out of cache. As the server does not need to work out the execution plan, the process from calling the procedure to it completing its execution is typically faster.

5/ Set up maintenance plans to rebuild indexes and statistics

All indexes will suffer from fragmentation and statistics will get out of date. Set up maintenance plans in SQL Server or batch tasks in MySQL which will periodically rebuild statistics and indexes to help the optimizer choose the best execution plan.

6/ Replace use of functions in WHERE clauses with SARG’s

It’s all about the search arguments (SARG’s)

Here’s an example. Two queries, the first uses a LIKE, the second uses the SUBSTRING function. There is an index on the LastName column in my table and the first query reports using an index seek which should normally be faster compared with the second which reports using a clustered index scan.

SELECT FirstName, LastName
FROM Employee
WHERE LastName LIKE 'Hay{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}';

SELECT FirstName, LastName
FROM Employee
WHERE LastName SUBSTRING(LastName,1,3) = 'Hay';

Note that use of LIKE will not always reference an index for example this query would not use an index

WHERE LastName LIKE '{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}es'

Other examples of SARG’s:

LastName = 'Hayes'
Amount > 0
Amount BETWEEN 10 AND 20

I’ve also seen UDF’s in place in WHERE clauses and unsuprisingly, performance is poor.

Get the query using a SARG and there is a higher chance that the optimizer will use an index.

7/ Ensure that foreign keys in are in place

Please refer to my post on the importance of foreign keys and how they can help improve query performance.

8/ Increasing chances of using the query cache (MySQL)

Minor changes to queries to make them deterministic can increase query cache hit rate. For example if a query uses CURDATE(), then the results will not be cached.

9/ Deal in sets not cursors

Try and build SQL queries which deal with sets of data. Leave any row by row processing to the application layer.

10/ Allocate more memory to buffers

You need a solution quickly and you have identified your server is under memory pressure.  You have spare RAM available and so it is an easy task to allocate more memory to MS SQL or MySQL database memory buffers. Personally, I will always try and carry out a more exhaustive analysis of the server but if time does not permit that then allocate the memory to improve performance and analyse a typical workload on a staging server to determine where other performance improvements can be made.

It is important to ensure that these issues are not brushed under the carpet after increasing hardware resource. Think about it for a moment. Your application has been slowing down and user complaints have increased. What’s an obvious cause of that? – Growth.

So in order to effectively deal with growth, from a DBA perspective, the database server needs to be monitored for performance problems with issues reported to the relevant technical staff at the earliest opportunity. Adding hardware will always help improve performance but the application will never scale very well unless the issues “underneath the bonnet” in the database layer are effectively dealt with.

Filed Under: All Articles, MySQL Performance, SQL Server Performance Tagged With: mysql, performance, sql server, t-sql

GROUPING SETS performance versus UNION performance

Posted on January 2, 2012 Written by Andy Hayes Leave a Comment

GROUPING SETS is a way of aggregating data to produce a result set containing different sets of columns.

For the SQL Server platform, it was first introduced in SQL 2008. Prior to SQL Server 2008, achieving this type of result set was possible using UNION, WITH CUBE, WITH ROLLUP and a helper function called GROUPING to achieve similar results.

The issues with WITH CUBE and WITH ROLLUP were that they were both non standard and not flexible enough. GROUPING SETS is ISO standard, simplifies writing these queries and performs better in comparison to UNION, also an ISO standard.

To help show this, I will use the table “Staff” which I created in a previous post on views and modify the table to include a region identifier relating to a new table called “Region”. I will also add data to the Staff table as well as modifying some of the salary values.

Modification of the view “vwStaff” which I set up in that post is required to include some new columns. The data in the view now looks like this and I will be wanting to get some aggregations based on average salary from this data.

SELECT * FROM vwStaff
StaffID     FirstName  LastName   Department Region     Salary
----------- ---------- ---------- ---------- ---------- -----------
1           John       Smith      Sales      South East 45000
2           Raymond    Jones      Finance    South East 40000
3           Tracey     Carter     Technology South East 42000
4           Andy       Milward    Sales      South West 41000
5           Joanne     Baldock    Finance    South West 37000
6           Simon      Jackson    Technology South West 40000
7           Greg       Benson     Sales      Midlands   47500
8           Jack       Andrews    Finance    Midlands   39000
9           James      Michaels   Technology Midlands   38000
10          Mark       Taylor     Sales      North West 39000
11          Frances    Keagan     Finance    North West 42000
12          Jane       Coleson    Technology North West 35000
13          Jennifer   Seymour    Sales      North East 35500
14          Rebecca    Morgan     Finance    North East 38000
15          Alex       Parker     Technology North East 38500

(15 row(s) affected)

To aggregate the data using GROUPING SETS, I have built this query below and you can see from the list that there are multiple result sets in the one list as defined in the GROUPING SETS part of the query. The “()” produces an aggregate for the entire result set based on the Salary column.

SELECT
Department
, Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
GROUPING SETS
(
, (Department,Region)
, (Department)
, (Region)
()
)

Which returns the following results

Department      Region          AverageSalary
--------------- --------------- -----------
Finance         Midlands        39000
Sales           Midlands        47500
Technology      Midlands        38000
NULL            Midlands        41500
Finance         North East      38000
Sales           North East      35500
Technology      North East      38500
NULL            North East      37333
Finance         North West      42000
Sales           North West      39000
Technology      North West      35000
NULL            North West      38666
Finance         South East      40000
Sales           South East      45000
Technology      South East      42000
NULL            South East      42333
Finance         South West      37000
Sales           South West      41000
Technology      South West      40000
NULL            South West      39333
NULL            NULL            39833
Finance         NULL            39200
Sales           NULL            41600
Technology      NULL            38700

(24 row(s) affected)

Not to be confused with WITH CUBE, the same query could be written using GROUP BY CUBE (also ISO compliant)

SELECT
Department
, Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
CUBE
(
Region
, Department
)

GROUP BY CUBE works by accepting a list of elements and then defining all possible GROUPING SETS from that supplied list. Personally I use GROUPING SETS but it is a matter of preference.

Before SQL 2008, this query would have been written using UNION like this

SELECT
Department
, Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
Department
, Region
UNION
SELECT
Department
, NULL AS Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
Department
UNION
SELECT
NULL AS Department
, Region
, AVG(Salary) AS AverageSalary
FROM vwStaff
GROUP BY
Region
UNION
SELECT
NULL AS Department
, NULL AS Region
, AVG(Salary) AS AverageSalary
FROM vwStaff

In regard to how both queries perform and without even looking at any performance statistics or execution plans, the obvious thing to note about the query using UNION is that there multiple calls to the base data as opposed to one call in the GROUPING SETS query. So you could estimate that there would be a 75{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} improvement in IO performance when using GROUPING SETS but the reality is not quite that exact.

To get some idea of the differences between the two, there are a couple of useful ways to draw comparisons

SET STATISTICS IO ON – tells SQL Server to display the amount of disk activity generated which produced these results

---GROUPING SETS
Table 'Department'. Scan count 0, logical reads 60, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Region'. Scan count 0, logical reads 60, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Staff'. Scan count 2, logical reads 4, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

---UNION
Table 'Department'. Scan count 0, logical reads 60, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0
Table 'Region'. Scan count 0, logical reads 120, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Staff'. Scan count 4, logical reads 8, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

It is worth nothing that the number of scans (index or table scans) is greater in the UNION statement with 50{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} more scans required than the GROUPING SETS operation. 50{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} less scans? That’s a nice performance gain.

Click the following links to see that the execution plan of the UNION query is considerably larger than the GROUPING SETS plan.

GROUPING SETS versus UNION conclusion

I can’t think of a reason not to modify any existing UNION code across to GROUPING SETS. It will simplify your code and perform faster. For large datasets, considerable performance benefits could be seen.

Filed Under: All Articles, SQL Server Performance Tagged With: performance, sql server 2008, t-sql

  • « Previous Page
  • 1
  • 2
  • 3
  • 4

Categories

  • All Articles (82)
  • Career Development (8)
  • MySQL Administration (18)
  • MySQL Performance (2)
  • SQL Server Administration (24)
  • SQL Server News (3)
  • SQL Server Performance (14)
  • SQL Server Security (3)
  • SQL Tips and Tricks (19)

Top 10 Popular Posts

  • Using sp_change_users_login to fix SQL Server orphaned users
  • How to shrink tempdb
  • MySQL SHOW USERS? – How to List All MySQL Users and Privileges
  • How to Transfer Logins to Another SQL Server or Instance
  • How to Delete Millions of Rows using T-SQL with Reduced Impact
  • T-SQL – How to Select Top N Rows for Each Group Using ROW_NUMBER()
  • New T-SQL features in SQL Server 2012 – OFFSET and FETCH
  • How to Kill All MySQL Processes For a Specific User
  • Using exec sp_who2 to help with SQL Server troubleshooting
  • How to move tempdb

Recent Posts

  • How to Setup MySQL Master Master Replication
  • How To Use SQL to Convert a STRING to an INT
  • How to set up MySQL Replication Tutorial
  • How to Use SQL CASE for Conditional Logic in Your SQL Queries
  • Using ISNULL in SQL Server to Replace NULL Values

Search

Connect

  • Twitter
  • Facebook
  • Google+
  • RSS

About

  • Cookie Policy
  • Disclaimer
  • About
Copyright © ‘2021’ DBA Diaries built on the Genesis Framework

This site uses cookies. We assume you are happy with cookies but click the link if you are not. Close