The following tips do not promise that your database server will magically become blisteringly fast and go like a Ferrari but they are important things to bear in mind and relatively simple to implement either when building your database or if you have an existing performance problem and are looking for ways in which to make your database run faster.
1/ SELECT * FROM
The rest of my points are in no particular order but I must place this one at the top of the list as it really annoys me. I’ve seen this many times where developers take the lazy approach to writing their queries by including all columns for the table, or in the case of there being joins involved, then this means every column for every table involved in the query. It is far better to limit the query to only bring back the columns required reducing I/O, memory pressure and on bandwidth between the database and the application layers.
2/ Incorrect sizing of data types – less is more
Database design is sometimes a neglected aspect of building an application. The need to rapidly build a database to store the data and get coding results in tables containing data types which are incorrectly sized.
So lets look at a simple example of what I mean using MS SQL Server.
The system we are designing for allows advertisements to be placed in one county in England per advert. Whilst this does not sound like a terribly flexible system, it serves the purpose in trying to demonstrate the issue with badly sized data types 🙂
So here is the create table script. (I have omitted the Customer create table script which would theoretically exist here to ensure that the relationship on Advert.CustomerID could be created).
CREATE TABLE County ( CountyID INT PRIMARY KEY IDENTITY(1,1), County VARCHAR(20) ); CREATE TABLE Advert ( AdvertID INT PRIMARY KEY IDENTITY(1,1), CustomerID INT NOT NULL REFERENCES Customer(CustomerID), DatePlaced DATETIME DEFAULT GETDATE() NOT NULL, CountyID INT NOT NULL REFERENCES County(CountyID), AdvertText VARCHAR(MAX) NOT NULL, );
So, two tables and where in this design are there issues?
How many counties are there in the England? Well certainly not this many: 2,147,483,647 which is the limit of the data type above 0. Yes you can store negatives also up to -2,147,483,648.
So what, they’re just numbers right? What’s the problem? The problem is the storage needed. The INT data type will require 4bytes of storage. The number of English counties is less than 255. Is it ever going to get larger than 255? I can’t see that happening any time soon so better to design the County table to be CountyID TINYINT and the related field in Advert to also be a TINYINT. This data type uses 1byte of storage. That’s a 75% reduction and if you had an extremely large Advert table it could all start to add up.
The other field in question is the DatePlaced. Always question whether you need the increased size of the types you are setting. In my example, I don’t think I am too fussy about the extra precision which the DATETIME data type gives me, so I should have designed that as a SMALLDATETIME.
But storage is cheap right? Yes but this is not the point. It’s all about I/O and memory pressure. The smaller the data types are, the less pages have to be read from disk into the memory buffers. If the memory buffers are not constantly being emptied to make way for other data, then your database server and application will be a lot happier as there will be less read requests to disk.
3/ Adding and Removing Indexes
Identify queries which are performing badly. Can you add indexes which would make them faster? SQL Server 2008 Management Studio will even recommend which indexes to add when you view the execution plan. Identify where you need to add clustered indexes and put them in.
Identify unused indexes and remove them. For each record inserted to a table, every index is updated. If you do not need an index, get rid of it. It will help speed up inserts.
4/ Use stored procedures instead of in-line SQL
There are many advantages to using stored procedures. From a performance perspective, having them in place means that each time the procedure is executed, the execution plan is pulled out of cache. As the server does not need to work out the execution plan, the process from calling the procedure to it completing its execution is typically faster.
5/ Set up maintenance plans to rebuild indexes and statistics
All indexes will suffer from fragmentation and statistics will get out of date. Set up maintenance plans in SQL Server or batch tasks in MySQL which will periodically rebuild statistics and indexes to help the optimizer choose the best execution plan.
6/ Replace use of functions in WHERE clauses with SARG’s
It’s all about the search arguments (SARG’s)
Here’s an example. Two queries, the first uses a LIKE, the second uses the SUBSTRING function. There is an index on the LastName column in my table and the first query reports using an index seek which should normally be faster compared with the second which reports using a clustered index scan.
SELECT FirstName, LastName FROM Employee WHERE LastName LIKE 'Hay%'; SELECT FirstName, LastName FROM Employee WHERE LastName SUBSTRING(LastName,1,3) = 'Hay';
Note that use of LIKE will not always reference an index for example this query would not use an index
WHERE LastName LIKE '%es'
Other examples of SARG’s:
LastName = 'Hayes' Amount > 0 Amount BETWEEN 10 AND 20
I’ve also seen UDF’s in place in WHERE clauses and unsuprisingly, performance is poor.
Get the query using a SARG and there is a higher chance that the optimizer will use an index.
7/ Ensure that foreign keys in are in place
Please refer to my post on the importance of foreign keys and how they can help improve query performance.
8/ Increasing chances of using the query cache (MySQL)
Minor changes to queries to make them deterministic can increase query cache hit rate. For example if a query uses CURDATE(), then the results will not be cached.
9/ Deal in sets not cursors
Try and build SQL queries which deal with sets of data. Leave any row by row processing to the application layer.
10/ Allocate more memory to buffers
You need a solution quickly and you have identified your server is under memory pressure. You have spare RAM available and so it is an easy task to allocate more memory to MS SQL or MySQL database memory buffers. Personally, I will always try and carry out a more exhaustive analysis of the server but if time does not permit that then allocate the memory to improve performance and analyse a typical workload on a staging server to determine where other performance improvements can be made.
It is important to ensure that these issues are not brushed under the carpet after increasing hardware resource. Think about it for a moment. Your application has been slowing down and user complaints have increased. What’s an obvious cause of that? – Growth.
So in order to effectively deal with growth, from a DBA perspective, the database server needs to be monitored for performance problems with issues reported to the relevant technical staff at the earliest opportunity. Adding hardware will always help improve performance but the application will never scale very well unless the issues “underneath the bonnet” in the database layer are effectively dealt with.