In this post, I talk about deleting millions of rows in SQL Server whilst keeping impact low.
Deleting millions of rows in one transaction can throttle a SQL Server
TRUNCATE TABLE – We will presume that in this example TRUNCATE TABLE is not available due to permissions, that foreign keys prevent this operation from being executed or that this operation is unsuitable for purpose because we don’t want to remove all rows.
When you run something like the following to remove all rows from your table in a single transaction,
[sourcecode language=’sql’]DELETE FROM ExampleTable[/sourcecode]
SQL Server sets about the process of writing to the transaction log all of the changes to be applied to the physical data. It will also decide on how it lock the data. It’s highly likely that the optimizer will decide that a complete table lock will be the most efficient way to handle the transaction.
There are potentially some big problems here,
- Your transaction log may grow to accommodate the changes being written to it. If your table is huge, you run the risk of consuming all the space on your transaction log disk.
- If your application(s) still requires access to the table and a table lock has been placed on it, your application has to wait until the table becomes available. This could be some time resulting in application time outs and frustrated users.
- Your transaction log disk will be working hard during this period as your transaction log grows. This could be decreasing performance across all databases which might be sharing that disk for their transaction logs.
- Depending on how much memory you have allocated to your SQL Server buffer pool, there could be significant drops in page life expectancy, reducing performance for other queries.
- The realisation that a big performance issue is occurring lends temptation to kill the query. The trouble with that is it can delay things even more as the server has to rollback the transaction.ย Depending on how far along the operation is, this could add on even more time to what was originally going to be.
For example, if you kill the query and it is 90{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} done then the server has to rollback a 90{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} completed transaction. This will vary but the rollback can take as much time as the delete operation was in progress!ย (check using KILL n WITH STATUSONLY)
Some ways to delete millions of rows using T-SQL loops and TOP
Use a loop combined with TOP and delete rows in smaller transactions. Here are a couple of variations of the same thing. Note that I have arbitrarily chosen 1000 as a figure for demonstration purposes.
[sourcecode language=’sql’]
SELECT 1
WHILE @@ROWCOUNT > 0
BEGIN
DELETE TOP (1000)
FROM LargeTable
END
[/sourcecode]
And another way…
[sourcecode language=’sql’]
DoItAgain:
DELETE TOP (1000)
FROM ExampleTable
IF @@ROWCOUNT > 0
GOTO DoItAgain
[/sourcecode]
These are simple examples just to demonstrate. You can add WHERE clauses and JOINS to help with the filtering process to remove specifics. You would add error handling/transactions (COMMIT/ROLLBACK) ย also.
Summary
It’s a bad idea to delete millions of rows in one transaction ๐ and whilst this might sound like a no-brainer, people do try and do this and wonder why things start to go bad.
Breaking the delete operation down into smaller transactions is better all round. This will help reduce contention for your table, reduce probability of your transaction log becoming too large for its disk and reduce performance impact in general.
Your transaction log disk will still be working hard as your refined delete routine removes the rows from your table. Try and run this task during maintenance windows which are typically done inside off peak periods.
Virendra Yaduvanshi says
Hi Andy,
Very Nice article… good work.
regards,
Virendra
Andy Hayes says
Hi Virendra
Thanks for your comment, I’m glad you liked the article ๐
All the best
Andy
Justice says
Hi!,
So what happens if I stop this query when it’s 80{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} through the way?
TIRUMALESH says
Very Good Site…Nice articles…Keep it up…
admin says
Glad you like the site Tirumalesh and thank you for the comment.
All the best
Andy
Patrick says
Mate thanks a bunch, that is so much faster I’m very surprised! ๐
Andy Hayes says
Hi Patrick
I’m glad this has helped.
Thanks for your comment.
All the best.
Andy
Milind says
Exactly what I was looking for. Thanks Andy!
Andy Hayes says
I’m glad you found this useful Milind ๐
Adnan Khan says
Thanks a lot Andy..This is exactly what I was looking for.. You saved me…
Once again thank you so much for sharing a nice article…
Andy Hayes says
My pleasure Adnan ๐
Ash Kumar says
Similar to your top rows solution, the following works as well:
set rowcount 50000
do while (1=1)
begin
delete mytable where
if @@rowcount=0
break
end
set rowcount 0 — Reset the rowcount
Johnny says
Nice article. Assuming that disk space for the transaction log growth is not a concern, and assuming that other queries being blocked is not a concern, is a single large operation faster than multiple smaller “group type operations”? I’m not a DBA and I’m trying to get my head around the situation from a theoretical standpoint.
Andy Hayes says
Thanks very much Johnny. I’m glad you liked the article. You have raised an interesting question. Typically SQL Server is far more efficient at dealing with sets of data so in the scenario you refer to where disk space and locking is not an issue, I would expect a single large transaction to be faster. It’s something worthy of a blog post to try and prove this. ๐
Dilon says
Hi Andy
Great article well explained. But can you explain why is it necessary for the SELECT 1 at the beginning of the code.
Thanks
Dilon
Andy Hayes says
Hi Dilon
SELECT 1 returns 1 row and it is to give the next line a starting position in the loop setting @@ROWCOUNT to be 1.
Adam Feather says
I’ve just forwarded this article around my dept. as I suspect we suffered at the hands of a large delete the other day. Hope you’re well!
Andy Hayes says
Hello sir! I’m very good thanks. Hope all is well with you. Thanks for sharing my post ๐
Gavin Stevens says
I started this route. 94,000,000 rows in a logging table.. The data file is 95 GB in size.. I’m nearing capacity on a 128GB drive. Your method is better than a single delete command, but will still take forever to run. I found it more effective to simply drop and re-create the logging table and then shrink the database, filesize went from 95GB to 56 MB!
Duncan says
Hi Andy,
I learn a lot from this site, keep writing:-)
Imran Khan says
Hi Andy,
Thanks for your great article. Its help me to delete 30 lacs records in a one loop query.
Well done and lots of tons thanks.
Michal says
Hi,
In case its not splitted by begin tran and commit tran, its still one transaction (with more steps)
in case there will be any “order by” clause, different with cluster index, can be a huge problem
isnt beter to use truncate table instead ?
Andy Hayes says
Hi Michal
Thanks for your question. TRUNCATE TABLE is great when you have permission to use it, if you want to clear the whole table (not a subset of rows) and when the table is not referenced by foreign key constraints.
zaib tabs says
Hi,
Its quite interest, but in my case i have to display records in report which are around 50,000. is there a way i can use a loop or for loop or cte to reduce the load on my cpu.
currently my memory and cpu is maxing out, when user fetch the data
select * from items.
can i use
DoItAgain:
Select * from items TOP (1000)
FROM ExampleTable
IF @@ROWCOUNT > 1000
GOTO DoItAgain
John says
Thanks for the article.
If you’ve got a script where you’re doing several of this, you can drop the begin/end for a single-statement WHILE:
WHILE @@ROWCOUNT > 0
BEGIN
DELETE TOP (1000)
FROM LargeTable
END
becomes
WHILE @@ROWCOUNT > 0 DELETE TOP (1000) FROM LargeTable
I hope my next database restore to development environment goes much faster when I’m purging huge tables.
Pavan says
Hi Andy,
Just to know, can we by changing the recovery model to bulk logged and peform the delete operation in single transaction during a maintenance time.