In this post, I talk about deleting millions of rows in SQL Server whilst keeping impact low.
Deleting millions of rows in one transaction can throttle a SQL Server
TRUNCATE TABLE – We will presume that in this example TRUNCATE TABLE is not available due to permissions, that foreign keys prevent this operation from being executed or that this operation is unsuitable for purpose because we don’t want to remove all rows.
When you run something like the following to remove all rows from your table in a single transaction,
[sourcecode language=’sql’]DELETE FROM ExampleTable[/sourcecode]
SQL Server sets about the process of writing to the transaction log all of the changes to be applied to the physical data. It will also decide on how it lock the data. It’s highly likely that the optimizer will decide that a complete table lock will be the most efficient way to handle the transaction.
There are potentially some big problems here,
- Your transaction log may grow to accommodate the changes being written to it. If your table is huge, you run the risk of consuming all the space on your transaction log disk.
- If your application(s) still requires access to the table and a table lock has been placed on it, your application has to wait until the table becomes available. This could be some time resulting in application time outs and frustrated users.
- Your transaction log disk will be working hard during this period as your transaction log grows. This could be decreasing performance across all databases which might be sharing that disk for their transaction logs.
- Depending on how much memory you have allocated to your SQL Server buffer pool, there could be significant drops in page life expectancy, reducing performance for other queries.
- The realisation that a big performance issue is occurring lends temptation to kill the query. The trouble with that is it can delay things even more as the server has to rollback the transaction.ย Depending on how far along the operation is, this could add on even more time to what was originally going to be.
For example, if you kill the query and it is 90{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} done then the server has to rollback a 90{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} completed transaction. This will vary but the rollback can take as much time as the delete operation was in progress!ย (check using KILL n WITH STATUSONLY)
Some ways to delete millions of rows using T-SQL loops and TOP
Use a loop combined with TOP and delete rows in smaller transactions. Here are a couple of variations of the same thing. Note that I have arbitrarily chosen 1000 as a figure for demonstration purposes.
[sourcecode language=’sql’]
SELECT 1
WHILE @@ROWCOUNT > 0
BEGIN
DELETE TOP (1000)
FROM LargeTable
END
[/sourcecode]
And another way…
[sourcecode language=’sql’]
DoItAgain:
DELETE TOP (1000)
FROM ExampleTable
IF @@ROWCOUNT > 0
GOTO DoItAgain
[/sourcecode]
These are simple examples just to demonstrate. You can add WHERE clauses and JOINS to help with the filtering process to remove specifics. You would add error handling/transactions (COMMIT/ROLLBACK) ย also.
Summary
It’s a bad idea to delete millions of rows in one transaction ๐ and whilst this might sound like a no-brainer, people do try and do this and wonder why things start to go bad.
Breaking the delete operation down into smaller transactions is better all round. This will help reduce contention for your table, reduce probability of your transaction log becoming too large for its disk and reduce performance impact in general.
Your transaction log disk will still be working hard as your refined delete routine removes the rows from your table. Try and run this task during maintenance windows which are typically done inside off peak periods.
Hi Andy,
Very Nice article… good work.
regards,
Virendra
Hi Virendra
Thanks for your comment, I’m glad you liked the article ๐
All the best
Andy
Hi!,
So what happens if I stop this query when it’s 80{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} through the way?
Very Good Site…Nice articles…Keep it up…
Glad you like the site Tirumalesh and thank you for the comment.
All the best
Andy
Mate thanks a bunch, that is so much faster I’m very surprised! ๐
Hi Patrick
I’m glad this has helped.
Thanks for your comment.
All the best.
Andy
Exactly what I was looking for. Thanks Andy!
I’m glad you found this useful Milind ๐
Thanks a lot Andy..This is exactly what I was looking for.. You saved me…
Once again thank you so much for sharing a nice article…
My pleasure Adnan ๐
Similar to your top rows solution, the following works as well:
set rowcount 50000
do while (1=1)
begin
delete mytable where
if @@rowcount=0
break
end
set rowcount 0 — Reset the rowcount
Nice article. Assuming that disk space for the transaction log growth is not a concern, and assuming that other queries being blocked is not a concern, is a single large operation faster than multiple smaller “group type operations”? I’m not a DBA and I’m trying to get my head around the situation from a theoretical standpoint.
Thanks very much Johnny. I’m glad you liked the article. You have raised an interesting question. Typically SQL Server is far more efficient at dealing with sets of data so in the scenario you refer to where disk space and locking is not an issue, I would expect a single large transaction to be faster. It’s something worthy of a blog post to try and prove this. ๐
Hi Andy
Great article well explained. But can you explain why is it necessary for the SELECT 1 at the beginning of the code.
Thanks
Dilon
Hi Dilon
SELECT 1 returns 1 row and it is to give the next line a starting position in the loop setting @@ROWCOUNT to be 1.
I’ve just forwarded this article around my dept. as I suspect we suffered at the hands of a large delete the other day. Hope you’re well!
Hello sir! I’m very good thanks. Hope all is well with you. Thanks for sharing my post ๐
I started this route. 94,000,000 rows in a logging table.. The data file is 95 GB in size.. I’m nearing capacity on a 128GB drive. Your method is better than a single delete command, but will still take forever to run. I found it more effective to simply drop and re-create the logging table and then shrink the database, filesize went from 95GB to 56 MB!
Hi Andy,
I learn a lot from this site, keep writing:-)
Hi Andy,
Thanks for your great article. Its help me to delete 30 lacs records in a one loop query.
Well done and lots of tons thanks.
Hi,
In case its not splitted by begin tran and commit tran, its still one transaction (with more steps)
in case there will be any “order by” clause, different with cluster index, can be a huge problem
isnt beter to use truncate table instead ?
Hi Michal
Thanks for your question. TRUNCATE TABLE is great when you have permission to use it, if you want to clear the whole table (not a subset of rows) and when the table is not referenced by foreign key constraints.
Hi,
Its quite interest, but in my case i have to display records in report which are around 50,000. is there a way i can use a loop or for loop or cte to reduce the load on my cpu.
currently my memory and cpu is maxing out, when user fetch the data
select * from items.
can i use
DoItAgain:
Select * from items TOP (1000)
FROM ExampleTable
IF @@ROWCOUNT > 1000
GOTO DoItAgain
Thanks for the article.
If you’ve got a script where you’re doing several of this, you can drop the begin/end for a single-statement WHILE:
WHILE @@ROWCOUNT > 0
BEGIN
DELETE TOP (1000)
FROM LargeTable
END
becomes
WHILE @@ROWCOUNT > 0 DELETE TOP (1000) FROM LargeTable
I hope my next database restore to development environment goes much faster when I’m purging huge tables.
Hi Andy,
Just to know, can we by changing the recovery model to bulk logged and peform the delete operation in single transaction during a maintenance time.