How to Delete Millions of Rows using T-SQL with Reduced Impact

October 9, 2012 by Andy Hayes 27 Comments

In this post, I talk about deleting millions of rows in SQL Server whilst keeping impact low.

Deleting millions of rows in one transaction can throttle a SQL Server

TRUNCATE TABLE – We will presume that in this example TRUNCATE TABLE is not available due to permissions, that foreign keys prevent this operation from being executed or that this operation is unsuitable for purpose because we don’t want to remove all rows.

When you run something like the following to remove all rows from your table in a single transaction,

[sourcecode language=’sql’]DELETE FROM ExampleTable[/sourcecode]

SQL Server sets about the process of writing to the transaction log all of the changes to be applied to the physical data. It will also decide on how it lock the data. It’s highly likely that the optimizer will decide that a complete table lock will be the most efficient way to handle the transaction.

There are potentially some big problems here,

Your transaction log may grow to accommodate the changes being written to it. If your table is huge, you run the risk of consuming all the space on your transaction log disk.
If your application(s) still requires access to the table and a table lock has been placed on it, your application has to wait until the table becomes available. This could be some time resulting in application time outs and frustrated users.
Your transaction log disk will be working hard during this period as your transaction log grows. This could be decreasing performance across all databases which might be sharing that disk for their transaction logs.
Depending on how much memory you have allocated to your SQL Server buffer pool, there could be significant drops in page life expectancy, reducing performance for other queries.
The realisation that a big performance issue is occurring lends temptation to kill the query. The trouble with that is it can delay things even more as the server has to rollback the transaction. Depending on how far along the operation is, this could add on even more time to what was originally going to be.

For example, if you kill the query and it is 90{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} done then the server has to rollback a 90{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} completed transaction. This will vary but the rollback can take as much time as the delete operation was in progress! (check using KILL n WITH STATUSONLY)

Some ways to delete millions of rows using T-SQL loops and TOP

Use a loop combined with TOP and delete rows in smaller transactions. Here are a couple of variations of the same thing. Note that I have arbitrarily chosen 1000 as a figure for demonstration purposes.

[sourcecode language=’sql’]
SELECT 1
WHILE @@ROWCOUNT > 0
BEGIN
DELETE TOP (1000)
FROM LargeTable
END
[/sourcecode]

And another way…
[sourcecode language=’sql’]
DoItAgain:
DELETE TOP (1000)
FROM ExampleTable

IF @@ROWCOUNT > 0
GOTO DoItAgain
[/sourcecode]

These are simple examples just to demonstrate. You can add WHERE clauses and JOINS to help with the filtering process to remove specifics. You would add error handling/transactions (COMMIT/ROLLBACK) also.

Summary

It’s a bad idea to delete millions of rows in one transaction 🙂 and whilst this might sound like a no-brainer, people do try and do this and wonder why things start to go bad.

Breaking the delete operation down into smaller transactions is better all round. This will help reduce contention for your table, reduce probability of your transaction log becoming too large for its disk and reduce performance impact in general.

Your transaction log disk will still be working hard as your refined delete routine removes the rows from your table. Try and run this task during maintenance windows which are typically done inside off peak periods.

About Andy Hayes

Andy Hayes is a DBA working with SQL Server since version 7.0. He has a wonderful wife and two beautiful children. He loves database technology, playing cricket, and blogging. He is passionate about sharing his experiences as a DBA and learning more to further his understanding and knowledge. You can follow me on Twitter, check out my Facebook page or follow me on Google+

Comments

Virendra Yaduvanshi says

November 1, 2012 at 4:00 am

Hi Andy,
Very Nice article… good work.
regards,
Virendra

Reply
- Andy Hayes says
  
  November 2, 2012 at 8:04 am
  
  Hi Virendra
  
  Thanks for your comment, I’m glad you liked the article 🙂
  
  All the best
  
  Andy
  
  Reply
  - Justice says
    
    December 2, 2015 at 8:13 am
    
    Hi!,
    
    So what happens if I stop this query when it’s 80{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de} through the way?
    
    Reply
TIRUMALESH says

February 22, 2013 at 7:13 pm

Very Good Site…Nice articles…Keep it up…

Reply
- admin says
  
  February 23, 2013 at 9:00 am
  
  Glad you like the site Tirumalesh and thank you for the comment.
  
  All the best
  
  Andy
  
  Reply
Patrick says

May 3, 2013 at 12:58 am

Mate thanks a bunch, that is so much faster I’m very surprised! 🙂

Reply
- Andy Hayes says
  
  May 3, 2013 at 5:55 pm
  
  Hi Patrick
  
  I’m glad this has helped.
  
  Thanks for your comment.
  
  All the best.
  
  Andy
  
  Reply
Milind says

June 20, 2013 at 12:31 am

Exactly what I was looking for. Thanks Andy!

Reply
- Andy Hayes says
  
  June 20, 2013 at 4:13 pm
  
  I’m glad you found this useful Milind 🙂
  
  Reply
Adnan Khan says

July 18, 2013 at 11:12 am

Thanks a lot Andy..This is exactly what I was looking for.. You saved me…
Once again thank you so much for sharing a nice article…

Reply
- Andy Hayes says
  
  July 26, 2013 at 7:18 pm
  
  My pleasure Adnan 🙂
  
  Reply
Ash Kumar says

February 17, 2014 at 1:58 pm

Similar to your top rows solution, the following works as well:

set rowcount 50000
do while (1=1)
begin
delete mytable where
if @@rowcount=0
break
end
set rowcount 0 — Reset the rowcount

Reply
Johnny says

June 29, 2014 at 2:26 am

Nice article. Assuming that disk space for the transaction log growth is not a concern, and assuming that other queries being blocked is not a concern, is a single large operation faster than multiple smaller “group type operations”? I’m not a DBA and I’m trying to get my head around the situation from a theoretical standpoint.

Reply
- Andy Hayes says
  
  June 29, 2014 at 8:02 am
  
  Thanks very much Johnny. I’m glad you liked the article. You have raised an interesting question. Typically SQL Server is far more efficient at dealing with sets of data so in the scenario you refer to where disk space and locking is not an issue, I would expect a single large transaction to be faster. It’s something worthy of a blog post to try and prove this. 🙂
  
  Reply
Dilon says

July 21, 2016 at 7:47 am

Hi Andy
Great article well explained. But can you explain why is it necessary for the SELECT 1 at the beginning of the code.
Thanks
Dilon

Reply
- Andy Hayes says
  
  August 7, 2016 at 7:00 pm
  
  Hi Dilon
  
  SELECT 1 returns 1 row and it is to give the next line a starting position in the loop setting @@ROWCOUNT to be 1.
  
  Reply
Adam Feather says

August 12, 2016 at 10:43 am

I’ve just forwarded this article around my dept. as I suspect we suffered at the hands of a large delete the other day. Hope you’re well!

Reply
- Andy Hayes says
  
  August 12, 2016 at 8:40 pm
  
  Hello sir! I’m very good thanks. Hope all is well with you. Thanks for sharing my post 😉
  
  Reply
Gavin Stevens says

October 5, 2016 at 2:51 am

I started this route. 94,000,000 rows in a logging table.. The data file is 95 GB in size.. I’m nearing capacity on a 128GB drive. Your method is better than a single delete command, but will still take forever to run. I found it more effective to simply drop and re-create the logging table and then shrink the database, filesize went from 95GB to 56 MB!

Reply
Duncan says

April 11, 2017 at 10:36 pm

Hi Andy,
I learn a lot from this site, keep writing:-)

Reply
Imran Khan says

November 2, 2017 at 12:44 pm

Hi Andy,

Thanks for your great article. Its help me to delete 30 lacs records in a one loop query.
Well done and lots of tons thanks.

Reply
Michal says

June 11, 2018 at 5:19 am

Hi,

In case its not splitted by begin tran and commit tran, its still one transaction (with more steps)
in case there will be any “order by” clause, different with cluster index, can be a huge problem

isnt beter to use truncate table instead ?

Reply
- Andy Hayes says
  
  June 22, 2018 at 3:54 pm
  
  Hi Michal
  
  Thanks for your question. TRUNCATE TABLE is great when you have permission to use it, if you want to clear the whole table (not a subset of rows) and when the table is not referenced by foreign key constraints.
  
  Reply
zaib tabs says

December 14, 2018 at 3:39 pm

Hi,

Its quite interest, but in my case i have to display records in report which are around 50,000. is there a way i can use a loop or for loop or cte to reduce the load on my cpu.

currently my memory and cpu is maxing out, when user fetch the data

select * from items.

can i use

DoItAgain:

Select * from items TOP (1000)

FROM ExampleTable

IF @@ROWCOUNT > 1000

GOTO DoItAgain

Reply
John says

February 13, 2019 at 9:29 pm

Thanks for the article.

If you’ve got a script where you’re doing several of this, you can drop the begin/end for a single-statement WHILE:

WHILE @@ROWCOUNT > 0
BEGIN
DELETE TOP (1000)
FROM LargeTable
END

becomes

WHILE @@ROWCOUNT > 0 DELETE TOP (1000) FROM LargeTable

I hope my next database restore to development environment goes much faster when I’m purging huge tables.

Reply
Pavan says

January 17, 2020 at 3:06 am

Hi Andy,

Just to know, can we by changing the recovery model to bulk logged and peform the delete operation in single transaction during a maintenance time.

Reply