There may be some requirements where you need to produce a csv output from a table. There are some good, efficient ways to do this and there are also some less efficient ways. Let us now look at one optimized approach and a couple of lesser optimized approaches 😉
SQL performs best working with sets
This is true and should always be the first thought when designing a query.
So instead of querying the database row by row by firing multiple select statements, using loops or cursors, always try and think how can you achieve the operation in a set based way.
To help demonstrate this thought process, lets take this example table and its data
T-SQL:
SET NOCOUNT ON; DECLARE @Guid UNIQUEIDENTIFIER = NEWID() CREATE TABLE CSVOutputDemo ( ID INT PRIMARY KEY IDENTITY(1,1), ReferenceId UNIQUEIDENTIFIER NOT NULL, ImageName VARCHAR(500) ) --add some data INSERT INTO CSVOutputDemo(ReferenceId,ImageName) VALUES (@Guid, 'image1.jpg'), (@Guid, 'image2.jpg'), (@Guid, 'image3.jpg') SELECT * FROM CSVOutputDemo ID ReferenceId ImageName ----------- ------------------------------------ -------------------------------------------------- 1 6083FFAF-8C29-489A-9486-07F2BABDC264 image1.jpg 2 6083FFAF-8C29-489A-9486-07F2BABDC264 image2.jpg 3 6083FFAF-8C29-489A-9486-07F2BABDC264 image3.jpg
The desired output is that we want all the values in the ImageURL field to be returned from the table as a comma separated value string, aka CSV output
image1,jpg,image2.jpg,image3.jpg
How to take three strings from three rows and turn them into one string on a single row?
You can take advantage of FOR XML which is used in conjunction with the SELECT statement. It’s really neat 🙂
By placing FOR XML PATH(”) at the end of the query, you can see that already, the rows returned have reduced from 3 to 1
SELECT ImageName FROM CSVOutputDemo FOR XML PATH('')
Returns…
XML_F52E2B61-18A1-11d1-B105-00805F49916B <ImageName>image1.jpg</ImageName<ImageName>image2.jpg</ImageName><ImageName>image3.jpg</ImageName>
However, the output isn’t quite as we want it just yet so we have to make some adjustments to add the comma’s
SELECT ',' + ImageName FROM CSVOutputDemo FOR XML PATH('') XML_F52E2B61-18A1-11d1-B105-00805F49916B ---------------------------------------- ,image1.jpg,image2.jpg,image3.jpg
We now have the CSV but there is a little bit of tidying up to do on the result to remove the comma at the start of the string and for this we will use STUFF to add an empty string in place of the first instance of the comma.
SELECT STUFF ((SELECT ',' + ImageName FROM CSVOutputDemo FOR XML PATH('')), 1, 1, '') AS ImageCSV ImageCSV --------------------------------- image1.jpg,image2.jpg,image3.jpg
FOR XML PATH(”) is an elegant solution but what about the alternative row by row approach?
This example uses a WHILE loop
DECLARE @ID INT DECLARE @ImageName VARCHAR(50) DECLARE @ImageCSV VARCHAR(500) = '' SET @ID = (SELECT MIN(ID) FROM CSVOutputDemo) WHILE @ID IS NOT NULL BEGIN SET @ImageName = (SELECT ImageName FROM CSVOutputDemo WHERE ID = @ID) SET @ImageCSV = @ImageCSV + ',' + @ImageName SET @ID = (SELECT MIN(ID) FROM CSVOutputDemo WHERE ID > @ID) END SELECT STUFF(@ImageCSV, 1, 1, '') AS ImageCSV
As you can see, this is a lot more code for the same output. There is an alternate row by row solution which uses a T-SQL cursor.
DECLARE @ID INT DECLARE @ImageName VARCHAR(50) DECLARE @ImageCSV VARCHAR(500) = '' DECLARE CursorImage CURSOR FOR SELECT ImageName FROM CSVOutputDemo OPEN CursorImage FETCH NEXT FROM CursorImage INTO @ImageName WHILE @@FETCH_STATUS <> -1 BEGIN SET @ImageCSV = @ImageCSV + ',' + @ImageName FETCH NEXT FROM CursorImage INTO @ImageName END CLOSE CursorImage DEALLOCATE CursorImage SELECT STUFF(@ImageCSV, 1, 1, '') AS ImageCSV
The performance comparison is best seen when STATISTICS IO is enabled.
Here are the results:
FOR XML PATH(”)
Table 'CSVOutputDemo'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
T-SQL WHILE LOOP
Table 'CSVOutputDemo'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'CSVOutputDemo'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'CSVOutputDemo'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'CSVOutputDemo'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'CSVOutputDemo'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'CSVOutputDemo'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'CSVOutputDemo'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
T-SQL CURSOR
Table 'Worktable'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'CSVOutputDemo'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'CSVOutputDemo'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'CSVOutputDemo'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'CSVOutputDemo'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
The set based operation involving the FOR XML PATH(”) solution makes one trip to the database however the other two solutions involve many more requests which will have a negative effect on performance.
Leave a Reply