DBA Diaries

Thoughts and experiences of a DBA working with SQL Server and MySQL

How to Produce CSV Format Using T-SQL

Posted on May 30, 2015 Written by Andy Hayes Leave a Comment

There may be some requirements where you need to produce a csv output from a table. There are some good, efficient ways to do this and there are also some less efficient ways. Let us now look at one optimized approach and a couple of lesser optimized approaches 😉

SQL performs best working with sets

This is true and should always be the first thought when designing a query.

So instead of querying the database row by row by firing multiple select statements, using loops or cursors, always try and think how can you achieve the operation in a set based way.

To help demonstrate this thought process, lets take this example table and its data

T-SQL:

SET NOCOUNT ON;
DECLARE @Guid UNIQUEIDENTIFIER = NEWID()

CREATE TABLE CSVOutputDemo
(
ID INT PRIMARY KEY IDENTITY(1,1),
ReferenceId UNIQUEIDENTIFIER NOT NULL,
ImageName VARCHAR(500)
)

--add some data

INSERT INTO CSVOutputDemo(ReferenceId,ImageName)
VALUES
(@Guid, 'image1.jpg'),
(@Guid, 'image2.jpg'),
(@Guid, 'image3.jpg')

SELECT * FROM CSVOutputDemo
ID          ReferenceId                          ImageName
----------- ------------------------------------ --------------------------------------------------
1           6083FFAF-8C29-489A-9486-07F2BABDC264 image1.jpg
2           6083FFAF-8C29-489A-9486-07F2BABDC264 image2.jpg
3           6083FFAF-8C29-489A-9486-07F2BABDC264 image3.jpg

The desired output is that we want all the values in the ImageURL field to be returned from the table as a comma separated value string, aka CSV output

image1,jpg,image2.jpg,image3.jpg

How to take three strings from three rows and turn them into one string on a single row?

You can take advantage of FOR XML which is used in conjunction with the SELECT statement. It’s really neat 🙂

By placing FOR XML PATH(”) at the end of the query, you can see that already, the rows returned have reduced from 3 to 1

SELECT ImageName FROM CSVOutputDemo FOR XML PATH('')

Returns…

XML_F52E2B61-18A1-11d1-B105-00805F49916B
<ImageName>image1.jpg</ImageName<ImageName>image2.jpg</ImageName><ImageName>image3.jpg</ImageName>

However, the output isn’t quite as we want it just yet so we have to make some adjustments to add the comma’s

SELECT ',' + ImageName FROM CSVOutputDemo FOR XML PATH('')

XML_F52E2B61-18A1-11d1-B105-00805F49916B
----------------------------------------
,image1.jpg,image2.jpg,image3.jpg

We now have the CSV but there is a little bit of tidying up to do on the result to remove the comma at the start of the string and for this we will use STUFF to add an empty string in place of the first instance of the comma.

SELECT STUFF ((SELECT ',' + ImageName FROM CSVOutputDemo FOR XML PATH('')), 1, 1, '') AS ImageCSV

ImageCSV
---------------------------------
image1.jpg,image2.jpg,image3.jpg

FOR XML PATH(”) is an elegant solution but what about the alternative row by row approach?

This example uses a WHILE loop

DECLARE @ID INT
DECLARE @ImageName VARCHAR(50)
DECLARE @ImageCSV VARCHAR(500) = ''

SET @ID = (SELECT MIN(ID) FROM CSVOutputDemo)
WHILE @ID IS NOT NULL
BEGIN
  SET @ImageName = (SELECT ImageName FROM CSVOutputDemo WHERE ID = @ID)
  SET @ImageCSV = @ImageCSV + ',' + @ImageName
  SET @ID = (SELECT MIN(ID) FROM CSVOutputDemo WHERE ID > @ID)
END

SELECT STUFF(@ImageCSV, 1, 1, '') AS ImageCSV

As you can see, this is a lot more code for the same output. There is an alternate row by row solution which uses a T-SQL cursor.

DECLARE @ID INT
DECLARE @ImageName VARCHAR(50)
DECLARE @ImageCSV VARCHAR(500) = ''

DECLARE CursorImage CURSOR FOR
SELECT ImageName FROM CSVOutputDemo
OPEN CursorImage
FETCH NEXT FROM CursorImage INTO @ImageName
WHILE @@FETCH_STATUS <> -1
BEGIN
	SET @ImageCSV = @ImageCSV + ',' + @ImageName
	FETCH NEXT FROM CursorImage INTO @ImageName
END
CLOSE CursorImage
DEALLOCATE CursorImage

SELECT STUFF(@ImageCSV, 1, 1, '') AS ImageCSV

The performance comparison is best seen when STATISTICS IO is enabled.

Here are the results:

FOR XML PATH(”)

Table 'CSVOutputDemo'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

T-SQL WHILE LOOP

Table 'CSVOutputDemo'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CSVOutputDemo'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CSVOutputDemo'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CSVOutputDemo'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CSVOutputDemo'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CSVOutputDemo'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CSVOutputDemo'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

T-SQL CURSOR

Table 'Worktable'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CSVOutputDemo'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CSVOutputDemo'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CSVOutputDemo'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CSVOutputDemo'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The set based operation involving the FOR XML PATH(”) solution makes one trip to the database however the other two solutions involve many more requests which will have a negative effect on performance.

Filed Under: All Articles, SQL Tips and Tricks Tagged With: performance, sql server, t-sql

T-SQL – Derived Tables Demonstrated and Explained

Posted on October 18, 2014 Written by Andy Hayes Leave a Comment

T-SQL is a very powerful language when dealing with sets and in this post, I am going to be demonstrating how to use derived tables.

Derived tables are not the same as a temporary or physical tables where the fields and data types are declared. Nor are they subqueries. Instead, as the name implies, the table is derived from an existing table. I liken them to views but a view is an object that is created and is re-usable. A derived table is only available at execution time and not a re-usable object within the query. In other words, you cannot reference it multiple times like you can with common table expressions (CTE’s).

Basic syntax of the derived table

SELECT Field1, Field2
FROM
(SELECT *
FROM Table) Table1

So lets put the syntax to something more real:

USE AdventureWorks2012
GO
SELECT TOP 10 FirstName, LastName
FROM
(SELECT * FROM Person.Person) AS PersonDerivedTable
WHERE LastName = 'Smith';
FirstName       LastName
--------------- ---------------
Abigail         Smith
Adriana         Smith
Alexander       Smith
Alexandra       Smith
Alexis          Smith
Allen           Smith
Alyssa          Smith
Andre           Smith
Andrew          Smith
Arthur          Smith

(10 row(s) affected)

You might be thinking that you can achieve the same thing without the derived table and you would be correct. However where derived tables start to become really useful are when you want to start using aggregated statements.

Derived tables used with aggregated queries

For example, the following query aggregates total sales orders by year as one query and combines the results of the total sales orders per sales rep, per year of the second query. The purpose is to get the percentage of sales orders by sales rep for each year.

USE AdventureWorks2012
GO

SELECT YearTotalsBySalesPerson.*
, YearTotals.TotalOrdersForYear
, CONVERT(DECIMAL(10,2), 100 * CONVERT(DECIMAL(10,2), YearTotalsBySalesPerson.SalesPersonTotalOrders)
/ CONVERT(DECIMAL(10,2), YearTotals.TotalOrdersForYear)) AS PercentageOfSalesForYear
FROM
--total sales orders per year
(SELECT YEAR(OrderDate) AS [Year]
, COUNT(SalesOrderID) AS TotalOrdersForYear
FROM Sales.SalesOrderHeader SOH
WHERE SalesPersonID IS NOT NULL
GROUP BY YEAR(OrderDate)) YearTotals

JOIN
--total sales orders by sales rep per year
(SELECT YEAR(OrderDate) AS [Year]
, SalesPersonID
, COUNT(SalesOrderID) AS SalesPersonTotalOrders
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL
GROUP BY YEAR(OrderDate), SalesPersonID) YearTotalsBySalesPerson

ON YearTotals.[Year] = YearTotalsBySalesPerson.[Year]

Returning the results (sample)

Year        SalesPersonID SalesPersonTotalOrders TotalOrdersForYear PercentageOfSalesForYear
----------- ------------- ---------------------- ------------------ ---------------------------------------
2008        283           41                     901                4.55
2008        277           97                     901                10.77
2006        289           84                     1015               8.28
2007        279           158                    1524               10.37
2006        278           69                     1015               6.80
2007        285           9                      1524               0.59
2006        275           139                    1015               13.69
2008        286           56                     901                6.22
2005        274           4                      366                1.09
2008        280           12                     901                1.33
2005        282           51                     366                13.93

Derived tables can simplify code writing

Take the following example query which uses a CASE statement combined with GROUP BY

SELECT YEAR(OrderDate) AS [Year]
,CASE WHEN MONTH(OrderDate) IN (1,2,3) THEN 1
WHEN MONTH(OrderDate) IN (4,5,6) THEN 2
WHEN MONTH(OrderDate) IN (7,8,9) THEN 3
WHEN MONTH(OrderDate) IN (10,11,12) THEN 4 END AS [Quarter]
,COUNT(SalesOrderID) AS Orders
FROM Sales.SalesOrderHeader
GROUP BY
YEAR(OrderDate)
,CASE WHEN MONTH(OrderDate) IN (1,2,3) THEN 1
WHEN MONTH(OrderDate) IN (4,5,6) THEN 2
WHEN MONTH(OrderDate) IN (7,8,9) THEN 3
WHEN MONTH(OrderDate) IN (10,11,12) THEN 4 END
ORDER BY [Year], [Quarter]

This can simplified with a derived table.

SELECT [Year], [Quarter], COUNT(SalesOrderID) AS Orders
FROM
(SELECT YEAR(OrderDate) AS [Year]
,CASE WHEN MONTH(OrderDate) IN (1,2,3) THEN 1
WHEN MONTH(OrderDate) IN (4,5,6) THEN 2
WHEN MONTH(OrderDate) IN (7,8,9) THEN 3
WHEN MONTH(OrderDate) IN (10,11,12) THEN 4 END AS [Quarter]
,SalesOrderID
FROM Sales.SalesOrderHeader) YearQuarter
GROUP BY [Year], [Quarter]
ORDER BY [Year], [Quarter]

You can see that there is no need to write the CASE statement twice for the second version.

Summary

  • A derived table cannot be re-used within a query. Consider using a CTE or view in that instance.
  • A derived table is not a physical object within the database, it only exists at execution time.
  • Derived tables are not exclusive to SQL Server, other RDBMS systems support them, MySQL, Oracle, etc.
  • They are great for performing extra calculations on aggregated queries by enabling you to join sets of data together.
  • Can be used to help simply code writing, for example GROUP BY CASE WHEN statements as above.

Filed Under: All Articles, SQL Tips and Tricks Tagged With: sql, sql server, t-sql

Using SELECT TOP WITH TIES in SQL Server

Posted on June 11, 2014 Written by Andy Hayes 2 Comments

I’ve used SELECT TOP (x) many, many times. After learning how to use SELECT, it’s something I quickly learned to use and remains something which I continue to use often.

I’ve also used TOP (x) with UPDATE and DELETE statements to limit the number of rows updated or deleted in a statement. I’ve never had to use SELECT TOP WITH TIES though.

I’ve glanced through the documentation in the past and thought – “must learn that one day” but never had the time or need to. I guess some things you learn how to use only when you have to.

So this post is dedicated to my learning experience of using SELECT TOP WITH TIES 🙂

What does SELECT TOP WITH TIES do?

Let’s take an example. Say you wanted to retrieve the first 5 rows from a table and you ordered by a specific column. So you get back 10 rows ordered by your column.

Now apply SELECT TOP (5) WITH TIES and any row which has a column value that matches the last row’s column value as defined by the ORDER BY clause will also be returned in the results. So you get more rows than what is specified in TOP (x).

Let’s do an example 🙂

CREATE TABLE FamousTransformers
(ID SMALLINT PRIMARY KEY IDENTITY(1,1),
FirstName VARCHAR(100) NOT NULL,
LastName VARCHAR(100) NOT NULL,
FavouriteColour VARCHAR(50) NOT NULL);

--populate with some data (6 rows to demonstrate this working)
INSERT INTO FamousTransformers(FirstName, LastName, FavouriteColour)
VALUES('Clark','Kent','Blue');
INSERT INTO FamousTransformers(FirstName, LastName, FavouriteColour)
VALUES('Bruce','Wayne','Black');
INSERT INTO FamousTransformers(FirstName, LastName, FavouriteColour)
VALUES('Peter','Parker','Red');
INSERT INTO FamousTransformers(FirstName, LastName, FavouriteColour)
VALUES('Optimus','Prime','Red');
INSERT INTO FamousTransformers(FirstName, LastName, FavouriteColour)
VALUES('David','Banner','Green');
INSERT INTO FamousTransformers(FirstName, LastName, FavouriteColour)
VALUES('Rodimus','Prime','Orange');

--select using normal TOP(x)

SELECT TOP (5) FirstName, LastName, FavouriteColour
FROM FamousTransformers
ORDER BY FavouriteColour

Returns….

FirstName            LastName             FavouriteColour
-------------------- -------------------- --------------------
Bruce                Wayne                Black
Clark                Kent                 Blue
David                Banner               Green
Rodimus              Prime                Orange
Optimus              Prime                Red

(5 row(s) affected)
--now execute using WITH TIES
SELECT TOP (5) WITH TIES FirstName, LastName, FavouriteColour
FROM FamousTransformers
ORDER BY FavouriteColour

FirstName            LastName             FavouriteColour
-------------------- -------------------- --------------------
Bruce                Wayne                Black
Clark                Kent                 Blue
David                Banner               Green
Rodimus              Prime                Orange
Peter                Parker               Red
Optimus              Prime                Red

(6 row(s) affected)

So you can see the extra row appearing there where “Optimus Prime” is displayed because he has the same favourite colour as “Peter Parker”. The FavouriteColour column is the key here as defined in the ORDER BY statement.

Actual use cases for SELECT TOP WITH TIES

I can think of one straight off the top of my head. If you had a league table of top performing sales people and wanted to bring back the top three performers, what if two sales people were tied for 3rd and 4th place on revenue? No way to differentiate them, they should both be returned in the results and share 3rd place.

And finally….

You must use ORDER BY and TOP clauses together if you want to use WITH TIES.

Filed Under: All Articles, SQL Tips and Tricks Tagged With: sql server, t-sql

Comparing Ways to Get Table Row Counts in MySQL and SQL Server

Posted on January 27, 2014 Written by Andy Hayes Leave a Comment

Get Row Counts MySQL SQL ServerI work in a mixed database server environment where applications are either powered by SQL Server, MySQL and more recently MongoDB. I started my career as a DBA working with SQL Server back in around 2003. In the last 5-6 years I have also been working with MySQL and more recently MongoDB which came into our organization last year.

In my time working with SQL Server it has become a mammoth product full of features and functionality. MySQL is not the same beast, however it is one of the most popular open source RDBMS in use today.

It is when I see something in MySQL that is straight forward to achieve but not so in SQL Server, that I scratch my head, wonder and hope that it will be simplified in future versions. I guess I presume too much.

An example of this was when I wrote up about a new feature in SQL Server 2012 for paging result sets using LIMIT – something which has existed in MySQL for a long time prior to SQL Server 2012.

I had another head scratching moment last week…..

As part of a task I was doing, I wanted to write some T-SQL to pull back a list of tables and their row counts. This task was across all databases on one of our servers. Typically I use the object explorer details view in Management Studio but I wanted to get the results out into a spreadsheet.

I wasn’t too bothered if the counts were accurate, more that I wanted to know which were populated by data. I also wanted a quick answer. I therefore was not prepared to use SELECT COUNT as it would be slow.

So I thought that this should be straight forward to achieve using INFORMATION_SCHEMA.TABLES, sadly it wasn’t or rather it wasn’t as easy as using MySQL.

So lets look at some ways to do this.

In MySQL, you can simply write something like this which is fast and approximate:

SELECT table_name, table_schema, table_rows
FROM information_schema.tables
WHERE table_schema not in ('mysql','performance_schema','information_schema');

To get the results out to a file by the way, you can run this from the command line (I’ve wrapped the lines for viewing purposes):

mysql -uroot -pYourPassword -e
"SELECT table_name, table_schema, table_rows
FROM information_schema.tables
WHERE table_schema not in
('mysql'
,'performance_schema'
,'information_schema')" > /path/filename.csv;

In SQL Server, querying INFORMATION_SCHEMA.TABLES does not bear the same fruit. Sadly, row counts are not displayed.

This is how I did it. I referenced SQL Server expert Pinal Dave’s article 🙂

There is usually someone out there who has found the answer already and has been very kind to share their solution with the world. They still had to invest some of their own spare time figuring it out though.

SELECT sc.name +'.'+ ta.name TableName, SUM(pa.rows) RowCnt
FROM sys.tables ta
INNER JOIN sys.partitions pa
ON pa.OBJECT_ID = ta.OBJECT_ID
INNER JOIN sys.schemas sc
ON ta.schema_id = sc.schema_id
WHERE ta.is_ms_shipped = 0 AND pa.index_id IN (1,0)
GROUP BY sc.name,ta.name
ORDER BY SUM(pa.rows) DESC

This is a more work to achieve the same result.

On the plus side, if you don’t do what I did by searching the web, you haven’t read this or any other article on the subject and you have the time, you can figure it out yourself which will be a lot more rewarding personally.

There are other ways to achieve this, take a look at this excellent article too.

I think SQL Server it is a great product and in many ways a superior one to MySQL. It’s just the simple things that I like in life (and to see in all database server products) that keep me a happy DBA 😉

Filed Under: All Articles, SQL Tips and Tricks Tagged With: mysql, sql server

  • « Previous Page
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next Page »

Categories

  • All Articles (82)
  • Career Development (8)
  • MySQL Administration (18)
  • MySQL Performance (2)
  • SQL Server Administration (24)
  • SQL Server News (3)
  • SQL Server Performance (14)
  • SQL Server Security (3)
  • SQL Tips and Tricks (19)

Top 10 Popular Posts

  • Using sp_change_users_login to fix SQL Server orphaned users
  • How to shrink tempdb
  • How to Transfer Logins to Another SQL Server or Instance
  • How to Delete Millions of Rows using T-SQL with Reduced Impact
  • MySQL SHOW USERS? – How to List All MySQL Users and Privileges
  • New T-SQL features in SQL Server 2012 – OFFSET and FETCH
  • T-SQL – How to Select Top N Rows for Each Group Using ROW_NUMBER()
  • How to move tempdb
  • How to Kill All MySQL Processes For a Specific User
  • Using exec sp_who2 to help with SQL Server troubleshooting

Recent Posts

  • How to Setup MySQL Master Master Replication
  • How To Use SQL to Convert a STRING to an INT
  • How to set up MySQL Replication Tutorial
  • How to Use SQL CASE for Conditional Logic in Your SQL Queries
  • Using ISNULL in SQL Server to Replace NULL Values

Search

Connect

  • Twitter
  • Facebook
  • Google+
  • RSS

About

  • Cookie Policy
  • Disclaimer
  • About
Copyright © ‘2019’ DBA Diaries built on the Genesis Framework

This site uses cookies. We assume you are happy with cookies but click the link if you are not. Close