T-SQL – How to Select Top N Rows for Each Group Using ROW_NUMBER()

I’ve recently been working on a data migration project and have found myself utilizing the ROW_NUMBER() function in SQL Server a lot. This function has been around since SQL Server 2005 and at its core, provides a way to provide sequential numbering for rows returned by a query.

One of the requirements for an export process was to return the most recent two customer orders for each customer account and outputting this to CSV file.

As well as sequential numbering, ROW_NUMBER() provides a way to partition the results returned so this suited me just fine :)

Let’s take a look at the problem in more detail and I’m going to use our good friend the AdventureWorks database to help demonstrate.

The first query returns the data by customer and with the most recent orders for each customer at the top.

SELECT SalesOrderID
      ,OrderDate
      ,SalesOrderNumber
      ,AccountNumber
      ,CustomerID
      ,SubTotal
      ,TaxAmt
      ,TotalDue
  FROM [AdventureWorks2012].[Sales].[SalesOrderHeader] 
ORDER BY CustomerID, OrderDate DESC

Here are the results of this query sampled.

t-sql return top n rows per group sql server

How to return the top two rows for each group in the result set?

We will use ROW_NUMBER() to place a sequential id on the results but also partition the results so that each instance of the customerID value has its sequential id reset.

SELECT SalesOrderID
      ,OrderDate
      ,SalesOrderNumber
      ,AccountNumber
      ,CustomerID
      ,SubTotal
      ,TaxAmt
      ,TotalDue
    ,ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY CustomerID, OrderDate DESC) AS RowNum
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]

This returns a new column called RowNum which is providing the sequential numbering for each instance of CustomerID based on the PARTITION BY and ORDER BY of the RowNum column.

t-sql return top n rows per group sql server with row_number

All that remains is to now select the first two rows for each CustomerID. So I am wrapping it up in a CTE and returning the first two rows using the WHERE clause.

WITH MyRowSet
AS
(
SELECT SalesOrderID
      ,OrderDate
      ,SalesOrderNumber
      ,AccountNumber
      ,CustomerID
      ,SubTotal
      ,TaxAmt
      ,TotalDue
    ,ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY CustomerID, OrderDate DESC) AS RowNum
FROM [AdventureWorks2012].[Sales].[SalesOrderHeader] 
)

SELECT * FROM MyRowSet WHERE RowNum <= 2

Which returns our desired result…

t-sql return top n rows per group sql server with row_number and cte

I hope you found this post useful. :)

Data Management – Finding, Removing Duplicate Rows Using SQL and Some Prevention Tips

find and remove duplicate data

Duplicate data is a common problem and in my experience comes in two main flavours; exact duplicate data and similar data. Exact duplicate data is when a data source contains the same values across all fields in the record. Similar data is when some … [Continue reading]

T-SQL – Determine the Difference in Values Between Columns in Different Rows

Calculate Difference Between Columns in Different Rows

Have you ever needed to calculate the difference between columns in two or more rows in a table using a T-SQL query? For example you might want to run some trend analysis or audit data for changes. Thankfully using ROW_NUMBER() and Common Table … [Continue reading]

SQL – Using CASE with ORDER BY in Stored Procedures

When writing SQL it is possible to use ORDER BY with CASE conditions to help provide more flexibility over the returned order of a record set and this is especially useful when writing stored procedures which require a dynamic ORDER BY clause. This … [Continue reading]

How to Produce CSV Format Using T-SQL

There may be some requirements where you need to produce a csv output from a table. There are some good, efficient ways to do this and there are also some less efficient ways. Let us now look at one optimized approach and a couple of lesser optimized … [Continue reading]