Count Total Rows while Paging and Avoid Two Queries

This is well covered in many other places, but I’m posting on my own blog so that I can more easily find this trick.

You can do paging in SQL Server 2012 and later using the OFFSET-FETCH syntax. This is useful for returning a single page’s worth of records from a larger (sorted) set of records. To count the total number of records, however, you’d normally do a second query. The T-SQL below shows how to do a total count as part of the query that returns the page in question. The count is repeated for every record, but it avoids having to do two queries.

select FirstName, LastName, COUNT(*) over () as TotalCount
  from Person
  where LastName like 'mc%'
  order by LastName, FirstName
  offset 500 rows
  fetch next 100 rows only

Here’s what the output looks like, returning records 501-600 out of 1,968 records having a last name that starts with “Mc”.

161109-1

SQL Server -Dump Info on Keys

Here’s a short stored procedure that could be useful when trying to understand a SQL Server database.

Let’s say that you have a simple schema like the one below. Person has GroupID as a foreign key (Group.GroupID), meaning that a person belongs to a single group. A person can also have multiple contact records, so PersonContact has a foreign key indicating which person the contact is for.

Keys

In a simple schema like this, you can quickly see the two relationships. You can also see some of this by looking at the Person table in Management Studio, under Columns and Keys. In the diagram below, we see that GroupID is a foreign key and, based on the naming convention, we infer that the corresponding primary key is in the Group table.

MgmtStudio

This notation relies on the naming convention used for the FK_Person_Group relationship. Also, while looking at the Person table, we’re unable to see which tables might contain foreign keys that refer to the primary key in the Person table.

Below is a simple stored procedure that dumps out all relationships that a specified table participates in. That is, given a table, it tells you:

  • All foreign keys that reference the specified table’s primary key
  • All foreign keys in the specified table and where the corresponding primary key is located
-- Given specified table, show all PK/FK relationships.  Show:
--   1) All foreign keys that reference this table's primary key
--   2) All foreign keys in this table and which table contains primary key
create procedure uKeys
(
	@TableName varchar(255)
)
as
begin
select tfk.name as PKTable, cfk.name as PKColumn, tpk.name as FKTable, cpk.name as FKColumn from sys.foreign_key_columns fk
  inner join sys.tables as tpk on fk.parent_object_id=tpk.object_id
  inner join sys.columns as cpk on fk.parent_object_id=cpk.object_id and fk.parent_column_id=cpk.column_id
  inner join sys.tables as tfk on fk.referenced_object_id=tfk.object_id
  inner join sys.columns as cfk on fk.referenced_object_id=cfk.object_id and fk.referenced_column_id=cfk.column_id
  where (tpk.name = @TableName) or (tfk.name = @TableName)
  order by PKTable, PKColumn, FKTable, FKColumn
end;

With this stored proc in place, you can now do the following in a query window, to ask about the keys for the Person table.

ukeys 'Person'

You’ll get some nice output that shows information about all of the relationships for the Person table.

Output

This is a simple example, but this procedure is considerably more useful when you’re dealing with very large databases, where you have a large number of tables and it’s not easy to see all of the relationships for a particular table.

Addendum: There’s already a built-in stored procedure in SQL Server that does basically the same thing (though perhaps in not as clean a format). sp_helpconstraint will dump out the same sort of information, as shown below.

sphelpconstraint

That Conference 2016 – Common TSQL Mistakes

That Conference 2016, Kalahari Resort, Lake Delton, WI
Common TSQL Mistakes – Kevin Boles

Day 1, 8 Aug 2016

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2016 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

  • By fixing some of the most common mistakes made in SQL queries and stored procs, you can vastly improve the performance of your application
  • Kevin sees the same mistakes over and over again
  • In each case, Kevin presents the mistake and describes a proposed alternative

Miscellaneous Goodies

  • tsqlt.org  – Unit testing framework for SQL Server
  • Always specify NOT NULL for a column, unless you truly expect to handle NULL values
  • You want Index Seeks in query plans, not Index Scans. [SPS: Or, worse yet, Table Scans]
  • Don’t do formatting in SQL Server, but do in middle tier or client instead
  • Always SET NOCOUNT ON at the top of a stored procedure
  • Tune to the 2nd execution, not the first
  • Every trigger must be able to process n rows. The trigger is invoked only once when doing batch updates
  • Catch-all query leads to poor performance (e.g. WHERE @a=a or @a is null)
  • Don’t use table variables–no statistics on columns

NULLs and the NOT IN predicate

Problem arises when using NOT IN against list of values that contains NULL. In the example below, if there’s at least one Product with a NULL value for Color, the SELECT statement returns nothing.

SELECT C.color
FROM Colors AS C
WHERE C.color NOT IN (SELECT P.color
                      FROM Products AS P);

The fix is to use NOT EXISTS instead of NOT IN.

SELECT C.color
FROM Colors AS C
WHERE NOT EXISTS(SELECT *
                 FROM Products AS P
                 WHERE C.color = P.color);

Functions on indexed columns in predicates

Using functions on indexed columns in a WHERE clause, the index is not used. I.e. This leads to an index scan, rather than a seek. In the example below, every row of the table is read.

SELECT customer_name
FROM Customers
WHERE LEFT(customer_name, 1) = 'L';

The fix is to not use functions on indexed columns in the predicate.

SELECT customer_name
FROM Customers
WHERE customer_name LIKE 'L%';

Incorrect subquery column

It’s easy to get mixed up when writing a subquery and use a column name from the main table, rather than from the table in the subquery. In the example below, the Sales table has a sale_date column and the Calendar table has a calendar_date column. The subquery uses the wrong column name.

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT sale_date 
                    FROM Calendar AS C
                    WHERE holiday_name IS NOT NULL);

The fix is to use the correct column name.

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT C.calendar_date 
                    FROM Calendar AS C
                    WHERE C.holiday_name IS NOT NULL);

Data type mismatch in predicates

When you have data types that don’t match between a parameter and a column used in a predicate, or between different columns when doing a join, SQL Server does an implicit conversion, which results in poor performance. Watch out for CONVERT_IMPLICIT in query plans.

For example, assume that last_name column in Customers table is VARCHAR(35), but you have a stored proc parameter of type NVARCHAR(35). In the example below, we get poor performance because SQL Server does an implicit conversion and the index is not used.

CREATE PROCEDURE GetCustomerByLastName
 @last_name NVARCHAR(35)
AS
 SELECT first_name, last_name
 FROM Customers
 WHERE last_name = @last_name;

The fix is to make sure that the data type of the parameter matches the column’s data type.

Predicate evaluation order

You can’t rely on evaluation order in predicates. That is, you can’t assume that sub-expressions in an AND clause are evaluated left to right and try to rely on short-circuiting the expression.

In the example below, assume that the account_reference column can be either numeric or character-based (bad design) and that checking account_type checks for this. The query fails because there’s no guarantee that the check against account_type happens before the CAST.

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
  AND CAST(account_reference AS INT) > 20;

A workaround for this particular example would be to use a CASE statement.

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
  AND CASE WHEN account_reference NOT LIKE '%[^0-9]%' 
           THEN CAST(account_reference AS INT)
      END > 20;

Outer joins and placement of predicates

When doing an outer join, the predicate (WHERE clause) is applied only after including the “outer” rows after applying the ON clause to do the join. This means that the WHERE clause may unexpectedly filter out rows that were intended to be included by virtue of doing the outer join.

In the example below, customers with no orders are not included in the final result, as intended. They get filtered out because they have an order_date of NULL after bringing the outer rows back into the query result.

SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
  ON C.customer_nbr = O.customer_nbr
WHERE O.order_date >= '20090101'
GROUP BY C.customer_name;

The solution is to move the date check inside the ON clause.

SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
  ON C.customer_nbr = O.customer_nbr
 AND O.order_date >= '20090101'
GROUP BY C.customer_name;

Subqueries that return more than one value

Using a subquery to retrieve a value from a second table can backfire when data changes and the subquery no longer returns a single value, as expected. The query below will work only if the sku exists only once in ProductPlants.

SELECT sku, product_description,
      (SELECT plant_nbr
       FROM ProductPlants AS B
       WHERE B.sku = A.sku) AS plant_nbr
FROM Products AS A;

The fix is to do this in a JOIN.

SELECT A.sku, A.product_description, B.plant_nbr
FROM Products AS A
JOIN ProductPlants AS B
  ON A.sku = B.sku;

Use of SELECT *

Do not use SELECT * in production code. Your use may make assumptions about the schema, e.g. inserting data into a second table. A query might then break if the schema changes.

You can also run into problems when using SELECT * in a view, when columns are later added or deleted. It’s a good idea to use SCHEMABINDING when creating views.

Scalar user-defined functions

Using a scalar user-defined function in a query can result in very poor performance, because the function can get applied for every single row of a target table.

As an example, assume you have the following function:


CREATE FUNCTION dbo.GetTotalSales(@sku INT)
RETURNS DECIMAL(15, 2)
AS
BEGIN
  RETURN(SELECT SUM(sale_amount)
         FROM Sales 
         WHERE sku = @sku);
END

and you use the function as follows:

SELECT sku, product_description, dbo.GetTotalSales(sku) AS total_sales
FROM Products;

This will result in very poor performance. The solution is to rewrite the function as a table-valued function or to rewrite the main query.

SELECT P.sku, P.product_description, SUM(S.sale_amount) As total_sales
FROM Products AS P
JOIN Sales AS S
  ON P.sku = S.sku
GROUP BY P.sku, P.product_description;

The table-valued function would look like:


CREATE FUNCTION dbo.GetTotalSales(@sku INT)
RETURNS TABLE
AS
RETURN(SELECT SUM(sale_amount) AS total_sales
       FROM Sales 
       WHERE sku = @sku);

Overuse of cursors

It’s generally a bad idea to use cursors. You can almost always do the same thing by writing a normal (i.e. set-based) query.