That Conference 2016 – Common TSQL Mistakes

That Conference 2016, Kalahari Resort, Lake Delton, WI
Common TSQL Mistakes – Kevin Boles

Day 1, 8 Aug 2016

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2016 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

By fixing some of the most common mistakes made in SQL queries and stored procs, you can vastly improve the performance of your application
Kevin sees the same mistakes over and over again
In each case, Kevin presents the mistake and describes a proposed alternative

Miscellaneous Goodies

tsqlt.org – Unit testing framework for SQL Server
Always specify NOT NULL for a column, unless you truly expect to handle NULL values
You want Index Seeks in query plans, not Index Scans. [SPS: Or, worse yet, Table Scans]
Don’t do formatting in SQL Server, but do in middle tier or client instead
Always SET NOCOUNT ON at the top of a stored procedure
Tune to the 2nd execution, not the first
Every trigger must be able to process n rows. The trigger is invoked only once when doing batch updates
Catch-all query leads to poor performance (e.g. WHERE @a=a or @a is null)
Don’t use table variables–no statistics on columns

NULLs and the NOT IN predicate

Problem arises when using NOT IN against list of values that contains NULL. In the example below, if there’s at least one Product with a NULL value for Color, the SELECT statement returns nothing.

SELECT C.color
FROM Colors AS C
WHERE C.color NOT IN (SELECT P.color
                      FROM Products AS P);

The fix is to use NOT EXISTS instead of NOT IN.

SELECT C.color
FROM Colors AS C
WHERE NOT EXISTS(SELECT *
                 FROM Products AS P
                 WHERE C.color = P.color);

Functions on indexed columns in predicates

Using functions on indexed columns in a WHERE clause, the index is not used. I.e. This leads to an index scan, rather than a seek. In the example below, every row of the table is read.

SELECT customer_name
FROM Customers
WHERE LEFT(customer_name, 1) = 'L';

The fix is to not use functions on indexed columns in the predicate.

SELECT customer_name
FROM Customers
WHERE customer_name LIKE 'L%';

Incorrect subquery column

It’s easy to get mixed up when writing a subquery and use a column name from the main table, rather than from the table in the subquery. In the example below, the Sales table has a sale_date column and the Calendar table has a calendar_date column. The subquery uses the wrong column name.

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT sale_date 
                    FROM Calendar AS C
                    WHERE holiday_name IS NOT NULL);

The fix is to use the correct column name.

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT C.calendar_date 
                    FROM Calendar AS C
                    WHERE C.holiday_name IS NOT NULL);

Data type mismatch in predicates

When you have data types that don’t match between a parameter and a column used in a predicate, or between different columns when doing a join, SQL Server does an implicit conversion, which results in poor performance. Watch out for CONVERT_IMPLICIT in query plans.

For example, assume that last_name column in Customers table is VARCHAR(35), but you have a stored proc parameter of type NVARCHAR(35). In the example below, we get poor performance because SQL Server does an implicit conversion and the index is not used.

CREATE PROCEDURE GetCustomerByLastName
 @last_name NVARCHAR(35)
AS
 SELECT first_name, last_name
 FROM Customers
 WHERE last_name = @last_name;

The fix is to make sure that the data type of the parameter matches the column’s data type.

Predicate evaluation order

You can’t rely on evaluation order in predicates. That is, you can’t assume that sub-expressions in an AND clause are evaluated left to right and try to rely on short-circuiting the expression.

In the example below, assume that the account_reference column can be either numeric or character-based (bad design) and that checking account_type checks for this. The query fails because there’s no guarantee that the check against account_type happens before the CAST.

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
  AND CAST(account_reference AS INT) > 20;

A workaround for this particular example would be to use a CASE statement.

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
  AND CASE WHEN account_reference NOT LIKE '%[^0-9]%' 
           THEN CAST(account_reference AS INT)
      END > 20;

Outer joins and placement of predicates

When doing an outer join, the predicate (WHERE clause) is applied only after including the “outer” rows after applying the ON clause to do the join. This means that the WHERE clause may unexpectedly filter out rows that were intended to be included by virtue of doing the outer join.

In the example below, customers with no orders are not included in the final result, as intended. They get filtered out because they have an order_date of NULL after bringing the outer rows back into the query result.

SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
  ON C.customer_nbr = O.customer_nbr
WHERE O.order_date >= '20090101'
GROUP BY C.customer_name;

The solution is to move the date check inside the ON clause.

SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
  ON C.customer_nbr = O.customer_nbr
 AND O.order_date >= '20090101'
GROUP BY C.customer_name;

Subqueries that return more than one value

Using a subquery to retrieve a value from a second table can backfire when data changes and the subquery no longer returns a single value, as expected. The query below will work only if the sku exists only once in ProductPlants.

SELECT sku, product_description,
      (SELECT plant_nbr
       FROM ProductPlants AS B
       WHERE B.sku = A.sku) AS plant_nbr
FROM Products AS A;

The fix is to do this in a JOIN.

SELECT A.sku, A.product_description, B.plant_nbr
FROM Products AS A
JOIN ProductPlants AS B
  ON A.sku = B.sku;

Use of SELECT *

Do not use SELECT * in production code. Your use may make assumptions about the schema, e.g. inserting data into a second table. A query might then break if the schema changes.

You can also run into problems when using SELECT * in a view, when columns are later added or deleted. It’s a good idea to use SCHEMABINDING when creating views.

Scalar user-defined functions

Using a scalar user-defined function in a query can result in very poor performance, because the function can get applied for every single row of a target table.

As an example, assume you have the following function:


CREATE FUNCTION dbo.GetTotalSales(@sku INT)
RETURNS DECIMAL(15, 2)
AS
BEGIN
  RETURN(SELECT SUM(sale_amount)
         FROM Sales 
         WHERE sku = @sku);
END

and you use the function as follows:

SELECT sku, product_description, dbo.GetTotalSales(sku) AS total_sales
FROM Products;

This will result in very poor performance. The solution is to rewrite the function as a table-valued function or to rewrite the main query.

SELECT P.sku, P.product_description, SUM(S.sale_amount) As total_sales
FROM Products AS P
JOIN Sales AS S
  ON P.sku = S.sku
GROUP BY P.sku, P.product_description;

The table-valued function would look like:


CREATE FUNCTION dbo.GetTotalSales(@sku INT)
RETURNS TABLE
AS
RETURN(SELECT SUM(sale_amount) AS total_sales
       FROM Sales 
       WHERE sku = @sku);

Overuse of cursors

It’s generally a bad idea to use cursors. You can almost always do the same thing by writing a normal (i.e. set-based) query.

4 thoughts on “That Conference 2016 – Common TSQL Mistakes”

Dave Froslie

9 August, 2016 at 2:47 pm

Awesome set of tips, Sean. Thanks for sharing!

- Sean
  
  10 August, 2016 at 2:19 am
  
  Hey Dave, nice to hear from you! No credit goes to me for those tips–got to love Msft MVPs, though! I just post my notes here so I don’t forget where I put them. :O)
  
- erpenbeck2014
  
  11 August, 2016 at 10:26 pm
  
  Sean, I attended this one as well. Thanks again for the summary. I found Kevin’s original material from a tweet at: https://t.co/LVK0PnIcPW
  
  - Sean
    
    11 August, 2016 at 11:30 pm
    
    Thanks!

Sean’s Stuff

Learning new software development technologies out loud

That Conference 2016 – Common TSQL Mistakes

4 thoughts on “That Conference 2016 – Common TSQL Mistakes”

Leave a comment Cancel reply

Share this:

Related

4 thoughts on “That Conference 2016 – Common TSQL Mistakes”

Leave a comment Cancel reply