That Conference 2016, Kalahari Resort, Lake Delton, WI
Common TSQL Mistakes – Kevin Boles
Day 1, 8 Aug 2016
Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2016 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.
Executive Summary
- By fixing some of the most common mistakes made in SQL queries and stored procs, you can vastly improve the performance of your application
- Kevin sees the same mistakes over and over again
- In each case, Kevin presents the mistake and describes a proposed alternative
Miscellaneous Goodies
- tsqlt.org – Unit testing framework for SQL Server
- Always specify NOT NULL for a column, unless you truly expect to handle NULL values
- You want Index Seeks in query plans, not Index Scans. [SPS: Or, worse yet, Table Scans]
- Don’t do formatting in SQL Server, but do in middle tier or client instead
- Always SET NOCOUNT ON at the top of a stored procedure
- Tune to the 2nd execution, not the first
- Every trigger must be able to process n rows. The trigger is invoked only once when doing batch updates
- Catch-all query leads to poor performance (e.g. WHERE @a=a or @a is null)
- Don’t use table variables–no statistics on columns
NULLs and the NOT IN predicate
Problem arises when using NOT IN against list of values that contains NULL. In the example below, if there’s at least one Product with a NULL value for Color, the SELECT statement returns nothing.
SELECT C.color FROM Colors AS C WHERE C.color NOT IN (SELECT P.color FROM Products AS P);
The fix is to use NOT EXISTS instead of NOT IN.
SELECT C.color FROM Colors AS C WHERE NOT EXISTS(SELECT * FROM Products AS P WHERE C.color = P.color);
Functions on indexed columns in predicates
Using functions on indexed columns in a WHERE clause, the index is not used. I.e. This leads to an index scan, rather than a seek. In the example below, every row of the table is read.
SELECT customer_name FROM Customers WHERE LEFT(customer_name, 1) = 'L';
The fix is to not use functions on indexed columns in the predicate.
SELECT customer_name FROM Customers WHERE customer_name LIKE 'L%';
Incorrect subquery column
It’s easy to get mixed up when writing a subquery and use a column name from the main table, rather than from the table in the subquery. In the example below, the Sales table has a sale_date column and the Calendar table has a calendar_date column. The subquery uses the wrong column name.
SELECT sale_date, sale_amount FROM Sales AS S WHERE sale_date IN (SELECT sale_date FROM Calendar AS C WHERE holiday_name IS NOT NULL);
The fix is to use the correct column name.
SELECT sale_date, sale_amount FROM Sales AS S WHERE sale_date IN (SELECT C.calendar_date FROM Calendar AS C WHERE C.holiday_name IS NOT NULL);
Data type mismatch in predicates
When you have data types that don’t match between a parameter and a column used in a predicate, or between different columns when doing a join, SQL Server does an implicit conversion, which results in poor performance. Watch out for CONVERT_IMPLICIT in query plans.
For example, assume that last_name column in Customers table is VARCHAR(35), but you have a stored proc parameter of type NVARCHAR(35). In the example below, we get poor performance because SQL Server does an implicit conversion and the index is not used.
CREATE PROCEDURE GetCustomerByLastName @last_name NVARCHAR(35) AS SELECT first_name, last_name FROM Customers WHERE last_name = @last_name;
The fix is to make sure that the data type of the parameter matches the column’s data type.
Predicate evaluation order
You can’t rely on evaluation order in predicates. That is, you can’t assume that sub-expressions in an AND clause are evaluated left to right and try to rely on short-circuiting the expression.
In the example below, assume that the account_reference column can be either numeric or character-based (bad design) and that checking account_type checks for this. The query fails because there’s no guarantee that the check against account_type happens before the CAST.
SELECT account_nbr, account_reference AS account_ref_nbr FROM Accounts WHERE account_type LIKE 'Business%' AND CAST(account_reference AS INT) > 20;
A workaround for this particular example would be to use a CASE statement.
SELECT account_nbr, account_reference AS account_ref_nbr FROM Accounts WHERE account_type LIKE 'Business%' AND CASE WHEN account_reference NOT LIKE '%[^0-9]%' THEN CAST(account_reference AS INT) END > 20;
Outer joins and placement of predicates
When doing an outer join, the predicate (WHERE clause) is applied only after including the “outer” rows after applying the ON clause to do the join. This means that the WHERE clause may unexpectedly filter out rows that were intended to be included by virtue of doing the outer join.
In the example below, customers with no orders are not included in the final result, as intended. They get filtered out because they have an order_date of NULL after bringing the outer rows back into the query result.
SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009 FROM Customers AS C LEFT OUTER JOIN Orders AS O ON C.customer_nbr = O.customer_nbr WHERE O.order_date >= '20090101' GROUP BY C.customer_name;
The solution is to move the date check inside the ON clause.
SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009 FROM Customers AS C LEFT OUTER JOIN Orders AS O ON C.customer_nbr = O.customer_nbr AND O.order_date >= '20090101' GROUP BY C.customer_name;
Subqueries that return more than one value
Using a subquery to retrieve a value from a second table can backfire when data changes and the subquery no longer returns a single value, as expected. The query below will work only if the sku exists only once in ProductPlants.
SELECT sku, product_description, (SELECT plant_nbr FROM ProductPlants AS B WHERE B.sku = A.sku) AS plant_nbr FROM Products AS A;
The fix is to do this in a JOIN.
SELECT A.sku, A.product_description, B.plant_nbr FROM Products AS A JOIN ProductPlants AS B ON A.sku = B.sku;
Use of SELECT *
Do not use SELECT * in production code. Your use may make assumptions about the schema, e.g. inserting data into a second table. A query might then break if the schema changes.
You can also run into problems when using SELECT * in a view, when columns are later added or deleted. It’s a good idea to use SCHEMABINDING when creating views.
Scalar user-defined functions
Using a scalar user-defined function in a query can result in very poor performance, because the function can get applied for every single row of a target table.
As an example, assume you have the following function:
CREATE FUNCTION dbo.GetTotalSales(@sku INT) RETURNS DECIMAL(15, 2) AS BEGIN RETURN(SELECT SUM(sale_amount) FROM Sales WHERE sku = @sku); END
and you use the function as follows:
SELECT sku, product_description, dbo.GetTotalSales(sku) AS total_sales FROM Products;
This will result in very poor performance. The solution is to rewrite the function as a table-valued function or to rewrite the main query.
SELECT P.sku, P.product_description, SUM(S.sale_amount) As total_sales FROM Products AS P JOIN Sales AS S ON P.sku = S.sku GROUP BY P.sku, P.product_description;
The table-valued function would look like:
CREATE FUNCTION dbo.GetTotalSales(@sku INT) RETURNS TABLE AS RETURN(SELECT SUM(sale_amount) AS total_sales FROM Sales WHERE sku = @sku);
Overuse of cursors
It’s generally a bad idea to use cursors. You can almost always do the same thing by writing a normal (i.e. set-based) query.
Awesome set of tips, Sean. Thanks for sharing!
Hey Dave, nice to hear from you! No credit goes to me for those tips–got to love Msft MVPs, though! I just post my notes here so I don’t forget where I put them. :O)
Sean, I attended this one as well. Thanks again for the summary. I found Kevin’s original material from a tweet at: https://t.co/LVK0PnIcPW
Thanks!