That Conference 2018 – Know What Your Code Is Doing to SQL Server

hat Conference 2018, Kalahari Resort, Lake Delton, WI
Know What Your Code Is Doing to SQL Server – Kevin Boles

Day 1, 6 Aug 2018

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2018 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Kevin Boles
TheSQLGuru@gmail.com

Executive Summary

  • Handful of specific examples of ways that you can end up with poorly performing SQL Server queries
  • Improper use of Entity Framework sometimes to blame
  • Always pay attention to exact T-SQL that is being used

 

Miscellaneous

  • Recommend 192GB on server, go beyond max memory (128GB)
  • In memory OLTP now down in standard edition
  • SQL Server 2016–It Just Runs Faster
    • Good blog posts on performance improvements
  • Get on 2016 or 2017

 

Entity Framework is tool that simplifies, but lower performance

  • Can be very productive
  • Several traps, easy to fall into
  • Other ORMs have similar problem

 

Being too greedy with rows

  • EF exposes objects without knowing values
  • E.g. pull data out to list, when do Where clause on the list
    • Pulls entire table before filtering
  • Do filtering on SQL Server
    • Provided that you ave the proper filter
  • Or could bring back to IQueryable

 

N+1 Select Problem–minimize trips to DB

  • ANTS performance profiler
  • E.g. One query to parent entities, 2nd query for each child to get related entities
  • Round-trip for each child query
  • Lazy Loading
  • “N+1 select problem”
  • If you know you want child data ahead of time, do the original query to include it
  • Use Eager Loading
    • .Where, .Include
  • Make sure that you really need them
  • Don’t do aggregation in client–do it in SQL Server
  • E.g. if we’re just counting child objects, do it with aggregate function in SQL Server
  • Don’t iterate and don’t get data that you don’t need

 

Being too Greedy with Columns

  • E.g. pull all Pupils from school, then iterate to dump out certain stuff
  • Selecting all columns when you just need subset of columns
  • EF doesn’t know what columns you want
  • Causes two problems
    • More data than we need
    • Impossible to index, since we pull everything back
  • SELECT WITH NO LOCK
    • If acceptable, can be faster because you don’t wait for confurrent transactions
  • If you’re pulling all columns, you’re locking entire table for the length of that query
  • Select new to just get you want

 

Mismatched Data Types

  • If data types don’t match, even simple queries can perform poorly
  • E.g. search for entities with particular zip code
  • Dev system should match prodution system in terms of scale
  • Find query runs fast, EF runs very slowly
  • Query plan warning
    • CONVERT_IMPLICIT
    • Function on a column in a WHERE clause
  • Column in database is VARCHART, but .NET has string, which is UTF16 Unicode
  • SQL Server does
    • INDEX SCAN of every zip code
    • Big I/O cost
    • Then converts the data–CPU hit
    • Optimizer can’t do prediction because  data is different than what’s in the index
  • Solution
    • Edit model to tell EF to use VARCHAR, using column annotation
    • In code, attribute
  • Do not let EF create models, code-first

 

ORM ?

  • He recommends not using one

 

ADO.NET

  • 3 things to remember about stuff on internet about SQL Server
    • Who wrote it
    • When was it written
    • Does it apply to my system
  • ADO.NET Parameterized Queries
  • AddWithValue–get rid of this

 

Overly Generic Queries

  • Allowing searching on multiple fields
  • So we construct one query, where clauses for each search field
  • == null or matches
  • IS NULL OR stuff gets into query
  • Bad because it builds query plan just once for all the possible search combinations
  • Always runs same query plan, no matter what you search on
  • Can’t do a seek on index
  • Options
    • Stored procedure — Gail – “catch all queries”
    • Conditionals in EF side, exclude clauses
    • Make SQL Server recompile plans each time–from within EF. Sometimes good
    • Could write interceptor, option(recomiple)

 

Bloating the plan cache

  • Reuse of execution plans often a good thing, avoid regenerating plan
  • In order for a plan to be reused, the statement text must be textually identical, wihch as we just saw, is case for parameterized queries
    • Ad-hoc workloads

 

Skip / Take

  • OFFSET FETCH
  • Next time we run query, we get different query plan
  • Enable SQL Server setting ‘optimize for ad-hoc workloads ‘
    • Less aggressive at caching plans, generally a good thing
  • EF6, pass in lambdas
    • In SQL, values are paremetrized, so we don’t recreate cache plan

 

Inserting data

  • EF will run separate INSERT statements for every row being added
  • Not good if you have to insert a lot of data
  • Can use EF.BulkInsert or use EF 7 (has this out-of-the-box)

 

Code First, Performance Last

  • Never allow your code to create your storage constructs
  • (He means–code-first used to refresh database object)

 

Other Potential Issues

  • Improper use of IN clauses
    • >16 clauses in IN clause if bad
  • Correlated subqueries instead of JOINs
  • Parameterized queries with parameters defined to be length of data they contain
  • Chatty apps
  • Security conerns
  • Increased IO and app bloat due to asking for too mcuh data

 

That Conference 2017 – Continuous Database Deployment

That Conference 2017, Kalahari Resort, Lake Delton, WI
Continuous Database Deployment – Mike Acord

Day 2, 8 Aug 2017

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2017 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

  • Important to have database assets under source control
  • Several different strategies, each with pros/cons
  • Most likely approaches–schema-based or script-based


Continuous Deployments

  • We continuously deploy our code
    • Automated deployments
    • Integration tests
    • Build pipeline
  • But we’re not doing this with databases
    • More difficult
    • Harder to avoid losing data
    • More risky


Databases – Instead

  • Manual change scripts
  • Schema comparisons (SQL Compare)
  • Create change script, run against database
  • Problems
    • Error prone
    • Risky
    • Slow


What Is Continuous Database Deployment

  • Automated deployments
  • Integration tests
  • Build pipeline
  • Diagram
    • Database should be in source control
    • Trigger from changes in CM leads to Continuous Integration
    • Leads to Release Management
    • Staged releases–QA, Test


Benefits

  • Less error prone, more repeatable
  • Anybody can do this
  • Testable–deploying to different environments
    • At each phase, you’re re-testing the deployment script
  • Faster to release
  • Easier to merge branches
    • Work item has unique code and unique database changes
  • Easier to refresh environments
    • And easier to refresh to different environments
    • Stuff already in DEV, ready to go


Methods

  • Schema-based
    • Current schema is always in source control
    • Script in CM for every object in your database
  • Script-based
    • Change scripts checked in and applied
    • You just create a delta script
    • Never have full history
  • Code-based
    • Apply database changes from code
    • E.g. Migrations in .NET


Schema Based

  • Provides history of schema
  • Can generate change scripts to reverse changes
  • Less likely to screw up change scripts
  • Compare and sync schema to deploy changes
  • Can make to database and sync back to model
    • Can generate blank database at an point
  • Challenges
    • Merging can be challenging
    • Black voodoo magic leads to nasty change scripts
    • Deployment process is more complex
    • Forget to check in changes and lose them


SQL Source Control

  • Shows changes
  • Create migration scripts
  • Deals with situation of adding non-nullable columns


Visual Studio Database Project

  • SQLCompare utility to apply schema changes


Example

  • Octopus Deploy used to deploy changes
  • Creates Powershell scripts (or uses scripts)
    • Applying .dacpac files using SQL Package cmd line tools
  • NB: RedGate has plugins to SQL Source Control for various CI tools


Script Based

  • Create change scripts to apply
    • Script for every change
  • Simple to understand
  • Easy to implement
  • No black magic
  • Easy to test scripts
  • Challenges
    • May need to review change scripts
    • No ‘current’ database that you can deploy
      • Backup/restore to get copy of current


Example – DBUP + Octopus Deploy

  • Storing migration scripts in DB project
  • DBUP uses LOG in database to see what needs to be run; applies migration in order
  • Also then applies environment-specific scripts


Code Based

  • Quick setup for new developers
  • Allows for seed data
  • Simple to use
  • Can generate change scripts
  • Challenges
    • Testing changes can be awkward
      • Hard to test because migration only happens in the context of running your app
    • May feel unnatural
    • Rolling deployments more challenging
      • One app does the update, but other apps aren’t yet up to date


Example – Entity Framework Migrations


Database Refresh – Keep Data Current

  • Quickly refresh environments from production
  • Can be scheduled to happen daily
  • Allows better diagnosis of data issues
  • Daily testing of deployment scripts
    • By virtue of pushing between DEV / STAGE / PRODUCTIOn
  • Challenges
    • Clustered servers
    • Large databases
      • Can be painful, >1TB


Example – Database Refresh

  • Using Octopus Deploy


Best Practices

  • Create backups or snapshots before doing anything else
  • Avoid data loss
  • Automated integration tests
  • Deploy breaking changes in steps
  • Example (field rename)
    • Add new field with synchronization trigger
    • Modify application to use new column
    • Remove old column and synchronization trigger
  • Do this in stages when you can’t take time window to do everything at once
  • Lots of patterns for applying changes in non-breaking patterns like this
    • Agiledata.org/essays/renameColumn.html
    • Book – Refactoring Databases


Questions

  • Problem with database project – comparing two schemas, scripts that it generates just don’t work
    • Turn on setting to catch errors on project build

That Conference 2017 – Refactoring Monolith Database Stored Procedures

That Conference 2017, Kalahari Resort, Lake Delton, WI
Refactoring Monolith Database Stored Procedures – Riley Major (@RileyMajor)

Day 2, 8 Aug 2017

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2017 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

  • Re-factoring large stored procedures helps with testability, comprehensibility, performance
  • Main strategies–reorganize, factor out pure business logic into UDFs, do updates at the end
  • Testing strategy–use transactions to undo test calls to before/after versions of stored proc, store data to compare in table variables


Monowhat?

  • Evolves over time
  • Long
  • Does multiple things
  • Disorganized
  • Fragile
  • Untestable
  • Scary


Layers, Like an Onion

  • Programming layers
    • Presentation
    • Business Logic
    • Data Storage
  • Tiers
    • Client app (HTML)
    • Server (ASP.NET)
    • Database (SQL Server, NoSQL)
  • Best Practice
    • Presentation in client application
    • Business Logic in server


Oh Noes!

  • Monolith
    • Presentation in database
    • Business Logic in database
  • Bad
    • Database scaling is hard
    • Causes vendor lock-in
    • Database languages are primitive
  • But
    • Close to data, less overhead
    • Who really changes databases?
    • SQL more powerful than you think
      • Could be faster to make changes to multiple tables down in the database


Turtles All the Way Down

  • Separate layers even on the same tier
    • Browser: MVVM
    • Server: MVC
    • Database: TBD
  • Database layer
    • Presentation / Business Logic
      • IF / CASE / SET / SUM / DATEADD
      • If this logic is going to be here anyway, we should try to architect it
    • Data access
      • SELECT / UPDATE / INSERT / DELETE
  • Testability, Isolation, Portability
    • Reasons to structure bus logic in database


Make a plan

  • What are goals?
    • Better performance?
    • Maintenance?
    • Understandability?
    • Testability?
  • How will you know you’ve achieved the goals?
    • Speed benchmarks
    • Less repetition
    • Smaller sections of code
    • Actually having a testing suite


Survey the damage

  • Can’t avoid a thorough code review
  • Look for data modification
    • INSERT, UPDATE, DELETE
    • Note columns affected
  • Look for external effects
    • CLR
    • E-mail generation
    • SPS: Triggers
  • Look for transaction handling
    • BEGIN TRAN, COMMIT, ROLLBACK
    • Harder to re-factor if existing code uses transactions
    • Can have nested transactions
    • Rollback goes all the way up the stack


Don’t Break Anything

  • Build a development environment
    • Need to be able to play around
    • Need realistic data (volume and content)
    • Maybe not real data
  • Work in isolation
    • Were changes the result of you or somebody else?
    • You really need to isolate your changes
    • Slow because resources being used elsewhere?
  • How can you tell if you broke something?
    • Need to capture before and after state
      • Look across entire database potentially
    • Aim for deterministic process
    • Easy to know if you broke it if you know what it’s supposed to do


Deterministic

  • Function returns the same result, given the same inputs
  • Easy to test – send same values in before/after your changes
  • Things that break determinism
    • Random numbers
    • Time
    • Pulling data from database (underlying table’s contents can change)


Play It Again, Sam

  • Why we want determinism–so you can compare data before/after your changes


Good Luck with That

  • (Monolith stored proc) Likely not deterministic
  • Monoliths change state
  • Need to go back in time
  • Can use transactions to do this
    • Revert to previous state


Become a wrapper

  • To test impact of code changes, wrap your calls
    • Begin transaction
    • Run original code
    • Capture changed data
    • Rollback
    • Run new code
    • Capture changed data
  • Compare the 2 captured data sets


Oops

  • But need to save changes somewhere
  • Captured data is also rolled back
  • Need to preserve the changes that you captured, even during rollback
    • Could save to local storage
    • Could save to another database
    • Print results to database console


Build a Ghost House

  • How capture doomed data?
    • Outside SQL Server–hard
    • Another thread with NOLOCK–hard
  • What’s immune from transactions?
  • Variables
  • You can’t have a variable for every row
  • One big XML? Ouch
  • Table variables survive transactions
    • Written to disk, but not stored to database
  • They’re ghost houses


Spooky Playground – Create House

  • DECLARE @Orders TABLE (Field1 int, Field 2 int);
  • Could use tricks here–store checksums or hashes instead of actual data
  • Typically create one table variable for each DB table that will get changed
    • And set only the columns that you expect to change


Spooky Playground – Fill House

  • BEGIN TRAN; EXEC monolisth;
  • UPDATE @Orders SET x = x FROM Orders
  • ROLLBACK
  • BEGIN TRAN; EXEC monolith_New


Spooky Playground – Compare

  • SELECT * FROM @Orders WHERE colA_Before <> colA_After


Mock Your Black Boxes

  • Transactions only work on the database
  • External effects aren’t rolled back
  • Replace external calls with “mocks”
  • They look and act like external calls
  • But you control the guts
  • Return hard-coded sample data
  • Have the mock log its inputs
    • You’ll need to see what was sent, to make sure it would have done the same thing


Make your time!

  • Date/Time functions kill determinism
  • You have to control “now”
  • Otherwise no two runs could be the same
  • So make your own time
  • And send it in as a parameter
    • Feed the monolith the current date/time


Your Petard Should Hoist

  • Move variable DECLAREs to the top
  • Reveals duplication
  • Reveals common data sources
  • Displays breadth of data required
  • Caution: DECLARE assignment
    • Leave SET down below when you pull the DECLARE up


One SELECT to Rule them All

  • Gather scattered SELECT statements to top
  • Reveals duplication
  • Prepares for separation
  • Prepares for shorter transactions
  • Use single SELECT with fancy SQL, if practical


Measure Twice, Cut Once

  • Find INSERT / UPDATE / DELETE
  • Replace with variables SETs
    • Store what these statements were supposed to do
  • Move data modification to end of proc
    • Shrinks amount of time when transactions are open
  • Results in 3 main sections
    • Data gathering
    • Computation
    • Data modification


Cases of CASES

  • What’s left in middle? Logic
  • Lots of Ifs, SETs, and calculations
  • Pull it all together in one giant statement
  • Usually performs better
  • Can be clearer
  • Can reduce code
  • Prepares for separation
  • CASE, Derived Tables, and CTEs are your friends


Building Blocks

  • Still one procedure = still a monolith
  • Separate
    • Data Gathering – inline UDFs
    • Calculation – inline UDF
  • Allows data gathering re-use
  • Allows testing suite for business rules
  • Allows read-only monolith actions
    • Most important benefit
    • Can tell people what the business logic will do
    • Data in, data out
    • May want to use this function elsewhere


It’s All Better Now

  • Reformed monolith
    • Recently written
    • Short
    • Orchestrates multiple things
    • Repeated code eliminated
    • Organized into functions
    • Vials of reagents to mix – based on pieces
    • Problems isolated
    • Testable
    • Bening–not scary


Note on functions

  • Scalar user-defined functions in SQL Server perform much worse than inline table-valued user-defined functions
  • Treated like the engine just like a view
  • Especially bad performance when you use result of a scalar function in a WHERE clause
  • Other problems with multi-statement table-valued function


Demo – Walk through this process

  • Mock out stored proc that sends e-mail
    • Just store input data
  • To test encapsulation, run monolith twice, in transaction
    • You’ll see differences for non-deterministic stuff, e.g. dates
  • Need to look through the differences and resolve them
    • e.g. by feeding in date as procedure
    • Make copy of Monolith, but change only the date parameter business
    • If you now see now results, it’s now deterministic
  • Now start re-factoring, but create different versions of the monolith
    • e.g. Monolith_HoistVariables
    • Move variable DECLAREs up to top
  • Beware of false positives
    • You might see no differences, but that could be due to having no data in database that exposes a bug that we just created
    • SPS: Or that test data doesn’t cause execution of a path where a bug is fixed
  • CROSS APPLY as performance improvement
    • Create new record sets on the fly
    • This is performance efficient
  • Do one UPDATE at bottom to make all of our changes
  • Move multiple IF statements into single bit SELECT statement
    • Keep re-running the compare harness, continue seeing no changes after compare
  • Move hard-coded strings into function that returns table having named columns with constant values
    • No performance hit to do this
  • Pull many statements together into one big SELECT
    • Can then move into its own UDF
  • Make giant list of test data
    • Then use CROSS APPLY to pass this data into new UDF
    • Can then do regression testing
    • After future changes, you can say explicitly what’s going to break (change)

T-SQL – Where Is a Table Used?

Another “put it in my blog so I don’t lose it” T-SQL nugget.

Nicer than just a raw text-based search, a where-used search for a named table.

select distinct [Found In] = sp.Name, sp.type_desc
  from sys.objects o inner join sys.sql_expression_dependencies  sd on o.object_id = sd.referenced_id
                inner join sys.objects sp on sd.referencing_id = sp.object_id
  where (o.name = 'Cow') 
  order by sp.Name

Handy T-SQL Merge

Inserts records into target that aren’t in source, deletes records that are no longer in source.

MERGE CowTarget AS t
USING CowSource AS s
ON (t.TargetCowId = s.CowId)
WHEN NOT MATCHED BY TARGET THEN
    INSERT (TargetCowId, TargetName, TargetMotto)
	VALUES (CowId, Name, Motto)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

String Properties in ADO.NET Entity Data Model

(NOTE: This is Model First development in Entity Framework–start with a data model in the Entity Framework Designer and then generate database schema from the model).

When you add a scalar property of type string to an entity data model, you can fill in the values of several other properties within Visual Studio.

  • Fixed Length – (None), True, False
  • Max Length – (None), value, Max
  • Unicode – (None), True, False

stringprop

What does a value of (None) mean for these properties (when generating SQL Server Database from model)?

  • Fixed Length – (None) equivalent to False
  • Max Length – (None) equivalent to Max
  • Unicode – (None) equivalent to True

Here’s how choices for these properties map to SQL Server data types:

  • Fixed=True, Max=20 – nchar(20)
  • Fixed=(None), Max=20 – nvarchar(20)
  • Fixed=False, Max=20 – nvarchar(20)
  • Fixed=(None), Max=(None) – nvarchar(max)
  • Fixed=True, Max=(None) – nchar(4000)
  • Fixed=False, Max=(None) – nvarchar(max)
  • Fixed=(None), Max=Max – nvarchar(max)
  • Fixed=True, Max=Max – nchar(4000)
  • Fixed=False, Max=Max – nvarchar(max)
  • Fixed=(None), Max=20, Unicode=True – nvarchar(20)
  • Fixed=(None), Max=20, Unicode=False – varchar(20)

This means that if you just pick String as the type and set nothing else, you’ll get nvarchar(max) as the SQL Server type.

NOTE: The value of 4,000 is being used because the maximum length of a fixed length string in SQL Server is 4,000 characters (8,000 bytes when encoded as UTF-16).

 

Count Total Rows while Paging and Avoid Two Queries

This is well covered in many other places, but I’m posting on my own blog so that I can more easily find this trick.

You can do paging in SQL Server 2012 and later using the OFFSET-FETCH syntax. This is useful for returning a single page’s worth of records from a larger (sorted) set of records. To count the total number of records, however, you’d normally do a second query. The T-SQL below shows how to do a total count as part of the query that returns the page in question. The count is repeated for every record, but it avoids having to do two queries.

select FirstName, LastName, COUNT(*) over () as TotalCount
  from Person
  where LastName like 'mc%'
  order by LastName, FirstName
  offset 500 rows
  fetch next 100 rows only

Here’s what the output looks like, returning records 501-600 out of 1,968 records having a last name that starts with “Mc”.

161109-1

SQL Server -Dump Info on Keys

Here’s a short stored procedure that could be useful when trying to understand a SQL Server database.

Let’s say that you have a simple schema like the one below. Person has GroupID as a foreign key (Group.GroupID), meaning that a person belongs to a single group. A person can also have multiple contact records, so PersonContact has a foreign key indicating which person the contact is for.

Keys

In a simple schema like this, you can quickly see the two relationships. You can also see some of this by looking at the Person table in Management Studio, under Columns and Keys. In the diagram below, we see that GroupID is a foreign key and, based on the naming convention, we infer that the corresponding primary key is in the Group table.

MgmtStudio

This notation relies on the naming convention used for the FK_Person_Group relationship. Also, while looking at the Person table, we’re unable to see which tables might contain foreign keys that refer to the primary key in the Person table.

Below is a simple stored procedure that dumps out all relationships that a specified table participates in. That is, given a table, it tells you:

  • All foreign keys that reference the specified table’s primary key
  • All foreign keys in the specified table and where the corresponding primary key is located
-- Given specified table, show all PK/FK relationships.  Show:
--   1) All foreign keys that reference this table's primary key
--   2) All foreign keys in this table and which table contains primary key
create procedure uKeys
(
	@TableName varchar(255)
)
as
begin
select tfk.name as PKTable, cfk.name as PKColumn, tpk.name as FKTable, cpk.name as FKColumn from sys.foreign_key_columns fk
  inner join sys.tables as tpk on fk.parent_object_id=tpk.object_id
  inner join sys.columns as cpk on fk.parent_object_id=cpk.object_id and fk.parent_column_id=cpk.column_id
  inner join sys.tables as tfk on fk.referenced_object_id=tfk.object_id
  inner join sys.columns as cfk on fk.referenced_object_id=cfk.object_id and fk.referenced_column_id=cfk.column_id
  where (tpk.name = @TableName) or (tfk.name = @TableName)
  order by PKTable, PKColumn, FKTable, FKColumn
end;

With this stored proc in place, you can now do the following in a query window, to ask about the keys for the Person table.

ukeys 'Person'

You’ll get some nice output that shows information about all of the relationships for the Person table.

Output

This is a simple example, but this procedure is considerably more useful when you’re dealing with very large databases, where you have a large number of tables and it’s not easy to see all of the relationships for a particular table.

Addendum: There’s already a built-in stored procedure in SQL Server that does basically the same thing (though perhaps in not as clean a format). sp_helpconstraint will dump out the same sort of information, as shown below.

sphelpconstraint

That Conference 2016 – What’s that Smell? It’s your Entity Framework!

That Conference 2016, Kalahari Resort, Lake Delton, WI
What’s that Smell? It’s your Entity Framework! – Russ Thomas

Day 2, 9 Aug 2016

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2016 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

  • Entity Framework often gets a bad rap when it appears to generate very poor T-SQL
  • You can generate far better T-SQL by paying attention to how you construct queries in LINQ
  • This talk describes several “code smells” that lead to poorly performing T-SQL and that can easily be improved
  • [Sean: Most of this also applies to LINQ to SQL]

All communication with SQL Server is via T-SQL

  • T-SQL is the only way to talk to the server
  • From T-SQL to database storage
  • Goes through Optimization (indexes, etc.)
  • There’s no such thing as a technology that bypasses T-SQL
  • Entity Framework adds 4th piece
    • EF -> T-SQL -> Optimization -> Database Storage
  • EF makes things easier because it quickly scaffolds T-SQL
  • Goal today–work towards higher performance in EF

Code smell #1 – too many columns

  • EF can bring back too many columns from database
  • Demo
    • Retrieve data from People table, do something with LastName, Firstname
    • Using simple LINQ statement, just do select p (e.g. from person table)
    • Then foreach on results, pull off LastName, FirstName
  • Look at what EF sends
    • Start up Profiler (yes, should use Extended Events)
    • Reminder: never do the trace in production
    • If writing this query manually in T-SQL, we’d do simple query
    • Look at trace from EF version of this
    • Query is selecting large list of columns–far more than we’re using
    • Can also look at Reads metric in Profiler
  • Recommendation
    • Specify exactly the columns that you want
    • Use Projection
    • instead of: select p,
    • use: select new {p.LastName, p.FirstName}
    • Goes from 544 reads to 4 reads
    • Query now looks like what we’d write ourselves

Code smell #2 – Non Set Based Work

  • Demo–Retrieving e-mails
  • Write a couple of loops, loop through People p, then loop through p.EmailAddresses (implicit join)
  • LINQ query is still returning entire People table
  • And then code is iterating through everything to get e-mail addresses
  • This results in a new T-SQL statement for each pass through the loop
  • Horrible performance
  • Don’t retrieve data procedurally
  • Relational DB works better if you retrieve using set-based query
  • Solution
    • In T-SQL, you’d just do a join
    • In LINQ
      • from people in db.People.Include(“EMailAddresses”)
      • Tells EF that we’ll be coming back later for e-mail address
    • Even better way than .Include
      • Do join in LINQ
      • Then only a small iteration on final results

Code first discussion and demo

  • Most people prefer model-first, but code-first is slick
  • Can write code and then generate DB from code
  • Discussion
    • By default, it set column type to nvarchar(max) for string
    • And decimal(18,2) for double (e.g. a salary field)
  • Conclusion
    • IF you’re going to do code-first, go further than just column names and types
    • SQL Server does care about max string length, makes educated guesses on storage
  • Better way
    • Override OnModelCreating in DbContext, use fluent syntax on modelBuilder to set stuff up
    • Set various attributes for each column, e.g. maximum
    • Re-gen model, can see better attributes on columns
  • Thoughts
    • Code-first, often starts as simple hobby project
    • But you really do need to set proper attributes ahead of time

How to see what EF is doing under-the-covers

  • Profiler
    • Don’t run on production server
  • So how do we see what’s going on in EF without using profiler?
    • SqlLogger class, IDBCommandInterceptor
    • Requires EF 6
    • Can write out to text file, capturing TSQL statements
    • You implement different methods to capture different kinds of things
    • Can do xxxExecuting or xxxExecuted
    • Then: DbInterception.Add(new MyCommandInterceptor());
  • Stuff is intercepted client-side
    • So doesn’t impact performance on server
  • [Sean: Can also set context.Database.Log property]
  • Other stuff you can do with interceptors
    • Intercept and change statement, e.g. “soft deletes” where they don’t actually delete but just set IsDeleted column
    • No trigger !

Other comments

  • There are other optimizations you can do that have nothing to do with SQL Server

Wrapup

  • Will put zipped up demo on sqljudo.com

That Conference 2016 – Tell SQL Server Profiler to Take a Hike

That Conference 2016, Kalahari Resort, Lake Delton, WI
Tell SQL Server Profiler to Take a Hike – Jes Borland

Day 1, 8 Aug 2016

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2016 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

  • SQL Server Profiler has been deprecated
  • We should start using Extended Events (XE)

1998

  • SQL Server 7.0, introduced trace
  • Profiler was GUI for trace

Profiler

  • 2012–deprecated
  • Replaced by Extended Events

Extended Events

  • Started out bad, but much more usable now

Why use

  • More thorough
  • Less resource intensive
    • Profiler hits system hard, can bring server to its knees
  • Covers new features
    • Can no longer use profiler for some features
    • Same # events in profiler forever; but always adding XE
  • Azure
    • Can’t use Profiler, but can use XE

Where data is gathered

  • Post-processing vs. Pre-processing
  • Start session, Gather ino, Output
  • Profiler
    • Filters after gathering information
  • XE
    • Stuff filtered before going into buffer

Common Profiler uses

  • Main uses
    • Capture queries executed, their performance
    • Capture deadlocks
    • Capture errors and warnings
  • Can use Extended Events for all the same things

Capturing all queries with Extended Events

  • Management | Extended Events
  • Sessions–already running sessions
  • system_health always running
  • Rt-click: New Session… (don’t use wizard)
  • Can start when server starts
  • Events
    • degree_of_parallelism – help capture every event on an object
    • sp_statement_completed, sql_statement_completed,
  • Must click Configure to pick database
  • Events have fields (properties)
    • e.g. cpu_time
    • Clicking on event, you see properties
  • Global fields
    • Properties that all events have in common
    • Common: client_app_name, database_name, query_hash, sql_text
  • Filter / Predicate
    • Select all and then filter applies to everything
    • e.g. database_name
    • duration > 5 sec
  • Targets
    • ring_buffer
      • Uses memory
      • Set # items
    • event_file
      • Saves to directory on server

Viewing data

  • New guy under sessions
  • Rt-click | Start
  • Expand, see targets
  • Rt-click | View Target Data
  • Choose Columns
    • Pick what you want to see
  • Sort
    • Don’t do on >1,000 rows
  • Can view data
    • Export to XEL (XML), CSV

Can script out the session

  • Readable text that looks similar to TSQL

XEL file

  • One copy per session, per run

sa permissions?

  • Don’t need
  • Profiler: Need ALTER TRACE
  • XE 20012+: ALTER ANY EVENT SESSION
  • XE is a bit more secure–can capture events but not necessarily read

Start/stop at particular time

  • Use SQL Server Agent

Existing traces

  • Can convert to XE sessions
  • Can find equivalents
  • Mapping

That Conference 2016 – Common TSQL Mistakes

That Conference 2016, Kalahari Resort, Lake Delton, WI
Common TSQL Mistakes – Kevin Boles

Day 1, 8 Aug 2016

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2016 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

  • By fixing some of the most common mistakes made in SQL queries and stored procs, you can vastly improve the performance of your application
  • Kevin sees the same mistakes over and over again
  • In each case, Kevin presents the mistake and describes a proposed alternative

Miscellaneous Goodies

  • tsqlt.org  – Unit testing framework for SQL Server
  • Always specify NOT NULL for a column, unless you truly expect to handle NULL values
  • You want Index Seeks in query plans, not Index Scans. [SPS: Or, worse yet, Table Scans]
  • Don’t do formatting in SQL Server, but do in middle tier or client instead
  • Always SET NOCOUNT ON at the top of a stored procedure
  • Tune to the 2nd execution, not the first
  • Every trigger must be able to process n rows. The trigger is invoked only once when doing batch updates
  • Catch-all query leads to poor performance (e.g. WHERE @a=a or @a is null)
  • Don’t use table variables–no statistics on columns

NULLs and the NOT IN predicate

Problem arises when using NOT IN against list of values that contains NULL. In the example below, if there’s at least one Product with a NULL value for Color, the SELECT statement returns nothing.

SELECT C.color
FROM Colors AS C
WHERE C.color NOT IN (SELECT P.color
                      FROM Products AS P);

The fix is to use NOT EXISTS instead of NOT IN.

SELECT C.color
FROM Colors AS C
WHERE NOT EXISTS(SELECT *
                 FROM Products AS P
                 WHERE C.color = P.color);

Functions on indexed columns in predicates

Using functions on indexed columns in a WHERE clause, the index is not used. I.e. This leads to an index scan, rather than a seek. In the example below, every row of the table is read.

SELECT customer_name
FROM Customers
WHERE LEFT(customer_name, 1) = 'L';

The fix is to not use functions on indexed columns in the predicate.

SELECT customer_name
FROM Customers
WHERE customer_name LIKE 'L%';

Incorrect subquery column

It’s easy to get mixed up when writing a subquery and use a column name from the main table, rather than from the table in the subquery. In the example below, the Sales table has a sale_date column and the Calendar table has a calendar_date column. The subquery uses the wrong column name.

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT sale_date 
                    FROM Calendar AS C
                    WHERE holiday_name IS NOT NULL);

The fix is to use the correct column name.

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT C.calendar_date 
                    FROM Calendar AS C
                    WHERE C.holiday_name IS NOT NULL);

Data type mismatch in predicates

When you have data types that don’t match between a parameter and a column used in a predicate, or between different columns when doing a join, SQL Server does an implicit conversion, which results in poor performance. Watch out for CONVERT_IMPLICIT in query plans.

For example, assume that last_name column in Customers table is VARCHAR(35), but you have a stored proc parameter of type NVARCHAR(35). In the example below, we get poor performance because SQL Server does an implicit conversion and the index is not used.

CREATE PROCEDURE GetCustomerByLastName
 @last_name NVARCHAR(35)
AS
 SELECT first_name, last_name
 FROM Customers
 WHERE last_name = @last_name;

The fix is to make sure that the data type of the parameter matches the column’s data type.

Predicate evaluation order

You can’t rely on evaluation order in predicates. That is, you can’t assume that sub-expressions in an AND clause are evaluated left to right and try to rely on short-circuiting the expression.

In the example below, assume that the account_reference column can be either numeric or character-based (bad design) and that checking account_type checks for this. The query fails because there’s no guarantee that the check against account_type happens before the CAST.

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
  AND CAST(account_reference AS INT) > 20;

A workaround for this particular example would be to use a CASE statement.

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
  AND CASE WHEN account_reference NOT LIKE '%[^0-9]%' 
           THEN CAST(account_reference AS INT)
      END > 20;

Outer joins and placement of predicates

When doing an outer join, the predicate (WHERE clause) is applied only after including the “outer” rows after applying the ON clause to do the join. This means that the WHERE clause may unexpectedly filter out rows that were intended to be included by virtue of doing the outer join.

In the example below, customers with no orders are not included in the final result, as intended. They get filtered out because they have an order_date of NULL after bringing the outer rows back into the query result.

SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
  ON C.customer_nbr = O.customer_nbr
WHERE O.order_date >= '20090101'
GROUP BY C.customer_name;

The solution is to move the date check inside the ON clause.

SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
  ON C.customer_nbr = O.customer_nbr
 AND O.order_date >= '20090101'
GROUP BY C.customer_name;

Subqueries that return more than one value

Using a subquery to retrieve a value from a second table can backfire when data changes and the subquery no longer returns a single value, as expected. The query below will work only if the sku exists only once in ProductPlants.

SELECT sku, product_description,
      (SELECT plant_nbr
       FROM ProductPlants AS B
       WHERE B.sku = A.sku) AS plant_nbr
FROM Products AS A;

The fix is to do this in a JOIN.

SELECT A.sku, A.product_description, B.plant_nbr
FROM Products AS A
JOIN ProductPlants AS B
  ON A.sku = B.sku;

Use of SELECT *

Do not use SELECT * in production code. Your use may make assumptions about the schema, e.g. inserting data into a second table. A query might then break if the schema changes.

You can also run into problems when using SELECT * in a view, when columns are later added or deleted. It’s a good idea to use SCHEMABINDING when creating views.

Scalar user-defined functions

Using a scalar user-defined function in a query can result in very poor performance, because the function can get applied for every single row of a target table.

As an example, assume you have the following function:


CREATE FUNCTION dbo.GetTotalSales(@sku INT)
RETURNS DECIMAL(15, 2)
AS
BEGIN
  RETURN(SELECT SUM(sale_amount)
         FROM Sales 
         WHERE sku = @sku);
END

and you use the function as follows:

SELECT sku, product_description, dbo.GetTotalSales(sku) AS total_sales
FROM Products;

This will result in very poor performance. The solution is to rewrite the function as a table-valued function or to rewrite the main query.

SELECT P.sku, P.product_description, SUM(S.sale_amount) As total_sales
FROM Products AS P
JOIN Sales AS S
  ON P.sku = S.sku
GROUP BY P.sku, P.product_description;

The table-valued function would look like:


CREATE FUNCTION dbo.GetTotalSales(@sku INT)
RETURNS TABLE
AS
RETURN(SELECT SUM(sale_amount) AS total_sales
       FROM Sales 
       WHERE sku = @sku);

Overuse of cursors

It’s generally a bad idea to use cursors. You can almost always do the same thing by writing a normal (i.e. set-based) query.