That Conference 2018 – Know What Your Code Is Doing to SQL Server

Posted by Sean

hat Conference 2018, Kalahari Resort, Lake Delton, WI
Know What Your Code Is Doing to SQL Server – Kevin Boles

Day 1, 6 Aug 2018

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2018 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Kevin Boles
TheSQLGuru@gmail.com

Executive Summary

Handful of specific examples of ways that you can end up with poorly performing SQL Server queries
Improper use of Entity Framework sometimes to blame
Always pay attention to exact T-SQL that is being used

Miscellaneous

Recommend 192GB on server, go beyond max memory (128GB)
In memory OLTP now down in standard edition
SQL Server 2016–It Just Runs Faster
- Good blog posts on performance improvements
Get on 2016 or 2017

Entity Framework is tool that simplifies, but lower performance

Can be very productive
Several traps, easy to fall into
Other ORMs have similar problem

Being too greedy with rows

EF exposes objects without knowing values
E.g. pull data out to list, when do Where clause on the list
- Pulls entire table before filtering
Do filtering on SQL Server
- Provided that you ave the proper filter
Or could bring back to IQueryable

N+1 Select Problem–minimize trips to DB

ANTS performance profiler
E.g. One query to parent entities, 2nd query for each child to get related entities
Round-trip for each child query
Lazy Loading
“N+1 select problem”
If you know you want child data ahead of time, do the original query to include it
Use Eager Loading
- .Where, .Include
Make sure that you really need them
Don’t do aggregation in client–do it in SQL Server
E.g. if we’re just counting child objects, do it with aggregate function in SQL Server
Don’t iterate and don’t get data that you don’t need

Being too Greedy with Columns

E.g. pull all Pupils from school, then iterate to dump out certain stuff
Selecting all columns when you just need subset of columns
EF doesn’t know what columns you want
Causes two problems
- More data than we need
- Impossible to index, since we pull everything back
SELECT WITH NO LOCK
- If acceptable, can be faster because you don’t wait for confurrent transactions
If you’re pulling all columns, you’re locking entire table for the length of that query
Select new to just get you want

Mismatched Data Types

If data types don’t match, even simple queries can perform poorly
E.g. search for entities with particular zip code
Dev system should match prodution system in terms of scale
Find query runs fast, EF runs very slowly
Query plan warning
- CONVERT_IMPLICIT
- Function on a column in a WHERE clause
Column in database is VARCHART, but .NET has string, which is UTF16 Unicode
SQL Server does
- INDEX SCAN of every zip code
- Big I/O cost
- Then converts the data–CPU hit
- Optimizer can’t do prediction because data is different than what’s in the index
Solution
- Edit model to tell EF to use VARCHAR, using column annotation
- In code, attribute
Do not let EF create models, code-first

ORM ?

He recommends not using one

ADO.NET

3 things to remember about stuff on internet about SQL Server
- Who wrote it
- When was it written
- Does it apply to my system
ADO.NET Parameterized Queries
AddWithValue–get rid of this

Overly Generic Queries

Allowing searching on multiple fields
So we construct one query, where clauses for each search field
== null or matches
IS NULL OR stuff gets into query
Bad because it builds query plan just once for all the possible search combinations
Always runs same query plan, no matter what you search on
Can’t do a seek on index
Options
- Stored procedure — Gail – “catch all queries”
- Conditionals in EF side, exclude clauses
- Make SQL Server recompile plans each time–from within EF. Sometimes good
- Could write interceptor, option(recomiple)

Bloating the plan cache

Reuse of execution plans often a good thing, avoid regenerating plan
In order for a plan to be reused, the statement text must be textually identical, wihch as we just saw, is case for parameterized queries
- Ad-hoc workloads

Skip / Take

OFFSET FETCH
Next time we run query, we get different query plan
Enable SQL Server setting ‘optimize for ad-hoc workloads ‘
- Less aggressive at caching plans, generally a good thing
EF6, pass in lambdas
- In SQL, values are paremetrized, so we don’t recreate cache plan

Inserting data

EF will run separate INSERT statements for every row being added
Not good if you have to insert a lot of data
Can use EF.BulkInsert or use EF 7 (has this out-of-the-box)

Code First, Performance Last

Never allow your code to create your storage constructs
(He means–code-first used to refresh database object)

Other Potential Issues

Improper use of IN clauses
- >16 clauses in IN clause if bad
Correlated subqueries instead of JOINs
Parameterized queries with parameters defined to be length of data they contain
Chatty apps
Security conerns
Increased IO and app bloat due to asking for too mcuh data

That Conference 2017 – Continuous Database Deployment

Posted by Sean

That Conference 2017, Kalahari Resort, Lake Delton, WI
Continuous Database Deployment – Mike Acord

Day 2, 8 Aug 2017

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2017 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

Important to have database assets under source control
Several different strategies, each with pros/cons
Most likely approaches–schema-based or script-based

Continuous Deployments

We continuously deploy our code
- Automated deployments
- Integration tests
- Build pipeline
But we’re not doing this with databases
- More difficult
- Harder to avoid losing data
- More risky

Databases – Instead

Manual change scripts
Schema comparisons (SQL Compare)
Create change script, run against database
Problems
- Error prone
- Risky
- Slow

What Is Continuous Database Deployment

Automated deployments
Integration tests
Build pipeline
Diagram
- Database should be in source control
- Trigger from changes in CM leads to Continuous Integration
- Leads to Release Management
- Staged releases–QA, Test

Benefits

Less error prone, more repeatable
Anybody can do this
Testable–deploying to different environments
- At each phase, you’re re-testing the deployment script
Faster to release
Easier to merge branches
- Work item has unique code and unique database changes
Easier to refresh environments
- And easier to refresh to different environments
- Stuff already in DEV, ready to go

Methods

Schema-based
- Current schema is always in source control
- Script in CM for every object in your database
Script-based
- Change scripts checked in and applied
- You just create a delta script
- Never have full history
Code-based
- Apply database changes from code
- E.g. Migrations in .NET

Schema Based

Provides history of schema
Can generate change scripts to reverse changes
Less likely to screw up change scripts
Compare and sync schema to deploy changes
Can make to database and sync back to model
- Can generate blank database at an point
Challenges
- Merging can be challenging
- Black voodoo magic leads to nasty change scripts
- Deployment process is more complex
- Forget to check in changes and lose them

SQL Source Control

Shows changes
Create migration scripts
Deals with situation of adding non-nullable columns

Visual Studio Database Project

SQLCompare utility to apply schema changes

Example

Octopus Deploy used to deploy changes
Creates Powershell scripts (or uses scripts)
- Applying .dacpac files using SQL Package cmd line tools
NB: RedGate has plugins to SQL Source Control for various CI tools

Script Based

Create change scripts to apply
- Script for every change
Simple to understand
Easy to implement
No black magic
Easy to test scripts
Challenges
- May need to review change scripts
- No ‘current’ database that you can deploy
  - Backup/restore to get copy of current

Example – DBUP + Octopus Deploy

Storing migration scripts in DB project
DBUP uses LOG in database to see what needs to be run; applies migration in order
Also then applies environment-specific scripts

Code Based

Quick setup for new developers
Allows for seed data
Simple to use
Can generate change scripts
Challenges
- Testing changes can be awkward
  - Hard to test because migration only happens in the context of running your app
- May feel unnatural
- Rolling deployments more challenging
  - One app does the update, but other apps aren’t yet up to date

Example – Entity Framework Migrations

Database Refresh – Keep Data Current

Quickly refresh environments from production
Can be scheduled to happen daily
Allows better diagnosis of data issues
Daily testing of deployment scripts
- By virtue of pushing between DEV / STAGE / PRODUCTIOn
Challenges
- Clustered servers
- Large databases
  - Can be painful, >1TB

Example – Database Refresh

Using Octopus Deploy

Best Practices

Create backups or snapshots before doing anything else
Avoid data loss
Automated integration tests
Deploy breaking changes in steps
Example (field rename)
- Add new field with synchronization trigger
- Modify application to use new column
- Remove old column and synchronization trigger
Do this in stages when you can’t take time window to do everything at once
Lots of patterns for applying changes in non-breaking patterns like this
- Agiledata.org/essays/renameColumn.html
- Book – Refactoring Databases

Questions

Problem with database project – comparing two schemas, scripts that it generates just don’t work
- Turn on setting to catch errors on project build

That Conference 2017 – Refactoring Monolith Database Stored Procedures

Posted by Sean

That Conference 2017, Kalahari Resort, Lake Delton, WI
Refactoring Monolith Database Stored Procedures – Riley Major (@RileyMajor)

Day 2, 8 Aug 2017

Executive Summary

Re-factoring large stored procedures helps with testability, comprehensibility, performance
Main strategies–reorganize, factor out pure business logic into UDFs, do updates at the end
Testing strategy–use transactions to undo test calls to before/after versions of stored proc, store data to compare in table variables

Monowhat?

Evolves over time
Long
Does multiple things
Disorganized
Fragile
Untestable
Scary

Layers, Like an Onion

Programming layers
- Presentation
- Business Logic
- Data Storage
Tiers
- Client app (HTML)
- Server (ASP.NET)
- Database (SQL Server, NoSQL)
Best Practice
- Presentation in client application
- Business Logic in server

Oh Noes!

Monolith
- Presentation in database
- Business Logic in database
Bad
- Database scaling is hard
- Causes vendor lock-in
- Database languages are primitive
But
- Close to data, less overhead
- Who really changes databases?
- SQL more powerful than you think
  - Could be faster to make changes to multiple tables down in the database

Turtles All the Way Down

Separate layers even on the same tier
- Browser: MVVM
- Server: MVC
- Database: TBD
Database layer
- Presentation / Business Logic
  - IF / CASE / SET / SUM / DATEADD
  - If this logic is going to be here anyway, we should try to architect it
- Data access
  - SELECT / UPDATE / INSERT / DELETE
Testability, Isolation, Portability
- Reasons to structure bus logic in database

Make a plan

What are goals?
- Better performance?
- Maintenance?
- Understandability?
- Testability?
How will you know you’ve achieved the goals?
- Speed benchmarks
- Less repetition
- Smaller sections of code
- Actually having a testing suite

Survey the damage

Can’t avoid a thorough code review
Look for data modification
- INSERT, UPDATE, DELETE
- Note columns affected
Look for external effects
- CLR
- E-mail generation
- SPS: Triggers
Look for transaction handling
- BEGIN TRAN, COMMIT, ROLLBACK
- Harder to re-factor if existing code uses transactions
- Can have nested transactions
- Rollback goes all the way up the stack

Don’t Break Anything

Build a development environment
- Need to be able to play around
- Need realistic data (volume and content)
- Maybe not real data
Work in isolation
- Were changes the result of you or somebody else?
- You really need to isolate your changes
- Slow because resources being used elsewhere?
How can you tell if you broke something?
- Need to capture before and after state
  - Look across entire database potentially
- Aim for deterministic process
- Easy to know if you broke it if you know what it’s supposed to do

Deterministic

Function returns the same result, given the same inputs
Easy to test – send same values in before/after your changes
Things that break determinism
- Random numbers
- Time
- Pulling data from database (underlying table’s contents can change)

Play It Again, Sam

Why we want determinism–so you can compare data before/after your changes

Good Luck with That

(Monolith stored proc) Likely not deterministic
Monoliths change state
Need to go back in time
Can use transactions to do this
- Revert to previous state

Become a wrapper

To test impact of code changes, wrap your calls
- Begin transaction
- Run original code
- Capture changed data
- Rollback
- Run new code
- Capture changed data
Compare the 2 captured data sets

Oops

But need to save changes somewhere
Captured data is also rolled back
Need to preserve the changes that you captured, even during rollback
- Could save to local storage
- Could save to another database
- Print results to database console

Build a Ghost House

How capture doomed data?
- Outside SQL Server–hard
- Another thread with NOLOCK–hard
What’s immune from transactions?
Variables
You can’t have a variable for every row
One big XML? Ouch
Table variables survive transactions
- Written to disk, but not stored to database
They’re ghost houses

Spooky Playground – Create House

DECLARE @Orders TABLE (Field1 int, Field 2 int);
Could use tricks here–store checksums or hashes instead of actual data
Typically create one table variable for each DB table that will get changed
- And set only the columns that you expect to change

Spooky Playground – Fill House

BEGIN TRAN; EXEC monolisth;
UPDATE @Orders SET x = x FROM Orders
ROLLBACK
BEGIN TRAN; EXEC monolith_New

Spooky Playground – Compare

SELECT * FROM @Orders WHERE colA_Before <> colA_After

Mock Your Black Boxes

Transactions only work on the database
External effects aren’t rolled back
Replace external calls with “mocks”
They look and act like external calls
But you control the guts
Return hard-coded sample data
Have the mock log its inputs
- You’ll need to see what was sent, to make sure it would have done the same thing

Make your time!

Date/Time functions kill determinism
You have to control “now”
Otherwise no two runs could be the same
So make your own time
And send it in as a parameter
- Feed the monolith the current date/time

Your Petard Should Hoist

Move variable DECLAREs to the top
Reveals duplication
Reveals common data sources
Displays breadth of data required
Caution: DECLARE assignment
- Leave SET down below when you pull the DECLARE up

One SELECT to Rule them All

Gather scattered SELECT statements to top
Reveals duplication
Prepares for separation
Prepares for shorter transactions
Use single SELECT with fancy SQL, if practical

Measure Twice, Cut Once

Find INSERT / UPDATE / DELETE
Replace with variables SETs
- Store what these statements were supposed to do
Move data modification to end of proc
- Shrinks amount of time when transactions are open
Results in 3 main sections
- Data gathering
- Computation
- Data modification

Cases of CASES

What’s left in middle? Logic
Lots of Ifs, SETs, and calculations
Pull it all together in one giant statement
Usually performs better
Can be clearer
Can reduce code
Prepares for separation
CASE, Derived Tables, and CTEs are your friends

Building Blocks

Still one procedure = still a monolith
Separate
- Data Gathering – inline UDFs
- Calculation – inline UDF
Allows data gathering re-use
Allows testing suite for business rules
Allows read-only monolith actions
- Most important benefit
- Can tell people what the business logic will do
- Data in, data out
- May want to use this function elsewhere

It’s All Better Now

Reformed monolith
- Recently written
- Short
- Orchestrates multiple things
- Repeated code eliminated
- Organized into functions
- Vials of reagents to mix – based on pieces
- Problems isolated
- Testable
- Bening–not scary

Note on functions

Scalar user-defined functions in SQL Server perform much worse than inline table-valued user-defined functions
Treated like the engine just like a view
Especially bad performance when you use result of a scalar function in a WHERE clause
Other problems with multi-statement table-valued function

Demo – Walk through this process

Mock out stored proc that sends e-mail
- Just store input data
To test encapsulation, run monolith twice, in transaction
- You’ll see differences for non-deterministic stuff, e.g. dates
Need to look through the differences and resolve them
- e.g. by feeding in date as procedure
- Make copy of Monolith, but change only the date parameter business
- If you now see now results, it’s now deterministic
Now start re-factoring, but create different versions of the monolith
- e.g. Monolith_HoistVariables
- Move variable DECLAREs up to top
Beware of false positives
- You might see no differences, but that could be due to having no data in database that exposes a bug that we just created
- SPS: Or that test data doesn’t cause execution of a path where a bug is fixed
CROSS APPLY as performance improvement
- Create new record sets on the fly
- This is performance efficient
Do one UPDATE at bottom to make all of our changes
Move multiple IF statements into single bit SELECT statement
- Keep re-running the compare harness, continue seeing no changes after compare
Move hard-coded strings into function that returns table having named columns with constant values
- No performance hit to do this
Pull many statements together into one big SELECT
- Can then move into its own UDF
Make giant list of test data
- Then use CROSS APPLY to pass this data into new UDF
- Can then do regression testing
- After future changes, you can say explicitly what’s going to break (change)

T-SQL – Where Is a Table Used?

Posted by Sean

Another “put it in my blog so I don’t lose it” T-SQL nugget.

Nicer than just a raw text-based search, a where-used search for a named table.

select distinct [Found In] = sp.Name, sp.type_desc
  from sys.objects o inner join sys.sql_expression_dependencies  sd on o.object_id = sd.referenced_id
                inner join sys.objects sp on sd.referencing_id = sp.object_id
  where (o.name = 'Cow') 
  order by sp.Name

Handy T-SQL Merge

Posted by Sean

Inserts records into target that aren’t in source, deletes records that are no longer in source.

MERGE CowTarget AS t
USING CowSource AS s
ON (t.TargetCowId = s.CowId)
WHEN NOT MATCHED BY TARGET THEN
    INSERT (TargetCowId, TargetName, TargetMotto)
	VALUES (CowId, Name, Motto)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

String Properties in ADO.NET Entity Data Model

Posted by Sean

(NOTE: This is Model First development in Entity Framework–start with a data model in the Entity Framework Designer and then generate database schema from the model).

When you add a scalar property of type string to an entity data model, you can fill in the values of several other properties within Visual Studio.

Fixed Length – (None), True, False
Max Length – (None), value, Max
Unicode – (None), True, False

stringprop

What does a value of (None) mean for these properties (when generating SQL Server Database from model)?

Fixed Length – (None) equivalent to False
Max Length – (None) equivalent to Max
Unicode – (None) equivalent to True

Here’s how choices for these properties map to SQL Server data types:

Fixed=True, Max=20 – nchar(20)
Fixed=(None), Max=20 – nvarchar(20)
Fixed=False, Max=20 – nvarchar(20)
Fixed=(None), Max=(None) – nvarchar(max)
Fixed=True, Max=(None) – nchar(4000)
Fixed=False, Max=(None) – nvarchar(max)
Fixed=(None), Max=Max – nvarchar(max)
Fixed=True, Max=Max – nchar(4000)
Fixed=False, Max=Max – nvarchar(max)
Fixed=(None), Max=20, Unicode=True – nvarchar(20)
Fixed=(None), Max=20, Unicode=False – varchar(20)

This means that if you just pick String as the type and set nothing else, you’ll get nvarchar(max) as the SQL Server type.

NOTE: The value of 4,000 is being used because the maximum length of a fixed length string in SQL Server is 4,000 characters (8,000 bytes when encoded as UTF-16).

Count Total Rows while Paging and Avoid Two Queries

Posted by Sean

This is well covered in many other places, but I’m posting on my own blog so that I can more easily find this trick.

You can do paging in SQL Server 2012 and later using the OFFSET-FETCH syntax. This is useful for returning a single page’s worth of records from a larger (sorted) set of records. To count the total number of records, however, you’d normally do a second query. The T-SQL below shows how to do a total count as part of the query that returns the page in question. The count is repeated for every record, but it avoids having to do two queries.

select FirstName, LastName, COUNT(*) over () as TotalCount
  from Person
  where LastName like 'mc%'
  order by LastName, FirstName
  offset 500 rows
  fetch next 100 rows only

Here’s what the output looks like, returning records 501-600 out of 1,968 records having a last name that starts with “Mc”.

161109-1

SQL Server -Dump Info on Keys

Posted by Sean

Here’s a short stored procedure that could be useful when trying to understand a SQL Server database.

Let’s say that you have a simple schema like the one below. Person has GroupID as a foreign key (Group.GroupID), meaning that a person belongs to a single group. A person can also have multiple contact records, so PersonContact has a foreign key indicating which person the contact is for.

Keys

In a simple schema like this, you can quickly see the two relationships. You can also see some of this by looking at the Person table in Management Studio, under Columns and Keys. In the diagram below, we see that GroupID is a foreign key and, based on the naming convention, we infer that the corresponding primary key is in the Group table.

MgmtStudio

This notation relies on the naming convention used for the FK_Person_Group relationship. Also, while looking at the Person table, we’re unable to see which tables might contain foreign keys that refer to the primary key in the Person table.

Below is a simple stored procedure that dumps out all relationships that a specified table participates in. That is, given a table, it tells you:

All foreign keys that reference the specified table’s primary key
All foreign keys in the specified table and where the corresponding primary key is located

-- Given specified table, show all PK/FK relationships.  Show:
--   1) All foreign keys that reference this table's primary key
--   2) All foreign keys in this table and which table contains primary key
create procedure uKeys
(
	@TableName varchar(255)
)
as
begin
select tfk.name as PKTable, cfk.name as PKColumn, tpk.name as FKTable, cpk.name as FKColumn from sys.foreign_key_columns fk
  inner join sys.tables as tpk on fk.parent_object_id=tpk.object_id
  inner join sys.columns as cpk on fk.parent_object_id=cpk.object_id and fk.parent_column_id=cpk.column_id
  inner join sys.tables as tfk on fk.referenced_object_id=tfk.object_id
  inner join sys.columns as cfk on fk.referenced_object_id=cfk.object_id and fk.referenced_column_id=cfk.column_id
  where (tpk.name = @TableName) or (tfk.name = @TableName)
  order by PKTable, PKColumn, FKTable, FKColumn
end;

With this stored proc in place, you can now do the following in a query window, to ask about the keys for the Person table.

ukeys 'Person'

You’ll get some nice output that shows information about all of the relationships for the Person table.

Output

This is a simple example, but this procedure is considerably more useful when you’re dealing with very large databases, where you have a large number of tables and it’s not easy to see all of the relationships for a particular table.

Addendum: There’s already a built-in stored procedure in SQL Server that does basically the same thing (though perhaps in not as clean a format). sp_helpconstraint will dump out the same sort of information, as shown below.

sphelpconstraint

That Conference 2016 – What’s that Smell? It’s your Entity Framework!

Posted by Sean

That Conference 2016, Kalahari Resort, Lake Delton, WI
What’s that Smell? It’s your Entity Framework! – Russ Thomas

Day 2, 9 Aug 2016

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2016 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

Entity Framework often gets a bad rap when it appears to generate very poor T-SQL
You can generate far better T-SQL by paying attention to how you construct queries in LINQ
This talk describes several “code smells” that lead to poorly performing T-SQL and that can easily be improved
[Sean: Most of this also applies to LINQ to SQL]

All communication with SQL Server is via T-SQL

T-SQL is the only way to talk to the server
From T-SQL to database storage
Goes through Optimization (indexes, etc.)
There’s no such thing as a technology that bypasses T-SQL
Entity Framework adds 4th piece
- EF -> T-SQL -> Optimization -> Database Storage
EF makes things easier because it quickly scaffolds T-SQL
Goal today–work towards higher performance in EF

Code smell #1 – too many columns

EF can bring back too many columns from database
Demo
- Retrieve data from People table, do something with LastName, Firstname
- Using simple LINQ statement, just do select p (e.g. from person table)
- Then foreach on results, pull off LastName, FirstName
Look at what EF sends
- Start up Profiler (yes, should use Extended Events)
- Reminder: never do the trace in production
- If writing this query manually in T-SQL, we’d do simple query
- Look at trace from EF version of this
- Query is selecting large list of columns–far more than we’re using
- Can also look at Reads metric in Profiler
Recommendation
- Specify exactly the columns that you want
- Use Projection
- instead of: select p,
- use: select new {p.LastName, p.FirstName}
- Goes from 544 reads to 4 reads
- Query now looks like what we’d write ourselves

Code smell #2 – Non Set Based Work

Demo–Retrieving e-mails
Write a couple of loops, loop through People p, then loop through p.EmailAddresses (implicit join)
LINQ query is still returning entire People table
And then code is iterating through everything to get e-mail addresses
This results in a new T-SQL statement for each pass through the loop
Horrible performance
Don’t retrieve data procedurally
Relational DB works better if you retrieve using set-based query
Solution
- In T-SQL, you’d just do a join
- In LINQ
  - from people in db.People.Include(“EMailAddresses”)
  - Tells EF that we’ll be coming back later for e-mail address
- Even better way than .Include
  - Do join in LINQ
  - Then only a small iteration on final results

Code first discussion and demo

Most people prefer model-first, but code-first is slick
Can write code and then generate DB from code
Discussion
- By default, it set column type to nvarchar(max) for string
- And decimal(18,2) for double (e.g. a salary field)
Conclusion
- IF you’re going to do code-first, go further than just column names and types
- SQL Server does care about max string length, makes educated guesses on storage
Better way
- Override OnModelCreating in DbContext, use fluent syntax on modelBuilder to set stuff up
- Set various attributes for each column, e.g. maximum
- Re-gen model, can see better attributes on columns
Thoughts
- Code-first, often starts as simple hobby project
- But you really do need to set proper attributes ahead of time

How to see what EF is doing under-the-covers

Profiler
- Don’t run on production server
So how do we see what’s going on in EF without using profiler?
- SqlLogger class, IDBCommandInterceptor
- Requires EF 6
- Can write out to text file, capturing TSQL statements
- You implement different methods to capture different kinds of things
- Can do xxxExecuting or xxxExecuted
- Then: DbInterception.Add(new MyCommandInterceptor());
Stuff is intercepted client-side
- So doesn’t impact performance on server
[Sean: Can also set context.Database.Log property]
Other stuff you can do with interceptors
- Intercept and change statement, e.g. “soft deletes” where they don’t actually delete but just set IsDeleted column
- No trigger !

Other comments

There are other optimizations you can do that have nothing to do with SQL Server

Wrapup

Will put zipped up demo on sqljudo.com

That Conference 2016 – Tell SQL Server Profiler to Take a Hike

Posted by Sean

That Conference 2016, Kalahari Resort, Lake Delton, WI
Tell SQL Server Profiler to Take a Hike – Jes Borland

Day 1, 8 Aug 2016

Executive Summary

SQL Server Profiler has been deprecated
We should start using Extended Events (XE)

1998

SQL Server 7.0, introduced trace
Profiler was GUI for trace

Profiler

2012–deprecated
Replaced by Extended Events

Extended Events

Started out bad, but much more usable now

Why use

More thorough
Less resource intensive
- Profiler hits system hard, can bring server to its knees
Covers new features
- Can no longer use profiler for some features
- Same # events in profiler forever; but always adding XE
Azure
- Can’t use Profiler, but can use XE

Where data is gathered

Post-processing vs. Pre-processing
Start session, Gather ino, Output
Profiler
- Filters after gathering information
XE
- Stuff filtered before going into buffer

Common Profiler uses

Main uses
- Capture queries executed, their performance
- Capture deadlocks
- Capture errors and warnings
Can use Extended Events for all the same things

Capturing all queries with Extended Events

Management | Extended Events
Sessions–already running sessions
system_health always running
Rt-click: New Session… (don’t use wizard)
Can start when server starts
Events
- degree_of_parallelism – help capture every event on an object
- sp_statement_completed, sql_statement_completed,
Must click Configure to pick database
Events have fields (properties)
- e.g. cpu_time
- Clicking on event, you see properties
Global fields
- Properties that all events have in common
- Common: client_app_name, database_name, query_hash, sql_text
Filter / Predicate
- Select all and then filter applies to everything
- e.g. database_name
- duration > 5 sec
Targets
- ring_buffer
  - Uses memory
  - Set # items
- event_file
  - Saves to directory on server

Viewing data

New guy under sessions
Rt-click | Start
Expand, see targets
Rt-click | View Target Data
Choose Columns
- Pick what you want to see
Sort
- Don’t do on >1,000 rows
Can view data
- Export to XEL (XML), CSV

Can script out the session

Readable text that looks similar to TSQL

XEL file

One copy per session, per run

sa permissions?

Don’t need
Profiler: Need ALTER TRACE
XE 20012+: ALTER ANY EVENT SESSION
XE is a bit more secure–can capture events but not necessarily read

Start/stop at particular time

Use SQL Server Agent

Existing traces

Can convert to XE sessions
Can find equivalents
Mapping

That Conference 2016 – Common TSQL Mistakes

Posted by Sean

That Conference 2016, Kalahari Resort, Lake Delton, WI
Common TSQL Mistakes – Kevin Boles

Day 1, 8 Aug 2016

Executive Summary

By fixing some of the most common mistakes made in SQL queries and stored procs, you can vastly improve the performance of your application
Kevin sees the same mistakes over and over again
In each case, Kevin presents the mistake and describes a proposed alternative

Miscellaneous Goodies

tsqlt.org – Unit testing framework for SQL Server
Always specify NOT NULL for a column, unless you truly expect to handle NULL values
You want Index Seeks in query plans, not Index Scans. [SPS: Or, worse yet, Table Scans]
Don’t do formatting in SQL Server, but do in middle tier or client instead
Always SET NOCOUNT ON at the top of a stored procedure
Tune to the 2nd execution, not the first
Every trigger must be able to process n rows. The trigger is invoked only once when doing batch updates
Catch-all query leads to poor performance (e.g. WHERE @a=a or @a is null)
Don’t use table variables–no statistics on columns

NULLs and the NOT IN predicate

Problem arises when using NOT IN against list of values that contains NULL. In the example below, if there’s at least one Product with a NULL value for Color, the SELECT statement returns nothing.

SELECT C.color
FROM Colors AS C
WHERE C.color NOT IN (SELECT P.color
                      FROM Products AS P);

The fix is to use NOT EXISTS instead of NOT IN.

SELECT C.color
FROM Colors AS C
WHERE NOT EXISTS(SELECT *
                 FROM Products AS P
                 WHERE C.color = P.color);

Functions on indexed columns in predicates

Using functions on indexed columns in a WHERE clause, the index is not used. I.e. This leads to an index scan, rather than a seek. In the example below, every row of the table is read.

SELECT customer_name
FROM Customers
WHERE LEFT(customer_name, 1) = 'L';

The fix is to not use functions on indexed columns in the predicate.

SELECT customer_name
FROM Customers
WHERE customer_name LIKE 'L%';

Incorrect subquery column

It’s easy to get mixed up when writing a subquery and use a column name from the main table, rather than from the table in the subquery. In the example below, the Sales table has a sale_date column and the Calendar table has a calendar_date column. The subquery uses the wrong column name.

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT sale_date 
                    FROM Calendar AS C
                    WHERE holiday_name IS NOT NULL);

The fix is to use the correct column name.

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT C.calendar_date 
                    FROM Calendar AS C
                    WHERE C.holiday_name IS NOT NULL);

Data type mismatch in predicates

When you have data types that don’t match between a parameter and a column used in a predicate, or between different columns when doing a join, SQL Server does an implicit conversion, which results in poor performance. Watch out for CONVERT_IMPLICIT in query plans.

For example, assume that last_name column in Customers table is VARCHAR(35), but you have a stored proc parameter of type NVARCHAR(35). In the example below, we get poor performance because SQL Server does an implicit conversion and the index is not used.

CREATE PROCEDURE GetCustomerByLastName
 @last_name NVARCHAR(35)
AS
 SELECT first_name, last_name
 FROM Customers
 WHERE last_name = @last_name;

The fix is to make sure that the data type of the parameter matches the column’s data type.

Predicate evaluation order

You can’t rely on evaluation order in predicates. That is, you can’t assume that sub-expressions in an AND clause are evaluated left to right and try to rely on short-circuiting the expression.

In the example below, assume that the account_reference column can be either numeric or character-based (bad design) and that checking account_type checks for this. The query fails because there’s no guarantee that the check against account_type happens before the CAST.

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
  AND CAST(account_reference AS INT) > 20;

A workaround for this particular example would be to use a CASE statement.

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
  AND CASE WHEN account_reference NOT LIKE '%[^0-9]%' 
           THEN CAST(account_reference AS INT)
      END > 20;

Outer joins and placement of predicates

When doing an outer join, the predicate (WHERE clause) is applied only after including the “outer” rows after applying the ON clause to do the join. This means that the WHERE clause may unexpectedly filter out rows that were intended to be included by virtue of doing the outer join.

In the example below, customers with no orders are not included in the final result, as intended. They get filtered out because they have an order_date of NULL after bringing the outer rows back into the query result.

SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
  ON C.customer_nbr = O.customer_nbr
WHERE O.order_date >= '20090101'
GROUP BY C.customer_name;

The solution is to move the date check inside the ON clause.

SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
  ON C.customer_nbr = O.customer_nbr
 AND O.order_date >= '20090101'
GROUP BY C.customer_name;

Subqueries that return more than one value

Using a subquery to retrieve a value from a second table can backfire when data changes and the subquery no longer returns a single value, as expected. The query below will work only if the sku exists only once in ProductPlants.

SELECT sku, product_description,
      (SELECT plant_nbr
       FROM ProductPlants AS B
       WHERE B.sku = A.sku) AS plant_nbr
FROM Products AS A;

The fix is to do this in a JOIN.

SELECT A.sku, A.product_description, B.plant_nbr
FROM Products AS A
JOIN ProductPlants AS B
  ON A.sku = B.sku;

Use of SELECT *

Do not use SELECT * in production code. Your use may make assumptions about the schema, e.g. inserting data into a second table. A query might then break if the schema changes.

You can also run into problems when using SELECT * in a view, when columns are later added or deleted. It’s a good idea to use SCHEMABINDING when creating views.

Scalar user-defined functions

Using a scalar user-defined function in a query can result in very poor performance, because the function can get applied for every single row of a target table.

As an example, assume you have the following function:


CREATE FUNCTION dbo.GetTotalSales(@sku INT)
RETURNS DECIMAL(15, 2)
AS
BEGIN
  RETURN(SELECT SUM(sale_amount)
         FROM Sales 
         WHERE sku = @sku);
END

and you use the function as follows:

SELECT sku, product_description, dbo.GetTotalSales(sku) AS total_sales
FROM Products;

This will result in very poor performance. The solution is to rewrite the function as a table-valued function or to rewrite the main query.

SELECT P.sku, P.product_description, SUM(S.sale_amount) As total_sales
FROM Products AS P
JOIN Sales AS S
  ON P.sku = S.sku
GROUP BY P.sku, P.product_description;

The table-valued function would look like:


CREATE FUNCTION dbo.GetTotalSales(@sku INT)
RETURNS TABLE
AS
RETURN(SELECT SUM(sale_amount) AS total_sales
       FROM Sales 
       WHERE sku = @sku);

Overuse of cursors

It’s generally a bad idea to use cursors. You can almost always do the same thing by writing a normal (i.e. set-based) query.

Sean’s Stuff

Learning new software development technologies out loud

SQL Server

That Conference 2018 – Know What Your Code Is Doing to SQL Server

That Conference 2017 – Continuous Database Deployment

That Conference 2017 – Refactoring Monolith Database Stored Procedures

T-SQL – Where Is a Table Used?

Handy T-SQL Merge

String Properties in ADO.NET Entity Data Model

Count Total Rows while Paging and Avoid Two Queries

SQL Server -Dump Info on Keys

That Conference 2016 – What’s that Smell? It’s your Entity Framework!

That Conference 2016 – Tell SQL Server Profiler to Take a Hike

That Conference 2016 – Common TSQL Mistakes