That Conference 2017 – Continuous Database Deployment

That Conference 2017, Kalahari Resort, Lake Delton, WI
Continuous Database Deployment – Mike Acord

Day 2, 8 Aug 2017

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2017 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

  • Important to have database assets under source control
  • Several different strategies, each with pros/cons
  • Most likely approaches–schema-based or script-based


Continuous Deployments

  • We continuously deploy our code
    • Automated deployments
    • Integration tests
    • Build pipeline
  • But we’re not doing this with databases
    • More difficult
    • Harder to avoid losing data
    • More risky


Databases – Instead

  • Manual change scripts
  • Schema comparisons (SQL Compare)
  • Create change script, run against database
  • Problems
    • Error prone
    • Risky
    • Slow


What Is Continuous Database Deployment

  • Automated deployments
  • Integration tests
  • Build pipeline
  • Diagram
    • Database should be in source control
    • Trigger from changes in CM leads to Continuous Integration
    • Leads to Release Management
    • Staged releases–QA, Test


Benefits

  • Less error prone, more repeatable
  • Anybody can do this
  • Testable–deploying to different environments
    • At each phase, you’re re-testing the deployment script
  • Faster to release
  • Easier to merge branches
    • Work item has unique code and unique database changes
  • Easier to refresh environments
    • And easier to refresh to different environments
    • Stuff already in DEV, ready to go


Methods

  • Schema-based
    • Current schema is always in source control
    • Script in CM for every object in your database
  • Script-based
    • Change scripts checked in and applied
    • You just create a delta script
    • Never have full history
  • Code-based
    • Apply database changes from code
    • E.g. Migrations in .NET


Schema Based

  • Provides history of schema
  • Can generate change scripts to reverse changes
  • Less likely to screw up change scripts
  • Compare and sync schema to deploy changes
  • Can make to database and sync back to model
    • Can generate blank database at an point
  • Challenges
    • Merging can be challenging
    • Black voodoo magic leads to nasty change scripts
    • Deployment process is more complex
    • Forget to check in changes and lose them


SQL Source Control

  • Shows changes
  • Create migration scripts
  • Deals with situation of adding non-nullable columns


Visual Studio Database Project

  • SQLCompare utility to apply schema changes


Example

  • Octopus Deploy used to deploy changes
  • Creates Powershell scripts (or uses scripts)
    • Applying .dacpac files using SQL Package cmd line tools
  • NB: RedGate has plugins to SQL Source Control for various CI tools


Script Based

  • Create change scripts to apply
    • Script for every change
  • Simple to understand
  • Easy to implement
  • No black magic
  • Easy to test scripts
  • Challenges
    • May need to review change scripts
    • No ‘current’ database that you can deploy
      • Backup/restore to get copy of current


Example – DBUP + Octopus Deploy

  • Storing migration scripts in DB project
  • DBUP uses LOG in database to see what needs to be run; applies migration in order
  • Also then applies environment-specific scripts


Code Based

  • Quick setup for new developers
  • Allows for seed data
  • Simple to use
  • Can generate change scripts
  • Challenges
    • Testing changes can be awkward
      • Hard to test because migration only happens in the context of running your app
    • May feel unnatural
    • Rolling deployments more challenging
      • One app does the update, but other apps aren’t yet up to date


Example – Entity Framework Migrations


Database Refresh – Keep Data Current

  • Quickly refresh environments from production
  • Can be scheduled to happen daily
  • Allows better diagnosis of data issues
  • Daily testing of deployment scripts
    • By virtue of pushing between DEV / STAGE / PRODUCTIOn
  • Challenges
    • Clustered servers
    • Large databases
      • Can be painful, >1TB


Example – Database Refresh

  • Using Octopus Deploy


Best Practices

  • Create backups or snapshots before doing anything else
  • Avoid data loss
  • Automated integration tests
  • Deploy breaking changes in steps
  • Example (field rename)
    • Add new field with synchronization trigger
    • Modify application to use new column
    • Remove old column and synchronization trigger
  • Do this in stages when you can’t take time window to do everything at once
  • Lots of patterns for applying changes in non-breaking patterns like this
    • Agiledata.org/essays/renameColumn.html
    • Book – Refactoring Databases


Questions

  • Problem with database project – comparing two schemas, scripts that it generates just don’t work
    • Turn on setting to catch errors on project build
Advertisements

That Conference 2017 – Refactoring Monolith Database Stored Procedures

That Conference 2017, Kalahari Resort, Lake Delton, WI
Refactoring Monolith Database Stored Procedures – Riley Major (@RileyMajor)

Day 2, 8 Aug 2017

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2017 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

  • Re-factoring large stored procedures helps with testability, comprehensibility, performance
  • Main strategies–reorganize, factor out pure business logic into UDFs, do updates at the end
  • Testing strategy–use transactions to undo test calls to before/after versions of stored proc, store data to compare in table variables


Monowhat?

  • Evolves over time
  • Long
  • Does multiple things
  • Disorganized
  • Fragile
  • Untestable
  • Scary


Layers, Like an Onion

  • Programming layers
    • Presentation
    • Business Logic
    • Data Storage
  • Tiers
    • Client app (HTML)
    • Server (ASP.NET)
    • Database (SQL Server, NoSQL)
  • Best Practice
    • Presentation in client application
    • Business Logic in server


Oh Noes!

  • Monolith
    • Presentation in database
    • Business Logic in database
  • Bad
    • Database scaling is hard
    • Causes vendor lock-in
    • Database languages are primitive
  • But
    • Close to data, less overhead
    • Who really changes databases?
    • SQL more powerful than you think
      • Could be faster to make changes to multiple tables down in the database


Turtles All the Way Down

  • Separate layers even on the same tier
    • Browser: MVVM
    • Server: MVC
    • Database: TBD
  • Database layer
    • Presentation / Business Logic
      • IF / CASE / SET / SUM / DATEADD
      • If this logic is going to be here anyway, we should try to architect it
    • Data access
      • SELECT / UPDATE / INSERT / DELETE
  • Testability, Isolation, Portability
    • Reasons to structure bus logic in database


Make a plan

  • What are goals?
    • Better performance?
    • Maintenance?
    • Understandability?
    • Testability?
  • How will you know you’ve achieved the goals?
    • Speed benchmarks
    • Less repetition
    • Smaller sections of code
    • Actually having a testing suite


Survey the damage

  • Can’t avoid a thorough code review
  • Look for data modification
    • INSERT, UPDATE, DELETE
    • Note columns affected
  • Look for external effects
    • CLR
    • E-mail generation
    • SPS: Triggers
  • Look for transaction handling
    • BEGIN TRAN, COMMIT, ROLLBACK
    • Harder to re-factor if existing code uses transactions
    • Can have nested transactions
    • Rollback goes all the way up the stack


Don’t Break Anything

  • Build a development environment
    • Need to be able to play around
    • Need realistic data (volume and content)
    • Maybe not real data
  • Work in isolation
    • Were changes the result of you or somebody else?
    • You really need to isolate your changes
    • Slow because resources being used elsewhere?
  • How can you tell if you broke something?
    • Need to capture before and after state
      • Look across entire database potentially
    • Aim for deterministic process
    • Easy to know if you broke it if you know what it’s supposed to do


Deterministic

  • Function returns the same result, given the same inputs
  • Easy to test – send same values in before/after your changes
  • Things that break determinism
    • Random numbers
    • Time
    • Pulling data from database (underlying table’s contents can change)


Play It Again, Sam

  • Why we want determinism–so you can compare data before/after your changes


Good Luck with That

  • (Monolith stored proc) Likely not deterministic
  • Monoliths change state
  • Need to go back in time
  • Can use transactions to do this
    • Revert to previous state


Become a wrapper

  • To test impact of code changes, wrap your calls
    • Begin transaction
    • Run original code
    • Capture changed data
    • Rollback
    • Run new code
    • Capture changed data
  • Compare the 2 captured data sets


Oops

  • But need to save changes somewhere
  • Captured data is also rolled back
  • Need to preserve the changes that you captured, even during rollback
    • Could save to local storage
    • Could save to another database
    • Print results to database console


Build a Ghost House

  • How capture doomed data?
    • Outside SQL Server–hard
    • Another thread with NOLOCK–hard
  • What’s immune from transactions?
  • Variables
  • You can’t have a variable for every row
  • One big XML? Ouch
  • Table variables survive transactions
    • Written to disk, but not stored to database
  • They’re ghost houses


Spooky Playground – Create House

  • DECLARE @Orders TABLE (Field1 int, Field 2 int);
  • Could use tricks here–store checksums or hashes instead of actual data
  • Typically create one table variable for each DB table that will get changed
    • And set only the columns that you expect to change


Spooky Playground – Fill House

  • BEGIN TRAN; EXEC monolisth;
  • UPDATE @Orders SET x = x FROM Orders
  • ROLLBACK
  • BEGIN TRAN; EXEC monolith_New


Spooky Playground – Compare

  • SELECT * FROM @Orders WHERE colA_Before <> colA_After


Mock Your Black Boxes

  • Transactions only work on the database
  • External effects aren’t rolled back
  • Replace external calls with “mocks”
  • They look and act like external calls
  • But you control the guts
  • Return hard-coded sample data
  • Have the mock log its inputs
    • You’ll need to see what was sent, to make sure it would have done the same thing


Make your time!

  • Date/Time functions kill determinism
  • You have to control “now”
  • Otherwise no two runs could be the same
  • So make your own time
  • And send it in as a parameter
    • Feed the monolith the current date/time


Your Petard Should Hoist

  • Move variable DECLAREs to the top
  • Reveals duplication
  • Reveals common data sources
  • Displays breadth of data required
  • Caution: DECLARE assignment
    • Leave SET down below when you pull the DECLARE up


One SELECT to Rule them All

  • Gather scattered SELECT statements to top
  • Reveals duplication
  • Prepares for separation
  • Prepares for shorter transactions
  • Use single SELECT with fancy SQL, if practical


Measure Twice, Cut Once

  • Find INSERT / UPDATE / DELETE
  • Replace with variables SETs
    • Store what these statements were supposed to do
  • Move data modification to end of proc
    • Shrinks amount of time when transactions are open
  • Results in 3 main sections
    • Data gathering
    • Computation
    • Data modification


Cases of CASES

  • What’s left in middle? Logic
  • Lots of Ifs, SETs, and calculations
  • Pull it all together in one giant statement
  • Usually performs better
  • Can be clearer
  • Can reduce code
  • Prepares for separation
  • CASE, Derived Tables, and CTEs are your friends


Building Blocks

  • Still one procedure = still a monolith
  • Separate
    • Data Gathering – inline UDFs
    • Calculation – inline UDF
  • Allows data gathering re-use
  • Allows testing suite for business rules
  • Allows read-only monolith actions
    • Most important benefit
    • Can tell people what the business logic will do
    • Data in, data out
    • May want to use this function elsewhere


It’s All Better Now

  • Reformed monolith
    • Recently written
    • Short
    • Orchestrates multiple things
    • Repeated code eliminated
    • Organized into functions
    • Vials of reagents to mix – based on pieces
    • Problems isolated
    • Testable
    • Bening–not scary


Note on functions

  • Scalar user-defined functions in SQL Server perform much worse than inline table-valued user-defined functions
  • Treated like the engine just like a view
  • Especially bad performance when you use result of a scalar function in a WHERE clause
  • Other problems with multi-statement table-valued function


Demo – Walk through this process

  • Mock out stored proc that sends e-mail
    • Just store input data
  • To test encapsulation, run monolith twice, in transaction
    • You’ll see differences for non-deterministic stuff, e.g. dates
  • Need to look through the differences and resolve them
    • e.g. by feeding in date as procedure
    • Make copy of Monolith, but change only the date parameter business
    • If you now see now results, it’s now deterministic
  • Now start re-factoring, but create different versions of the monolith
    • e.g. Monolith_HoistVariables
    • Move variable DECLAREs up to top
  • Beware of false positives
    • You might see no differences, but that could be due to having no data in database that exposes a bug that we just created
    • SPS: Or that test data doesn’t cause execution of a path where a bug is fixed
  • CROSS APPLY as performance improvement
    • Create new record sets on the fly
    • This is performance efficient
  • Do one UPDATE at bottom to make all of our changes
  • Move multiple IF statements into single bit SELECT statement
    • Keep re-running the compare harness, continue seeing no changes after compare
  • Move hard-coded strings into function that returns table having named columns with constant values
    • No performance hit to do this
  • Pull many statements together into one big SELECT
    • Can then move into its own UDF
  • Make giant list of test data
    • Then use CROSS APPLY to pass this data into new UDF
    • Can then do regression testing
    • After future changes, you can say explicitly what’s going to break (change)

T-SQL – Where Is a Table Used?

Another “put it in my blog so I don’t lose it” T-SQL nugget.

Nicer than just a raw text-based search, a where-used search for a named table.

select distinct [Found In] = sp.Name, sp.type_desc
  from sys.objects o inner join sys.sql_expression_dependencies  sd on o.object_id = sd.referenced_id
                inner join sys.objects sp on sd.referencing_id = sp.object_id
  where (o.name = 'Cow') 
  order by sp.Name

Handy T-SQL Merge

Inserts records into target that aren’t in source, deletes records that are no longer in source.

MERGE CowTarget AS t
USING CowSource AS s
ON (t.TargetCowId = s.CowId)
WHEN NOT MATCHED BY TARGET THEN
    INSERT (TargetCowId, TargetName, TargetMotto)
	VALUES (CowId, Name, Motto)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

String Properties in ADO.NET Entity Data Model

(NOTE: This is Model First development in Entity Framework–start with a data model in the Entity Framework Designer and then generate database schema from the model).

When you add a scalar property of type string to an entity data model, you can fill in the values of several other properties within Visual Studio.

  • Fixed Length – (None), True, False
  • Max Length – (None), value, Max
  • Unicode – (None), True, False

stringprop

What does a value of (None) mean for these properties (when generating SQL Server Database from model)?

  • Fixed Length – (None) equivalent to False
  • Max Length – (None) equivalent to Max
  • Unicode – (None) equivalent to True

Here’s how choices for these properties map to SQL Server data types:

  • Fixed=True, Max=20 – nchar(20)
  • Fixed=(None), Max=20 – nvarchar(20)
  • Fixed=False, Max=20 – nvarchar(20)
  • Fixed=(None), Max=(None) – nvarchar(max)
  • Fixed=True, Max=(None) – nchar(4000)
  • Fixed=False, Max=(None) – nvarchar(max)
  • Fixed=(None), Max=Max – nvarchar(max)
  • Fixed=True, Max=Max – nchar(4000)
  • Fixed=False, Max=Max – nvarchar(max)
  • Fixed=(None), Max=20, Unicode=True – nvarchar(20)
  • Fixed=(None), Max=20, Unicode=False – varchar(20)

This means that if you just pick String as the type and set nothing else, you’ll get nvarchar(max) as the SQL Server type.

NOTE: The value of 4,000 is being used because the maximum length of a fixed length string in SQL Server is 4,000 characters (8,000 bytes when encoded as UTF-16).

 

Count Total Rows while Paging and Avoid Two Queries

This is well covered in many other places, but I’m posting on my own blog so that I can more easily find this trick.

You can do paging in SQL Server 2012 and later using the OFFSET-FETCH syntax. This is useful for returning a single page’s worth of records from a larger (sorted) set of records. To count the total number of records, however, you’d normally do a second query. The T-SQL below shows how to do a total count as part of the query that returns the page in question. The count is repeated for every record, but it avoids having to do two queries.

select FirstName, LastName, COUNT(*) over () as TotalCount
  from Person
  where LastName like 'mc%'
  order by LastName, FirstName
  offset 500 rows
  fetch next 100 rows only

Here’s what the output looks like, returning records 501-600 out of 1,968 records having a last name that starts with “Mc”.

161109-1

SQL Server -Dump Info on Keys

Here’s a short stored procedure that could be useful when trying to understand a SQL Server database.

Let’s say that you have a simple schema like the one below. Person has GroupID as a foreign key (Group.GroupID), meaning that a person belongs to a single group. A person can also have multiple contact records, so PersonContact has a foreign key indicating which person the contact is for.

Keys

In a simple schema like this, you can quickly see the two relationships. You can also see some of this by looking at the Person table in Management Studio, under Columns and Keys. In the diagram below, we see that GroupID is a foreign key and, based on the naming convention, we infer that the corresponding primary key is in the Group table.

MgmtStudio

This notation relies on the naming convention used for the FK_Person_Group relationship. Also, while looking at the Person table, we’re unable to see which tables might contain foreign keys that refer to the primary key in the Person table.

Below is a simple stored procedure that dumps out all relationships that a specified table participates in. That is, given a table, it tells you:

  • All foreign keys that reference the specified table’s primary key
  • All foreign keys in the specified table and where the corresponding primary key is located
-- Given specified table, show all PK/FK relationships.  Show:
--   1) All foreign keys that reference this table's primary key
--   2) All foreign keys in this table and which table contains primary key
create procedure uKeys
(
	@TableName varchar(255)
)
as
begin
select tfk.name as PKTable, cfk.name as PKColumn, tpk.name as FKTable, cpk.name as FKColumn from sys.foreign_key_columns fk
  inner join sys.tables as tpk on fk.parent_object_id=tpk.object_id
  inner join sys.columns as cpk on fk.parent_object_id=cpk.object_id and fk.parent_column_id=cpk.column_id
  inner join sys.tables as tfk on fk.referenced_object_id=tfk.object_id
  inner join sys.columns as cfk on fk.referenced_object_id=cfk.object_id and fk.referenced_column_id=cfk.column_id
  where (tpk.name = @TableName) or (tfk.name = @TableName)
  order by PKTable, PKColumn, FKTable, FKColumn
end;

With this stored proc in place, you can now do the following in a query window, to ask about the keys for the Person table.

ukeys 'Person'

You’ll get some nice output that shows information about all of the relationships for the Person table.

Output

This is a simple example, but this procedure is considerably more useful when you’re dealing with very large databases, where you have a large number of tables and it’s not easy to see all of the relationships for a particular table.

Addendum: There’s already a built-in stored procedure in SQL Server that does basically the same thing (though perhaps in not as clean a format). sp_helpconstraint will dump out the same sort of information, as shown below.

sphelpconstraint