That Conference 2017 – Refactoring Monolith Database Stored Procedures

That Conference 2017, Kalahari Resort, Lake Delton, WI
Refactoring Monolith Database Stored Procedures – Riley Major (@RileyMajor)

Day 2, 8 Aug 2017

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2017 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

Re-factoring large stored procedures helps with testability, comprehensibility, performance
Main strategies–reorganize, factor out pure business logic into UDFs, do updates at the end
Testing strategy–use transactions to undo test calls to before/after versions of stored proc, store data to compare in table variables

Monowhat?

Evolves over time
Long
Does multiple things
Disorganized
Fragile
Untestable
Scary

Layers, Like an Onion

Programming layers
- Presentation
- Business Logic
- Data Storage
Tiers
- Client app (HTML)
- Server (ASP.NET)
- Database (SQL Server, NoSQL)
Best Practice
- Presentation in client application
- Business Logic in server

Oh Noes!

Monolith
- Presentation in database
- Business Logic in database
Bad
- Database scaling is hard
- Causes vendor lock-in
- Database languages are primitive
But
- Close to data, less overhead
- Who really changes databases?
- SQL more powerful than you think
  - Could be faster to make changes to multiple tables down in the database

Turtles All the Way Down

Separate layers even on the same tier
- Browser: MVVM
- Server: MVC
- Database: TBD
Database layer
- Presentation / Business Logic
  - IF / CASE / SET / SUM / DATEADD
  - If this logic is going to be here anyway, we should try to architect it
- Data access
  - SELECT / UPDATE / INSERT / DELETE
Testability, Isolation, Portability
- Reasons to structure bus logic in database

Make a plan

What are goals?
- Better performance?
- Maintenance?
- Understandability?
- Testability?
How will you know you’ve achieved the goals?
- Speed benchmarks
- Less repetition
- Smaller sections of code
- Actually having a testing suite

Survey the damage

Can’t avoid a thorough code review
Look for data modification
- INSERT, UPDATE, DELETE
- Note columns affected
Look for external effects
- CLR
- E-mail generation
- SPS: Triggers
Look for transaction handling
- BEGIN TRAN, COMMIT, ROLLBACK
- Harder to re-factor if existing code uses transactions
- Can have nested transactions
- Rollback goes all the way up the stack

Don’t Break Anything

Build a development environment
- Need to be able to play around
- Need realistic data (volume and content)
- Maybe not real data
Work in isolation
- Were changes the result of you or somebody else?
- You really need to isolate your changes
- Slow because resources being used elsewhere?
How can you tell if you broke something?
- Need to capture before and after state
  - Look across entire database potentially
- Aim for deterministic process
- Easy to know if you broke it if you know what it’s supposed to do

Deterministic

Function returns the same result, given the same inputs
Easy to test – send same values in before/after your changes
Things that break determinism
- Random numbers
- Time
- Pulling data from database (underlying table’s contents can change)

Play It Again, Sam

Why we want determinism–so you can compare data before/after your changes

Good Luck with That

(Monolith stored proc) Likely not deterministic
Monoliths change state
Need to go back in time
Can use transactions to do this
- Revert to previous state

Become a wrapper

To test impact of code changes, wrap your calls
- Begin transaction
- Run original code
- Capture changed data
- Rollback
- Run new code
- Capture changed data
Compare the 2 captured data sets

Oops

But need to save changes somewhere
Captured data is also rolled back
Need to preserve the changes that you captured, even during rollback
- Could save to local storage
- Could save to another database
- Print results to database console

Build a Ghost House

How capture doomed data?
- Outside SQL Server–hard
- Another thread with NOLOCK–hard
What’s immune from transactions?
Variables
You can’t have a variable for every row
One big XML? Ouch
Table variables survive transactions
- Written to disk, but not stored to database
They’re ghost houses

Spooky Playground – Create House

DECLARE @Orders TABLE (Field1 int, Field 2 int);
Could use tricks here–store checksums or hashes instead of actual data
Typically create one table variable for each DB table that will get changed
- And set only the columns that you expect to change

Spooky Playground – Fill House

BEGIN TRAN; EXEC monolisth;
UPDATE @Orders SET x = x FROM Orders
ROLLBACK
BEGIN TRAN; EXEC monolith_New

Spooky Playground – Compare

SELECT * FROM @Orders WHERE colA_Before <> colA_After

Mock Your Black Boxes

Transactions only work on the database
External effects aren’t rolled back
Replace external calls with “mocks”
They look and act like external calls
But you control the guts
Return hard-coded sample data
Have the mock log its inputs
- You’ll need to see what was sent, to make sure it would have done the same thing

Make your time!

Date/Time functions kill determinism
You have to control “now”
Otherwise no two runs could be the same
So make your own time
And send it in as a parameter
- Feed the monolith the current date/time

Your Petard Should Hoist

Move variable DECLAREs to the top
Reveals duplication
Reveals common data sources
Displays breadth of data required
Caution: DECLARE assignment
- Leave SET down below when you pull the DECLARE up

One SELECT to Rule them All

Gather scattered SELECT statements to top
Reveals duplication
Prepares for separation
Prepares for shorter transactions
Use single SELECT with fancy SQL, if practical

Measure Twice, Cut Once

Find INSERT / UPDATE / DELETE
Replace with variables SETs
- Store what these statements were supposed to do
Move data modification to end of proc
- Shrinks amount of time when transactions are open
Results in 3 main sections
- Data gathering
- Computation
- Data modification

Cases of CASES

What’s left in middle? Logic
Lots of Ifs, SETs, and calculations
Pull it all together in one giant statement
Usually performs better
Can be clearer
Can reduce code
Prepares for separation
CASE, Derived Tables, and CTEs are your friends

Building Blocks

Still one procedure = still a monolith
Separate
- Data Gathering – inline UDFs
- Calculation – inline UDF
Allows data gathering re-use
Allows testing suite for business rules
Allows read-only monolith actions
- Most important benefit
- Can tell people what the business logic will do
- Data in, data out
- May want to use this function elsewhere

It’s All Better Now

Reformed monolith
- Recently written
- Short
- Orchestrates multiple things
- Repeated code eliminated
- Organized into functions
- Vials of reagents to mix – based on pieces
- Problems isolated
- Testable
- Bening–not scary

Note on functions

Scalar user-defined functions in SQL Server perform much worse than inline table-valued user-defined functions
Treated like the engine just like a view
Especially bad performance when you use result of a scalar function in a WHERE clause
Other problems with multi-statement table-valued function

Demo – Walk through this process

Mock out stored proc that sends e-mail
- Just store input data
To test encapsulation, run monolith twice, in transaction
- You’ll see differences for non-deterministic stuff, e.g. dates
Need to look through the differences and resolve them
- e.g. by feeding in date as procedure
- Make copy of Monolith, but change only the date parameter business
- If you now see now results, it’s now deterministic
Now start re-factoring, but create different versions of the monolith
- e.g. Monolith_HoistVariables
- Move variable DECLAREs up to top
Beware of false positives
- You might see no differences, but that could be due to having no data in database that exposes a bug that we just created
- SPS: Or that test data doesn’t cause execution of a path where a bug is fixed
CROSS APPLY as performance improvement
- Create new record sets on the fly
- This is performance efficient
Do one UPDATE at bottom to make all of our changes
Move multiple IF statements into single bit SELECT statement
- Keep re-running the compare harness, continue seeing no changes after compare
Move hard-coded strings into function that returns table having named columns with constant values
- No performance hit to do this
Pull many statements together into one big SELECT
- Can then move into its own UDF
Make giant list of test data
- Then use CROSS APPLY to pass this data into new UDF
- Can then do regression testing
- After future changes, you can say explicitly what’s going to break (change)

Sean’s Stuff

Learning new software development technologies out loud

That Conference 2017 – Refactoring Monolith Database Stored Procedures

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply