That Conference 2017 – Refactoring Monolith Database Stored Procedures

That Conference 2017, Kalahari Resort, Lake Delton, WI
Refactoring Monolith Database Stored Procedures – Riley Major (@RileyMajor)

Day 2, 8 Aug 2017

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2017 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

  • Re-factoring large stored procedures helps with testability, comprehensibility, performance
  • Main strategies–reorganize, factor out pure business logic into UDFs, do updates at the end
  • Testing strategy–use transactions to undo test calls to before/after versions of stored proc, store data to compare in table variables


Monowhat?

  • Evolves over time
  • Long
  • Does multiple things
  • Disorganized
  • Fragile
  • Untestable
  • Scary


Layers, Like an Onion

  • Programming layers
    • Presentation
    • Business Logic
    • Data Storage
  • Tiers
    • Client app (HTML)
    • Server (ASP.NET)
    • Database (SQL Server, NoSQL)
  • Best Practice
    • Presentation in client application
    • Business Logic in server


Oh Noes!

  • Monolith
    • Presentation in database
    • Business Logic in database
  • Bad
    • Database scaling is hard
    • Causes vendor lock-in
    • Database languages are primitive
  • But
    • Close to data, less overhead
    • Who really changes databases?
    • SQL more powerful than you think
      • Could be faster to make changes to multiple tables down in the database


Turtles All the Way Down

  • Separate layers even on the same tier
    • Browser: MVVM
    • Server: MVC
    • Database: TBD
  • Database layer
    • Presentation / Business Logic
      • IF / CASE / SET / SUM / DATEADD
      • If this logic is going to be here anyway, we should try to architect it
    • Data access
      • SELECT / UPDATE / INSERT / DELETE
  • Testability, Isolation, Portability
    • Reasons to structure bus logic in database


Make a plan

  • What are goals?
    • Better performance?
    • Maintenance?
    • Understandability?
    • Testability?
  • How will you know you’ve achieved the goals?
    • Speed benchmarks
    • Less repetition
    • Smaller sections of code
    • Actually having a testing suite


Survey the damage

  • Can’t avoid a thorough code review
  • Look for data modification
    • INSERT, UPDATE, DELETE
    • Note columns affected
  • Look for external effects
    • CLR
    • E-mail generation
    • SPS: Triggers
  • Look for transaction handling
    • BEGIN TRAN, COMMIT, ROLLBACK
    • Harder to re-factor if existing code uses transactions
    • Can have nested transactions
    • Rollback goes all the way up the stack


Don’t Break Anything

  • Build a development environment
    • Need to be able to play around
    • Need realistic data (volume and content)
    • Maybe not real data
  • Work in isolation
    • Were changes the result of you or somebody else?
    • You really need to isolate your changes
    • Slow because resources being used elsewhere?
  • How can you tell if you broke something?
    • Need to capture before and after state
      • Look across entire database potentially
    • Aim for deterministic process
    • Easy to know if you broke it if you know what it’s supposed to do


Deterministic

  • Function returns the same result, given the same inputs
  • Easy to test – send same values in before/after your changes
  • Things that break determinism
    • Random numbers
    • Time
    • Pulling data from database (underlying table’s contents can change)


Play It Again, Sam

  • Why we want determinism–so you can compare data before/after your changes


Good Luck with That

  • (Monolith stored proc) Likely not deterministic
  • Monoliths change state
  • Need to go back in time
  • Can use transactions to do this
    • Revert to previous state


Become a wrapper

  • To test impact of code changes, wrap your calls
    • Begin transaction
    • Run original code
    • Capture changed data
    • Rollback
    • Run new code
    • Capture changed data
  • Compare the 2 captured data sets


Oops

  • But need to save changes somewhere
  • Captured data is also rolled back
  • Need to preserve the changes that you captured, even during rollback
    • Could save to local storage
    • Could save to another database
    • Print results to database console


Build a Ghost House

  • How capture doomed data?
    • Outside SQL Server–hard
    • Another thread with NOLOCK–hard
  • What’s immune from transactions?
  • Variables
  • You can’t have a variable for every row
  • One big XML? Ouch
  • Table variables survive transactions
    • Written to disk, but not stored to database
  • They’re ghost houses


Spooky Playground – Create House

  • DECLARE @Orders TABLE (Field1 int, Field 2 int);
  • Could use tricks here–store checksums or hashes instead of actual data
  • Typically create one table variable for each DB table that will get changed
    • And set only the columns that you expect to change


Spooky Playground – Fill House

  • BEGIN TRAN; EXEC monolisth;
  • UPDATE @Orders SET x = x FROM Orders
  • ROLLBACK
  • BEGIN TRAN; EXEC monolith_New


Spooky Playground – Compare

  • SELECT * FROM @Orders WHERE colA_Before <> colA_After


Mock Your Black Boxes

  • Transactions only work on the database
  • External effects aren’t rolled back
  • Replace external calls with “mocks”
  • They look and act like external calls
  • But you control the guts
  • Return hard-coded sample data
  • Have the mock log its inputs
    • You’ll need to see what was sent, to make sure it would have done the same thing


Make your time!

  • Date/Time functions kill determinism
  • You have to control “now”
  • Otherwise no two runs could be the same
  • So make your own time
  • And send it in as a parameter
    • Feed the monolith the current date/time


Your Petard Should Hoist

  • Move variable DECLAREs to the top
  • Reveals duplication
  • Reveals common data sources
  • Displays breadth of data required
  • Caution: DECLARE assignment
    • Leave SET down below when you pull the DECLARE up


One SELECT to Rule them All

  • Gather scattered SELECT statements to top
  • Reveals duplication
  • Prepares for separation
  • Prepares for shorter transactions
  • Use single SELECT with fancy SQL, if practical


Measure Twice, Cut Once

  • Find INSERT / UPDATE / DELETE
  • Replace with variables SETs
    • Store what these statements were supposed to do
  • Move data modification to end of proc
    • Shrinks amount of time when transactions are open
  • Results in 3 main sections
    • Data gathering
    • Computation
    • Data modification


Cases of CASES

  • What’s left in middle? Logic
  • Lots of Ifs, SETs, and calculations
  • Pull it all together in one giant statement
  • Usually performs better
  • Can be clearer
  • Can reduce code
  • Prepares for separation
  • CASE, Derived Tables, and CTEs are your friends


Building Blocks

  • Still one procedure = still a monolith
  • Separate
    • Data Gathering – inline UDFs
    • Calculation – inline UDF
  • Allows data gathering re-use
  • Allows testing suite for business rules
  • Allows read-only monolith actions
    • Most important benefit
    • Can tell people what the business logic will do
    • Data in, data out
    • May want to use this function elsewhere


It’s All Better Now

  • Reformed monolith
    • Recently written
    • Short
    • Orchestrates multiple things
    • Repeated code eliminated
    • Organized into functions
    • Vials of reagents to mix – based on pieces
    • Problems isolated
    • Testable
    • Bening–not scary


Note on functions

  • Scalar user-defined functions in SQL Server perform much worse than inline table-valued user-defined functions
  • Treated like the engine just like a view
  • Especially bad performance when you use result of a scalar function in a WHERE clause
  • Other problems with multi-statement table-valued function


Demo – Walk through this process

  • Mock out stored proc that sends e-mail
    • Just store input data
  • To test encapsulation, run monolith twice, in transaction
    • You’ll see differences for non-deterministic stuff, e.g. dates
  • Need to look through the differences and resolve them
    • e.g. by feeding in date as procedure
    • Make copy of Monolith, but change only the date parameter business
    • If you now see now results, it’s now deterministic
  • Now start re-factoring, but create different versions of the monolith
    • e.g. Monolith_HoistVariables
    • Move variable DECLAREs up to top
  • Beware of false positives
    • You might see no differences, but that could be due to having no data in database that exposes a bug that we just created
    • SPS: Or that test data doesn’t cause execution of a path where a bug is fixed
  • CROSS APPLY as performance improvement
    • Create new record sets on the fly
    • This is performance efficient
  • Do one UPDATE at bottom to make all of our changes
  • Move multiple IF statements into single bit SELECT statement
    • Keep re-running the compare harness, continue seeing no changes after compare
  • Move hard-coded strings into function that returns table having named columns with constant values
    • No performance hit to do this
  • Pull many statements together into one big SELECT
    • Can then move into its own UDF
  • Make giant list of test data
    • Then use CROSS APPLY to pass this data into new UDF
    • Can then do regression testing
    • After future changes, you can say explicitly what’s going to break (change)

Leave a comment