That Conference 2016 – From Inception to Production: A Continuous Delivery Story

That Conference 2016, Kalahari Resort, Lake Delton, WI
From Inception to Production: A Continuous Delivery Story
Ian Randall

Day 3, 10 Aug 2016

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2016 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

  • Ian shares best practices of his development group at Pushpay, from inception to deployment

Tell a continuous delivery story in context

  • Pushpay company, SaaS business
  • ACMR Growth – committed monthly revenue stream
  • to $10 million in 3 quarters (5 considered fast)
  • Everyone at Pushpay paid to deliver value to business
    • Code not yet running on production server has no value

Hierarchy

  • Tools (top)
  • People, practices
  • Just culture & blameless postmortems
    • Bottom layer, most important

Our journey begins

  • Idea

Why?

  • Shared vision over the value to the business
  • Talk about why to build it
    • Better than talking about what

Who?

  • Who has conversation?
    • Product
    • QA – must involve QA in initial discussion
      • Can’t just test quality at end
      • “Bake quality in”
      • Build with tester (pairing)
    • Dev

Building a feature

  • Dev – how will I build this thing?
  • QA – how will I break this thing?
    • Very valuable insight
    • “Nobody will ever do that”
    • QA tester writes out what tests should be
    • Then Dev writes unit tests based on this
    • Quality Assistant (not Assurance)

Building a larger feature

  • Long-lived feature branch (git)
    • Delta gets too big
      • Smaller deltas are lower risk
    • No feedback – from user
    • (good) dry code
  • Feature switches (toggles)
    • Small deltas
    • Regular feedback
    • (bad) technical debt
      • Some code duplicatoin
      • But you have to go back later and yank out old code

Pushpay terminology

  • Delta – diff between production server & head of master branch
    • Code no yet running in production
  • Want to keep delta small and contained
  • Shipping in tinier increments, easier to figure out what the cause of the problem is

Feature Switches

  • Configuration per environment
    • Features.config
    • Features.Production.config
    • Features.Sandbox.config

Feature Switches

  • Url manipulation to toggle switches on/off
  • Deliver daily increments of (non-running) code
  • Light up a slice of feature
  • Measure – statsd
  • Re-think road-map to compmlete feature

Womm

  • Works on My Machine
  • Context switching after QA person finds problem
  • Pushpay, turned around
    • Must work on your machine
    • Tester decides this
    • Hand laptop to tester, let them test on the dev machine
    • Best value for company from dev point of view–watch tester break it right in front of them
    • Shortens the DEV/QA cycle
    • Pair testing

Code review

  • Every line of code gets reviewed
  • Code must reviewed and wommed before merging
  • “Roll forwards to victory”

Code Review

  • Do
    • Validate approach
    • Performance, security, operability
    • Cohesion, coupling
    • Be honest
  • Don’t
    • Be rude – e.g. “dude that’s gross”
      • Better – tell coder how you would have done it?
    • Seriously, don’t be rude
    • Sweat the small stuff, like bracing, spaces
      • This stuff is not important

Cross-Pollination

  • Someone else does it all again!
  • Pollination – not necessarily from your cell
  • Might not fully understand the context of your feature

4 Cotinuouses

(1) Continuous Integration

  • Source control
    • PR branch
    • Pushed early, for discussion
  • Build & Test
    • TeamCity does builds, using nUnit for unit testing
    • Integration testing – anything that crosses a boundary

CI – Source Control

  • PR-based workflow
  • Review happens in the PR
  • Everything reviewed

CI – build and test

  • PR Branch: Build, unit test and integration test
  • Merge into master – build, unit test, integration and acceptance test
    • Goes to QA – then rolled back or pushed to production
    • Human decision to push to production
    • Acceptance tests in Selenium – a bit brittle, but have some value
    • Create static Model and push to View, then test
      • If it breaks there, it’s not because of back-end
      • Verified that your site is intact, visually
      • Renders view against snapshot – if diff, either a bug or you accept as new snapshot

(2) Continuous Deployment

  • Automatically build, package and deploy to QA
    • Octopus deploy (great tool)
  • Manually promote package to production
    • One button click (because of Octopus)

(3) Continuous Delivery

  • Operability
  • Value

CD – Operability

  • Exception logging
  • App logging (Log4Net)
  • App metrics (statsd)
    • Measure absolutely everything
  • Incident

CD – Value

  • Add incremental bits of value to the product
    • Need to think – is there maybe value in shipping a portion of the feature?
  • Measuring the effectiveness

(4) Continuous Improvement

  • Actively seeking out opportunities to improve
    • Fix broken windows
    • Leave codebase in better state than when you found it
    • Improve the process

Bots

  • Shipbot, Beebot, Salesbot
    • Many, many, many more
  • @C3PR in action
    • Catalog of little commands
    • People joining PRs together

Just Culture

  • sidney dekker
  • Retributive culture
    • Clarity between acceptable and unacceptable
  • Restorative culture / model
    • Focuses on learning from what went wrong
    • Safe to fail

Fear of breaking things will paralyze an organization

Toyota’s Five Whys

  • Keep asking why until you get to root cause of problem
  • Doesn’t work for Pushpay
    • No single thing that is root cause
    • And often turns into “who”

Blameless Post-Mortems

  • Talk about how to stop the thing from happening again
  • When?
    • When there is an opportunity
    • Often, after break to production
    • Or even when something brings QA server down
  • How?
    • If we had a meeting, the loudest person in the room would do the most talking
    • So we do this asynchronously in a Wiki
    • Coordinated in slack channel #morgue
    • Co-ordinated with person closest to the incident
  • What?
    • Four sections in report
      • Scenario and impact
      • Timeline – write in real-time
      • Discussion
      • Mitigation – make sure this type of thing won’t happen again
        • Actionable ticket in Jira, highest priority possible
        • Slipping feature is better than having the incident happen again

Fault

  • Easy for manager to say to staff–when stuff happens, it’s not your fault
  • Much harder to go to CEO and say that stuff just happens
    • Board reads every post-mortem