That Conference 2018 – An Extended Explanation of Caching

That Conference 2018, Kalahari Resort, Lake Delton, WI
An Extended Explanation of Caching – Tom Cudd

Day 3, 8 Aug 2018  2:30 PM

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2018 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

  • A number of places where you can do caching
  • Various tools for various places
  • Really all about 1) populating, 2) invalidating

Caching examples

  • Network tab in browser, pages from memory or disk cache
  • Content from CDN
  • Special applications – e..g. Varnish
  • WordPress plugin – specify cache settings for blog

Caching is Like Regex

  • If you’re not careful, you’re going to have new problems

Application Architecture

  • “Christmas tree” showing server, cache servers, CDN, load balancers, etc.

Problem

  • Caching solves one problem–performance
  • 3-sec rule: user leaves if something takes longer than 3 secs
    • 40% of users leave

Another reason

  • Buy some time
    • e.g. before application crashes
  • Cost-benefit analysis–caching costs vs downtime

Measure First

  • User metrics
  • Load times
  • Site no crashing every 5 minutes
  • S.M.A.R.T. Goals – specific measurable achievable relevant time-bound

How to Not Suck at Caching

  • Caching is additive
    • Can’t just throw caching onto server that’s already overloaded
  • Don’t over-engineer
    • Use the simplest solution that satisfies the requirements
  • Measure, change, test, measure

Caching Doesn’t Help

  • Oversaturation
    • Network hardware
    • Thread death–“virus”, creating multiple threads for each request
  • Thundering herd
    • If you get large number of initial requests, a large number of them go all the way back to server, since it takes a little time to cache the data
  • Lack of information
  • Bad decisions

End Users

Browser Caching

  • Setting request headers–page tells browser to cache a page
  • Requires a hit to operating system
  • Unique naming
    • Rename an assets to force download of long cached object

Set with Web Server

  • E.g. est cache-control for certain pages

External/Edge Services

  • CDN – Content Delivery Network
    • Point your URI to 3rd party server
    • They then pull files, as needed, from your server
  • Most big sites on the web run on CDN
  • Improve page load time
    • Serve request from server that has shortest travel time to client
  • High availability
    • Cached data could still be present even when your origin server hiccups
  • Can maybe buy CDN servies at lower cost than scaling out actual servers

CDN Features

  • Setting TTLs, other configs based on extension
  • Region, language routing
  • Mobile detection
  • WAF/Security
    • Applications can do double duty–firewall + CDN
    • DDOS protection

CDN Providers

  • Cloudflare
  • Incapsula
  • Cloud providers
  • Akamai
  • Fastly

Akamai

  • Match file extensions
  • Honoe cache control of origin
  • Can have configs for some stuff on your server

Measure and Test

Your Systems

  • Don’t over-engineer–get strange results
  • Spin up different instance if you need different purpose

Varnish

  • Separate servers
  • Both memory and disk paging

Web/App Layer

  • Baked vs fried
    • Building on demand vs preloading cache
    • Cache gen’d on demand when request hits (fries)
    • Preload cache ahead of time (baked)
  • Disk vs memory

Products

  • Adobe Experience Manager
  • Sitecore
    • Prefetch, data, item, html caches
  • Drupal
    • Performance caching (SQL queries stored)

Output Caching

  • IIS- – generate on request

App/Data Layer

  • Reducing database calls
  • Reducing API calls

Ehcache

  • Java based applications
  • Memory or disk

Memcached

  • In memory, key/value store
  • Reduce database load
  • Not synchronized
  • Scalable separately

Let’s Not Fight

  • Not worth arguing about which is better
  • Memcached, Redis, Mongo
  • All are good

Database

  • Redis
  • Firebase
  • Stored procedures – reduce network bandwith by calling short sproc rather than sending long query

Challenges

  • Invalidating stale data

Caching Procedures

  • Only two things you’ll really need to do with caching
    • Populating
    • Invalidating

Debugging, need to remind people to clear their cache

  • [Sean] But should users have to worry about this? If you need to have them clear cache, then something’s not working properly

Populating Caches

  • Pull based – cache tool pulls from your server
  • Push based – your code has to push out data to the cache

Cache clearing

  • Staggered approaches
  • Clearing individual items
  • Using APIs and automation

 

Leave a comment