That Conference 2018, Kalahari Resort, Lake Delton, WI
An Extended Explanation of Caching – Tom Cudd
Day 3, 8 Aug 2018 2:30 PM
Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2018 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.
Executive Summary
- A number of places where you can do caching
- Various tools for various places
- Really all about 1) populating, 2) invalidating
Caching examples
- Network tab in browser, pages from memory or disk cache
- Content from CDN
- Special applications – e..g. Varnish
- WordPress plugin – specify cache settings for blog
Caching is Like Regex
- If you’re not careful, you’re going to have new problems
Application Architecture
- “Christmas tree” showing server, cache servers, CDN, load balancers, etc.
Problem
- Caching solves one problem–performance
- 3-sec rule: user leaves if something takes longer than 3 secs
- 40% of users leave
Another reason
- Buy some time
- e.g. before application crashes
- Cost-benefit analysis–caching costs vs downtime
Measure First
- User metrics
- Load times
- Site no crashing every 5 minutes
- S.M.A.R.T. Goals – specific measurable achievable relevant time-bound
How to Not Suck at Caching
- Caching is additive
- Can’t just throw caching onto server that’s already overloaded
- Don’t over-engineer
- Use the simplest solution that satisfies the requirements
- Measure, change, test, measure
Caching Doesn’t Help
- Oversaturation
- Network hardware
- Thread death–“virus”, creating multiple threads for each request
- Thundering herd
- If you get large number of initial requests, a large number of them go all the way back to server, since it takes a little time to cache the data
- Lack of information
- Bad decisions
End Users
Browser Caching
- Setting request headers–page tells browser to cache a page
- Requires a hit to operating system
- Unique naming
- Rename an assets to force download of long cached object
Set with Web Server
- E.g. est cache-control for certain pages
External/Edge Services
- CDN – Content Delivery Network
- Point your URI to 3rd party server
- They then pull files, as needed, from your server
- Most big sites on the web run on CDN
- Improve page load time
- Serve request from server that has shortest travel time to client
- High availability
- Cached data could still be present even when your origin server hiccups
- Can maybe buy CDN servies at lower cost than scaling out actual servers
CDN Features
- Setting TTLs, other configs based on extension
- Region, language routing
- Mobile detection
- WAF/Security
- Applications can do double duty–firewall + CDN
- DDOS protection
CDN Providers
- Cloudflare
- Incapsula
- Cloud providers
- Akamai
- Fastly
Akamai
- Match file extensions
- Honoe cache control of origin
- Can have configs for some stuff on your server
Measure and Test
Your Systems
- Don’t over-engineer–get strange results
- Spin up different instance if you need different purpose
Varnish
- Separate servers
- Both memory and disk paging
Web/App Layer
- Baked vs fried
- Building on demand vs preloading cache
- Cache gen’d on demand when request hits (fries)
- Preload cache ahead of time (baked)
- Disk vs memory
Products
- Adobe Experience Manager
- Sitecore
- Prefetch, data, item, html caches
- Drupal
- Performance caching (SQL queries stored)
Output Caching
- IIS- – generate on request
App/Data Layer
- Reducing database calls
- Reducing API calls
Ehcache
- Java based applications
- Memory or disk
Memcached
- In memory, key/value store
- Reduce database load
- Not synchronized
- Scalable separately
Let’s Not Fight
- Not worth arguing about which is better
- Memcached, Redis, Mongo
- All are good
Database
- Redis
- Firebase
- Stored procedures – reduce network bandwith by calling short sproc rather than sending long query
Challenges
- Invalidating stale data
Caching Procedures
- Only two things you’ll really need to do with caching
- Populating
- Invalidating
Debugging, need to remind people to clear their cache
- [Sean] But should users have to worry about this? If you need to have them clear cache, then something’s not working properly
Populating Caches
- Pull based – cache tool pulls from your server
- Push based – your code has to push out data to the cache
Cache clearing
- Staggered approaches
- Clearing individual items
- Using APIs and automation