#heweb10 - Providing More and Using Less with Caching

TPR / 3PM / Monday
Jason Fish @jasondfish, Purdue University

800 visitors to your site at one moment in time… without caching …. means 800 database calls and returns of data.

Reduce this to just ONE.

General Caching Rules

  • data used more than once
  • data not specific to a user
  • don’t overdo it
  • support failover
    • use cheaper, less reliable machines
  • Test, Test, Test

Technology Stack

Jason’s team is using ASP.NET MVC 2.0 and SQL ServerRS - code examples are for that setup

Output Caching (page caching)

  • page output
  • not user specific pages:
    • landing page
    • faq pages
  • partial page caching
    • menu
    • user profile
  • .Net Output Caching
    • Duration - time in cache
      • Required
      • Specified in seconds
      • not guaranteed to be in cache for that period of time
    • VaryByParam
      • required (can be set to none)
      • creates a different cached version for each value of parameter
      • specify multiple parameters (id; category;)
      • results in many more items in the cache

Data Caching

  • most performance bottlenecks under heavy load are related to the database
  • cache query results
  • good candidates:
    • most used
    • expensive (queries that take the most time in the database)
  • parameters — key, value and expiration date
    • Key - Where are you getting it from?

Cache Removal

  • must be put everywhere that data can be changed
  • testing is critical
  • this is the hardest part to get right
  • test, test, test …
  • Did he mention testing??
  • Two methods:
    • Push Methodology
      • invalidate -> immediately repopulate
      • reduces time for first hit
      • cache contains more data than need
    • Pull Methodology
      • invalidate cache on update and leave empty
      • longer first request
      • hottest items stay in the cache
      • items not in cache until accessed
  • Choose your method based on your specific application.

Web Cluster Problem

  • Cache is saved on each node
  • Twice as much space is needed
  • Validity of data is unknown (this should not stop you!)
  • MEMCACHED (https://memcached.org/about)
    • Free! We all have no budgets … so this is awesome.
    • Open source
    • Distributed
    • Libraries available for pretty much every language (PHP, Java, Ruby, .NET, etc)
    • Takes individual webservers (Node A and Node B) — and makes the cache pool singular
    • Twitter, YouTube, Wikipedia, Digg, WordPress, Flickr all use memcached!
  • Removal — takes the key and deletes it

Purdue is using memcached on HOTSEAT - backchannel discussion for large classes (purdue.edu/studio/hotseat)

  • Page is autorefreshed every 5 seconds
  • Without caching — approximately 300,000 database requests for a one hour class (with average of 300 students in class, page refreshing every 5 sec)
  • With caching — brought that to under 9000 requests
  • reducing traffic by 97%

Loadstorm.com

Mixable — implemented caching

  • 3 times more requests but …
  • 50% LESS database queries
  • 0 timeout errors
  • page load time down from 2.24 seconds to .35 seconds

Silver Bullet??

Will this magically solve all our problems?

  • Cache validation
  • Writing vs. reading — if heavy on writing, then caching isn’t going to fix your issues
  • dependences - making sure things play nicely together
  • query and logic optimization is still needed;  caching doesn’t fix a bad application.  It makes a GOOD application BETTER.

take aways

  • Performance / Scalability / Cost
  • Easy to Implement
  • results are life changing
    • don’t wait until an application desperately needs caching — make this part of your development cycle
  • intelligent implementation is needed
    • don’t put it everywhere … put it only where it’s needed

This post was written by:

Lacy Tite - who has written 10 posts on .eduGuru

Lacy is a web developer for Vanderbilt University (Go 'Dores!) in the  University Web Communications office (which is responsible for the Vanderbilt homepage and all top level pages - as well as providing development, design, content management, communication strategy assistance to the entire Vanderbilt community.)  Follow Lacy: twitter | VU project blog | delicious