Skyrock Spot is an iPhone app whose UI mixes native components and web views.
And something was definitely wrong with the latter. Browsing web pages was dog slow, although they all share the same CSS and images.
How come? Nginx was properly configured to send caching headers with static content. And scripts producing dynamic content were correctly sending caching headers as well. To top it off, on the development platform, Webkit’s web inspector and other tools didn’t find any significant issue with cacheability.
But on the production platform, it was a mess.
For some reason, elements were sometimes served as a 302 code, and on a subsequent request, like one second later, they were not. Even though the expiration date was way off. And the pattern looked like random.
Here were the two root causes.
Anyone with a clue knows that deploying code with something like “svn update” plenty sucks and can have loads of pesky side effects. But the fact that it can annihilate HTTP caching, hence ruin performance, is not a widely documented one.
The static files of the Spot application are served by two hosts. After a new release, the “svn update” command is run on both. Consistency apart (something trivial to fix with a symbolic link), it might sound acceptable.
But there’s a catch. While “svn update” effectively deploys the same data on every host it is run on, it doesn’t keep the metadata. And in particular, the modification time can differ. And they did.
Here’s how cacheable elements ended up as being served with a 200 code:
1st request:
- Client, to Load balancer: “hey, gimme /main.css”
- Load balancer, to Host A: “hey, gimme /main.css”
- Host A: “Here it is. One last thing: Last-Modified at 13:29:12”
2nd request:
- Client, to Load balancer: “hey, gimme /main.css If-Modified-Since 13:29:12”
- Load balancer, to Host B: “hey gimme /main.css If-Modified-Since 13:29:12”
- Host B: “Here’s the complete thing, since the local file’s modification date is 13:30:23 here”
A simple “touch” command with the same reference date on both hosts immediately solved the caching issues.
Lesson learned:
-
deploying code means that data AND METADATA should be identical on every host you deploy on.
-
configure your load balancer for stickiness. Both for static and for dynamic content. If, for some reason, an application server has its clock one second off, the caching mess is bound to arise, just like it does with static content.