Frank DENIS random thoughts.

A transparent, client-side fragment cache.

As a matter of fact, web sites are sharing a great amount of code between pages. Headers, footers, navigation bars, ads and search boxes keep being sent over and over again as part of every page.

A typical content-oriented web site like this one (Hi, Karen!) is actually made of very similar pages. Nothing but the central zone ever changes between recipes.

This is obviously a huge waste of bandwidth and the user experience is terrible.

Using AJAX to refresh nothing but the required parts has proven to be very efficient, especially when coupled with proper HTTP caching.

However, converting an “everything is sent in a single shot” web site to a web site where different parts are asynchronously loaded is not exactly a trivial work. It would probably require almost a complete rewrite.

Here’s another approach that might turn any web site designed the old way into a speed daemon. Best of all, it requires almost no change to the existing code.

The idea is to provide hints about fragments in the HTML code itself. Simple hints like where a fragment starts and where it ends.

Let’s start with some basic HTML code:

<!doctype html>
<html>
<head>
  <title>Transparent client-side fragment cache demo</title>
</head>
<body>
  <nav>
    Navigation bar
  </nav>
  <div id="ads">
    Ads
  </div>
  <section id="hey">
    I ain't no <a href="/">fragment</a>!
  </section>
  <script src="disco.js"></script>
</body>
</html>

A simple web page, sent in a single shot. The navigation bar, the ads and the “hey” section are different parts that would actually require different caching strategies. It’s likely that the “hey” section changes on every page while others remain the same.

So let’s provide some hints about fragments. Custom tags would violate HTML5 so let’s just add some comments:

<!doctype html>
<html>
<head>
  <title>Transparent client-side fragment cache demo</title>
</head>
<body>
<!--fragment navbar-->
  <nav>
    Navigation bar
  </nav>
<!--/fragment-->
<!--fragment ads-->
  <div id="ads">
    Ads
  </div>
<!--/fragment-->
  <section id="hey">
    I ain't no <a href="/">fragment</a>!
  </section>
  <script src="disco.js"></script>
</body>
</html>

Now, here comes the trick.

Server-side, we’re going to slightly alter these comments before sending them to clients in order to include a signature of the content.

Thus, the previous code is going to be sent as something like:

<!doctype html>
<html>
<head>
  <title>Transparent client-side fragment cache demo</title>
</head>
<body>
<!--fragment navbar-290d996311209f1897516b2caa2cc611-->
  <nav>
    Navigation bar
  </nav>
<!--/fragment-->
<!--fragment ads-bd779001f9cad4bfb74e563eb6bbf5c5-->
  <div id="ads">
    Ads
  </div>
<!--/fragment-->
  <section id="hey">
    I ain't no <a href="/">fragment</a>!
  </section>
  <script src="disco.js"></script>
</body>
</html>

In this example, a signature is just a MD5 digest of the fragment included in a comment.

Once the client has received the data, it parses it with some basic Javascript code in order to retrieve two critical things about every fragment: its name+digest and the inner content.

Thanks to the HTML5 localStorage, this data can be made persistent. And it’s as easy as storing the inner content as a value for the name+digest property.

By the way, we keep track of every name+digest we found and stored permanently. Related web pages could really reuse this content, without downloading it.

Without downloading it? But how would the server know? Easy. Some Javascript code can rewrite every link in order to add the list of client-cached fragments.

The

  <section id="hey">
    I ain't no <a href="/">fragment</a>!
  </section>

part gets dynamically rewritten as:

  <section id="hey">
    I ain't no <a href="/?_fragments=navbar-290d996311209f1897516b2caa2cc611+ads-bd779001f9cad4bfb74e563eb6bbf5c5">fragment</a>!
  </section>

With such an info, the server can easily spot fragments that the client has already in its local cache. Instead of sending the actual content, we’re just going to sent references.

Following the previous link displays the very same page, but the code is slightly different:

<!doctype html>
<html>
<head>
  <title>Transparent client-side fragment cache demo</title>
</head>
<body>
<!--cached navbar-290d996311209f1897516b2caa2cc611-->
<!--cached ads-bd779001f9cad4bfb74e563eb6bbf5c5-->
  <section id="hey">
    I ain't no <a href="/">fragment</a>!
  </section>
  <script src="disco.js"></script>
</body>
</html>

Indeed, in addition to the fragments delimiters, the client can also parse insertion points, and replace them with actual content.

The advantages of this techniques are huge:

  • It is trivial to implement

  • It can easily be added to virtually any existing web site

  • There’s no need to keep track of fragments ages. May the content change, the signatures will change and browsers will automatically get correct versions

  • Unlike AJAXified pages that require duplicate efforts in order to be SEO friendly, this technique is totally SEO friendly by default.

Here’s a proof-of-concept implementation.

Server-side

FRAGMENT_RX = %r{<!--\s*fragment\s+(.+?)\s*-->(.+?)<!--\s*/fragment\s*-->}mi

get '/' do
  _fragments = params[:_fragments] || ''
  fragments = Hash[*_fragments.split.collect { |t| t.split('-') }.flatten]
  page = File.read(File.join('public', 'disco.html'))
  page.gsub!(FRAGMENT_RX) do |match|
    fragment_name, fragment_content = $1, $2
    digest = Digest::MD5::hexdigest(fragment_content)
    if digest == fragments[fragment_name]
      "<!--cached #{fragment_name}-#{digest}-->"
    else
      "<!--fragment #{fragment_name}-#{digest}-->#{fragment_content}<!--/fragment-->"
    end
  end    
end

Client-side

(function() {
    var body_html = document.body.innerHTML;
    var fragment_rx = new RegExp
        ("<!--\\s*fragment\\s+(.+?)\\s*-->([^]+?)<!--\\s*/fragment\\s*-->", "gm");
    var cache_rx = new RegExp("<!--cached (.+?)-->", "g");
    var need_update = false;
    var known_fragments_names = { };
    body_html = body_html.replace(cache_rx, function(str, fragment_name) {
        known_fragments_names[fragment_name] = true;
        var cached_content = window.localStorage[fragment_name];
        if (cached_content) {
            need_update = true;
            return cached_content;
        }
        return str;
    });
    var match;
    while ((match = fragment_rx.exec(body_html))) {
        var fragment_name = match[1];
        window.localStorage[fragment_name] = match[2];
        known_fragments_names[fragment_name] = true;        
    }
    if (need_update) {
        document.body.innerHTML = body_html;
    }
    var fragments_qstr = "";
    for (var fragment_name in known_fragments_names) {
        fragments_qstr += (fragments_qstr ? "+" : "") + escape(fragment_name);
    }
    var link_nodes = document.getElementsByTagName("a");
    for (var i = 0, j = link_nodes.length; i < j; i++) {
        var href = link_nodes[i].href;
        href += (href.indexOf("?") == -1 ? "?" : "&") + 
            "_fragments=" + fragments_qstr;
        link_nodes[i].href = href;
    }    
})();

Of course there’s still room for improvement. Cached entries should expire. Pages with partials missing from cache should be reloaded. localStorage is sweet but other ways are required for obsolete browsers.

But this technique seems to work decently well and may effortlessly give a serious boost to aging web sites.