Enterprise Caching Summary

Background

Caching is an very important topic in enterprise-level multi-tier application, especially for web application. A general rule for applying caching is you should consider do necessary caching at each tier of your application if possible. The other basic rule for applying caching is the closer the cache is near the user, the cheaper the cost to implement it is. For example, to implement browser side caching is much cheaper than at server side.

For a B/S application, from client to server side, possible tiers to implement caching include:

  • Browser: local sandbox cache, memory cache
  • Network Router: CDN
  • Web Server: HTTP output cache, memory cache, distributed cache
  • Application Server: memory cache, distributed cache
  • Database Server: memory cache, file system cache

The Cache-Control HTTP Headers‎ standardizes the cache policy from browser, to network router and to Web server. Different Web browsers implement their internal caching according to the standard Web protocols.

CDN helps us cache and distribute content through Network routers between browsers and Web servers automatically.

Modern database systems could manage the caching of database server well enough by themselves.

So the only places need additional implementation for caching are in our application code, for a B/S application it may include:

  • Browser side code (scripts, Flash/Silverlight, Java Applet/MS ActiveX/Smart Client, etc)
  • Application code
  • Database query code

I don’t want to talk much about caching in browser side code and database query code, because browser side caching more means how many memory you want the browser to occupy at client machine and there are no “thread synchronization” problems to be considered; and database side caching has its own guideline according to the database query language standards and different database system implementation.

So what left, worth to be discussed more is the caching in main application code. For a .NET C# application, this should mean the C# code to be compiled into executables to be deployed onto different tier servers. To be simple for discussion, let’s make our discussion below focus on .NET C# application only.

Caching in Application Code

1. Output Cache

The word “output cache” seems be invented by Microsoft together with ASP.NET. But its principle is simple. It caches the entire HTTP response message by a key which consists of a specified set of fields in a HTTP request. Since with output cache, a HTTP request could get the response message without further execution if the key exists in the output cache active key list, the performance of the Web server could be improved a lot. Especially, from II6, page level output cache could be dealt at IIS instead of at the ASP.NET ISAPI, there for, the performance is even much better than before.

So, as a general rule, for any request, if the response is possible to be reused to some extent, you should consider use output cache. But before use it, according to the “the closer the cheaper” rule, you should consider apply caching at “closer” places than Web server first. For example, practically, if possible, pay for CDN to extend the benefit of output cache out of your Web server to 3rd party CDN servers.

2. ASP.NET Cache Class and EntLib Caching Application Block

Behind the output cache, is the caching for application data. ASP.NET provides the Caching class, and similarly, the EntLib provides the caching application block. Each of the Cache class implements not only a thread safe hash table, but also common cache expiration policies. But so far, there are still not build-in cache replacement algorithms, such as FIFO & LRU, which are required by most real enterprise application. That’s why I  implements a set of cache classes with LRU algorithms in NIntegrate. And either of the Cache classes is still in-process caching only, which means, in a load-balanced farm, the application data cached in Cache instances may exists duplicated in each of the server in the farm.

3. Caching Service & Distributed Hash Table

If you want to shared cached data among processes and even among servers, you need shared caching service in separate process or distributed hash table.

A caching service is a service wrapping a hash table executing in a separate process, but store shared data among different processes on the same server. For example, I could implement it as a namedpipe binding based WCF service, exposing methods for operating an internal hash table, which stores shared application data of 3 ASP.NET application deployed on the same Web server.

Furthermore, a distributed hash table is more like a caching service be deployed on a separate server or even farm, which stores shared application data of different applications deployed on different servers and farms. The biggest benefit of distributed hash table is the cached data is not duplicated in different server, so you could expire cached data centrally. But you should realize that the performance of a distributed hash table is worse than a caching service deployed on the same server of an application, and of course far worse than in-process caching. But in practice, because in most cases, the performance bottle neck of enterprise applications is database server, and compare to querying database, the performance of distributed hash table is still much faster and  it could be scale-out easier than database, so distributed hash table could still benefit a lot on performance and is a indispensable part of enterprise applications.

posted @ 2010-02-23 23:52  Teddy's Knowledge Base  Views(1709)  Comments(1Edit  收藏  举报