(译)How content delivery networks (CDNs) work

  翻译一篇Nicholas C.Zakas的文章,原文在这里。自己看和翻译还是两码事,第一次这么正经的翻译,翻译得比较晦涩,喜欢英文原文的同学可以挪步,也可以选择对照英文阅读。

  正文如下:

  Content delivery networks (CDNs) are an important part of Internet infrastructure that are frequently used without a full understanding of what’s happening behind the scenes. You’ll hear people saying, “oh, we put that on the CDN” or “make sure static assets go on the CDN,” when they have only a rudimentary idea of what CDNs are and how they work. As with most pieces of technology, CDNs are not magic and actually work in a pretty simple and straightforward manner.

  内容分发网络(CDN)是互联网基础设施中非常重要的一部分,被人们广泛使用,虽然人们没有对它内部的运行的机制有全面的理解。你可能经常听到人们说,"我把那个放在CDN上了"或"确保静态资源都放在CDN上",当他们只对CDN和它的工作原理有个基本的了解。和大多数技术一样,CDN没有多少神奇的地方,它实际上的运行其实只是一个相当简单和直接的方式。

  When a web browser makes a request for a resource, the first step is to make a DNS request. Making a DNS request is a lot like looking up a phone number in a phone book: the browser gives the domain name and expects to receive an IP address back. With the IP address, the browser can then contact the web server directly for subsequent requests (there are actually multiple layers of DNS caching, but that’s beyond the scope of this post). For your simple blog or small commercial web site, a domain name may have a single IP address; for large web applications, a single domain name may have multiple IP addresses.

  当一个web浏览器发送一个资源请求,第一步发送的是一个DNS请求。发送一个DNS请求很像在号码簿上查找一个电话号码:浏览器提供域名,要获得一个IP地址。有了这个IP地址,浏览器就可以在后续资源的请求上直接与web服务器通讯(DNS缓存实际上有很多层,但是这不是这篇文章所要讨论的)。对于你自己的一个简单的博客或是一个很小的商业网站那,一个域名可能只有一个IP地址;但对于大型的web应用,一个域名就可能对应很多的IP地址。

  Physics determines how fast one computer can contact another over physical connections, and so attempting to access a server in China from a computer in the United States will take longer than trying to access a U.S. server from within the U.S. To improve user experience and lower transmission costs, large companies set up servers with copies of data in strategic geographic locations around the world. This is called a CDN, and these servers are called edge servers, as they are closest on the company’s network to the end-user.

  物理连接上,一台计算机的物理环境决定它连接到其他计算机的速度。所以在美国连接一台在中国的服务器将比连接到一台在美国的服务器所花的时间要长。为了提高用户体验,减少远距离连接所花费的代价,很多大公司在世界范围内设立了很多存放了相同数据服务器。这就叫作CDN,这些服务器叫做边缘服务器,因为它们最贴近最终端用户的网络。

 

DNS解决方案(DNS resolution)

  When the browser makes a DNS request for a domain name that is handled by a CDN, there is a slightly different process than with small, one-IP sites. The server handling DNS requests for the domain name looks at the incoming request to determine the best set of servers to handle it. At it’s simplest, the DNS server does a geographic lookup based on the DNS resolver’s IP address and then returns an IP address for an edge serverthat is physically closest to that area. So if I’m making a request and the DNS resolver I’m routed to is Virginia, I’ll be given an IP address for a server on the East coast; if I make the same request through a DNS resolver in California, I’ll be given an IP address for a server on the West coast. You may not end up with a DNS resolver in the same geographic location from where you’re making the request.

  当浏览器发送了一个DNS请求,请求一个被CDN处理过的域名,这和它向一个小的、只有一个ip地址的网站域名发送请求在过程上有轻微的差别。服务器处理域名的DNS请求会根据进来的请求决定返回最佳的处理方式。最简单的,DNS服务器基于DNS查找者的IP地址来做一个地理查找,然后返回一个离用户最近的边缘服务器的IP地址。所以,如果DNS解析路由到我的请求是来自弗吉尼亚州,我将得到一个东海岸的服务器的IP地址,如果我在加州通过DNS解析器发送了一个同样的请求,我会得到一个西海岸的服务器的IP地址。你可能不会得到一个和你发送请求地理位置相同的DNS分析结果。

  

  That’s the first step of the process: getting the request to the closest server possible. Keep in mind that companies may optimize their CDNs in other ways as well, for instance, redirecting to a server that is cheaper to run or one that is sitting idle while another is almost at capacity. In any case, the CDN smartly returns the best possible IP address to handle the request.

  这是过程的第一步:到最近的服务器得到请求。请记住,公司可能会对他们的CDN进行其他方式的优化,例如,重定向到的服务器运行成本更低或闲置,而其他的几乎满负荷。在任何情况下,CDN返回处理该请求最好的IP地址。

 

访问内容(Accessing content)

  Edge servers are proxy caches that work in a manner similar to the browser caches. When a request comes into an edge server, it first checks the cache to see if the content is present. The cache key is the entire URL including query string (just like in a browser). If the content is in cache and the cache entry hasn’t expired, then the content is served directly from the edge server.

  边缘服务器的工作方式类似于浏览器缓存的代理缓存。当一个请求到边缘服务器,它首先检查缓存中是否还包含请求的内容。缓存的key是整个URL,包括查询字符串(就像在浏览器中)。如果内容是在缓存中,并该缓存条目没有过期,边缘服务器就会将内容直接送达出去。 

  If, on the other hand, the content is not in the cache or the cache entry has expired, then the edge server makes a request to the origin server to retrieve the information. The origin server is the source of truth for content and is capable of serving all of the content that is available on the CDN. When the edge server receives the response from the origin server, it stores the content in cache based on the HTTP headers of the response.

  另一方面,如果在请求的内容不在缓存中或缓存已经过期,边缘服务器就会向源服务器发送请求来获得内容。源服务器里内容是最正确的,是内容的最初来源。它可以提供CDN能提供的所有内容。当边缘服务器接收从到源服务器的内容,它将基于响应内容中的HTTP头建立内容的缓存。

  Yahoo! created and open sourced the Apache Traffic Server, which is what Yahoo! uses in its CDN for managing this traffic. Reading through the Traffic Server documentation is highly recommended if you’d like to learn more about how cache proxies work.

   雅虎创建和开源的Apache Traffic Server,这是雅虎用来管理其网络阻塞在它的CDN中使用的服务器。如果你想了解缓存代理是如何工作的,强力建议通读Apache Traffic Server文档。

 

举例(Example)

  For example, Yahoo! serves the YUI library files off of its CDN using a tool called the combo handler. The combo handler takes a request whose query string contains filenames and concatenates the files into a single response. Here’s a sample URL:

  例如,雅虎提供的YUI库文件它的CDN使用的工具,称为combo handler。combo handler将请求的查询字符串包含到一个单一的响应的文件名和连接文件。下面是一个示例网址:

  http://yui.yahooapis.com/combo?3.4.1/build/yui-base/yui-base-min.js&3.4.1/build/array-extras/array-extras-min.js

  The domain yui.yahooapis.com is part of the Yahoo! CDN and will redirect you to the closest edge server based on your location. This particular request combines two files, yui-base-min.js andarray-extras-min.js, into a single response. The logic to perform this concatenation doesn’t exist on the edge servers, it only exists on the origin server. So if an edge server receives this request and has no content, a request is made to the origin server to retrieve the content. The origin server is running the proprietary combo handler (specified by /combo? in the URL) and so it combines the files and returns the result to the edge server. The edge server can then serve up the appropriate content.

  该域名yui.yahooapis.com是雅虎的CDN的一部分,它将根据你的位置帮你重定向到离你最近的边缘服务器。这种特殊的请求将两个文件:yui-base-min.js和array-extras-min.js,合并成一个单一的响应。理论上来说,处理这个串联请求的内容在边缘服务器上不存在,只存在于源服务器。因此,如果一台边缘服务器接收到这个请求但它没有这个内容,它就发送请求到源服务器去检索内容。源服务器上运行着combo handler(在URL中指定成 /combo? 形式),所以它合并文件,并将结果返回到边缘服务器。以后边缘服务器就可以提供适当的内容。


什么是静态(What does static mean?)

  I frequently get confused looks when I describe systems similar to the combo handler. There is a misconception that CDNs act like FTP repositories, where you simply upload static files so that others can retrieve them. I hope that it’s clear from the previous section that this is not the case. An edge server is a proxy, the origin server is the one that tells the edge server exactly what content should be returned for a particular request. The origin server may be running Java, Ruby, Node.js, or any other type of web server and, therefore, can do anything it wants. The edge server does nothing but make requests and serve content. So the YUI combo handler exists only on the origin server and not on the edge servers.

  当我描述一个类似于combo handler的系统时,我经常感到困惑。有人认为的CDN的运行就像FTP库,你只需上传静态文件,其他人就可以拿到他们想要的静态文件,这是一种误解。我希望通过上面的阐述,你能明白,其实不是这样的。边缘服务器只是一个代理,源服务器主宰着边缘服务器到底一个的请求应返回什么样的内容。源服务器可运行Java、Ruby、Node.js或任何其他类型的Web服务,因此,可以做任何它想做的。边缘服务器除了发送请求(到源服务器)、提供内容(给请求)之外,什么也不做。所以YUI combo handler只存在于源服务器上而不是在边缘服务器。 

  If that’s the case, why not serve everything from the CDN? The CDN is a cache, meaning that is has value when it can serve data directly and not need to contact the origin server. If an edge server needs to make a request to the origin server for every request, then it has no value (and in fact, costs more than just making the request to the origin server itself).

  如果是这样的话,为什么不从CDN提供一切(内容)服务呢? CDN是一个缓存,这意味着只有直接提供数据而不需要和源服务器交互时才有价值。如果边缘服务器对于每一个请求都需要请求源服务器,那么它有没有价值的(事实上,成本比只是发送请求到源服务器本身还高)。

  The reason JavaScript, CSS, images, Flash, audio, and video are frequently served from CDNs is precisely because they don’t change that frequently. That means not only will the same user receive content from cache, but all users will receive the same data from cache. Once the cache is primed with content, all users benefit. A site’s homepage is a poor candidate for edge caching because it’s frequently customized per user and needs to be updated several times throughout the day.

   JavaScript,CSS,图像,Flash,音频,和视频一般都使用CDN服务的原因,它们的改动不是很频繁。这意味着不仅相同的用户会从缓存中收到内容,所有用户都将收到从缓存中相同的数据。一旦预先缓存了需要的内容,所有用户受益。网站的主页不需要使用边缘缓存(CDN),因为它经常需要根据每个用户的需要在一天中更新很多次。

 

缓存过期(Cache expiration)

  Yahoo! performance guidelines specify that static assets should have far-future Expires headers. This is for two reasons: first, so the browser will cache the resources for a long time, and second, so the CDN will cache the resources for a long time. Doing so also means you can’t use the same filename twice, because it may be cached in at least two places and users will receive the cached version instead of the new one for quite a while.

  雅虎性能准则指定静态的资源应该有一个未来的过期头。这有两个原因:第一,(没有过期头的话)浏览器会在很长一段时间内缓存资源,第二,CDN也将缓存的资源很长一段时间。这样做也意味着你不能两次使用相同的文件名,因为它可能至少在两个地方(CDN/浏览器)被缓存,用户将在相当长一段时间内获得旧的缓存版本而不是新的版本。

  There are several ways to work around this. The YUI library uses directories containing the version number of the library to differentiate file versions. It’s also common to append identifiers to the end of a filename, such as an MD5 hash or source control revision. Any of these techniques ensures that users are receiving the most up-to-date version of the file while maintaining far-future Expires headers on all requests.

   有几种方法可以解决这个问题。 YUI使用目录来区分文件版本,这些目录包含library的版本号。还有一种常见的方式是在文件名的末尾追加标识符的,如MD5哈希或源代码控制版本。这些技术都确保用户接收到文件是最新版本,同时保持将来所有的请求头到期。

 

结论(Conclusion )

  CDNs are an important part of today’s Internet, and they’re only going to become more important as time goes on. Even now, companies are hard at work trying to figure out ways to move more functionality to edge servers in order to provide users with the fastest possible experience. This includes a technique called Edge Side Includes (ESI) which is designed to serve partial pages from cache. A good understanding of CDNs and how they work is key to unlocking greater performance benefits for users.

   CDN是现今互联网的一个重要组成部分,它只是会随着时间的推移变得更加重要。即使现在,公司正在努力试图将更多的功能边缘服务器,以便为用户提供尽可能最快的体验。这包括所谓Edge Side Includes(ESI),其目的是为从缓存中的提供部分页面。对CDN以及它如何工作的一个深入的理解,是能否为用户带来更大的性能优势的关键。

posted @ 2011-12-03 17:17  慢热君Kevin  阅读(891)  评论(1编辑  收藏  举报