CSE 581 Internet Technologies Date: 01/16/02
Content Distribution Network (CDN) and Performance
Introduction
The Content Distribution Network (CDN) is a technology to offload some or all of the content delivery burden from the origin server. An origin server is one, which is primarily responsible for the content and a content delivery. For example in case of the URL www.cse.ogi.edu/index.html, a server www.cse.ogi.edu is responsible for the content delivery, which is running some web server. In case of the CDN, a server who delivers a content on behalf of the original server is called a CDN server. Effectively an origin server redirects a request to the appropriate CDN server. There are several advantages of by delivering a content from the CDN server rather than an origin server; first, a requesting client and the origin server might be located quite apart (geographically or topologically) from each other and client perceived latency would be unacceptable. In such cases content is being delivered from a CDN server, which is hopefully located near to the client. Secondly, at some times (e.g. during peak hours daily or some peak time of the year etc.) an origin server tend to get overwhelmed by the incoming requests and even might not able to deliver a content to the nearest client in reasonable amount of time. The CDNs address these issues coupled with side effect of the improved caching.
Request redirection
There are two main ways to redirect the requests
The CDN provider controls an authoritative DNS server. The (embedded) URLs in the origin server content points to CDN provider’s server (e.g. this process is called ‘Akamization’ in case of Akami CDN provider). At the client site, DNS lookup resolves to the one of nearest (or least loaded) CDN server out of many located throughout geography. At the CDN provider’s authoritative DNS server some load balancing policy is being played, in order to balance load among the CDN servers and provide minimal latency at the client site.
In this type of request redirection, embedded URLs of the web page (e.g. index.html) is being rewritten before sending it to the client. A direct IP address or fully qualified canonical server name is employed in the practice. Advantage of using direct IP address is that it saves time in the DNS lookup.
The CDN providers are third party vendors. Few to name here are Akamai, DigitalIsland, Adero, Clearway etc. This vendors employ combination of either full site DNS redirection (entire web content is served from the CDN server) or partial site DNS redirection (main page comes from the origin server but mainly static objects like images etc. being delivered from the CDN server). Some vendors employ URL rewriting or combination of the URL rewriting and DNS redirection.
Analytical Model
A [GA03] presents an analytical model for the cache hit ratio of the middle-tier, e.g. proxies between client and a server is the middle-tier. Similarly CDN server between a client and the server is a middle-tier too. Their model takes various inputs, viz. object popularity, number of leaf nodes, client population etc. Model validated with the NLANR cache hierarchy at the ‘root’ level (considering all root level cache as an unified cache). 32% cache hit ratio in Oct 1999
Performance
There are three matrices, which are relevant to the CDN performance, namely
Only first matrix can be evaluated with the black-box testing, i.e. given no access to the working internals of the CDN. There are two factors, which governs a client perceived latency.
A download time can be measured in terms of the improved latency by downloading from the CDN server, rather then the origin server. A [JO01] compared Akamai and Digital Island performance, they found out that download time is highly dependent on the client location. Besides not the best CDN server is chosen every time, however 90% of the times ‘reasonably good’ CDN server is chosen. A [KR02]’s tests shows that a new CDN server doesn’t yield lower latency than that of the earlier CDN server.
CDN providers use small TTL, which enables them finer control over the CDN server load. This and change in CDN server puts an extra overhead of the DNS lookup, especially for the CDN vendors who doesn’t use direct IP address, only Clearway uses direct IP address. A study conducted by [KR02] reveals DNS lookup could add up to 8 secs to client latency.
Summary
CDN is a way to manage web workload and client latency. However CDN is mainly deployed for the static contents. The 98% of CDN bytes delivered are images etc. It appears that deployment of the CDN is on the stiff rise. By Dec 2000 [KR02] 25% of the popular sites already has deployed a CDN. Also improved caching is a plus, hit ratio is 30-80% for CDN served sites, whereas only 25-60% for non-CDN served sites.
References
|
[JO01] |
The Measured Performance of Content Distribution Networks K. Johnson, J. Carr, M. Day, M. Kaashoek. |
|
[KR02] |
On the Use and Performance of Content Distribution Networks B. Krishnamurthy, C. Wills, Y. Zhang. |
|
[GA03] |
Web Caching and Content Distribution: A View from the Interior S. Gadde, J. Chase, M. Rabinovich |