CSE
525 (Winter 2004)
Topic #16: Application Measurement
Bobin John
[1] Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble,
Henry M. Levy, John Zahorjan, "Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload" SOSP 2003
paper
Summary: This paper seeks to understand the details of multimedia workloads
in P2P applications. This paper can be divided into different stages as follows:
- Trace Collection: A trace of KaZaa traffic was collected for more than 200 days.
- Characteristics Inferred:
- User Charactersitics:
- Users are patient in retrieving their requested multimedia files.
- Over time, users slow down in the number of files requested or in their overall activity.
- Client transfer activity was shown relative to the clinet's lifetime & to the entire trace.
- Object Charactersitics:
- KaZaa workload is a blend of "small", "medium" & "large" objects' workload.
- Clients fetch a particular file atmost once.
- Popularity of a particular file is often short-lived.
- The most popular objects may be the recently-born objects.
- Most client request are for older objects.
- KaZaa object popularity does not follow the Zipf pattern.
- Non-Zipf distribution: In this section of the paper we see detailed explanation on why the popularity distribution
of the KaZaa objects does not follow Zipf. Two main reasons are the immutability of the objects and the fetch-at-most-once behavior
of clients. The KaZaa workload is also compared against other non-Zipf workloads like Web workloads, VoD server worload etc.
- Model:
- This is a basic model of a P2P file-sharing system.
- Makes use of parameters to simulate the P2P environment.
- Findings from the model simulation:
- The fetch-at-most-once client behavior causes the file-sharing effectiveness to reduce.
- As new objects are born, cache hit performance improves.
- Arrival of new clients does not stabilize performance, in fact it reduces hit rate.
- Model validation: Used the same model by copying the object popularity distribution from the collected trace.
Then the popularity distributions from the model & that from the trace were compared.
- Locality-Awareness: How well can we exploit the locality attribute of a P2P file-sharing workload?
- Measuring Locality in the workload: can be done with the help of a proxy cache. But this brings other legal
and/or political problems with it.
- Methodolgy: It can satisfy client requests by other clients within the same network. Has to keep track of
which peer is currently available & which peer has the requested file.
- Benefits: Saves bandwidth, improves hit-ratio
- Increasing the availability of peers helps improve the hit rate & thereby the overall performance of the
P2P application.
- Drawbacks:
- A KaZaa trace should not be generalized to all P2P workloads
- The KaZaa client behavior in a university does not represent the client behavior of the whole world
- Locality-awareness was simulated and not implemented.
[2] C. Dewes, A. Wichmann, A. Feldmann, "An Analysis of Internet Chat Systems", IMC 2003
paper
Summary: This paper talks about Internet Chat traffic and even presents a characterization
of this trace. The paper could be summarised as follows:
- Different types of chat systems:
- Internet Relay Chat [IRC]: uses a set of servers, clients, discussion channels. channel operators, IRC operator etc.
- Web-based chat systems: a single server with a web browser interface which is more "social" than IRC.
- ICQ & AIM: are stand-alone client applications which have to be downloaded & installed on the local host. They use UDP as the
transport protocol.
- Gale: encrypts its messages cryptographically
- Identifying chat traffic:
- IRC:monitor traffic on port 6667, mainly small packets, session lasts for more than a few minutes
- Web-chat: very difficult to implement, monitor cache-control headers, monitor scripting languages being used,
or even use of applets, use of "chat" in filename, script, page, path etc.
- Overall strategy: monitor 9 properties, identify & remove non-chat traffic, arrange steps such that initial steps filter out
more traffic than the later steps
- Experiment:
- performed at University of Saarland, lots of physical resources were needed, different traces were collected
- Validation:
- Recall - ability of a system to present all chat-traffic in final trace: calculated to be 91.7%
- Precision - ability of a system to present only chat-traffic: calculated to be 93.1%
- Results from the trace collected:
- Session durations: IRC sessions were longer than Web-chat sessions
- Interarrival times of chat sessions: frequency with which users connect to chat servers was consistent with an exponential distribution
- Interarrival times of chat messages: typically for Web-chat it is between 1 & 10 seconds, and for IRC it is usually longer
- Packet sizes: for Web-chat & IRC traffic it is much smaller than the average TCP packet size
- Transmitted & received bytes per session: a typical session has a client receiving 10 times as much data as he sends
- Drawbacks:
- It wasn't clear how the information-retreival aspects of "recall" & "precision" were most-suited for validating the trace collected.
- There doesn't seem to be any useful future-work that could be derived out of this research