CSE 525 (Winter 2004)
Topic #25:  Packet Classification
Robert Nesius

Papers

  1. Florin Baboescu, Sumeet Singh, George Varghese, "Packet Classification for Core Routers: Is there an alternative to CAMs?",INFOCOM 2003 paper
  2. Anindya Basu, Girija Narlikar, "Fast Incremental Updates for Pipelined Forwarding Engines", INFOCOM 2003, paper
  3. Girija Narlikar, Anindya Basu, Francis Zane, "CoolCAMs: Power-Efficient TCAMs for Forwarding Engines", INFOCOM 2003 paper

Summary of Paper 1

This paper focused on improving the performance of routing architectures that used SRAM as opposed to TCAMs (Ternary Content Addressable Memory). SRAM is faster and cooler than CAMs, allowing for lower manufacturing and power consumption costs, as well as higher densities of devices in a data center. After contrasting two other routing algorithms with each other in terms of complexity and space (memory) requirements, the authors describe a new algorithm they have devised that is unencumbered by patants. The algorithm is Extended Grid-of-Tries with Path Compression (EGT-PC).

The papers three contributions are:

  1. New Characteristic A new property observed in packet classification traces of traces from production routers.
  2. New Algorithm The previously mentioned EGT-PC.
  3. Standardized ComparisonsThe authors implement competing algorithms and perform normalized tests to allow Apples to Apples comparisons.

The new characteristic is the observation that regardless of the size of a the rule-set for packet classification, all source-destination address prefixes match five to twenty rules.

The idea then is to use an algorithm to quickly derive what rules correlate to a given source-address pair and then linearly search those rules. So this is a prune-search technique. The approach used is based on the Grid of Tries algorithm referenced in a contributing paper. The authors further refine the algorithm and introduce path-compression into the search tree yielded by the algorithm.

The authors testing shows this algorithm compares very favorably to HI-Cuts, a popular algorithm employed in many production routers, in both time and space. It seems HI-Cuts may be in danger of losing some royalty revenue.


Summary of Paper 2

This paper discusses techniques to minimize the impact of updates to route tables to packet classifiers using pipelined architectures. These type of forwarding engines have multiple classification engines with their own small SRAM memory cache. Packet headers flow through these pipelines, and each engine checks the packet against its set of rules. This is an architecture seen in network processors.

This paper focuses on then on two problems.

  1. Distributing the rules evenly across the SRAM caches for the forwarding engines in the pipeline.
  2. Minimizing the updates to those SRAM caches so the packet flow is not disrupted.

The first goal is accomplished via a new algorithm named MinMaxwhich, through successive refinements, does succeed in distributing the representation of the route tables and rules across the forwarding engines in a reasonable distributed fashion.

The second goal is accomplished using write bubbles. Write bubbles are packets of rule updates that are interleaved with packets being actively classified. Each write bubble can have an update for each stage, and each write bubble can modify each stage only once on its way through the pipeline. With some refinements, the number of write bubbles is dramatically reduced.

This paper discovered during its legwork that the vast majority of updates to BGP routers occur because a subnet drops offline and then returns within four minutes.


Summary of Paper 3

This paper focuses on making forwarding engines that use TCAMs more efficient, going so far as to derive worst-case bounds for power consumption. This is of particular use to network engineers and data-center planners.

Recent TCAM-based forwarding engines have recognized the high power and space consumption costs of these architectures in addition to the heat they generate. The latest generation of these products provide ways to only activate subsets of the CAM memory so only the CAM blocks that are needed are active. The authors use the knowledge of this hardware capability to hash the route table into "buckets" that correlate to CAM blocks.

Two primary techniques are employed for this.

  1. Bit Selection - just grabbing bits from the address to use as a MASK to turn on certain TCAM blocks.
  2. Trie-based splitting - using a Trie data-structure, and carving it intelligently into different buckets.

While in theory this approach should yield an 8x power savings, the authors are prove that their worst-case scenarios do in fact provide a 4x savings in power. The bit-selection technique suffered from worst-case scenarios that were dramatically higher than the average case. The second approach yielded worst-case power-consumption bounds that were tighter to the average case.


Discussion

One key item that came out of the discussion was that the amount of power consumed to maintain the Internet infrastructure is beginning to be viewed as an environmental issue. There was a paper written on the "Green Internet" by some professors at Portland State University. I was not able to locate this paper as a reference online to include in this summary. However, it is an interesting issue.

Regarding the comparisons of EGT-PC with HI-Cuts in the first paper, it was noted that the implementation of HI-Cuts being used was written by the authors of the paper. So one wonders how well their version compares to the real production version of the algorithm.