Switch Design for Optimal Service Quality

Switch Design for Optimal Service Quality

A large number of current-generation switches and routers operate in a multi-protocol, non-blocking, packet segmentation, packet reassembly, parallel execution environment.  The system must support quality of service (QOS) levels that take into account video frame rate constraints, switch-to-switch throttling control, packet priorities, error checking/correction, and retransmission of overflow packets consistent with internal buffering.  Switches and routers also need to consider the topology and routing algorithms that can influence a switch fabric implementation or port buffering.  The refinement of a new switch or router in terms of number of ports, port buffering, protocol mix per switch port, QOS by switch subscriber, throughput, and maximum latency can all drive price-performance points in a switch design process.  This article discusses the use of performance modeling to explore and validate the design specification of a high-speed core router operation using a graphical modeling environment. The graphical model and simulation analysis were developed in approximately one week.

Early design exploration using a statistical modeling methodology can follow different approaches, listed below:

  1. The most common approach is to use statistical measurements from prior designs, and apply the lessons learned to modified or new designs.  This approach starts with the assumption that most switch performance impacts are linearly correlated.  This may or not be the case, as the contention for resources may be non-linear.    
  2. Another approach is to use either code-based or spreadsheet-driven methods.  Both of these tend to be analytical queuing models without considering the effects of concurrency and contention.  These models also require considerable post-processing effort to generate analysis output.
  3. Another approach is to use a code-base simulation environment.  These environments have a good library of networking protocols, workload generators and the capacity for large network models.  They do not provide resource modeling capability and are not suited to include implementation details.  As models have to be written using code, they require considerable time and effort.

This article focuses on the requirements for quick model construction, the attributes to be monitored and workloads to be generated.  The design goal is to create a router that minimizes or eliminates congestion and ensures real-time data transfer.  The analysis will determine the right sizing for the system attributes to maximize the Quality of Service and maintain the latency between the ingress and egress ports below a set threshold.  A model of the switch will be constructed for exploration using networking and workload generator libraries provided in VisualSim.  The port balancing across the input ports and the interaction between the ingress and egress ports will be analyzed.  The system will be evaluated for different input packet rates, data sizes, queue depths and internal crossbar bus frequency. 

VisualSim is a concept engineering software application that enables rapid exploration of networking systems for performance and power trade-off.    Models in VisualSim can be constructed using the configurable, parameterized library blocks, application-specific functions, standard component generators (processors, memory, caches, bus and switches) and a template-driven SystemC.  In addition, there are co-simulation links to Verilog, VHDL, STK, Excel and MATLAB and an open, timed-API for integrating simulators.  VisualSim optimizes the initial concept through a series of modeling refinements and abstractions to allow the best architecture to become an executable specification.    

System Overview

The proposed switch internals are show in Figure 1.  There are variable numbers of ingress and egress ports with the range from 64-1024.  12-bytes sized frames arrive at each ingress ports in a uniform distribution around 200MHz.   The input processing combines three frames into a single output frame before sending them out to the internal bus and the egress ports.  The internal bus runs at half the speed of the arrival rate of each individual port.  The ingress and egress ports have fixed length queues.  The forward queues are polled using a slot-weighted round robin inspection.  Every forward port is mapped to an output port.  Multiple forward ports are mapped to a single output port.  The mapping is stored in a routing table that is currently fixed.  The back pressure is applied when the queues on the output exceed a set threshold.


Figure 1: Block Diagram of the Traffic Management Switch

Figure 1: Block Diagram of the Traffic Management Switch

Model Parameters:

The following parameters are used for the model:

Incoming frame size = 12B                          Incoming rate = 200 MHz
Outgoing frame size = 36B                          Outgoing rate = 200 MHz

Ingress Ports = variable between 64-1024 Ingress queue depth = 10 frames
Egress Ports = variable between 8-64 Egress queue depth = 10 frames

Backward pressure trigger point = 7 frames in the output queue

Figure 2: VisualSim Block Diagram of the Switch Model
Figure 2: VisualSim Block Diagram of the Switch Model

System Model

The system model will consist of:

  1. Workload generator for the input data to the ports
  2. Input ports with associated queues
  3. Output ports with associated queues
  4. Many-to-one port mapping between input and output ports
  5. Slot-weighted polling inspection for the input ports
  6. Back pressure for the flow control
  7. Performance monitors
  8. Iteration on selection system parameter

Model Description:

VisualSim provides a series of modeling library ?(modules/blocks) that are configurable to meet the specific system requirements.  The model for the described switch system is shown in Figure 2.  The model is constructed in the following parts using specific library modules.

  1. Workload Generator: The workload generates packets of fixed size (12 bytes) and at a uniform rate of 200 MHz.  This utilizes the data structure (Transaction) generator and expression processing block.  The random generator follows a uniform distribution.  The incoming packets are written into the respective ingress ports using the Write block.  The frame is a Data Structure in VisualSim.  This is a holder that contains field-value combinations.
  2. Input and Output Ports: The input ports are defined as an array of queues with a parameterized queue depth.  The output ports combine the latency for the transfer time on the bus and the queue processing using a single definition block.  The number of queues/ ports is a separate parameter for the input and output.  Data is read from the input and output ports using the Read block. 
  3. Port mapping:  Every input port is assigned to an output port.  For a randomized generation, modulo calculation could be employed to create the mapping.  The model determines this mapping at the start of the simulation.  The assignment is fixed throughout the simulation.
  4. Polling Process for the input ports: The model scans the content of each queue in order to identify one with at least three frames.  When the scan is successful, it provides the matched port information to a secondary scan.  The inspection algorithm uses a slot weighted sequential polling process.  To speed up the simulation, the polling skips ports that do not match the three-frame criteria.  The latency associated with the polling must still be completed.
  5. Flow Control:  When a input port matches the 3 frame criteria, the corresponding output port is checked to meet the criteria. (?)
  6. Transfer from input to output: The data is removed from the Ingress Queues using the Pop mechanism and placed in the Egress queues after traversing via the Xon-Xoff control queues.  Since the actual transfer time in the Egress Queue, excluding the queuing delay, is predetermined, it is calculated and provided to the output port along with the frame. When the frame reaches the head of the queue in the output port, it is delayed by the pre-computed period and then sent out using the Read block.
  7. Performance Monitor: The input and output ports are defined using the resource library blocks in VisualSim.  These have pre-defined statistics that are generated on a trigger.  The latency is computed at the output of the output port.  This involves subtracting the current simulation time from the creation time and then presenting it to one of the real-Time Viewers.  
  8. Iterative parameters: A number of parameters are shown at the window level in Figure 2.  These parameters can be modified and the simulation can be repeated.

Analysis

The statistics outputs and packet latency plot are shown in Figures 3 to 5: one depicts standard statistics generated for each port (Figures 3, 4), and the second depicts a latency plot.  The statistics of the input and output queues indicate that the mean queue occupancy ranges between 1.09 and 2.02.  This suggests that the system can be scaled in terms of port buffering for significantly higher performance.  The latency plot shows a distinct randomness to packet delay, consistent with the network packet generation, slot-weighted polling process by the switch. As the number of input/output queues decreases, indicating reduced port processing requirements for the switch, the total latency range also decreases, although this is not explicitly shown.  This suggests that port processing might be performed in a different fashion to improve overall switch efficiency, while taking advantage of the available input queue capacity.  One possibility would be to have each port perform a distributed queue length check, and pass along a packet if the input threshold is reached, simplifying the switch fabric operation and reducing the variance of packet delays.  The model run is repeated for various values of the window parameters.  The trends demonstrate the same over a large variation of parameter values.

Figure 3: Statistics for one of the Output Ports- 3
Figure 3: Statistics for one of the Output Ports- 3
Figure 4: Statistics for one of the Input ports- 63
Figure 4: Statistics for one of the Input ports- 63
Figure 5: Real-Time Viewing of Packet Latency
Figure 5: Real-Time Viewing of Packet Latency

Meticulous design and evaluation process of a high-speed core switch using VisualSim, a powerful modeling environment, delves into various aspects of switch architecture, including input and output port configurations, queue management, and latency analysis. Through the simulation, you can explore different parameters and their impact on performance metrics such as queue occupancy and packet latency.

Notably, the analysis reveals potential scalability in port buffering, hinting at opportunities for enhancing system efficiency. You can find innovative approaches to optimize switch design for high-performance networking systems.