Accelerating Architecture Exploration for FPGA Selection

Accelerating Architecture Exploration for FPGA Selection and System Design

You can create an optimized specification that achieves performance, reliability, and cost goals for use in production systems.

Performance analysis and early architecture exploration ensures that you will select the right FPGA platform and achieve optimal partitioning of the application onto the fabric and software. This early exploration is referred to as rapid visual prototyping. Mirabilis Design’s VisualSim software simulates the FPGA and board using models that are developed quickly using pre-built, parameterized modeling libraries in a graphical environment.

These library models resemble the elements available on AMD-Xilinx ® FPGAs, including ARM Cortex, and MicroBlaze™ processors; AMBA AXI Bus; DMA; interrupt controllers; DDR; BRAM; LUTs; DSP48E; logic operators; and fabric devices. The components are connected to describe a given AMD-Xilinx Kintex/Ultrascale platform and simulated for different operating conditions such as traffic, user activity, and operating environment.

More than 200 standard analysis outputs include latency, utilization, throughput, hit ratio, state activity, context switching, power consumption, and processor stalls. VisualSim accelerates architecture exploration by reducing typical model development time from months to days.

I can illustrate the advantages of early architecture exploration with an example from one of our customers, who was experiencing difficulty with a streaming media processor implemented using a Kintex/Ultrascale™ device. The design could not achieve the required performance and was dropping every third frame. Utilization at all of the individual devices was below 50%. A visual simulation that combined both the peripheral and the FPGA identified that the video frames were being transferred at the same clock sync as the audio frames along a shared internal bus.

As the project was in the final stages of development, making architecture changes to address the problem would have delayed shipment by an additional six months. Further refinement of the VisualSim model found that by giving the audio frames a higher priority, the design could achieve the desired performance, as the audio frames would also be available for processing. The project schedule was delayed by approximately 1.5 months.

If the architecture had been modeled early in the design cycle, the design cycle could have been reduced by 3 months, eliminating the 1.5 month re-spin to get to market approximately 5 months sooner. Moreover, with a utilization of 50%, control processing could have been moved to the same FGPA. This modification might have saved one external processor, a DDR controller, and one less memory board.

Rapid Visual Prototyping

Rapid visual prototyping can help you make better partitioning decisions. Evaluations with performance and architectural models can help eliminate clearly inferior choices, point out major problem areas, and evaluate hardware/software trade-offs. Simulation is cheaper and faster than building hardware prototypes and can also help with software development, debugging, testing, documentation, and maintenance. Furthermore, early partnership with customers using visual prototypes improves feedback on design decisions, reducing time to market and increasing the likelihood of product success (Figure 1).

Figure 1 – Translating a system concept into rapid visual prototyping
Figure 1 – Translating a system concept into rapid visual prototyping

A design level specification captures a new or incremental approach to improve system throughput, power, latency, utilization, and cost; these improvements are typically referred to as price/power/performance trade-offs. At each step in the evolution of a design specification, well intentioned modifications or improvements may significantly alter the system requirements. The time required to evaluate a design modification before or after the system design process has started can vary dramatically, and a visual prototype will reduce evaluation time.

To illustrate the use of the rapid visual prototype, let’s consider a Layer 3 switch implemented using a Kintex/Ultrascale FPGA. The Layer 3 switch is a non blocking switch and the primary consideration is to maintain total utilization across the switch.

Current Situation

In product design, three factors are certain: specifications change, non deterministic traffic creates performance uncertainty, and AMD-Xilinx FPGAs get faster. Products operate in environments where the processing and resource consumption are a function of the incoming data and user operations. FPGA based systems used for production must meet quality, reliability, and performance metrics to address customer requirements. What is the optimal distribution of tasks into hardware acceleration and software on FPGAs and other board devices? How can you determine the best FPGA platform to meet your product requirements and attain the highest performance at the lowest cost?

Early Exploration Solution

VisualSim provides pre-built components that are graphically instantiated to describe hardware and software architectures. The applications and use cases are described as flow charts and simulated on the VisualSim model of the architecture using multiple traffic profiles. This approach reduces the model construction burden and allows you to focus on analysis and interpretation of results. It also helps you optimize product architectures by running simulations with application profiles to explore FPGA selection; hardware versus software decisions; peripheral devices versus performance; and partitioning of behavior on target architectures.

Design Optimization

You can use architecture exploration (Figure 2) to optimize every aspect of an FPGA specification, including:

  • Task distribution on MicroBlaze and ARM Cortex processors
  • Sizing the processors
Figure 2 – Architecture model of the FPGA platform and peripheral using VisualSim FPGA components
Figure 2 – Architecture model of the FPGA platform and peripheral using VisualSim FPGA components

An analysis conducted using VisualSim includes packet size versus latency, protocol overhead versus effective bandwidth, and resource utilization.

In reference to the Layer 3 example, your decisions would include using:

  • The on-chip ARM Cortex A72 or external processor for routing operations
  • Encryption algorithms using the DSP function blocks or fabric multipliers and adders
  • A dedicated MicroBlaze processor for traffic management or fabric
  • ARM Cortex A72 for control or proxy rules processing
  • TCP offload using an external coprocessor or MicroBlaze processor

Can a set of parallel MicroBlaze processors with external SDRAM support inline spyware detection? What will the performance be when the packet size changes from 256 bytes to 1,512 bytes? How can you plan for future applications such as mobile IP? You can extend the exploration to consider the interfaces between the FPGA and board peripherals, such as SDRAM. As the ARM Cortex A72 will be sharing the bus with the MicroBlaze processor, the effective bus throughput is a function of the number of data requests and the size of the local block RAM buffers. For example, you could enhance the MicroBlaze processor with a coprocessor to do encryption at the bit level in the data path. You could also use the AMBA AXI bus to connect the peripheral SDRAM to the ARM A72 while the Network-on-chip (NoC) is used for the MicroBlaze processor.

You can reuse the VisualSim architecture model for exploration of the software design, identifying high resource consumption threads, balancing load across multiple MicroBlaze processors, and splitting operations into smaller threads. If a new software task or thread has data dependent priorities, exploration of the priorities and task-arrival time on the overall processing is a primary modeling question. If you change the priority on a critical task, will this be sufficient to improve throughput and reduce task latency?

In most cases, this will be true, but there may be a relative time aspect to a critical task that can reduce latencies on lower priority tasks such that both benefit from the new ordering.

If peak processing is above 80% for a system processing element, then the system may be vulnerable to last minute tasks added, or to future growth of the system itself.

Model Construction

System modeling of the Layer 3 switch (Figure 3) starts by compiling the list of functions (independent of implementation), expected processing time, resource consumption, and system performance metrics. The next step is to capture a flow diagram in VisualSim using a graphical block diagram editor (Figure 3). The flow diagrams are UML diagrams annotated with timing information. The functions in the flow are represented as delays; timed queues represent contention; and algorithms handle the data movement. The flow diagram comprises data processing, control, and any dependencies.

Data flow includes flow and traffic management, encryption, compression, routing, proxy rules, and TCP protocol handling. The control path contains the controller algorithm, branch decision trees, and weighted polling policies. VisualSim builds scenarios to simulate the model and generate statistics. The scenarios are multiple concurrent data flows such as connection establishment (slow path); inline data transfer after setup of secure channel (fast path); and protocol and data-specific operation sequences based on data type identification.

Figure 3 – Flow chart describing the application flow diagram in VisualSim
Figure 3 – Flow chart describing the application flow diagram in VisualSim
Figure 4 –Analysis output for the Layer 3 switch design
Figure 4 –Analysis output for the Layer 3 switch design

You can use this model of the timed flow diagram for functional correctness and validation of the flows. VisualSim uses random traffic sequences to trigger the model.

The traffic sequences are defined data structures in VisualSim; a traffic generator emulates application-specific traffic. This timed flow diagram selects the FPGA platform and conducts the initial hardware and software partitioning. The flow diagram model defines the FPGA components and peripheral hardware using the FPGA Modeling Toolkit.

The functions of the flow diagram are mapped to these architecture components. For each function, VisualSim automatically collects the end-to-end delay and number of packets processed in a time period. For the architecture, VisualSim plots the average processing time, utilization, and effective throughput (Figure 4). These metrics are matched against the requirements. Exploration of the mapping and architecture is possible by varying the link and replacing the selected FPGA with other FPGAs.

The outcome of this effort will be selecting the right FPGA family, correctly sizing the peripherals and the right number of BRAMs, LUTs, DSP, and MicroBlaze processors. You can add overhead to the models to capture growth requirements and ensure adequate performance.

Conclusion

Early architecture exploration ensures a highly optimized product for quality, reliability, performance, and cost. This provides direction for implementation plans, reduces the amount of tests you need to conduct, and has the ability to shrink the development cycle by almost 30%.

VisualSim libraries of standard FPGA components, flow charts defining the behavior, traffic models, and pre-built analysis probes ensure that system design is no longer time consuming, difficult to perform, and providing questionable results. The reduction in system modeling time and availability of standard component models provides a single environment for designers to explore both hardware and software architectures.

 

For the free trial of the FPGA Modeling Toolkit register at:

https://www.mirabilisdesign.com/download-login/

 

To learn more about VisualSim, visit www.mirabilisdesign.com, where there are models embedded in the HTML pages. You can modify parameters and execute from within your web browser without having to download custom software.

 

To access VisualSim Cloud, you can register here- https://www.mirabilisdesign.com/visualsim-cloud-login/