Hardware Software Partitioning- Simulations and Discussions

Hardware Software Partitioning - Simulations & Discussions

Hardware/software partitioning is the concept of dividing an application’s computations into a part that executes sequential instructions on a microprocessor (the “software”) and a part that runs parallel circuits on some IC fabric like an ASIC or FPGA (the “hardware”), such as to achieve design goals set for metrics like performance, power, size, and cost. The circuit part commonly acts as a coprocessor for the microprocessor. The partitioning is done in the earliest stages of the design; at the stage where there is the greatest possibility for changes. Hardware software partitioning tries to exploit the synergy of hardware and software. Using the VisualSim Architect simulation tool, designers can construct models of their proposed system and conduct performance, power and functional analysis.

In the hardware software partitioning model, we have analyzed the behavior of hardware and the software partitioning schemes for performance, power consumption, and the number of video frames received. In this model, we have considered two different approaches to meet the requirements of better power consumption, one is to process all applications as software tasks, and the other is to move to hardware accelerators to improve the performance using a feature called rotate frame.

We are mainly focusing on two metrics:  the number of video frames and the total power consumed.

  1. The following have been considered while modeling:- HW architecture constitutes processor, cache, memory, bus, and peripheral modules.
  2. SW architecture constituting dummy drivers mapped to C code and a real use case with SW blocks as models
  3. The HW -SW partitioning with the power consumption and the results which show the trade-offs for the power consumed and the number of video frames received.

Simulation Model

The VisualSim Architect model for the hardware –software partitioning.

VisualSim Architect is modeling and simulation software used for the architecture exploration of electronics systems and semiconductors. 

Hardware software partitioning is an example of  performance modeling using a set of standard IP blocks and a custom behavior flow diagram. The purpose of this model is to select the hardware platform for the MPEG video application. The requirement here is for processing 13,000 videos with in 10 msec  with less than 1 watt of power.

The model has a behavior definition, hardware architecture, and  mapping. A system’s behavior is how it responds to a request, whereas a use case describes the system’s functionality. For the hardware architecture, the power table block, which is in conjunction with the battery, is used to analyze the power consumption, battery discharge, dynamic system changes, and  power state changes of the devices, which impact the system timing. The power table block helps study and model the power infrastructure. The hardware platform has the ARM , AHB, AXI, SRAM, Flash, SDARM, and hardware accelerator.

For the software architecture, we have defined the behavior definition as a flow diagram, which are represented by the green block at the bottom. Flow diagram approach of defining the software behavior flow is the most reliable and advantageous in terms of representation and mapping. In the software architecture, the use case A is for running the simple traffic model. 

For the use case B, each green block represents a specific task of the application. The rotate frame is where the partitioning scheme is set to hardware or software, according to the application. By using a fast processor, the designer achieves the necessary performance metrics at high speeds. But the power consumption may change according to the processor used.

The simulations are performed based on some standard cases. The cases considered for the different simulations are as defined below:

CASE 1- Modifying the parameter value “Partitioning” to “SW”.

CASE 2- Modifying the parameter value “Partitioning” to “HW”.

CASE3- Power gating for the hardware accelerator by turning it off when not used.

Results

POWER USAGE

CASE1- For the software partitioning.

CASE2-For the hardware partitioning.

CASE 3– Introduce power gating for the hardware accelerator.

The results show that when the tasks are processed in the software, the power consumption is 0.4 mW. After moving the rotate frame application to the hardware accelerator by modifying the parameter value “Partitioning” to “HW”, we see that the power consumption has increased to 8 mW. Whereas, after introducing  power gating for the hardware accelerator, the power consumption decreases to 1.2 mW. When power consumption is crucial, running the tasks on ARM yields better results, with the lowest power consumption.

VIDEO FRAMES COMPLETION TIME

CASE1-For the software partitioning.

CASE  2-For the hardware partitioning.

CASE 3- Introduce power gating for the hardware accelerator.

The video frames completion time gives the number of video frames processed successfully. For the consideration of 10 msec duration of time, in the option of  tasks processed in software, it is 4880 frames. After moving the rotate frame application to the hardware accelerator by modifying the parameter value “Partitioning” to “HW”, It is 10960 frames, which is an increase compared to software partitioning.  Whereas, after introducing power gating for the hardware accelerator, the frames received are 10840. Thus, if the designer requires maximum number of frames, then hardware partitioning yields better results.

AVERAGE POWER PLOT

CASE 1- For the software partitioning.

CASE 2-For the hardware partitioning.

CASE 3-Introducing power gating for the hardware accelerator.

For the software partitioning, the average power is below 1 J. After moving the rotate frame application to the hardware accelerator by modifying the parameter value “Partitioning” to “HW”, the average power increases to 1.2 joules. Whereas, after introducing power gating for the hardware accelerator, the average power drops to 0.98 joules, which is less than the desired 1 joules. Thus, introducing power gating yields better average power results.

INSTANT POWER FOR HARDWARE ENGINE

CASE1- For the software partitioning.

CASE 2-For the hardware partitioning.

CASE3—Introducing power gating for the hardware accelerator.

For software partitioning, the instantaneous power for the hardware engine is 0.01 mwatts, which shows that all the tasks are running on ARM. After moving rotate frame task on to an accelerator, there is a change in the instantaneous power for the hardware engine in the case of the hardware partitioning and the accelerator gating. It goes up to 0.2 mwatts and remains constant. Whereas, after introducing power gating for the hardware accelerator, the instantaneous power increases up to 0.6 mW and then decreases up to 0.01 mW.

CONCLUSION

The goal is to achieve optimized power performance along with maximum throughput. Here, throughput is the number of video frames. We are expecting about 13K matrices of video frame keeping power consumption under 1 w. The results and plots show that when the tasks are processed in software, we get 4880 video frames which consume power of 0.4 mW. Here, the desired number of video frames is not achieved. Hence, the performance criteria is not met thought the power is achieved. As we were not meeting the performance requirements while running all tasks on ARM, we found that moving the rotate frame task on to an accelerator would help us in achieving performance.When we move the rotate frame application to the hardware accelerator by modifying the parameter value “Partitioning” to “HW”, the performance improves substantially to 10960 video frames but power consumption is at 8 mW. We now introduce power gating for the hardware accelerator. The performance drops to 10840 video frames but the power consumption is under 1.2 mwatt.  So, we can see the trade-off here. The most important part here is that the performance and power are being achieved simultaneously. By making a minor trade off in the performance, the designer can achieve better results in power consumption case. Again, it depends on the application at hand as well. For smartphone applications, this is very crucial where power consumption and performance are equally important. The VisualSim Architect model gives the performance as well as power consumption upto the desired level.