Dual Processor Design - Part 1

Parent Previous Next

Learning Objectives

Use the framework available in VisualSim to explore the architecture of a multi-core processor system for the following objectives:

Design Methodology


We consider a system comprising a single core/dual core processor. A traffic generator sends application specific transactions to a target processor. In addition, we consider a main memory across a shared bus.

Two block diagrams depicting the system using a single as well as dual core processor are show below.


Single Core Block Diagram
Figure 1: Single Core Block Diagram

Dual Core Block Diagram
Figure 2: Dual Core Block Diagram

VisualSim Model

VisualSim Models for this tutorial is available in the following location
$VS\doc\Training_Material\Tutorial\WebHelp\Tutorial\Architecture_Exploration\Processor_Modeling_Tutorial\Processor_Modeling_Part1.xml
$VS\doc\Training_Material\Tutorial\WebHelp\Tutorial\Architecture_Exploration\Processor_Modeling_Tutorial\Processor_Modeling_Part1_TaskGen.xml


Refer Fig 3 (Singe Core) and Fig 4 (Dual Core) for the intended VisualSim model.


Single Core VisualSim Model

Figure 3: Single Core VisualSim Model

Dual Core VisualSim Model
Figure 4: Dual Core VisualSim Model

Building the VisualSim Model

Assumptions


The procedures outlined below makes the following assumptions:


Blocks Used


The following table list the blocks to be used in this model.

S.No.
Library Block
Description
1
Digital Simulator

Digital Block


The Digital Simulator implements the discrete-event Model of Computation (MoC). The simulator maintains a notion of current time, and processes events chronologically in this time. Used to model elements that change with time such as hardware, software and networks.

Click here for detailed description and examples.


2
Traffic

Traffic block
Outputs a new Data Structure (DS) at time intervals specified by the "Time_Distribution" setting. A Data Structure is also knowns as a transaction and contains a list of Field Names + Values.

Click here for detailed description and examples.
3
InstructionSet

InstructionSet block


Creates instruction references for the processor.

Click here for further details.

4
ExpressionList

ExpressionList Block
Executes a sequence of expressions in order.

The default block contains one input and one output. The user can add multiple input and output ports.

Click here for detailed description and examples.
5
Processor

AEA Processor Block
Models variations of commercial and proprietary processors. Gets accurate timing, data flow, throughput, and power consumption of the processor.
6
Dynamic Mapper

Dynamic Mapper Block

Enables the mapping of behavior, task, or function on a target processor.

Click here for further details.


7
Text_Display

Text Display Block

Displays the values arriving on the input port in a text display dialog.

Click here for detailed description and examples.
8
TimeDataPlotter

TimeDataPlotter Block

Plots the incoming data on the Y-Axis against the current simulation time on the X-axis. Every wire connected to this block input is considered a separate dataset and plotted separately.

Click here for detailed description and examples.

9
BusArbiter

DPD BusArbiter Block

The Bus Arbiter block is the Arbiter for a Bus Interface.

Click here for further details.

10
BusInterface

DPD BusInterface Block
Connects devices to the BusArbiter and has a queue for each port.

Click here for further details.

11
RAM

DPD RAM Block

Combines the operation of a basic memory controller (delay function) and the memory array. Handles pre-fetch, read, write, refresh, and controller operations.

Click here for further details.
12
ArchitectureSetup

DPD ArchitectureSetup

Handles all the address mapping, routing, plotting, statistics, and debugging for the hardware modeling components.

Click here for further details.

13
Cache

DPD Cache
Emulates a cache in an architectural model. There are interfaces on both side of the block for connectivity.

Click here for further details.

14
ResourceStatistics

DPD ResourceStatistics Block
Outputs or resets the statistics for all the SystemResource, Channel, Channel_N, Server, and Queues in the model.

Click here for further details.

15
Timing Diagram

Timing Diagram Block

Enables the user to view a Gantt chart of the activity associated with the main architecture devices.

Click here for further details.

16
Task Generator

Task Generator Block

Generates new tasks to execute on the processor. The mix of instructions in the task is read from the file that is referenced by the parameter Read_My_Instruction_Mix_Table.

Click here for further details.


Scenarios

We consider the following two scenarios while building this model.


Generic Scenario

Initial Setup


In the initial setup, you set the model parameters, drag and drop a Digital Simulator, and define an Instruction Set for the processors.


Sets the value for the simulation time and cannot be modified during a simulation. The Digital Simulator block uses this parameter.
Sets the clock speed of the processor and cannot be modified during a simulation. The Processor block uses this parameter. Sets the clock speed of the bus and cannot be modified during a simulation. The BusArbiter block uses this parameter. Sets the clock speed of the cache and cannot be modified during simulation. The Cache block uses this parameter. Sets the clock speed of the RAM and cannot be modified during simulation. The RAM block uses this parameter.


AEA Parameters

Figure 5: Parameters



AEA Digital Simulator block

Figure 6: Digital Simulator


The rest of the parameters remain as default settings.

AEA Digital Simulator Parameters

Figure 7: Digital Simulator Parameters

AEA Instruction Set
Figure 8: Instruction Set



/* Instruction Set or File Path. */
   Mnew Ra  Rb  Rc  Rd   Re ;   /* Label */
   EXE  IU  FPU     ;
   IU   INT_1       ;
   FPU  FP_1                ;
 
begin INT_1                 ;   /* Group */
   add 1                    ;
   mul 1                    ;
   div 2                    ;
   sub 1                    ;
end   INT_1                 ;
 
begin FP_1                  ;   /* Group */
   fadd 3                   ;
   fmul 3                   ;
   madd 3                   ;
   fdiv 17                  ;
   fsub 12                  ;
end   FP_1                  ;



Note that EXE is the Label that is associated with the different execution units in a processor. For example, Integer, Load/Store, Arithmetic, floating etc. In this case, we have two execution units namely IU and FPU representing Integer and Floating Point execution units and each has a group of mnemonics with cycle delay associated with the mnemonic.

AEA InstructionSet Parameters

Figure 9: Instruction Set Parameters


Architectural Elements


In this stage, you configure blocks to represent the architectural elements - Processors, Bus, Cache, and RAM.


Note: We consider a dual core processor for this tutorial. The procedures for a single core processor remain the same and you use only a single processor block.


AEA Processor 1

Figure 10: Processor 1




/* First row contains Column Names.                */
Parameter_Name                   Parameter_Value    ;
Processor_Instruction_Set:       Instruction_Set   ;
Number_of_Registers:             32                 ;
Processor_Speed_Mhz:             Processor_Clock    ;
Context_Switch_Cycles:           100                ; 
Instruction_Queue_Length:        6                  ;
Number_of_Pipeline_Stages:       4                 ; 
Number_of_INT_Execution_Units:   1                 ; 
Number_of_FP_Execution_Units:    1                ;  
Number_of_Cache_Execution_Units: 2                ;  
I_1:            {Cache_Speed_Mhz=Processor_Clock, Size_KBytes=16.0, Words_per_Cache_Line=16, Cache_Miss_Name=L2_Cache}     
D_1:            {Cache_Speed_Mhz= Processor_Clock, Size_KBytes=16.0, Words_per_Cache_Line=16, Cache_Miss_Name=L2_Cache} 


Note that Processor_Instruction_Set is set to use the Instruction_Set and Processor_Speed_Mhz, I_1 cache and D_1 Cache uses the value for Processor_Clock set in the model parameters.


/* First row contains Column Names.                */
Stage_Name   Execution_Location  Action  Condition ;
1_PREFETCH   I_1                 instr   none      ;
1_PREFETCH   D_1                 read    none      ;
2_DECODE     I_1                 wait    none      ;
3_EXECUTE    D_1                 wait    none      ;
3_EXECUTE    EXE                 exec    none      ;
4_STORE      EXE                 wait    none      ;
4_STORE      D_1                 write   none      ;

   
The rest of the parameters retain the default settings.

AEA Processor 1 Parameters
Figure 11: Processor 1 Parameters

Note: In VisualSim, no two blocks can have the same name.



The BusArbiter acts as a communication channel between master and slave devices. BusArbiter grants bus control to master devices based on the arbitration algorithm selected. The block can be configured to support First Come First Serve, Round Robin, and Custom arbitrations. In this tutorial, we will be considering First Come First Serve.

BusArbiter works in conjunction with BusInterface block. Bus Interface defines the physical connection with the devices; whereas Bus Arbiter controls the Bus Interface based on the selected arbitration.

AEA Bus Arbiter
Figure 12: Bus Arbiter


The rest of the parameters retain the default settings.

AEA Bus Arbiter Parameters
Figure 13: Bus Arbiter Parameter


The BusInterface receives data traffic and sends data traffic out on the Bus Arbiter. The block co-ordinates data transfer by handling control to the Linear Controller.

AEA Bus Interface
Figure 14: Bus Interface


The rest of the parameters retain the default settings. The first BusInterface retains all the default parameter settings.

AEA Bus Interface Parameters
Figure 15: Bus Interface 2 Parameters


AEA Bus Arbiter Bus Interface Connection

Figure 16: Bus Arbiter - Bus Interface Connection


AEA Bus Interface Bus Interface Connection
Figure 17: Bus Interface - Bus Interface Connection


AEA Cache
Figure 18: Cache


The Cache acts as an external shared cache. If there is a I_1 or D_1 cache miss, then the Processor tries to get the instruction or data from the L2 cache.

Set the Miss_Memory_Name as “DRAM”. If there is L2 miss, then the data is accessed from DRAM. In this model, we have considered that L2 cache can have 95% hit.

Set the Cache_Speed_Mhz to L2_Cache_Clock. Note that this value was set as part of the initial parameters.

The rest of the parameters retain the default settings.

AEA Cache Parameters
Figure 19: Cache Parameters


AEA RAM
Figure 20: RAM


Here we are setting the access time for Read, Write, Prefetch and Erase. The values are considered as nano seconds.


The rest of the parameters retain the default settings.

AEA RAM Parameters
Figure 21: RAM Parameters


AEA RAM Connection
Figure 22: RAM Connection

Architecture Setup



Architecture_Setup block handles all the address mapping, routing, plotting, statistics and debugging for the Hardware Modeling components. The architecture setup block configures the complete set of blocks linked to a single Architecture_Name parameter found in most blocks. There can be multiple Architecture_Setup blocks in a model. Each block instance must have a unique name. All Bus and Hardware blocks must be associated with one of the Architecture_Setup.

AEA Architecture Setup
Figure 23: Architecture Setup

Processor, Bus, and DRAM blocks requires at least data structure fields such as A_Destination, A_Hop, A_IDX, A_Task_ID, and so on. These fields are predefined in Processor_DS, the datastructure file that is used while working with cycle accurate library blocks.

The Default Routing_Tabe parameter appears as follows. If the user constructs models only using Hardware Accurate library blocks such as Processor, Cache, DRAM, DeviceInterface, Bus and so on, then Routing_Table entries will not be considered as Hello Messages generated by these blocks will define the routing information.


/* First row contains Column Names.                */
Source_Node    Destination_Node   Hop           Source_Port ;
Processor_1    Cache_1            Port_1        bus_out2    ;
Cache_1        Processor_1        Port_2        output      ;
Cache_1        SDRAM_1            Port_2        output      ;
SDRAM_1        Cache_1            Port_4        output      ;
SDRAM_1        Processor_1        Port_4        output      ;

   
Note that checking Enable_Hello Messages means that every block within the hardware architecture library sends out Hello messages at simulation time 0.0 to determine the node-to-node connectivity.  Each bus in the topology creates an internal routing table, based on the hello messages received, meaning the bus knows each end node it is connected to, and the user is freed from having to construct each bus routing table.

Hello Messages received by each to/from node are added to the routing table with the source, destination, port information. User entries to the routing table supplement the Hello message entries to simplify routing table construction.

Later in the Custom Hardware Routing definitions scenario, we disable the Hello Messages and manually construct a routing table.

Behavior Flow


In this tutorial, we consider a simple behavior sequence for a task, "My_Task_1". The task is generated every 1 ms. The task definitions such as the instructions and target processor details are updated using the Expression List block.

As we have two processor cores, 50% of the simulation time tasks are mapped on to Processor_1 and the remaining 50% of the time task is mapped on to Processor_2. Software task mapping is done with the help of Dynamic Mapper blocks available in VisualSim.

The step by step explanations are given below.


AEA Traffic
Figure 24: Traffic


The rest of the parameters retain the default settings.

AEA Traffic Parameters
Figure 25: Traffic Parameters


AEA Expression List 1
Figure 26: Expression List 1


Use the Expression_List parameter to add the following fields to the datastructure and also assign values.
Generate random numbers to assist in the selection of processors.

Use random numbers to select a processor.


We consider that 50% of the time the tasks run on Processor_1 and remaining 50% on Processor_2.

It is important that we provide the correct name to A_Destination and A_Hop. If we provide the wrong name, then the task does not get mapped to the target platform and is rejected.

The rest of the parameters retain the default settings.

AEA Expression List 1 Parameters
Figure 27: Expression List 1 Parameters


In this step, we consider a simple instruction sequence. But it will be replaced by a synthetic instruction generator to generate more accurate instruction sequence for the task.

Use the ExpressionList parameter to add the following field to the datastructure. This field sets the list of instructions.

Update the other parameters as given below.

AEA Expression List 2
Figure 28: Expression List 2


The DynamicMapper block accepts a data structure on the input and sends this along with the information in block parameters to the target resource (Processor_1 or Processor_2).

AEA Dynamic Mappers
Figure 29:  Dynamic Mappers


Parameter
DynamicMapper
DynamicMapper2
Block_Name
Task_Mapper_1
Task_Mapper_2
Database_Lookup
none
none
Database_Expression
none
none
Task_Name
A_Task_Name
A_Task_Name
Task_Destination
Architecture_1.Processor_1
Architecture_1.Processor_2
Task_Instruction
A_Instruction
A_Instruction
Task_ID
1
1
Task_Plot_ID
1
1
Task_Priority
1
1
Task_Time
none
none

AEA Dynamic Mapper 1 Parameters
Figure 30: Dynamic Mapper 1 Parameters

AEA Dynamic Mapper 2 Parameters
Figure 31:  Dynamic Mapper 2 Parameters

Resource Statistics and Reports

Statistics and Reports Generation



AEA ExpressionList
Figure 32:  Expression List


AEA ExpressionList Parameter
Figure 33:  Expression List Parameter


AEA TimeDataPlotters
Figure 34:  TimeDataPlotters


The other parameters retain the default settings.

AEA TimeDataPlotter Parameters
Figure 35:  TimeDataPlotter Parameters


AEA Text Display
Figure 36:  Text Display


AEA Timing Diagram
Figure 37:  Timing Diagram


Statistics and Reports


Find below the screenshots of the reports.

AEA Processor Unit Execution Activity
Figure 38:  Processor Unit Execution Activity

AEA Latency Plot
Figure 39:  Latency Plot

AEA MIPS Plot
Figure 40:  MIPS Plot

DISPLAY AT TIME          ------ 100.0000000000 ms ------
{BLOCK                = ".Processor_Modeling_Part1.ArchitectureSetup",
Bus_1_Delay_Max            = 2.6154000007383E-8,
Bus_1_Delay_Mean        = 2.1950571428453E-8,
Bus_1_Delay_Min            = 2.1249999995754E-8,
Bus_1_Delay_StDev        = 1.7160425282895E-9,
Bus_1_IOs_per_sec_Max        = 16000.0,
Bus_1_IOs_per_sec_Mean        = 16000.0,
Bus_1_IOs_per_sec_Min        = 16000.0,
Bus_1_IOs_per_sec_StDev        = 0.0,
Bus_1_Input_Buffer_Occupancy_in_Words_Max    = 31.0,
Bus_1_Input_Buffer_Occupancy_in_Words_Mean    = 8.6266840504129,
Bus_1_Input_Buffer_Occupancy_in_Words_Min    = 0.0,
Bus_1_Input_Buffer_Occupancy_in_Words_StDev    = 8.0148255065604,
Bus_1_Preempt_Buffer_Occupancy_in_Words_Max    = 0.0,
Bus_1_Preempt_Buffer_Occupancy_in_Words_Mean    = 0.0,
Bus_1_Preempt_Buffer_Occupancy_in_Words_Min    = 0.0,
Bus_1_Preempt_Buffer_Occupancy_in_Words_StDev    = 0.0,
Bus_1_Throughput_MBs_Max    = 0.448,
Bus_1_Throughput_MBs_Mean    = 0.448,
Bus_1_Throughput_MBs_Min    = 0.448,
Bus_1_Throughput_MBs_StDev    = 0.0,
DELTA                = 0.0,
DRAM_Delay_Time_Max        = 2.3076923076923E-8,
DRAM_Delay_Time_Mean        = 1.3076923076923E-8,
DRAM_Delay_Time_Min        = 3.0769230769231E-9,
DRAM_Delay_Time_StDev        = 1.0E-8,
DRAM_Memory_Used_By_Processor_1_MB_Max    = 0.003712,
DRAM_Memory_Used_By_Processor_1_MB_Mean    = 0.003264,
DRAM_Memory_Used_By_Processor_1_MB_Min    = 0.002816,
DRAM_Memory_Used_By_Processor_1_MB_StDev    = 4.48E-4,
DRAM_Memory_Used_By_Processor_2_MB_Max    = 0.003584,
DRAM_Memory_Used_By_Processor_2_MB_Mean    = 0.003136,
DRAM_Memory_Used_By_Processor_2_MB_Min    = 0.002688,
DRAM_Memory_Used_By_Processor_2_MB_StDev    = 4.48E-4,
DRAM_Memory_Used_By_Total_MB_Max    = 0.0064,
DRAM_Memory_Used_By_Total_MB_Mean    = 0.0064,
DRAM_Memory_Used_By_Total_MB_Min    = 0.0064,
DRAM_Memory_Used_By_Total_MB_StDev    = 0.0,
DRAM_Throughput_MBs_Max        = 0.128,
DRAM_Throughput_MBs_Mean    = 0.128,
DRAM_Throughput_MBs_Min        = 0.128,
DRAM_Throughput_MBs_StDev    = 0.0,
DS_NAME                = "Architecture_Stats",
ID                = 2,
INDEX                = 0,
L2_Cache_Delay_Time_Max        = 2.0E-8,
L2_Cache_Delay_Time_Mean    = 1.4642857142857E-8,
L2_Cache_Delay_Time_Min        = 1.25E-9,
L2_Cache_Delay_Time_StDev    = 8.470386589737E-9,
L2_Cache_Hit_Ratio_Max        = 98.0,
L2_Cache_Hit_Ratio_Mean        = 97.7142857142857,
L2_Cache_Hit_Ratio_Min        = 97.4285714285714,
L2_Cache_Hit_Ratio_StDev    = 0.2857142857127,
L2_Cache_Memory_Used_By_DRAM_MB_Max    = 0.0064,
L2_Cache_Memory_Used_By_DRAM_MB_Mean    = 0.0064,
L2_Cache_Memory_Used_By_DRAM_MB_Min    = 0.0064,
L2_Cache_Memory_Used_By_DRAM_MB_StDev    = 0.0,
L2_Cache_Memory_Used_By_Processor_1_MB_Max    = 0.00928,
L2_Cache_Memory_Used_By_Processor_1_MB_Mean    = 0.00816,
L2_Cache_Memory_Used_By_Processor_1_MB_Min    = 0.00704,
L2_Cache_Memory_Used_By_Processor_1_MB_StDev    = 0.00112,
L2_Cache_Memory_Used_By_Processor_2_MB_Max    = 0.00896,
L2_Cache_Memory_Used_By_Processor_2_MB_Mean    = 0.00784,





L2_Cache_Memory_Used_By_Processor_2_MB_Min    = 0.00672,
L2_Cache_Memory_Used_By_Processor_2_MB_StDev    = 0.00112,
L2_Cache_Memory_Used_By_Total_MB_Max    = 0.0224,
L2_Cache_Memory_Used_By_Total_MB_Mean    = 0.0224,
L2_Cache_Memory_Used_By_Total_MB_Min    = 0.0224,
L2_Cache_Memory_Used_By_Total_MB_StDev    = 0.0,
L2_Cache_Throughput_MBs_Max    = 0.328,
L2_Cache_Throughput_MBs_Mean    = 0.328,
L2_Cache_Throughput_MBs_Min    = 0.328,
L2_Cache_Throughput_MBs_StDev    = 0.0,
Processor_1_Context_Switch_Time_Pct_Max    = 0.005858,
Processor_1_Context_Switch_Time_Pct_Mean    = 0.005151,
Processor_1_Context_Switch_Time_Pct_Min    = 0.004444,
Processor_1_Context_Switch_Time_Pct_StDev    = 7.07E-4,
Processor_1_D_1_Hit_Ratio_Max    = 0.0,
Processor_1_D_1_Hit_Ratio_Mean    = 0.0,
Processor_1_D_1_Hit_Ratio_Min    = 0.0,
Processor_1_D_1_Hit_Ratio_StDev    = 0.0,
Processor_1_D_1_KB_per_Thread_Max    = 0.0,
Processor_1_D_1_KB_per_Thread_Mean    = 0.0,
Processor_1_D_1_KB_per_Thread_Min    = 0.0,
Processor_1_D_1_KB_per_Thread_StDev    = 0.0,
Processor_1_I_1_Hit_Ratio_Max    = 100.0,
Processor_1_I_1_Hit_Ratio_Mean    = 44.4444444444444,
Processor_1_I_1_Hit_Ratio_Min    = 0.0,
Processor_1_I_1_Hit_Ratio_StDev    = 49.6903994999953,
Processor_1_I_1_KB_per_Thread_Max    = 0.0,
Processor_1_I_1_KB_per_Thread_Mean    = 0.0,
Processor_1_I_1_KB_per_Thread_Min    = 0.0,
Processor_1_I_1_KB_per_Thread_StDev    = 0.0,
Processor_1_Stall_Time_Pct_Max    = 0.001856,
Processor_1_Stall_Time_Pct_Mean    = 0.001632,
Processor_1_Stall_Time_Pct_Min    = 0.001408,
Processor_1_Stall_Time_Pct_StDev    = 2.24E-4,
Processor_1_Task_Delay_Max    = 1.4499900000131E-7,
Processor_1_Task_Delay_Mean    = 1.1955455555611E-7,
Processor_1_Task_Delay_Min    = 1.0499899999461E-7,
Processor_1_Task_Delay_StDev    = 1.5341382750109E-8,
Processor_2_Context_Switch_Time_Pct_Max    = 0.005656,
Processor_2_Context_Switch_Time_Pct_Mean    = 0.004949,
Processor_2_Context_Switch_Time_Pct_Min    = 0.004242,
Processor_2_Context_Switch_Time_Pct_StDev    = 7.07E-4,
Processor_2_D_1_Hit_Ratio_Max    = 0.0,
Processor_2_D_1_Hit_Ratio_Mean    = 0.0,
Processor_2_D_1_Hit_Ratio_Min    = 0.0,
Processor_2_D_1_Hit_Ratio_StDev    = 0.0,
Processor_2_D_1_KB_per_Thread_Max    = 0.0,
Processor_2_D_1_KB_per_Thread_Mean    = 0.0,
Processor_2_D_1_KB_per_Thread_Min    = 0.0,
Processor_2_D_1_KB_per_Thread_StDev    = 0.0,
Processor_2_I_1_Hit_Ratio_Max    = 100.0,
Processor_2_I_1_Hit_Ratio_Mean    = 44.4444444444444,
Processor_2_I_1_Hit_Ratio_Min    = 0.0,
Processor_2_I_1_Hit_Ratio_StDev    = 49.6903994999953,
Processor_2_I_1_KB_per_Thread_Max    = 0.0,
Processor_2_I_1_KB_per_Thread_Mean    = 0.0,
Processor_2_I_1_KB_per_Thread_Min    = 0.0,
Processor_2_I_1_KB_per_Thread_StDev    = 0.0,
Processor_2_Stall_Time_Pct_Max    = 0.001792,
Processor_2_Stall_Time_Pct_Mean    = 0.001568,
Processor_2_Stall_Time_Pct_Min    = 0.001344,
Processor_2_Stall_Time_Pct_StDev    = 2.24E-4,
Processor_2_Task_Delay_Max    = 1.4499900000131E-7,
Processor_2_Task_Delay_Mean    = 1.1955455555573E-7,
Processor_2_Task_Delay_Min    = 1.0499899999461E-7,
Processor_2_Task_Delay_StDev    = 1.5341382750299E-8,
TIME                = 0.1}

Figure 41:  Text Display

Note: The HW_DRAM_Bank and Bus Activity plot is not being displayed as we are not using HW_DRAM.

Replace Task Instruction Sequence with Task Generator

In this model, we replace the Task Instruction Sequence block with a Task Generator. The TaskGenerator block generates synthetic instructions for the target processor based on the user inputs on number of instructions and possible combination of instruction types such as arithmetic, logical, integer, load/store etc.

An image of the proposed model is given below.

AEA VisualSim Model Task Generator
Figure 42:  VisualSim Model with Task Generator

Modifying the existing architecture exploration model



AEA Task Generator
Figure 43:  Task Generator


AEA Sample Instruction Table
Figure 44:  Sample Instruction Table

Here, first part defines the instruction types and associated mnemonics and the second part defines the number of instructions for each task and combination of instruction types in percentage.



AEA Task Generator Parameters
Figure 45:  Task Generator Parameters


AEA Task Generator Expression List
Figure 46:  Task Generator Expression List




AEA Task  Generator Expression List Parameters
Figure 47:  Task Generator Expression List Parameters


AEA Task Generator Model
Figure 48:  Task Generator Model

Statistics and Reports


Find below the screenshots of the reports.

AEA Processor Execution Unit Activity
Figure 49:  Task Generator Processor Execution Unit Activity

AEA Task Generator Latency
Figure 50:  Task Generator Latency

AEA Task Generator MIPS
Figure 51:  Task Generator MIPS


DISPLAY AT TIME          ------ 100.0000000000 ms ------
{
Processor_1_Context_Switch_Time_Pct_Max    = 0.004848,
Processor_1_Context_Switch_Time_Pct_Mean    = 0.004848,
Processor_1_Context_Switch_Time_Pct_Min    = 0.004848,
Processor_1_Context_Switch_Time_Pct_StDev    = 0.0,
Processor_1_D_1_Hit_Ratio_Max    = 0.0,
Processor_1_D_1_Hit_Ratio_Mean    = 0.0,
Processor_1_D_1_Hit_Ratio_Min    = 0.0,
Processor_1_D_1_Hit_Ratio_StDev    = 0.0,
Processor_1_D_1_KB_per_Thread_Max    = 0.0,
Processor_1_D_1_KB_per_Thread_Mean    = 0.0,
Processor_1_D_1_KB_per_Thread_Min    = 0.0,
Processor_1_D_1_KB_per_Thread_StDev    = 0.0,
Processor_1_I_1_Hit_Ratio_Max    = 100.0,
Processor_1_I_1_Hit_Ratio_Mean    = 80.0355859590934,
Processor_1_I_1_Hit_Ratio_Min    = 0.0,
Processor_1_I_1_Hit_Ratio_StDev    = 39.7193069184094,
Processor_1_I_1_KB_per_Thread_Max    = 0.0,
Processor_1_I_1_KB_per_Thread_Mean    = 0.0,
Processor_1_I_1_KB_per_Thread_Min    = 0.0,
Processor_1_I_1_KB_per_Thread_StDev    = 0.0,
Processor_1_Stall_Time_Pct_Max    = 0.011006,
Processor_1_Stall_Time_Pct_Mean    = 0.010885,
Processor_1_Stall_Time_Pct_Min    = 0.010764,
Processor_1_Stall_Time_Pct_StDev    = 1.2100000000002E-4,
Processor_1_Task_Delay_Max    = 5.7299900000155E-7,
Processor_1_Task_Delay_Mean    = 3.2370287645205E-7,
Processor_1_Task_Delay_Min    = 1.0499899999461E-7,
Processor_1_Task_Delay_StDev    = 1.2443263817956E-7,
Processor_2_Context_Switch_Time_Pct_Max    = 0.005252,
Processor_2_Context_Switch_Time_Pct_Mean    = 0.005252,
Processor_2_Context_Switch_Time_Pct_Min    = 0.005252,
Processor_2_Context_Switch_Time_Pct_StDev    = 0.0,



Processor_2_D_1_Hit_Ratio_Max    = 0.0,
Processor_2_D_1_Hit_Ratio_Mean    = 0.0,
Processor_2_D_1_Hit_Ratio_Min    = 0.0,
Processor_2_D_1_Hit_Ratio_StDev    = 0.0,
Processor_2_D_1_KB_per_Thread_Max    = 0.0,
Processor_2_D_1_KB_per_Thread_Mean    = 0.0,
Processor_2_D_1_KB_per_Thread_Min    = 0.0,
Processor_2_D_1_KB_per_Thread_StDev    = 0.0,
Processor_2_I_1_Hit_Ratio_Max    = 100.0,
Processor_2_I_1_Hit_Ratio_Mean    = 80.166475315729,
Processor_2_I_1_Hit_Ratio_Min    = 0.0,
Processor_2_I_1_Hit_Ratio_StDev    = 39.6057602585533,
Processor_2_I_1_KB_per_Thread_Max    = 0.0,
Processor_2_I_1_KB_per_Thread_Mean    = 0.0,
Processor_2_I_1_KB_per_Thread_Min    = 0.0,
Processor_2_I_1_KB_per_Thread_StDev    = 0.0,
Processor_2_Stall_Time_Pct_Max    = 0.011776,
Processor_2_Stall_Time_Pct_Mean    = 0.01169,
Processor_2_Stall_Time_Pct_Min    = 0.011604,
Processor_2_Stall_Time_Pct_StDev    = 8.6000000000108E-5,
Processor_2_Task_Delay_Max    = 5.7299900000501E-7,
Processor_2_Task_Delay_Mean    = 3.2205382204324E-7,
Processor_2_Task_Delay_Min    = 1.0499899999461E-7,
Processor_2_Task_Delay_StDev    = 1.2373799043996E-7,
TIME                = 0.1}


Figure 52:  Task Generator Text Display (Full Stats is not displayed here)

System Analysis

With default configurations we noticed that the hardware architecture was hardly utilized and hence modified system configurations to understand system performance.

We have used the model parameters listed below to modify system configurations.

We also increased Traffic rate from 1 ms to 1 us.

We looked at the MIPS. MIPS for both the processors was below 2 MIPS majority of the times. The Processor utilization statistics suggested that the processors used only about 2.5% and Pipeline was utilized only about 4%.


Processor_1_PROC_Utilization_Pct_Max    = 2.64,
Processor_1_PROC_Utilization_Pct_Mean    = 2.64,
Processor_1_PROC_Utilization_Pct_Min    = 2.64,
Processor_1_PROC_Utilization_Pct_StDev    = 0.0,
Processor_1_Pipeline_Utilization_Pct_Max    = 4.54,
Processor_1_Pipeline_Utilization_Pct_Mean    = 4.54,
Processor_1_Pipeline_Utilization_Pct_Min    = 4.54,
Processor_1_Pipeline_Utilization_Pct_StDev    = 0.0,
Processor_1_Stall_Time_Pct_Max    = 14.53,
Processor_1_Stall_Time_Pct_Mean    = 14.53,
Processor_1_Stall_Time_Pct_Min    = 14.53,
Processor_2_Stall_Time_Pct_Max    = 14.016,
Processor_2_Stall_Time_Pct_Mean    = 14.016,
Processor_2_Stall_Time_Pct_Min    = 14.016,



This suggests that the Processor is not occupied with processing majority of the times and is underutilized with a Stall time percentage of 14%.

If we look at the Stats for the Bus, Cache and RAM as shown below.


Bus_1_Utilization_Pct_Max    = 93.7274999999925,
Bus_1_Utilization_Pct_Mean    = 93.4062499999925,
Bus_1_Utilization_Pct_Min    = 93.0849999999926,
Bus_1_Utilization_Pct_StDev    = 0.321249999998,
DRAM_Utilization_Pct_Max    = 47.1076923076937,
DRAM_Utilization_Pct_Mean    = 46.8430769230783,
DRAM_Utilization_Pct_Min    = 46.5784615384629,
DRAM_Utilization_Pct_StDev    = 0.2646153846157,
L2_Cache_Utilization_Pct_Max    = 80.4875000000015,
L2_Cache_Utilization_Pct_Mean    = 79.0325000000015,
L2_Cache_Utilization_Pct_Min    = 77.5775000000015,
L2_Cache_Utilization_Pct_StDev    = 1.4549999999998,


Bus and L2 Cache are extremely busy with over 70% utilization and tells us that these could be the bottleneck if we increase the workload.


Processor_1_I_1_Hit_Ratio_Max    = 100.0,
Processor_1_I_1_Hit_Ratio_Mean    = 44.4709772226304,
Processor_1_I_1_Hit_Ratio_Min    = 0.0,
Processor_1_I_1_Hit_Ratio_StDev    = 49.6887380194589,
Processor_2_I_1_Hit_Ratio_Max    = 100.0,
Processor_2_I_1_Hit_Ratio_Mean    = 44.4444444444444,
Processor_2_I_1_Hit_Ratio_Min    = 0.0,
Processor_2_I_1_Hit_Ratio_StDev    = 49.6903994999953,

We also noticed that I Cache hit rate is only about 44% most of the time and suggests that Processor pipeline is waiting for the instructions and pushes for the stall time of nearly 14%.

This suggests that we must increase the I_1 Cache size or have an on chip dedicated L2 Cache to improve the performance.

As part of continued analysis, we recommend users to modify the system parameters and workload scenarios and analyze the reports collected.