Processor

ProcessorGenerator / Processor
Block Name: Processor

Code File Location: VisualSim/actor/arch/Processor

Block Overview

The Processor block is used to model variations of commercial and proprietary processors. To learn the details of the Processor block operation, review the Processor Section in the Advanced Modeling Guide. The goal of the block is to get accurate timing, data flow, throughput and power computation of the processor. The processor block will execute instruction supplied to it as an array of instructions in the A_Instruction field of the transaction. It will not accept a binary executable. This block can work with other VisualSim library blocks to construct a a board, SoC, commercial processor or custom pipeline. The advanced features of the Processor block are best explained using the standard demo examples available in VisualSim. The Basic Processor Model shows the use of the Processor and connectivity to other blocks in a model. This models also shows the variable values for all the required fields and how the pipeline interacts with internal/external caches. The tutorial section has examples showing the assembly of systems using the Processor block.

Important considerations:

The minimum blocks to setup and configure are the Instruction_Set, ArchitectureSetup, DRAM and the Processor.

The keywords that cannot be modified are I_ , D_ , INT_, FP_. If you are defining any of these elements, they must be used as-is

Instruction cycles can only be integer values.

The minimum set of Data Structure fields are- A_Source, A_Destination, A_Variables, A_Hop, A_Instruction and A_Priority. It is best to use the Processor_DS when working with the Processor block. This ensures all the required fields are available.

Key Features:

Supports multiple level of caches

Caches can be located within the same Processor block or in a different processor block.

Caches support single instruction, multiple instruction, single data and multiple data. The performance of the cache can be controlled using the Hit_Ratio parameter.

The DRAM block can be used as a TCM, SRAM, RAM, Flash and DRAM.

The pipeline can be configured to Read from Registers or Instruction cache, Write to Registers, wait for an activity in the previous stage to complete, execute a instruction in a specific Execution Unit, and execute a instruction outside of this block, like another processor or a Scheduler block.

Pipeline can execute multiple instructions in parallel

Multiple Execution Units can be defined. They can be of two types- Integer and Floating. All Units must fall under these two categories.

Large data transfers using a Load or Store operation can be performed using the external DMA block. In this model, the block will performance Load and Store operation, not a traditional DMA.

A co-Processor can be defined using the task operator (external execution) in the pipeline. The external device can be either a Processor or a custom component such as a Scheduler or a assembly of multiple blocks.

Any number of ports can be added to the Processor. This can be used connect additional memories and IOs without requiring a bus.

The supported instructions include integer, float, branch (taken or not taken), load/store, and clock speed change.

Hit_Ratio parameter can be added to D_1 and I_1/D_1 to determine the miss. The performance and the request to the final DRAM wis functionally modeled. The miss to the next level is not modeled. If accurach is required for the miss being sent to the next level, use the Cache block.

For Registers, the Hit-Ratio = Number_of_Registers/A_Variables

Define multiple cores and multiple processors where the cores can share the caches, while the processors can share a DRAM.

Preemption of current task by a higher priority task

Processor Parameter:

Architecture Setup name: This is the name of the Architecture_Setup block that has been instantiated in the model. This block maintains the routing and statistics information. It can also generate debug information for the instruction operation (including pipeline activity, stall, cycle and instruction number) in this processor and order of timing relative to other processors, memory and peripherals, using Listen to Block.
Processor Name: This is a unique name for this block in the entire model and is used for Routing and statistics identifier.
Instruction Set: This is a separate block and maintains the list of instructions associated with each execution unit and the number of cycles to execute. All instructions are assumed to consume 2 registers and write out to a third register. The number of cycles, associated with each instruction in the Intruction_Set, can a single value (fixed delay) or two numbers - a range (uniform distribution). The pipeline will also use the instruction name for the association with a particular named execution unit to execute in.
Processor Resources: These are defined in the Parameter_Value window of this block. These define the basic structure of the Processor. The Parameter_Values are typically extracted from the vendor data sheet.
Cache Hierarchy: This defines the location of the different levels of the cache. This is referenced in the pipeline for the fetch of instruction and data. The caches can be defined within this block or can be defined entirely in another processor block or just external blocks.
Pipeline: This defines the sequence of actions occuring in the Processor. A typical 4-stage pipeline is shown below. The demo examples of various processors created using this block will show all available options. The pipeline stage is identified by the prefix number for the name of each pipeline stage. One stage can have multiple lines.
Processor Width: This is a pull-down and sets the processor and internal cache widths.
Key attributes: There are some addtional "hidden" attributes available for this block. These can be used to better describe the number of instructions (Cache_Loop_Words) between jumps (to describe the size of a loop) and the number of lines to prefetch for the internal caches (Cache_Prefetch_Lines).
To change the clock speed, the user send an instruction. There is an additional parameter called 'Processor_Incremental_Mhz’ that must be set. This is used as the baseline speed when the user wants to dynamically adjust the core speed. The speed is adjusted by using a instruction. The instruction must start with 'CLOCK + any prefix. The value in the instruction set must be the multiplier of the Baseline value. You cna have any number of these instructions. This instruction must be send as a separate task and cannot be part of a task with other instruction. After this instruction has been send, the clock speed runs at the new speed going forward, until it is changed again. View the example the see the details.
If a particular instruction is going to be repeated, then you can use the format ADD.n, which is the number of times the instruction is repeated.
The Flush is based on the position in the pipeline, which is the definition of flushing the pipeline. The pipeline is flushed when there is a * preceding the instruction name. In any flush, which could dump the prior stages immediately in one cycle, the prior stages would need to re-engage the pipeline and logic. If the instruction and data has been prefetched, then these stages could be skipped. So, instead of three cycles, it will be one. So, the *b.n value becomes the flush value.
The 'Enable_Hello_Messages' parameter turns off or on the generation of Hello Messages. If all the blocks attached to the processor are custom, there is no need for the Hello Messages. To learn more about the Hellp Messages, review the Advancede Modeling Guide and the Architecture_Setup documentation.

Required Field:
The following are required for the Processor to function correctly (Processor_DS data structure template).
A_Destination=Processor Name as a string
A_Priority= Field must exist and be an integer
A_Instruction=Array of instruction names as strings. The instructions must match the items in the Instruction_Set.

When the Task is completed, it is sent out the instr_out port. The value in A_Source is placed in A_Destination field.

Detailed Information
The ArchitectureSetup, the Instruction_Set, input to the instr_In field and a DRAM attached to the bus_out are the minimum required configuration. The instructions associated with the software code that is to be executed on the Processor block can be generated in many ways. It is finally put into the A_Instruction field of the incoming Data Structure. The field is a array of strings. The minimum additional fields of the DS required are the A_Destination and A_Hop. Both of these fields must be a string with the Processor_Name. The DS can be sent to the Processor either from the instr_in port or the Software_Mapper directly to the Processor. When the processor has completed a transaction (DS), it returns either to the instr_out or the SoftwareMapper, depending on where the DS originated. Depending on the setting of the Pipeline, the processor may send requests out to memories, caches and DMA for instruction and data processing. The number of variables associated with a DS (task) is set in the A_Variable field. This will have an impacty on the data register hit ratio. The Basic Processor Model shows the generation of 500 ADD in the A_Instruction field. On the other hand, the model Generating Synthetic Instructions shows the use of the SoftGen block to generate the instructions. Finally, the SAR models shows the use of a trace file generated from a actual program.

There are two types of execution units- Integer and Floating. All vector units come under Floating while the branches/load/store come under Integer units. You can have any number of Integer and Floating units. Each unit must have a associated list of instructions in the Instruction_Set block. The number of interger and floating point units in the Instruction_Set must match the number listed in the resource section of the Processor.

The Processor block executes the instructions in the A_Instruction array of instruction mnemonics in sequential order, such as {"ADD", "SUB", "ADD", "MUL"}. The execution of the instruction stream is based on the user defined Pipeline. There is an out-of-order array called A_Instruction_Reorder that has a one-to-one mapping with the A_Instruction. If a single value array is used, then the value is applied to all the instructions. An alternate approach is to assign 0 (no out-of-order), 1 allows for one instruction back and 2 allows for two instructions back for each matching instruction in the A_Instruction. The field A_IDX maintains the array index of the currently executing instruction. The field A_IDY contains the array index of the last instruction to execute in the same cycle. Thus A_IDX to A_IDY is the range for the multi-instruction per cycle.

The Processor block can have multiple caches listed. Each cache has the name of the next level cache that it is associated with. The cache line has a hidden parameter (Hit_Ratio) to lock each cache to a specific performance. The actual hit ratio will be slightly different because of delays due to prefetch. All the caches do a continuous prefetch to keep up with the data request. The miss at a cache is sent to the final DRAM directly. The prefetch during the context switching time is faster than the regular prefetch. This is because the first instructions does not require any intelligent prefetching.

The Processor block supports Multiple Instruction and Multiple Data access in the same cycle. For multiple instructions, the hidden parameter (Instructions_per_Cycle 4) can be added to the list of the Parameter_Values window. For multiple data access per cycle, each cache line can add an additional parameter (Words_per_Cache_Access=2). For the multiple instructions, multiple instructions will be accessed every cycle only if the next number of instructions can be sent to available execution untis. If the multiple instruction is 4 and the next 4 instructions access the same execution,. only 1 will be sent out. If the next 4 instructions are accessing 4 different execution units and they are all available, then 4 will be accessed.

The Processor block has three pre-defined power mode states- Active (when the pipeline is busy), wait(when the pipeline is stalled) and standby(when there is no request processing). Additional power mdoes can be added to the Power_Manager. The getBlockStatus("Processor_name","length") will give the length of the input queue and the getDeviceStatus(Architecture_Setup name, "Processor_Name") will return true/false on whether the Processor is busy or not.

The instruction sequence can be obtained from many ways. The TaskGenerator block can generate the instructions based on a profile. The annotate mechanism, documented in the Reference Guide Document can also be used.

The Processor pipeline can make a request to a DMA. This is done from the Execution Unit. When the Instruction is listed with a prefix # in the Instruction Set, this indicates a load/store operation ( this is a future feature) . The specific operation is described in the Database ('Memory_Database_Reference' parameter) associated with this block and the target DMA block. When the task makes a DMA request, the request is sent to the DMA block and the task is place in an interrupt queue. Any task waiting in the input queue will start processing, after a context switching time. Refer this demo for seeing how a DMA can be connected to the processor in the current VisualSim version Multi-Thread Processor .

The Context_Switch_Cycles are the number of cycles to delay between two task execution and the beginning of the simulation.

If the instruction to be executed is prefixed with a *, it means that the branch needs to be taken. The pipeline is flushed and the sequence behaves like it came out of a loop. If the same jump instruction does not have a *, it means that it is still going through the loop.

If the user wants to add Branch Prediction to the pipeline flow, the user can make external request calls from the pipeline.

Preemption

The Processor block in VisualSim supports preemption at the instruction-level. Using this feature, the user can explore the impact of priority on the execution latency of tasks. To enable preemption, the user must add a hidden parameter called Preemption_Enable:true

Make sure to send one task at a time to the Processor. External to the processor, keep track of the A_Priority value of the current executing task. The first task starts executing. When a higher priority (Over the current task) becomes available at the Proc_Preemption block, it is sent to the Processor block. The Processor interrupts the current task, set certain fields in the current task and sends it out via the insstr_out port. The fields set are A_Preempt:true(new field), and A_IDY=A_IDX, which is the last instruction executed. The new task starts executing. If the current task encounters a DMA (or large load operation), it starts the DMA by sending the task to the external DMA, and sends out the task on the instr_out port. The fild modified are A_Start_DMA=true and A_IDY=A_IDX. Another task can be sent in to start executing. When the DMA operation has completed and is returned to the Processor, it sends out the same data structure to the instr_out port. Here the A_Start_DMA:false.

List of Usage Examples

17 examples are provided in the VS_AR/doc/Doc_Support/ directory of the VisualSim. They have the name of Processor_Model_PPT_xx.xml.

List of Demonstration Templates

Parameter_Name               Parameter_Value
Processor_Instruction_Set:   MyInstructionSet    Name of the Instruction Set block associated with this Processor
Processor_Registers:         32 Number of registers
Context_Switch_Cycles:       200 Number of cycles between transactions. Between two task or between an DMA request and the next task or between a DMA return and the currently processing task. This must be a minimum of 10.
Processor_Speed_Mhz:         Processor_Speed     Speed of the processor in Mhz.
Instruction_Queue_Length:    6   Length of the input and holds multiple DS (tasks)
Instructions_per_Cycle:       6                  Optional parameter. Support multiple instruction per cycle
ROB_Size : 160                                   Optional parameter. Specify the Reorder buffer size
I_1         {Processor_Name=Processor_1, Cache_Speed_Mhz=1000.0, Size_KBytes=16.0, Hit_Ratio=0.9999, Words_per_Cache_Access=1, Words_per_Cache_Line=16, Cache_Miss_Name=L2}
D_1         {Cache_Speed_Mhz=500.0, Size_KBytes=64.0, Words_per_Cache_Line=16, Cache_Miss_Name=L_2}
L_2         {Cache_Speed_Mhz=500.0, Size_KBytes=64.0, Words_per_Cache_Line=16, Cache_Miss_Name=Cache_1}

Explanation of the cache line: I_1 line shows all the cache parameter. D_1 line shows the minimum required lines only.

I_1    Name of this cache
Processor_Name=Processor_1 Optional parameter. If cache is in another processor, then enter the name
Cache_Speed_Mhz=1000.0       Speed of the cache. Can be different from the Processor Speed
Size_KBytes=16.0            Size of the memory. Used to determine the cache boundary for miss request
Hit_Ratio=0.9999            Optional Paremeter. Used to fix the hit ratio
Words_per_Cache_Access=1    Optional Parameter. This is the number of data access per cycle.
Words_per_Cache_Line=16    Required to identify the end of a line and generate miss, plus a prefetch

Outstanding_Req_Count=3 Optional Parameter. This is the number of outstanding requests that can be made to the corresponding memory. Used along with External cache.

Cache_Type=Load_Store Optional Parameter. This is used to specify whether the cache is Load_Store cache. This must be used only if any of the pipeline stages doesnt implicitly specify the Load_Store cache name as an Execution_Location.

The Cache can be setup outside. If so, then, we must use the keyword "External_" + Cache_Name while setting the cache hierarchy. This cache can be cycle accurate cache.

Instruction Set: The Processor_Instruction_Set is a separate block called Instruction_Set and in this case the name of the block is MyInstructionSet, see above. It contains information about each execution unit (INT_n, FP_n), where INT means integer, and FP means floating point.
"begin INT_1 ... end INT_1" defines the instructions for this execution unit. One can group Processor instructions, if they have the same number of cycles, or one can list the entire instruction set.

   Mnew Min   Max   ; /* Label */
   PROC   INT_1 FP_1 ;

   begin size_config ; /* Specify the Load store instructions */
   Read 3 32 LDR,LDUR ; /* <Command> <PipelineStage number> <Size in bits> <Instruction/Instructions/ Execution_Unit[startIndex:endIndex]> */
   Write 3 32 STR ;
   end size_config   ;

   begin execUnit_config           ;   /* Specify the Execution unit queue sizes */
   Queue_Size    INT_1     2 ;
   Queue_Size    FP_1        2    ;
   end execUnit_config             ;

   begin INT_1       ; /* Group */
   ADD   2           ;
   SUB   2           ;
   *b    2           ;
   MUL   4           ;
   DIV   4           ;
   LDR   1           ;
   LDUR 1           ;
   STR   1           ;
   end   INT_1       ;

   begin FP_1       ; /* Group */
   FADD   2           ;
   FSUB   2           ;
   FMUL   4    8       ;
   FDIV   4    12      ;
   end   FP_1       ;

The entry FADD 2 ; means that the instruction "FADD" will take 2 cycles to complete execution (Without including the I_Cache access latency and pipeline transfer latency).

The entry FMUL 4 8 ; means that the instruction FMUL can take a random delay cycle between 4 and 8 cycles to complete execution.

The Pipeline is a separate parameter window in the Processor block and can vary from two to twenty stages, depending on the processor being modeled.

Stage_Name Execute_Location Action Condition ;
1_PREFETCH I_1               instr   none      ; // I_Cache access
2_DECODE    I_1               wait    none      ;
3_DISPATCH D_1               issue   6         ; // from the dispatch stage, the width is set to be a max of 6 uop per cycle
3_EXECUTE   INT               exec    none      ; // instruction execution

The Pipeline shown above is the classic four stage pipeline for prefetch, decode, execute and store back results. More advanced pipeline execution can be modeled and references can be made to other processors and external blocks.

There are 4 columns to the pipeline.
The first column has the stage number followed by a "_" and a identifier. Multiple lines can be defined for a pipeline stage. The name can be descriptive value. The stage number must be a integer and in order. The number of stages must match the Number_of_Pipelines_Stages parameter of the processor.

The second column is the location where the line must be executed. This can be a cache or execution unit of this block. In addition, it can be a execution unit of another processor or another custom block that has a path defined in the Routing Tabel. To learn more abourt the Routing Tabel, review the Architecture Setup document or the Advanced Modeling Guide.

The third column specifies what action needs to be performed. The possible options are instr, read, write, wait and exec. instr represents an instruction access and is a keyword. For a data access, the Action can be a read or write. When the request needs to wait for a response, then the wait action is added. If the pipeline needs to do an external action, such as accessing a hardware engine or a co-processor or write data via an external definition; then add the "task" keyword here. The fourth column, ie the Condition column must specify the destination for the task. All other actions do not use the Condition column.

For a external task, the Execute_Location refers to a Scheduler or other named device outside of this Processor block. The condition column will refer to the Instruction Unit that has the list of instructions that will be executed in the external device.

The instructions are received in a Data Structure arriving on the Instr_In port on the left-side. The Data Structure is a task and can contain multiple instructions. The list of instructions supported by this block is listed in the Instruction_Table. The tasks are stored in the Instruction Queue. The length is defined by the parameter Instruction_Queue_Length. The head of the Instruction Queue is sent to the pipeline. The instructions within a task are executed in sequence.

There are three sets of statistics and a series of Timing Diagrams available standard for the Processor. The timing diagrams are for the Register read, Register Write, I_1, D_1, L_2 (if available), INT_1, INT_2(if available), FP_1(if available), and FP_2(if available). The statistics are made up of the statistics that are added to the Data Structure when the task has completed processing. This update is available in the instr_out port and Software_Mapper, depending on the origination. The list are:

CYCLES_IN_PROCESSOR           = 603.0 (Number of cycles in the processor for this task)
CYCLES_PER_INSTRUCTION        = 1.206 (CYCLES_IN_PROCESSOR/size of (A_Instruction))
MHZ_PROCESSOR                 = 2.0E9 (Final processor speed. This will be different if the clock speed has been modified by an instruction.)
MIPS_IN_PROCESSOR       = 1400.560224089636 (Millions of Instructions per second)
TIME_IN_PROCESSOR       = 3.57E-7 (Duration of time in the Processor)

The second are the utilization metrics of the caches, registers, pipeline and execution units. There is a difference between Proc (OProcessor) and Pipeline. The utilization of the pipeline indicates the percentage of time the pipeline is in the Active state. The processor utilization is total number of instructions processed over time.

Processor_1_D_1_Utilization_Pct_Max    = 5.245,
Processor_1_D_1_Utilization_Pct_Mean    = 5.245,
Processor_1_D_1_Utilization_Pct_Min    = 5.245,
Processor_1_D_1_Utilization_Pct_StDev    = 0.0,
Processor_1_FP_1_Utilization_Pct_Max    = 1.145,
Processor_1_FP_1_Utilization_Pct_Mean    = 1.145,
Processor_1_FP_1_Utilization_Pct_Min    = 1.145,
Processor_1_FP_1_Utilization_Pct_StDev    = 0.0,
Processor_1_FP_2_Utilization_Pct_Max    = 0.57,
Processor_1_FP_2_Utilization_Pct_Mean    = 0.57,
Processor_1_FP_2_Utilization_Pct_Min    = 0.57,
Processor_1_FP_2_Utilization_Pct_StDev    = 0.0,
Processor_1_INT_1_Utilization_Pct_Max    = 2.87,
Processor_1_INT_1_Utilization_Pct_Mean    = 2.87,
Processor_1_INT_1_Utilization_Pct_Min    = 2.87,
Processor_1_INT_1_Utilization_Pct_StDev    = 0.0,
Processor_1_INT_2_Utilization_Pct_Max    = 1.145,
Processor_1_INT_2_Utilization_Pct_Mean    = 1.145,
Processor_1_INT_2_Utilization_Pct_Min    = 1.145,
Processor_1_INT_2_Utilization_Pct_StDev    = 0.0,
Processor_1_I_1_Utilization_Pct_Max    = 4.925,
Processor_1_I_1_Utilization_Pct_Mean    = 4.925,
Processor_1_I_1_Utilization_Pct_Min    = 4.925,
Processor_1_I_1_Utilization_Pct_StDev    = 0.0,
Processor_1_L_2_Utilization_Pct_Max    = 4.075,
Processor_1_L_2_Utilization_Pct_Mean    = 4.075,
Processor_1_L_2_Utilization_Pct_Min    = 4.075,
Processor_1_L_2_Utilization_Pct_StDev    = 0.0,
Processor_1_PROC_Utilization_Pct_Max    = 2.3733333333333,
Processor_1_PROC_Utilization_Pct_Mean    = 2.3733333333333,
Processor_1_PROC_Utilization_Pct_Min    = 2.3733333333333,
Processor_1_PROC_Utilization_Pct_StDev    = 0.0,
Processor_1_Pipeline_Utilization_Pct_Max    = 2.885,
Processor_1_Pipeline_Utilization_Pct_Mean    = 2.885,
Processor_1_Pipeline_Utilization_Pct_Min    = 2.885,
Processor_1_Pipeline_Utilization_Pct_StDev    = 0.0,
Processor_1_Register_Rd_Utilization_Pct_Max    = 0.955,
Processor_1_Register_Rd_Utilization_Pct_Mean    = 0.955,
Processor_1_Register_Rd_Utilization_Pct_Min    = 0.955,
Processor_1_Register_Rd_Utilization_Pct_StDev    = 0.0,
Processor_1_Register_Wr_Utilization_Pct_Max    = 0.43,
Processor_1_Register_Wr_Utilization_Pct_Mean    = 0.43,
Processor_1_Register_Wr_Utilization_Pct_Min    = 0.43,
Processor_1_Register_Wr_Utilization_Pct_StDev    = 0.0,

The last are the throughput metrics for the caches, registers, pipeline and execution units. The context switch time is defined in the Processor parameters. The statistics gives a measurement of the percentage of time that was consumed by the context switching. This time is very valuable as it shows the amount of time consumed for switching between tasks. The KB_per_Thread gives a measure of the amount of cache needed to complete the processing. This gives an idea of the size of the cache required. The stall time is a statistics that provides a view of time spent in getting data or making IO calls by the Task. This is the time that the task is holding the pipeline but not doing any thing with it. The Task delay is an average over all the tasks that are executed on this processor. The L_2 hit ratio filed is not currently used. The plan is to add it in the future. The KB_per_Thread is also not used.

Processor_1_Context_Switch_Time_Pct_Max    = 45.095,
Processor_1_Context_Switch_Time_Pct_Mean    = 45.095,
Processor_1_Context_Switch_Time_Pct_Min    = 45.095,
Processor_1_Context_Switch_Time_Pct_StDev    = 0.0,
Processor_1_D_1_Hit_Ratio_Max    = 100.0,

Processor_1_D_1_Hit_Ratio_Mean    = 13.4920634920635,
Processor_1_D_1_Hit_Ratio_Min    = 0.0,
Processor_1_D_1_Hit_Ratio_StDev    = 29.3655668615204,
Processor_1_D_1_KB_per_Thread_Max    = 0.0,
Processor_1_D_1_KB_per_Thread_Mean    = 0.0,
Processor_1_D_1_KB_per_Thread_Min    = 0.0,
Processor_1_D_1_KB_per_Thread_StDev    = 0.0,
Processor_1_I_1_Hit_Ratio_Max    = 100.0,
Processor_1_I_1_Hit_Ratio_Mean    = 40.8549783549784,
Processor_1_I_1_Hit_Ratio_Min    = 0.0,
Processor_1_I_1_Hit_Ratio_StDev    = 44.307741292233,
Processor_1_I_1_KB_per_Thread_Max    = 0.0,
Processor_1_I_1_KB_per_Thread_Mean    = 0.0,
Processor_1_I_1_KB_per_Thread_Min    = 0.0,
Processor_1_I_1_KB_per_Thread_StDev    = 0.0,
Processor_1_L_2_Hit_Ratio_Max    = 0.0,
Processor_1_L_2_Hit_Ratio_Mean    = 0.0,
Processor_1_L_2_Hit_Ratio_Min    = 0.0,
Processor_1_L_2_Hit_Ratio_StDev    = 0.0,
Processor_1_L_2_KB_per_Thread_Max    = 0.0,
Processor_1_L_2_KB_per_Thread_Mean    = 0.0,
Processor_1_L_2_KB_per_Thread_Min    = 0.0,
Processor_1_L_2_KB_per_Thread_StDev    = 0.0,
Processor_1_Stall_Time_Pct_Max    = 50.715,
Processor_1_Stall_Time_Pct_Mean    = 50.715,
Processor_1_Stall_Time_Pct_Min    = 50.715,
Processor_1_Stall_Time_Pct_StDev    = 0.0,
Processor_1_Task_Delay_Max    = 2.737E-6,
Processor_1_Task_Delay_Mean    = 2.2243766233766E-6,
Processor_1_Task_Delay_Min    = 1.39E-7,
Processor_1_Task_Delay_StDev    = 4.1390570393254E-7,

begin INT_1       ; /* Group */
   ADD   2           ;
   SUB   2           ;
   *b    2           ;
   MUL   4           ;
   DIV   4           ;
   LDR   1           ;
   STR   1           ;
   end   INT_1       ;

Detailed Documentation:
Advanced Modeling Guide has comprehensive information
on the processor can be found here:

Block Keywords:
INT_n -- name of integer execution units, 1 through n
FP_n -- name of floating point execution units, 1 through n

Architectural_Name

This is the name of the ArchitectureSetup block that this Processor is associated. The Architecture_Setup block maintains the routing table and statistics collection. Type is String

Processor_Name

This is a unique name of the Processor block. No other architecture component can have this name. Type is String

Processor_Setup

Pipeline_Stages

Processor_Bits

String Attribute, Width of Processor in Bits, forms a processor word, either 16, 32, 64 Pulldown selection.

instr_in

Instruction input port. This can be connected to any VisualSim library block, model of a RTOS or custom-code. This can also be connected to a Bus_port or other blocks. The type is general.

instr_out

Instruction output port. This can be connected to any VisualSim library block, model of a RTOS or custom-code. This can also be connected to a Bus_port or other blocks. The type is general.

bus_out

Bus output port. This is one of two Bus connections on the left side (East). The type is general.

bus_in

Bus input port. This is one of two Bus connections on the left side (East). The type is general.

bus_out2

Bus output2 port. This is one of two Bus connections on the left side (East). The type is general.

bus_in2

Bus input2 port. This is one of two Bus connections on the left side (East). The type is general.

reject_out

Reject output port. When the instruction queue is full, the incoming instruction is rejected and placed on this port. The type is general.

dma_out

This is an output port through which the processor sends out interrupts to the DMA. Then the Processor continues its operation while DMA caries out the functions intended for that particular instruction sent out by the processor. ( Future feature but ports have been allocated in this version. )

dma_in

This is an input port. When the DM completes its task, it comes back to the processor. So now the processor knows, the task assigned to the DMA has been completed.( Future feature but ports have been allocated in this version. )

ProcessorGenerator / Processor Block Name: Processor