Block Overview
The Processor block in VisualSim supports preemption at the instruction-level. Using this feature, the user can explore the impact of priority on the execution latency of tasks. To enable preemption, the user must add a hidden parameter called Preemption_Enable:true
Make sure to send one task at a time to the Processor. External to the processor, keep track of the A_Priority value of the current executing task. The first task starts executing. When a higher priority (Over the current task) becomes available at the Proc_Preemption block, it is sent to the Processor block. The Processor interrupts the current task, set certain fields in the current task and sends it out via the insstr_out port. The fields set are A_Preempt:true(new field), and A_IDY=A_IDX, which is the last instruction executed. The new task starts executing. If the current task encounters a DMA (or large load operation), it starts the DMA by sending the task to the external DMA, and sends out the task on the instr_out port. The fild modified are A_Start_DMA=true and A_IDY=A_IDX. Another task can be sent in to start executing. When the DMA operation has completed and is returned to the Processor, it sends out the same data structure to the instr_out port. Here the A_Start_DMA:false.
17
examples are provided in the VS_AR/doc/Doc_Support/ directory of the
VisualSim. They have the name of Processor_Model_PPT_xx.xml.
Instruction
Set Setup
Architecture
Setup
Timing
Diagram
Basic
Processor Model- Shows the setup of the processor block
Multi-Thread
Processor- Shows the definition of multi-threaded processor
Multi-Core
Processor- Shows the use of the Processor block for defining
multi-core
SIMD
Processor Model- Single Instruction- Multi Data
MIMD
Processor Model- Multi Instruction- Multi Data
Using
SoftGen to generate profile-based synthetic instructions for
the
Processor
MIMD
Processor Model- Multi Instruction- Multi Data
Changing
the Clock Speed- Used when a specific operation or
the stage of the pipeline needs to be expanded.
Using
multi cycle delay for the flush from annotate C-code
Preemption
Enabled Adding preemption to the Processor
Static Checker Check the correctness of the Optional Parameters
Parameter_Value Window
The Parameter_Value column can also reference other model parameters.
Parameter_Name
Parameter_Value
Processor_Instruction_Set:
MyInstructionSet Name of the Instruction Set block
associated with this Processor
Processor_Registers:
32
Number
of registers
Context_Switch_Cycles:
200
Number
of cycles between transactions. Between two task or between
an DMA
request and the next task or between a DMA return and the
currently processing task. This must be a minimum of 10.
Processor_Speed_Mhz:
Processor_Speed Speed of the processor in Mhz.
Instruction_Queue_Length:
6
Length of the input and holds multiple
DS (tasks)
Instructions_per_Cycle:
6
Optional parameter. Support multiple
instruction per cycle
Pipeline_Stages:
4
Number of pipeline
stages in the processor.
INT_Execution_Units:
2
Must match the number of Int units in
the Instruction Set
FP_Execution_Units:
0
Must match
the number of Int units in the Instruction Set
Memory_Database_Reference: MyDatabase
Name of
the DMADatabase block
Cache_Execution_Units:
3
Must match the
number of caches defined in this block
I_1
{Processor_Name=Processor_1,
Cache_Speed_Mhz=1000.0, Size_KBytes=16.0, Hit_Ratio=0.9999,
Words_per_Cache_Access=1, Words_per_Cache_Line=16, Cache_Miss_Name=L2}
D_1
{Cache_Speed_Mhz=500.0,
Size_KBytes=64.0, Words_per_Cache_Line=16, Cache_Miss_Name=L_2}
L_2
{Cache_Speed_Mhz=500.0,
Size_KBytes=64.0, Words_per_Cache_Line=16, Cache_Miss_Name=Cache_1}
Explanation of the cache line: I_1 line shows all the cache parameter. D_1 line shows the minimum required lines only.
I_1
Name of this cache
Processor_Name=Processor_1 Optional
parameter. If cache is in another processor, then enter the
name
Cache_Speed_Mhz=1000.0
Speed of
the cache. Can be different from the Processor Speed
Size_KBytes=16.0
Size of the memory. Used to
determine the cache boundary for miss request
Hit_Ratio=0.9999
Optional Paremeter. Used to
fix the hit ratio
Words_per_Cache_Access=1 Optional Parameter. This is the number
of data access per cycle.
Words_per_Cache_Line=16 Required to identify the end of a line
and generate miss, plus a prefetch
Cache_Miss_Name=L2
Next
level memory when there is a miss
Instruction
Set: The Processor_Instruction_Set is a separate block called Instruction_Set
and in this case the name of the block is MyInstructionSet, see above.
It contains information about each execution unit (INT_n,
FP_n),
where INT means integer, and FP means floating
point.
"begin INT_1 ... end INT_1" defines the instructions for this
execution unit. One can group Processor instructions, if they
have the
same number of cycles, or one can list the entire instruction set.
Mnew Min Max
; /*
Label */
PROC INT_1 FP_1 ;
begin
INT_1
; /* Group
*/
ADD
2
;
SUB
2
;
CLOCK_2GHZ
20
;
*b
2
;
MUL
4
;
DIV
4
;
end
INT_1 ;
begin FP_1
; /* Group
*/
FADD
2
;
FSUB
2
;
FMUL
4
;
FDIV
4
;
end FP_1
;
The Pipeline is a separate parameter window in the Processor block and can vary from two to twenty stages, depending on the processor being modeled.
Stage_Name
Execute_Location Action Condition ;
1_PREFETCH
I_1
instr
none ;
1_PREFETCH
D_1
read
none ;
2_DECODE
I_1
wait
none ;
3_EXECUTE
D_1
wait
none ;
3_EXECUTE
INT
exec
none ;
4_STORE
D_1
write
none ;
2_External
Co_Pro
task INT_2
;
The Pipeline shown above is the classic four stage pipeline for prefetch, decode, execute and store back results. More advanced pipeline execution can be modeled and references can be made to other processors and external blocks.
There are 4 columns to the
pipeline.
The
first column has the stage number followed by a "_" and a identifier.
Multiple lines can be defined for a pipeline stage.
The
name can be descriptive value. The stage number must be a
integer
and in order. The number of stages must match the
Number_of_Pipelines_Stages parameter of the processor.
The second column is the location where the line must be executed. This can be a cache or execution unit of this block. In addition, it can be a execution unit of another processor or another custom block that has a path defined in the Routing Tabel. To learn more abourt the Routing Tabel, review the Architecture Setup document or the Advanced Modeling Guide.
The third column specifies what action needs to be performed. The possible options are instr, read, write, wait and exec. instr represents an instruction access and is a keyword. For a data access, the Action can be a read or write. When the request needs to wait for a response, then the wait action is added. If the pipeline needs to do an external action, such as accessing a hardware engine or a co-processor or write data via an external definition; then add the task keyword here. The fourth column, ie the Condition column must specify the destination for the task. All other actions do not use the Condition column.
For a external task, the Execute_Location refers to a Scheduler or other named device outside of this Processor block. The condition column will refer to the Instruction Unit that has the list of instructions that will be executed in the external device.
Operation of the Pipeline:
The instructions are received in a Data Structure arriving on the Instr_In port on the left-side. The Data Structure is a task and can contain multiple instructions. The list of instructions supported by this block is listed in the Instruction_Table. The tasks are stored in the Instruction Queue. The length is defined by the parameter Instruction_Queue_Length. The head of the Instruction Queue is sent to the pipeline. The instructions within a task are executed in sequence.
Statistics and Plotting
There are three sets of statistics and a series of Timing Diagrams available standard for the Processor. The timing diagrams are for the Register read, Register Write, I_1, D_1, L_2 (if available), INT_1, INT_2(if available), FP_1(if available), and FP_2(if available). The statistics are made up of the statistics that are added to the Data Structure when the task has completed processing. This update is available in the instr_out port and Software_Mapper, depending on the origination. The list are:
CYCLES_IN_PROCESSOR
= 603.0 (Number of cycles in the processor for this task)
CYCLES_PER_INSTRUCTION
= 1.206 (CYCLES_IN_PROCESSOR/size of (A_Instruction))
MHZ_PROCESSOR
= 2.0E9 (Final processor speed. This will be different if the
clock speed has been modified by an instruction.)
MIPS_IN_PROCESSOR
= 1400.560224089636 (Millions of Instructions per second)
TIME_IN_PROCESSOR
= 3.57E-7 (Duration of time in the Processor)
The second are the utilization metrics of the caches, registers, pipeline and execution units. There is a difference between Proc (OProcessor) and Pipeline. The utilization of the pipeline indicates the percentage of time the pipeline is in the Active state. The processor utilization is total number of instructions processed over time.
Processor_1_D_1_Utilization_Pct_Max
= 5.245,
Processor_1_D_1_Utilization_Pct_Mean =
5.245,
Processor_1_D_1_Utilization_Pct_Min =
5.245,
Processor_1_D_1_Utilization_Pct_StDev =
0.0,
Processor_1_FP_1_Utilization_Pct_Max =
1.145,
Processor_1_FP_1_Utilization_Pct_Mean =
1.145,
Processor_1_FP_1_Utilization_Pct_Min =
1.145,
Processor_1_FP_1_Utilization_Pct_StDev =
0.0,
Processor_1_FP_2_Utilization_Pct_Max =
0.57,
Processor_1_FP_2_Utilization_Pct_Mean =
0.57,
Processor_1_FP_2_Utilization_Pct_Min =
0.57,
Processor_1_FP_2_Utilization_Pct_StDev =
0.0,
Processor_1_INT_1_Utilization_Pct_Max =
2.87,
Processor_1_INT_1_Utilization_Pct_Mean =
2.87,
Processor_1_INT_1_Utilization_Pct_Min =
2.87,
Processor_1_INT_1_Utilization_Pct_StDev =
0.0,
Processor_1_INT_2_Utilization_Pct_Max =
1.145,
Processor_1_INT_2_Utilization_Pct_Mean =
1.145,
Processor_1_INT_2_Utilization_Pct_Min =
1.145,
Processor_1_INT_2_Utilization_Pct_StDev =
0.0,
Processor_1_I_1_Utilization_Pct_Max =
4.925,
Processor_1_I_1_Utilization_Pct_Mean =
4.925,
Processor_1_I_1_Utilization_Pct_Min =
4.925,
Processor_1_I_1_Utilization_Pct_StDev =
0.0,
Processor_1_L_2_Utilization_Pct_Max =
4.075,
Processor_1_L_2_Utilization_Pct_Mean =
4.075,
Processor_1_L_2_Utilization_Pct_Min =
4.075,
Processor_1_L_2_Utilization_Pct_StDev =
0.0,
Processor_1_PROC_Utilization_Pct_Max =
2.3733333333333,
Processor_1_PROC_Utilization_Pct_Mean =
2.3733333333333,
Processor_1_PROC_Utilization_Pct_Min =
2.3733333333333,
Processor_1_PROC_Utilization_Pct_StDev =
0.0,
Processor_1_Pipeline_Utilization_Pct_Max
= 2.885,
Processor_1_Pipeline_Utilization_Pct_Mean
= 2.885,
Processor_1_Pipeline_Utilization_Pct_Min
= 2.885,
Processor_1_Pipeline_Utilization_Pct_StDev
= 0.0,
Processor_1_Register_Rd_Utilization_Pct_Max
= 0.955,
Processor_1_Register_Rd_Utilization_Pct_Mean
= 0.955,
Processor_1_Register_Rd_Utilization_Pct_Min
= 0.955,
Processor_1_Register_Rd_Utilization_Pct_StDev
= 0.0,
Processor_1_Register_Wr_Utilization_Pct_Max
= 0.43,
Processor_1_Register_Wr_Utilization_Pct_Mean
= 0.43,
Processor_1_Register_Wr_Utilization_Pct_Min
= 0.43,
Processor_1_Register_Wr_Utilization_Pct_StDev
= 0.0,
The last are the throughput metrics for the caches, registers, pipeline and execution units. The context switch time is defined in the Processor parameters. The statistics gives a measurement of the percentage of time that was consumed by the context switching. This time is very valuable as it shows the amount of time consumed for switching between tasks. The KB_per_Thread gives a measure of the amount of cache needed to complete the processing. This gives an idea of the size of the cache required. The stall time is a statistics that provides a view of time spent in getting data or making IO calls by the Task. This is the time that the task is holding the pipeline but not doing any thing with it. The Task delay is an average over all the tasks that are executed on this processor. The L_2 hit ratio filed is not currently used. The plan is to add it in the future. The KB_per_Thread is also not used.
Processor_1_Context_Switch_Time_Pct_Max
= 45.095,
Processor_1_Context_Switch_Time_Pct_Mean
= 45.095,
Processor_1_Context_Switch_Time_Pct_Min =
45.095,
Processor_1_Context_Switch_Time_Pct_StDev
= 0.0,
Processor_1_D_1_Hit_Ratio_Max = 100.0,
Processor_1_D_1_Hit_Ratio_Mean =
13.4920634920635,
Processor_1_D_1_Hit_Ratio_Min = 0.0,
Processor_1_D_1_Hit_Ratio_StDev =
29.3655668615204,
Processor_1_D_1_KB_per_Thread_Max = 0.0,
Processor_1_D_1_KB_per_Thread_Mean = 0.0,
Processor_1_D_1_KB_per_Thread_Min = 0.0,
Processor_1_D_1_KB_per_Thread_StDev =
0.0,
Processor_1_I_1_Hit_Ratio_Max = 100.0,
Processor_1_I_1_Hit_Ratio_Mean =
40.8549783549784,
Processor_1_I_1_Hit_Ratio_Min = 0.0,
Processor_1_I_1_Hit_Ratio_StDev =
44.307741292233,
Processor_1_I_1_KB_per_Thread_Max = 0.0,
Processor_1_I_1_KB_per_Thread_Mean = 0.0,
Processor_1_I_1_KB_per_Thread_Min = 0.0,
Processor_1_I_1_KB_per_Thread_StDev =
0.0,
Processor_1_L_2_Hit_Ratio_Max = 0.0,
Processor_1_L_2_Hit_Ratio_Mean = 0.0,
Processor_1_L_2_Hit_Ratio_Min = 0.0,
Processor_1_L_2_Hit_Ratio_StDev = 0.0,
Processor_1_L_2_KB_per_Thread_Max = 0.0,
Processor_1_L_2_KB_per_Thread_Mean = 0.0,
Processor_1_L_2_KB_per_Thread_Min = 0.0,
Processor_1_L_2_KB_per_Thread_StDev =
0.0,
Processor_1_Stall_Time_Pct_Max = 50.715,
Processor_1_Stall_Time_Pct_Mean = 50.715,
Processor_1_Stall_Time_Pct_Min = 50.715,
Processor_1_Stall_Time_Pct_StDev = 0.0,
Processor_1_Task_Delay_Max = 2.737E-6,
Processor_1_Task_Delay_Mean =
2.2243766233766E-6,
Processor_1_Task_Delay_Min = 1.39E-7,
Processor_1_Task_Delay_StDev =
4.1390570393254E-7,
Detailed
Documentation:
Advanced
Modeling Guide
has comprehensive information
on the processor can be found here:
Block Keywords:
INT_n -- name of integer
execution units, 1 through n
FP_n -- name of floating
point execution units, 1 through n
Field Details |
This is the name of the ArchitectureSetup block that this Processor is associated. The Architecture_Setup block maintains the routing table and statistics collection. Type is String
This is a unique name of the Processor block. No other architecture component can have this name. Type is String
Setup Processor parameters. Type is text string.
Define Processor pipeline stage execution. Type is text string.
String Attribute, Width of Processor in Bits, forms a processor word, either 16, 32, 64 Pulldown selection.
Instruction input port. This can be connected to any VisualSim library block, model of a RTOS or custom-code. This can also be connected to a Bus_port or other blocks. The type is general.
Instruction output port. This can be connected to any VisualSim library block, model of a RTOS or custom-code. This can also be connected to a Bus_port or other blocks. The type is general.
Bus output port. This is one of two Bus connections on the left side (East). The type is general.
Bus input port. This is one of two Bus connections on the left side (East). The type is general.
Bus output2 port. This is one of two Bus connections on the left side (East). The type is general.
Bus input2 port. This is one of two Bus connections on the left side (East). The type is general.
Reject output port. When the instruction queue is full, the incoming instruction is rejected and placed on this port. The type is general.
This is an output port through which the processor sends out interrupts to the DMA. Then the Processor continues its operation while DMA caries out the functions intended for that particular instruction sent out by the processor. ( Future feature but ports have been allocated in this version. )
This is an input port. When the DM completes its task, it comes back to the processor. So now the processor knows, the task assigned to the DMA has been completed.( Future feature but ports have been allocated in this version. )