Introduction
to the Cache Library
Mirabilis Design provides Cache block as a standard library
in VisualSim Architect modeling environment. Using this environment,
systems
engineers and product architects can evaluate the impact of design
decisions on
the performance of their system.
VisualSim Cache modeling environment supports the requirement of the Semiconductor market. Combined with the industry
hardware, software
and system library provided in VisualSim Architect, designers assemble
an
end-to-end system and evaluate the throughput, latency and other
performance
attributes. The
proposed system can
tested for various fault and error condition to evaluate the
susceptibility of
the system for real-world conditions.
VisualSim
Cache block can be used to design a complete architecture for Processor
designs and IO device integation to the architecture.
Objective
- Build a model using Cache block
- Analyse the Latency, throughput, power and behavior flow of the block
- Analyse the overal behavior flow of the model
Tutorial System
A demonstration system is provided
with explanation on the
use of this library. The block diagram is provided below:
Figure
2: Block diagram of Cache System
Figure
3: Cache Example system in VisualSim
The Above block diagram shows the internal architecture of CPU, cache and memory.
Traffic generators
for I and D cache emulates Processor traffic with required fields in
the data structure. The requests generated by the traffic will be
passed through the bus to designated cache block. The L2 cache is
shared for I and D, it will use main memory in the case of data is
unavailable.
The VisualSim model of this design is
at-
VS_AR/demo/memory/Cache_and_mem.xml
When you run the
simulation, there
are three text display output showing the statistics for each cache
block and one plotter for power consumption. The statistics are generated twice during the simulation.
Model Description
The design consists of three major hardware modeling components
and buses for interconnection. The major hardware modeling Components
are:
- Processor Traffic
- Traffic, ExpressionList, Queue and Device Interface combines the operation of cache request in the Processor.
- L1 and L2 Cache
- Instruction and Data Cache are implemented separately as a combination of L1 cache
- Main Memory
- RAM block is used as a stochastic model for main memory.
In the case of Cycle accurate model combination of Memory Controller
and Cycle Accurate DRAM can be used.
The processor traffic for I and D generates data structure with
required fields and send one packet at a time in the model. The blocks
in the left hand side of CPU bus is responsible for generating data
request for thee cache.
The back side bus interconnects L1 cache and L2 cache. The front side bus interconnects L2 cache and main memory
Blocks Required
- Model_Setup -> DigitalSimulator
- Model_Setup -> Parameter
- Hardware_Setup -> ArchitechtureSetup
- Traffic -> Traffic
- Model_Setup ->VariableList
- Behavior -> ExpressionList
- Resources -> Queues
- HardwareSetup -> DeviceInterface
- Full Library -> Math Operations -> Math and Trig -> Const
- HardwareDevices -> BusArbiter
- HardwareDevices -> BusInterface
- Interfaces_and_Buses -> AMBA -> AMBA_AXI
- Memory -> RAM
- Power -> PowerTable
- Memory -> Integrated_Cache
Construction Steps
Step 1:
- Drag DigitalSimulator from model_Setup.
- Drag Parameter form model_Setup, set name as "Sim_Time" (right click on parameter -> Customize_Name - > "Sim_Time").
- Configure DigitalSimulator
- stopTime -> Sim_Time
- writeStatsToFile -> true
- Drag ArchitectureSetup from HardwareSetup.
Step 2:
To Emulate Processor traffic:
- Drag Trafffic block from Traffic, cofigure as follows:
- Data_Structure_Name -> "Hardware_DS"
- Value_1 -> 1.0e-6
- Time_Dlistribution -> Fixed(Value_1)
- Drag ExpressionList from Behavior , modify the flow as follows
input.A_Source = "Proc_1_I"
input.A_Destination = "I_1"
input.A_Command = "Read_Instr"
input.A_Bytes
= 32
input.A_Task_Flag = true
input.A_D_Addr
= 0
input.A_I_Addr
= (TNow>0.0)?(last+8):input.A_Address
last
= input.A_I_Addr
- Drag Queses from Resources, configure as follows:
- Block_Name -> "I_Cache_Queue"
- Drag DeviceInterface from HardwareSetup, configure as follows:
- Drag Const from Full Library -> Math Operations -> Math and Trig
- Repeat the above step 1 for D -cache processor traffic
- Drag ExpressionList from Behavior , modify the flow as follows
input.A_Source = "Proc_1_D"
input.A_Destination = "D_1"
input.A_Command = "Read_Req"
input.A_Bytes
= 32
input.A_Task_Flag = true
input.A_I_Addr
= 0
input.A_D_Addr
= (TNow>0.0)?(last+8):input.A_Address
last
= input.A_D_Addr
- Drag DeviceInterface from HardwareSetup, configure as follows:
- Drag Queses from Resources, configure as follows:
- Block_Name -> "D_Cache_Queue"
- Drag Const from Full Library -> Math Operations -> Math and Trig
- Drag VaiableList form model_Setup
Name Type
Value
last local
0 ;
Step 3:
To Configre Bus and Cache:
- Drag BusArbiter from HardwareDevices, Configure as follows:
- Bus_Name -> "Bus_1"
- Bus_Speed_Mhz -> 1000.0
- Burst_Size_Bytes -> 100.0
- Width_Bytes -> 4
- Drag BusInterface from HardwareDevices,Cofigure as follows:
- Bus_Name -> "Bus_1"s
- Port_Name_1 -> "Port_1"
- Port_Name_2 -> "Port_2"
- Drag BusInterface from HardwareDevice, configure as follows:
- Bus_Name -> "Bus_1"
- Port_Name_1 -> "Port_3"
- Port_Name_2 -> "Port_4"
- Drag Integrated_Cache from Memory(I-Cache),Configure as follows:
- Drag Integrated_Cache from Memory(D-Cache),Configure as follows:
- Drag Integrated_Cache from Memory(L2-Cache),Configure as follows:
Step 4:
To Configre AXI and Memory(Stochastic or Cycle Accurate):
- Drag AMBA_AXI from Interfaces_and_Buses -> AMBA, Configure as follows
- Bus_Name -> "AXI"
- AXI_Speed_Mhz -> 1000.0
- Bus_Width -> 4
- Threshold_Trans_T_Bytes_F -> true
- Number_Masters -> 1
- Number_Slaves -> 1
- Memory for Stochastic Model
- Drag RAM from Memory, configure as follows:
- Memory for Address Based model
- set "Stochastic_or_Address_Based" parameter in the cache blocks as "Address_Baseed"
- modify the Miss_Memory_Name as "DRAM" in L2 cache
- Drag Memory Controller form Memory
- Draag Cycle Accurate DRAM from Memory
- Configure as follows:
Step 5:
To Configre Power Manager:
- Drag Power Table from Power, configure as follows
Step 6:
Run the model and observe the statistics for
each cache block and the power consumption of the overall model during
the simulation period. The following images shows the statistics and
power consumption.
Statistics of cache blocks shows Throughput, buffer
occupancy, utilization ,hit and miss ratio. The power plot shows the
Instantaneous power and average power consumed by the hardware blocks
during the simulation.
Latency and Throughput of the cache blocks can be
plotted using the ports available at cache block. User can use time
data plotters for viewing the latency and throughput.