Processor Generator / Cache
Block Name: Cache

Code File Location: VisualSim/actor/arch/Cache

Description

This blocks is used to emulate a cache in a architecture model.  There are interfaces on both sides of the block for connectivity. The block handles request queuing, cache access latency, cache hit-miss evaluation, cache prefetch, Read/Write data response and Cache miss activity to the next level of memory. The block is typically connected to a Linear_Port or to a custom Controller created using blocks or SystemC.  It cannot be connected directly to the Processor block.  This is because it requires a path outside of the Processor to the next-level memory.

Refer DRAM and Cache Demo Model.

Cache

Operation

The following is the sequence of operation:

   1. Incoming Request:

  1. The Cache accepts the incoming request if the A_Destination is itself.  
  2. The A_Command contains the request type and can be Prefetch, Read or Write.  A Prefetch and Read are treated as a Read. 
  3. For the Read operation, the Cache checks if the A_Bytes = A_Bytes_Remaining + A_Bytes_Sent.  The A_Bytes_Sent cannot be zero.
  4. For the Write operation, the cache checks if A_Bytes_Remaining = 0.  
  5. The cache will accept the first fragment of the transaction from the Bus.  All remaining fragments are ignored.

   2. Internal delay = access time*data request size where access_Time = Cache_Line/ Cache_Speed

  1. For Read operation, the Cache starts the delay as soon as it receives the request and if the Cache is free.

   3. Outgoing Response:

  1. For a Read, the data is sent back to the source after the delay for the first Word. The remaining words are delayed internally. The A_Bytes_Sent = Cache_Width, A_Bytes = request data size and A_Bytes = A_Bytes_Remaining + A_Bytes_Sent.
  2. For a Write request, the Cache delays for the access time*data request size.
  3. The A_Source and A_Destination are swapped and the A_Hop = A_Destination. 
  4. The Read response has the A_Command = Write when it is returned to the Source.
  5. If the Write has A_Task_Flag = true, the A_Command is modified to Read and sent back to the Source. If the Write has A_Task_Flag = false, then the message is written to output3. The response for the Write is sent after all the words have been delayed at the Cache.

   4. DMA case: When the A_Task_Flag = true, both Read and Write are returned. 

  1. In this case, the Read is returned after the first Word.  A_Command is returned as Write.
  2. The Write is returned after the last Word has completed the processing in the block.  The A_Command is returned as a Read; A_Bytes = A_Bytes_Sent = Cache_Width; and  A_Bytes_Remaining = 0.

   5. Hit-Miss: 

  1. When a miss occurs, a miss request is made to the device listed in Next_Miss_Memory field. 
  2. The response for the miss is sent directly back to the originating Source and is not sent back to this cache block.
  3. A miss will initiate a prefetch.

   6. Prefetch

  1. If a prefetch is requested, A_Command=Prefetch is sent to the device listed in Next_Miss_Memory field. The prefetch is returned to the cache as a  A_Command = Write.
  2. If a miss occured or the number of cache transactions exceeds the words_per_cache_line a cache prefetch is initiated. 

Queuing

The Cache receive a request for a read or Write of data and instructions. The request can come from either of the two ports. The Cache block does not distinguish between Data and Instruction requests. If separate memory needs to be allocated to Data and Instruction with unique access mechanism, use two Cache block. The incoming request is processed immediately if the Cache is not active, else placed in the FIFO. There is a single FIFO for both ports and the size is determined by the parameter "FIFO_Buffer". 

Cache Hit-Miss

When a new request comes in for processing, the cache evaluates the Hit Expression. If the expression evaluates to a "true", then a hit occurs, else a miss occurs. If a miss occurs, the task is sent to the Next_Miss_Memory block after the task is delayed by the access time*data request size. The miss task will have the same original A_Source but the A_Destination will now be the next memory.

Pre-fetch

An internal counter is maintained with knowledge of the prior requests from the current active cache line. When a Hit occurs, a pre-fetch test is done. If the number of remaining words is less than the number of words requests (Read or Write), a pre-fetch is initiated. The prefetch sends a request to the next memory to get one line of cache. This block does not support multiple cache line pre-fetch. If this is required, then the cache line is made larger to emulate the same effect. The pre-fetch is done on both read and write to be a operational activity that models the worst case behavior. When a pre-fetch requests arrives from the higher-level cache (I-1, D_1 or L2), this block either respond with a line of cache or will generate a miss to the next level of memory. Also, a miss request will generate a pre-fetch in addition to sending the incoming request to the next lower memory.

Additional Cache Miss

On a "true" Hit expression, if the number of word request exceeds the cache boundary, a miss occurs. This is over and above the miss based on the Hit expression. For a 16KB cache size and a 4B word size, there are 4000 words. If the counter reaches 4000, this will cause a miss even if the Hit expression evaluates to a True.

Cache Response

If the task is a Read, a response is sent back to the Source with the A_Source being this cache name and the A_Destination being the task requester. If the task is a Write, no response is sent.

Data Structure Fields

The A_Source, A_Destination, A_Command, A_Bytes, A_Bytes_Remaining and A_Bytes_Sent are the required field. Additional fields required are A_Hop, A_Status, A_Task_Flag, A_Interrupt, A_Prefetch and A_Priority.  The A_Source and A_Destination are reversed before sending the response for a Write/Read operation. For a pre-fetch, the A_Source is set to the Processor and the A_Destination is the next level memory.

Additional information on this library block is available in chapter 3 - Advanced Modeling Topics.


Parameter Explanation Type  Example 
Architectural_Name This is the name of the Architecture_Setup block that this Cache is associated. If the Architecture_Setup does not exist, a error will be generated and the mdoel will be terminated. String "Architecture_1"
Cache_Name This is a unique name of this Cache. No other Architecture, Virtual_Machine, Scheduler or Smart_Resource can have this name in a model. The memory name is used to identify this destination. String "Cache_1"
Miss_Memory_Name This is used to route the requests to the next level of memory to access when a cache miss occurs or a prefetch is requested. String "DRAM"
Cache_Speed_Mhz Speed of the cache in Mhz. Double 500.0
Cache_Size_KBytes Size of the cache in KBytes. Double 64.0
Width_Bytes Width of the cache in bytes. Options are- 2,4,8. Integer 4
Words_per_Cache_Line The number of Words per cache line. This is an important factor in determining if there is a cache hit or miss. Integer 16
FIFO_Buffers This is the number of out standing requests that need to be processed (both Read and Write).  Under normal circumstances, the queue will be empty but special designs can accomodate multiple outstanding requests.  Integer 32
Cache_Address This is a range of values and specifies the range of memory addresses that are associated with this Cache block. The format is "Min_Address,Max_Address". for example, "201,300".  See the example on Using Memory Controller in the BDE to understand the application. String "/* Format: Min_Address,Max_Address. Example:100,200 */"
Cache_Hit_Expression This is a expression for the cache hit. This is logic expression and uses the full power of the RegEx language. If the logic expression evaluates to a Boolean true, then the task had a cache hit, else a miss occured. For example, the expression can be (Mem_Writes * Task_Size + Fragment_Overhead_Factor) < Cache_Size_KBytes. String "rand(0.0,1.0) <= 0.95"


Port Explanation
input This port connects this block to a Linear_Port or other device to accept requests. This is the set of ports on the left side (West).
output This port connects this block to a Linear_Port or other device to send the Read_Response. This is the set of ports on the left side (West).
input2 This port connects this block to a Linear_Port or other device to accept requests. This is the set of ports on the right side (East).
output2 This port connects this block to a Linear_Port or other device to send the Read_Response. This is the set of ports on the right side (East).