Design and Analysis of Memory Subsystem Architecture

Parent Previous Next

Learning Objectives

Use the framework available in VisualSim to build a model for the following objectives:

Introduction

A designer or architect considers the performance of a memory subsystem as one of the crucial factors to achieve real-time application performance. In addition, they must accurately evaluate the tradeoff between performance, power, cost, and reliability of the system. They also need to evaluate properties such as locality, interface technology, arbitration algorithm, and data width while making architecture decisions.


In this tutorial, we explore a memory system architecture with the following levels of abstractions.


  1. Statistical DRAM and Controller with an Abstract Bus Arbiter.
    Tutorial Model can be found at $VS\VS_AR\doc\Training_Material\Tutorial\WebHelp\Tutorial\Performance_Modeling\mem_bw_model.xml
  2. Statistical DRAM with a Memory Controller Logic and Single AXI.
    Tutorial Model can be found at $VS\VS_AR\doc\Training_Material\Tutorial\WebHelp\Tutorial\Performance_Modeling\mem_bw_model_V3_1.xml
  3. Statistical DRAM with a Memory Controller Logic and Single AXI and Dedicated Local Bus.
    Tutorial Model can be found at $VS\VS_AR\doc\Training_Material\Tutorial\WebHelp\Tutorial\Performance_Modeling\mem_bw_model_V3_2.xml
  4. Statistical DRAM with a Memory Controller Logic and Single AXI, Dedicated local Bus and a DMA.
    Tutorial Model can be found at $VS\VS_AR\doc\Training_Material\Tutorial\WebHelp\Tutorial\Performance_Modeling\mem_bw_model_V3_5.xml
  5. Cycle Accurate DRAM and Memory Controller.
    Tutorial Model can be found at $VS\VS_AR\doc\Training_Material\Tutorial\WebHelp\Tutorial\Performance_Modeling\mem_bw_model_V3_6.xml

Note: Add an offset to every traffic block. In AXI_Bus change the Threshold_Trans_T_Bytes_F value to be true.

    Design Methodology

    Figure 1 depicts a simple block diagram of a memory system. As the analysis is focussed around memory subsystem, we have abstracted out the processor/external device that performs requests.

    In this tutorial, the user analyses the response of the memory to requests from different devices. The user chooses an arbitration to select the highest priority request or the request that arrived first. The request is then put in a Command Queue based on the selected arbitration algorithm. Based on the address of the transactions, a DRAM Requestor decides on the location in the memory to which to send the transactions.

    After the completion of the DRAM Read/Write/Erase activity, the response is sent to the Requestor in First Come First Out order.

    Memory Bandwidth Model Block Diagram

    Figure 1: Block Diagram

    Use the above block diagram to create enhanced models with additional blocks. The details are in the subsequent sections.

    Block Diagram Usage of VisualSim

    In VisualSim, you model the block diagram into five different variations. The details of the five variations are given below:


    Variation
    Details
    Variation 1 (Abstract Arbitration Algorithm and Memory Controller)
    1. Traffic generators
    2. Processors to assign values to transactions
    3. Arbitration mechanism
    4. Command Queues
    5. DRAM Requestor
    6. DRAM
    Variation 2 (Single Bus Interface between devices and Memory)

    1. Traffic generators
    2. Processors to assign values to transactions
    3. Device Interface
    4. BUS
    5. Script
    6. DRAM
    Variation 3 (Local bus and AXI Bus)
    1. Traffic generators
    2. Processors to assign values to transactions
    3. Device Interface
    4. Local Bus
    5. Bridge
    6. BUS
    7. Script
    8. DRAM
    Variation 4 (Extension to Variation 3 with DMA

    1. Traffic generators
    2. Processors to assign values to transactions
    3. Device Interface
    4. DMA Controller
    5. DMA Database
    6. BUS
    7. Script
    8. DRAM
    Variation 5 (Extension to Variation 4 and use Cycle Accurate Memory Controller and Memory)
    1. Traffic generators
    2. Processors to assign values to transactions
    3. Device Interface
    4. DMA Controller
    5. DMA Database
    6. BUS
    7. Memory Controller
    8. DRAM

    Apart from these parts, the following elements are unique to VisualSim:

    Variances from the Block Diagram in VisualSim Modeling

    Variances


    We factor in the following variances when we model the block diagram in VisualSim. These variances are simulation-specific, either for ease of modelling or to capture statistics in the model.

    Mapping of the block diagram to VisualSim Model


    The following table provides a mapping of the block diagram to the VisalSim Model.


    Block Diagram
    VisualSim Model
    • Groups 1, 2, and 3
    • Traffic
    • Processing
    • Arbitration
    • Hierarchical
    • CMD Queue
    • DRAM Requestor
    • Queue
    • Const
    • DRAM
    • RAM
    • FIFO
    • TimeDataPlotter
    • Expression_List
    • Queue

    Building the VisualSim Model

    Use the block information as the base and build a VisualSim model with the Library Blocks listed in the following table (Table 1). Note that some of the steps are applicable only to a particular variation. Such information is given in the “Applicable for Variation” column.


    Initial Setup



    S.No.
    Process
    Library Block
    Applicable for Variation
    1.
    • Create a Digital Simulator.
    • Use the “Parameter=” block to define a parameter (TStop) and value (for example 10.0) for the Digital parameter “stopTime”.
    Digital
    Variation 1, 2, 3, 4, 5
    2.
    • Implement an Architecture setup
    Architecture_Setup

    Variation 1, 2, 3, 4, 5

    Create Traffic Generators


    S.No.
    Process
    Library Block
    Applicable for Variation
    1.
    • Ensure three traffic generators (Group 1, Group 2, Group 3) to send transactions to the memory.
    • Use the “Parameter=” block to define parameters (Mean_Interrarival_Group1, Mean_Interrarival_Group2, Mean_Interrarival_Group3) and a uniform value (for example 0.1) for the parameter “Value_1” of the traffic generators.
    • Specify the value “Fixed (Value_1)” for the parameter “Time_Distribution” of the traffic sources.

    Note
    : Create three modules of traffic generators with each module comprising three different traffic controllers for Variations 2, 3, 4, and 5.
    Traffic
    All the Variations
    2.
    • Build blocks to assign values to the transactions from the traffic sources.
    • Specify the following values for the parameter “Expression_List”.
    • input.A_Bytes = 400
    • input.A_Bytes_Remaining = 0
    • input.A_Bytes_Sent = input.A_Bytes
    • input.A_Command = "Write"
    • input.A_Destination = "DRAM"
    • input.A_Hop = "DRAM"
    • input.A_Status = ""
    • input.A_Task_Flag = true
    • input.A_Interrupt = false
    • input.A_Prefetch = false
    • input.A_Priority = 1
    • input.Time_Generated = TNow
    Processing
    All the Variations
    3.
    • Add the following additional instance-specific statements in the respective processing block.
    • “Processing1”
    • input.A_Source = "Group1"
    • input.Origin = "Group1"
    • “Processing2”
    • input.A_Source = "Group2"
    • input.Origin = "Group2"
    • “Processing3”
    • input.A_Source = "Group3"
    • input.Origin = "Group3"

    All the Variations
    4.
    • Create a block to make the transactions compatible with the proposed architecture.
    DeviceInterface
    Variations 2, 3, 4, and 5
    5.
    • Implement a block to receive the transactions from the Memory.
    OUT
    Variations 2, 3, 4, and 5
    6.
    • Create a DMA Controller.
    DMA Controller
    Variations 4 and 5
    7.
    • Create a DMA Database.
    DMA Database
    Variations 4 and 5

    Note:

    Prioritize the transactions (Applicable only for Variation 1)


    S.No.
    Process
    Library Block
    Applicable for Variation
    1.
    • Create a block titled “Arbitration” to prioritize the transactions.
    Hierarchical
    Variation 1
    2.
    • Implement a Command_Queue to put the transactions in a queue.
    • Specify “A_Priority” as the value for the parameter “Priority_Field”.
    Queue
    Variation 1
    3.
    • Create a Const to POP the head of the queue:
    • Right-click on the “Const” block and select menu Appearance->Flip Ports Horizontally.
    Const
    Variation 1

    Note:

    Implement BUS (Not applicable for Variation 1)


    S.No.
    Process
    Library Block
    Applicable for Variation
    1.
    • Create BUS to route the transactions.
    AMBA_AXI
    Variation 2
    2.
    • Implement a local BUS to route the transactions.
    AMBA_AHB
    Variations 3, 4, and 5
    3.
    • Create a bridge to connect the local bus to BUS.
    Bridge
    Variations 3, 4, and 5

    Note:

    Implement a Memory


    S.No.
    Analysis
    Library Block
    1.
    • Implement a script to either randomly or sequentially write/read the transactions to the memory.
    Script
    2.
    • Implement a Cycle Accurate Memory Controller to either randomly or sequentially write/read the transactions to the memory.
    Memory Controller
    3.
    • Insert a memory to hold the transactions.
    • Use the “Parameter=” block to define:
    • “Access_Time” parameter with the value "Read 1000.0/Memory_Speed_Mhz, Prefetch 3.0, Write 1000.0/Memory_Speed_Mhz, ReadWrite 3.0, Erase 3.0"
    • “Memory_Speed_Mhz” with a value 256.0.
    • Specify the following values for other parameters:
    • Memory_Name: “DRAM”
    • Memory_Type:DDR
    • Deselect “Enable_Hello_Messages”
    • Add the parameter “Refresh” and specify false as the default value.
    RAM

    Note:

    Illustration of the Model

    The VisualSim models are given below:


    Variation 1

    MBD Variation 1
    Figure 2: Type 1 VisualSim Model


    Variation 2

    MBW Variation 2

    Figure 3: Type 2 VisualSim Model

    Variation 3

    MBW Variation 3
    Figure 4: Type 3 VisualSim Model

    Variation 4

    MBW Variation 4
    Figure 5: Type 4 VisualSim Model

    Variation 5

    MBW Variation 5
    Figure 6: Type 5 VisualSim Model


    Gathering Resource Statistics and Reports


    Build a Visual model using the Library blocks listed in the table to gather resource statistics and reports.


    Variation 1


    S.No.
    Parameter
    Value
    1.
    • Create a block titled “xTime_yData_Plotter”.
    • Right-click on the block and select menu Appearance->Flip Ports Horizontally.
    TimeDataPlotter
    2.
    • Build a Decision block.
    • Right-click on the block to select:
    • menu Appearance->Flip Ports Horizontally.
    • menu Customize Ports
    • Add 2 output additional output ports
    • Click the “Add” button and specify “output2”, put a check in the “Output” column, and select double for the “Type” from the pull-down menu.
    • Click the “Add” button and specify “output3”, put a check in the “Output” column, and select double for the “Type” from the pull-down menu.
    Expression_List
    3.
    • Implement a queue titled “Smart_Timed_Resource”.
    • Use the “Parameter=” block to define “Exit_Queue_Service_Time” as the value for the parameter “Time_Field”.
    • Right-click on the block to select menu Appearance->Flip Ports Horizontally.
    Queue

    Note:

    Other Variations



    S.No.
    Parameter
    Value
    1.
    • Create a block to send the transaction from the memory for analysis.
    IN
    2.
    • Build a Decision block.
    • Right-click on the block to select:
    • menu Appearance->Flip Ports Horizontally.
    • menu Customize Ports
    • Add 2 output additional output ports
    • Click the “Add” button and specify “output2”, put a check in the “Output” column, and select double for the “Type” from the pull-down menu.
    • Click the “Add” button and specify “output3”, put a check in the “Output” column, and select double for the “Type” from the pull-down menu.
    Expression_List
    3.
    • Create a block titled “Latency”.
    • Right-click on the block and select menu Appearance->Flip Ports Horizontally.
    TimeDataPlotter

    Note:


    Analysis and Results


    Latency graph shows that the end-to-end latency for the complete system. X-axis is the simulation time and Y-axis displays latency in seconds. Latency increases gradually and also shows that the standard deviation between maximum and minimum is huge and may result in unpredictable system behaviour.


    Variation 1


    MBW Analysis Variation 1
    Figure 7: Variation 1 - Latency vs. Simulation Time Plot

    Variation 2


    MBW Analysis Variation 2
    Figure 8: Variation 2 - Latency vs. Simulation Time Plot

    Variation 3


    MBW Analysis Variation 3
    Figure 9: Variation 3 - Latency vs. Simulation Time Plot

    Variation 4


    MBW Analysis Variation 4
    Figure 10: Variation 4 - Latency vs. Simulation Time Plot

    Variation 5


    MBW Analysis Variation 5
    Figure 11: Variation 5 - Latency vs. Simulation Time Plot

    Additional Analysis


    Build another model for three Traffic Generators with the following blocks:


    Use the following values for the Traffic Generators: