PCIe_Bus

Parent Previous Next

Block Name: PCIe_Bus
Library location: Interfaces and Buses -> PCI -> PCIe_Bus
Code file location: $VS/VisualSim/actor/arch/Buses

Table of Contents

1. Block Overview
2. Description
3. Key features
4. Example Models
5. Configuration
    5.1 Required Modeling Fields
    5.2 Messages
    5.3 Packet Sizes
    5.4 Flow Control
    5.5 Fragmentation
    5.6 Read_Write_Ratio
    5.7 Statistics
6. Block Ports
7. Video tutorial
8. Parameters

1. Block Overview

The PCIe bus protocol is a high performance bus for interconnecting peripheral chips to any independent processor/memory subsystems

2. Description

This library emulates all the functionality of the PCIe standard.  There is a single block called the PCIe.  There are 12 Root Complex and 12 EndPoint ports for this block. For this blocks to operate correctly, the model must contain two blocks as a prerequisite- Architecture_Setup, and Digital Simulator.

PCI Express (PCIe) provides a scalable, high-speed, serial I/O bus that maintains backward compatibility with PCI applications and drivers. The PCI Express layered architecture supports existing PCI applications and drivers by maintaining compatibility with the legacy PCI model. PCI Express having parallel bus topology and Multiple point-to-point connections. A switch may provide peer-to-peer communication between different endpoints and this traffic. A PCI Express link consists of dual simplex channels, each implemented as a transmit pair and a receive pair for simultaneous transmission in each direction. Each pair consists of two low-voltage, differentially driven pairs of signals. A data clock is embedded in each pair, using an 8b/10b clock-encoding scheme to achieve very high data rates.

For detailed information on the PCIe Bus, refer to $VS/doc/Chapter_8-Bus_Standards.pdf.


The PCIe Bus
The PCIe bus block in VisualSim


3. Key features
    1. The PCI Express library comprehends future performance enhancements via speed upgrades and advanced encoding techniques.
    2. Generation of Bus statistics. Throughput in Mbps, Bus utilization in percentage, Input/Output transactions /sec, Root complex buffer occupancy and End point buffer occupancy values
    3. Block debug feature, this enables the user to visualize the active transaction, current time and channel.
    4. User can apply flow control for root complex and end point

4. Example model list

1. PCIe_2Masters_DRAM_model5_Rd_W.xml
2. PCIe_2Masters_2DRAM_model8_Rd_W.xml
3. PCIe_2Masters_HW_DRAM_model10_Rd_W.xml
4. PCIe_2Masters_2HW_DRAM_Rd_W_model18.xml
5. PCIe_12Master_12Slaves_model19.xml
6. PCIe_Proc_DRAM_Model20.xml
7. PCIe_VM_DRAM_model21.xml
8. PCIe_2PCIe_Bridge_model25.xml
9. PCIe_DMA.xml


5. Configuration
To include the PCIe block in a model, the following steps must be followed:


5.1 Required Modeling Fields

The incoming transaction must have the following basic fields to operate correctly- 

A_Command                 “Read” or “Write” operation
A_Source                      Source or Root Complex device name
A_Destination              Destination or End Point device name
A_Bytes                        Number of data Bytes for this transaction
A_Bytes_Remaining     Set Internally to monitor burst transaction flow
A_Bytes_Sent               Set Internally to monitor burst transaction flow
A_Priority                      Priority for this transaction, used if switch is currently busy

Other fields used internally by the model:

INDEX                                   Switch channel number field
PCI_Src (added)                    Internal designation for source port as an integer
PCI_Des (added)                  Internal designation for destination port as an integer
PCIe_ID (added)                   Long value designating the unique value assigned to
                                             to a transaction, allowing one to trace transactions
                                             in messages.
A_Bus_Delay (added)          Switch delay
PCI_Message (added)          Indicate PCIe internal message, such as Read_Request, Read_Data, etc.
A_CRCCheckSum (added)   Checksum processing

The PCIe Bus expects the Data Structure, Processor_DS, to be used. If you are using any other Data Structure template, make sure the basic fields are available.

Delays or Latency across the Bus

There are multiple delays across the bus.  The following are the list of delays in the PCIe.

Transaction Fragment at Master        Read Request is fragmented to the Max_Payload_Req_Size. Write is 

                                                            fragmented to Max_Payload_Size.  There is a one bit time between each fragmented payload.

Read_Start                                          From Root Complex to End Point.  Request access to the Switch.  Delay is switch time, one cycle delay at the End Point processing

Read_Ack                                            From End Point to Root Complex.  Response to a Read_Start is the delay across the switch for the header.

Read_Request message                     From Root Complex to End Point.  When a Read_Ack is received,Request is sent to the End Point.  The delay is Switch time and one bit time at the Slave.

Write_Req                                          From Root Complex to End Point.  Request access to the Switch.  Delay is switch time, one cycle delay at the End Point processing

Write_Ack                                           From End Point to Root Complex.  Response to a Write_Request is the delay across the switch for the header.

Write_Data message                         From Root Complex to End Point.  Header + Payload delay across the Switch.

Read_Data message                          From End Point to Root Complex.  Header + Payload delay across the Switch.

Data Sizes                                          Start and Req are Header size of 16 Bytes.  Read_Request is 16 Bytes.  Data is Header + Playload.

Where are the queuing or buffering?
Each Root Complex has a Queue.  If there is no flow control, then the fragmenter will accept all incoming transaction.  The Switch has a queue.  The End Point has a Queue.  If there is no flow control with the Slave device, then the fragment is sent out immediately.  There is a one bit time between requests and data to be sent out.  For the Read data path, there is a single input queue at the End Point.  There is a Queue at the Master.

How are the buffer sizes set?
The buffer is in number of bytes.  The buffer size is based on the A_Bytes field of the transaction.  The Requests are one double word, while the Write is the fragment size.  If there is flow control and the buffer is full, no additional transactions are accepted. If there is no flow control, the incoming fragments are dropped and will be sent to the msg_out port.

What are the credit policy setup for flow control?
There is a Read/Write ratio that  ensures the number of Read and Write transaction ratio is maintained. It is not expected to be exact but an extremely good approximation.

Transactions support priority using the A_Priority field.  The higher number is a higher priority and the queue will be reordered.  Once the payload starts transmitting, it cannot be preempted. The priority is applied for each Master-Slave link.,

Transaction bursts across all the lanes. When all the bytes from all the payloads have arrived at the slave, the transaction is reassembled and data exits the End Point port.  Data can be simultaneous sent from multiple Root Complex to a single End Point.  The output from the slave also decrements the internal credit byte array and threshold array transactions outstanding.

The model generates messages indicating the internal activity.  Each newly arriving transaction from any master port is assigned a unique ID.  This unique ID can be used to trace a particular transaction.  In addition, each transaction has a source (Master 1 to 12) and a destination (Slave 1 to 12) to clarify the link used.

      

5.2 Messages

·         Master_to_Slave

o   Read_Request, Red_Ack, Nack

o   Write_Request, Write_Response, Nack

o   Write_Data

·         Slave_to_Master

o   Read_Data, Read_Response, Nack

o   Write_Data, Write_Ack, Nack

5.3 Packet Sizes

·         Read

o   Read_Start and Read_Start_Ack- Header

o   Read_Request- Header + Request_Data

o   Read_Data- Header + Payload Data

o   Read_Request_Ack- Header

o   Nack- Header

·         Write

o   Write_Start and Write_Start_Ack- Header

o   Write_Data- Header + Payload Data

o   Nack- Header

5.4 Flow Control
Root Complex --> Master Queue
EndPoint         --> Slave Queue



Flow Sequence for Read request

·         Root Complex->Sends first request to Master Queue.  The packet is first fragmented and then sent to the Queue.

·         If Input Flow Control is enabled, the next packet waits for an EVENT from the Fragmenter.  When the Root Complex is available, the next tranaction is sent.

·         After fragmentation, the first packet is placed in the Master Queue.

·         In Master Queue

o   If Master Queue is Empty, check Slave Queue. If available, send immediately.  The Slave buffer counter is incremented.

o   If Slave Queue is Not Empty but not full, then enqueue

o   If the Slave Queue is full, then enqueue and wait until an Event is received from the Slave Queue.

·         Delay in Switch. The switch is point-to-point, full duplex connections between all Root Complex and and End Points. The switch queue has the same size as the Slave buffer.

·         The CRC and PCIe_timeout fields are checked towards the EndPoint.

o   If CRC is correct, send Ack to Master

o   If CRC has a error or even if the fragment transmission times out, drop the payload and send NACK to Master.

o  Every NACK is logged for the permissible number of retries

o  If the number of retries exceeds the threshold given by the parameter NumOfRetry, the fragment is dropped


·         If CRC is bad, the Slave Buffer Counter is immediately decremented.

·         Master Queue

o   If Ack is received from End Point side, send Data or Read Request

o   If Nack is received from the End Point side after a checksum verification, Resend the request.

·         Slave/EndPoint Queue

o   If queue is empty, send out immediately.

o   If queue is not empty, then enqueue

o   If queue is empty but a previous packet has not been acknowledged (assuming slave flow control is enabled), then enqueue.

Notes:

·         The Slave buffer counter is used by all Masters to determine whether the Slave is available based on the Slave_Buffer_Counter < Slave_Buffer. 

·         When the Slave Buffer Counter < Slave Buffer, all Root Complex Master Queues are informed. When the Slave buffer is available, the first Request, irrespective of the Master gets access.

Write Data from Master to Slave and Read data from Slave to Master

·         Data is sent to the Queue. This can be from Root Complex or EndPoint.

o   If the Master (Write) and Slave (Read) queue is empty, check whether the opposite Queue is available.  If available send immediately.

o   If not empty, enqueue

o   If incoming queue is empty but opposite Queue is full (busy), then enqueue.

·         Delay at Switch

5.5 Fragmentation

Two parameters used are – A is Maximum_Payload_Size and B is Maximum_Read_Request_Size.  For Read, if Request <= B, then no fragmentation of the Request.  If Request > B, then fragment to the size of B.  For Read and Write data, the packets are fragmented to the size of A.  The fragmentation will occur prior to the Queue input.  There is a fragmenter at the Master which handles the fragmenting of the Read Request and the Write Data.  There is a fragmenter at the Slave that handles the Read Data.

Read example

A= 64

B = 256

Incoming Read_Request=1024 Bytes

Number of Request Fragments = 4

Number of Read Data returned = 16

 

Write example

A= 64

B = 256

Incoming Write=1024 Bytes

Number of Write sent = 16

5.6 Read Write Ratio

The purpose of this ratio is to ensure that the Read and Write are sent in certain ratio.  The PCIe block here will keep the count on the number of transactions, not on the number of Bytes.  The idea is to evaluate the impact of this ratio on the throughput and latency.  Internally, we maintain two attributes- Read_Transactions and Write_Transaction.  We keep incrementing the count of the two memories.  If the ratio does not match the parameter value, then we shall balance it out by increasing the output of the other command type.

For example, if the Ratio is 50%, while the Read_Transaction=100 and Write_Trandaction is 85.  PCIe will try to transmit more Writes to achieve the 50% ratio.

A completion for  Read is when the Data is received at the Master. A completion for Write is when the Request is sent out of the Slave port.

The Read_Write_Ratio will be maintained as a local memory within each PCIe block.  This memory can be modified during the simulation.  The Read and Write Statistics can also be accessed as a local memory.

5.7 Statistics

o   Master Queues- Number of transactions, Buffer usage, latency

o   Slave Queues- Number of transactions, Buffer usage, latency

o   Switch-Buffer usage, latency, throughput, utilization

o   Read Statistics- Consolidated for all the Reads from this Master.  Statistics is for the fragment size A. Number entered, Latency. Read is the from RC Read Request to the Data being sent from the Master Queue to the Root Complex.

o   Write Statistics- Consolidated for all the Write operations to this Slave.  Statistics is for fragment size A. Number entered, Latency

6 Block ports

Expanding the PCIe        

To add more Masters and Slaves, the user needs to add the following into the Open Instance:

For adding a Master, add a multiport on the left side. Inside the PCIe add a input port to the Transaction_In and an output to the Transaction_Out.  Connect the input port to the Transaction_In and the output port to the Transaction_Out. The names of the Transaction_In port must start at input13.  The name of the Transaction_Out port must start at output13.

For adding a Slave, add a multiport on the right side. Then add an input port to the Read_Data and an output to the Write_Data.  Connect the input port to the Read_Data and the output port to the Write_Data. The names of the Read_Data port must start at input13.  The name of the Write_Data port must start at output13.

7. Video Tutorial

8. Parameters

Parameter

Explanation

 Example 

Architecture_Name

“Architecture_1” (String)

Name of the Architecture_Setup block

Bus_Name

“PCIe_1” (String)

Unique name for this Bus. Different from all architecture blocks and global model memories.

Number of Lanes

16(int)

1,2,4,16,32, and 64 are the only possible values.  The user is restricted to these values only. This can be a single value or an array of the Number of Master.

Slave Buffer

512  /* Max Bytes @ Slave */

Number of Bytes irrespective of the number of transactions.

Master Buffer

512  /* Max Bytes @ Master */

Number of Bytes irrespective of the number of transactions.

Header_Bytes

16

This is the packet header size. Single integer value.

Number_of_Ports

{12, 12}  /* Master, Endpoint  Ports */

Master, Slave}- Used for expansion only.

BER

1.0E-11

Range is 0.0 to 1.0.  During check, a random number is generated.  If the number is below this BER, a Nack is returned.  If above, the transaction is accepted.

Max_Payload_Size

512 /* Write, Read Data */

Maximum transaction size. It is used in fragmentation. The Transaction Payload Size can be variable, meaning the incoming transaction can contain many Max Payload Size

Max_Payload_Req_Size

128  /* Read Requests */

Maximum request size is used in fragmenting the Read Request

Read_to_Write_Ratio

0.5  /* 0.0 to 1.0 */

If 0 this function is ignored.  The range is 0.0 to 1.0

Devices_Attached_to_Slaves

{{"DRAM","DRAM_1"},{"DRAM_2"},{"DRAM_3"},{"Dev_4"}}}

Array of Arrays.  Each index position is the list of Slaves that can be accessed via this Slave port.  If a single Slave, then a string, else a string array

Root_Complex_Flow_Control

{false,false,false,false,false,false,false,false,false,false,
false,false,false,false,false,false}

Array of Booleans.  One per Master device. True is enabled

Endpoint_Flow_Control

{false,false,false,false,false,false,false,false,false,false,
false,false,false,false,false,false}

Array of Booleans. One per Slave device. True is enabled

Bit_64_Mode

true

64 bit type PCIe Header size 

PCIe_MBps
PCI_Gen_1

A pulldown parameter with four possible values,

PCIe_Gen_1,

PCIe_Gen_2,

PCIe_Gen_3,

PCIe_Gen_4
Timeout
1E-6
A user configured
timeout value

NumOfRetry
4
A hardcoded value for maximum number of retries with the PCIe bus
Enable_Plots Enabling this check box displays throughput and latency.Enabling this check box displays throughput and latency. A boolean valued flag used globally for setting viewPlot to true or false

Created with the Personal Edition of HelpNDoc: Single source CHM, PDF, DOC and HTML Help creation