Block Name: PCIe_Bus
Library location: Interfaces and Buses -> PCI -> PCIe_Bus
Code file location: $VS/VisualSim/actor/arch/Buses
Table of Contents
1. Block Overview
2. Description
3. Key features
4. Example Models
5. Configuration
5.1 Required Modeling Fields
5.2 Messages
5.3 Packet Sizes
5.4 Flow Control
5.5 Fragmentation
5.6 Read_Write_Ratio
5.7 Statistics
6. Block Ports
7. Video tutorial
8. Parameters
1. Block Overview
The
PCIe bus protocol is a high performance bus for interconnecting
peripheral chips to any independent processor/memory subsystems
2. Description
This
library emulates all the functionality of the PCIe standard.
There is a single block called the PCIe. There are 12 Root
Complex and 12 EndPoint ports for this block. For this blocks to
operate correctly, the model must contain two blocks as a prerequisite-
Architecture_Setup, and Digital Simulator.
PCI
Express (PCIe) provides a scalable, high-speed, serial I/O bus that
maintains backward compatibility with PCI applications and drivers. The
PCI Express layered architecture supports existing PCI applications and
drivers by maintaining compatibility with the legacy PCI model. PCI
Express having parallel bus topology and Multiple point-to-point
connections. A switch may provide peer-to-peer communication between
different endpoints and this traffic. A PCI Express link consists of
dual simplex channels, each implemented as a transmit pair and a
receive pair for simultaneous transmission in each direction. Each pair
consists of two low-voltage, differentially driven pairs of signals. A
data clock is embedded in each pair, using an 8b/10b clock-encoding
scheme to achieve very high data rates.
For detailed information on the PCIe Bus, refer to $VS/doc/Chapter_8-Bus_Standards.pdf.
The PCIe bus block in VisualSim
3. Key features
1. The PCI Express library comprehends future performance enhancements
via speed upgrades and advanced encoding techniques.
2. Generation
of Bus statistics. Throughput in Mbps, Bus utilization in percentage,
Input/Output transactions /sec, Root complex buffer occupancy and End
point buffer occupancy values
3. Block debug feature, this enables the user to visualize the active transaction, current time and channel.
4. User can apply flow control for root complex and end point
4. Example model list
1. PCIe_2Masters_DRAM_model5_Rd_W.xml
2. PCIe_2Masters_2DRAM_model8_Rd_W.xml
3. PCIe_2Masters_HW_DRAM_model10_Rd_W.xml
4.
PCIe_2Masters_2HW_DRAM_Rd_W_model18.xml
5. PCIe_12Master_12Slaves_model19.xml
6. PCIe_Proc_DRAM_Model20.xml
7. PCIe_VM_DRAM_model21.xml
8. PCIe_2PCIe_Bridge_model25.xml
9. PCIe_DMA.xml
5. Configuration
To include the PCIe block in a model, the following steps must be followed:
The incoming transaction must have the following basic fields to operate correctly-
A_Command “Read” or “Write” operation
A_Source
Source or Root Complex device name
A_Destination Destination or End Point device name
A_Bytes
Number of data Bytes for this
transaction
A_Bytes_Remaining Set Internally to monitor burst transaction flow
A_Bytes_Sent
Set Internally to monitor burst transaction flow
A_Priority
Priority for this transaction, used if
switch is currently busy
Other fields used internally by the model:
INDEX
Switch channel number field
PCI_Src
(added)
Internal designation for
source port as an integer
PCI_Des
(added)
Internal designation for
destination port as an integer
PCIe_ID
(added)
Long value designating the unique
value assigned to
to a transaction, allowing
one to trace transactions
in messages.
A_Bus_Delay (added) Switch delay
PCI_Message
(added) Indicate PCIe
internal message, such as Read_Request, Read_Data, etc.
A_CRCCheckSum (added) Checksum processing
The
PCIe Bus expects the Data Structure, Processor_DS, to be used. If you
are using any other Data Structure template, make sure the basic fields
are available.
Delays or Latency across the Bus
There are multiple delays across the bus. The following are the list of delays in the PCIe.
Transaction Fragment at Master Read Request is fragmented to the Max_Payload_Req_Size. Write is
fragmented to Max_Payload_Size. There is a one bit time between each fragmented payload.
Read_Start From Root Complex to End Point. Request access to the Switch. Delay is switch time, one cycle delay at the End Point processing
Read_Ack From End Point to Root Complex. Response to a Read_Start is the delay across the switch for the header.
Read_Request message From Root Complex to End Point. When a Read_Ack is received,Request is sent to the End Point. The delay is Switch time and one bit time at the Slave.
Write_Req From Root Complex to End Point. Request access to the Switch. Delay is switch time, one cycle delay at the End Point processing
Write_Ack From End Point to Root Complex. Response to a Write_Request is the delay across the switch for the header.
Write_Data message From Root Complex to End Point. Header + Payload delay across the Switch.
Read_Data message From End Point to Root Complex. Header + Payload delay across the Switch.
Data Sizes
Start
and Req are Header size of 16 Bytes. Read_Request is 16 Bytes.
Data is Header + Playload.
Where are the queuing or buffering?
Each
Root Complex has a Queue. If there is no flow control, then the
fragmenter will accept all incoming transaction. The Switch has a
queue. The End Point has a Queue. If there is no flow
control with the Slave device, then the fragment is sent out
immediately. There is a one bit time between requests and data to
be sent out. For the Read data path, there is a single input
queue at the End Point. There is a Queue at the Master.
How are the buffer sizes set?
The
buffer is in number of bytes. The buffer size is based on the
A_Bytes field of the transaction. The Requests are one double
word, while the Write is the fragment size. If there is flow
control and the buffer is full, no additional transactions are
accepted. If there is no flow control, the incoming fragments are
dropped and will be sent to the msg_out port.
What are the credit policy setup for flow control?
There
is a Read/Write ratio that ensures the number of Read and Write
transaction ratio is maintained. It is not expected to be exact but an
extremely good approximation.
Transactions support priority
using the A_Priority field. The higher number is a higher
priority and the queue will be reordered. Once the payload starts
transmitting, it cannot be preempted. The priority is applied for each
Master-Slave link.,
Transaction
bursts across all the lanes. When all the bytes from all the payloads
have arrived at the slave, the transaction is reassembled and data
exits the End Point port. Data can be simultaneous sent from
multiple Root Complex to a single End Point. The output from the
slave also decrements the internal credit byte array and threshold
array transactions outstanding.
The
model generates messages indicating the internal activity. Each
newly arriving transaction from any master port is assigned a unique
ID. This unique ID can be used to trace a particular
transaction. In addition, each transaction has a source (Master 1
to 12) and a destination (Slave 1 to 12) to clarify the link used.
· Master_to_Slave
o Read_Request, Red_Ack, Nack
o Write_Request, Write_Response, Nack
o Write_Data
· Slave_to_Master
o Read_Data, Read_Response, Nack
o Write_Data, Write_Ack, Nack
· Read
o Read_Start and Read_Start_Ack- Header
o Read_Request- Header + Request_Data
o Read_Data- Header + Payload Data
o Read_Request_Ack- Header
o Nack- Header
· Write
o Write_Start and Write_Start_Ack- Header
o Write_Data- Header + Payload Data
o Nack- Header
5.4 Flow Control
Root Complex --> Master Queue
EndPoint --> Slave Queue
Flow Sequence for Read request
· Root Complex->Sends first request to Master Queue. The packet is first fragmented and then sent to the Queue.
· If Input Flow Control is enabled, the next packet waits for an EVENT from the Fragmenter. When the Root Complex is available, the next tranaction is sent.
· After fragmentation, the first packet is placed in the Master Queue.
· In Master Queue
o If Master Queue is Empty, check Slave Queue. If available, send immediately. The Slave buffer counter is incremented.
o If Slave Queue is Not Empty but not full, then enqueue
o If the Slave Queue is full, then enqueue and wait until an Event is received from the Slave Queue.
· Delay in Switch. The switch is point-to-point, full duplex connections between all Root Complex and and End Points. The switch queue has the same size as the Slave buffer.
· The CRC and PCIe_timeout fields are checked towards the EndPoint.
o If CRC is correct, send Ack to Master
o If CRC has a error or even if the fragment transmission times out, drop the payload and send NACK to Master.
o Every NACK is logged for the permissible number of retries
o If the number of retries exceeds the threshold given by the parameter NumOfRetry, the fragment is dropped
· If CRC is bad, the Slave Buffer Counter is immediately decremented.
· Master Queue
o If Ack is received from End Point side, send Data or Read Request
o If Nack is received from the End Point side after a checksum verification, Resend the request.
· Slave/EndPoint Queue
o If queue is empty, send out immediately.
o If queue is not empty, then enqueue
o If queue is empty but a previous packet has not been acknowledged (assuming slave flow control is enabled), then enqueue.
Notes:
· The Slave buffer counter is used by all Masters to determine whether the Slave is available based on the Slave_Buffer_Counter < Slave_Buffer.
· When the Slave Buffer Counter < Slave Buffer, all Root Complex Master Queues are informed. When the Slave buffer is available, the first Request, irrespective of the Master gets access.
Write Data from Master to Slave and Read data from Slave to Master
· Data is sent to the Queue. This can be from Root Complex or EndPoint.
o If the Master (Write) and Slave (Read) queue is empty, check whether the opposite Queue is available. If available send immediately.
o If not empty, enqueue
o If incoming queue is empty but opposite Queue is full (busy), then enqueue.
· Delay at Switch
Two parameters used are – A is Maximum_Payload_Size and B is Maximum_Read_Request_Size. For Read, if Request <= B, then no fragmentation of the Request. If Request > B, then fragment to the size of B. For Read and Write data, the packets are fragmented to the size of A. The fragmentation will occur prior to the Queue input. There is a fragmenter at the Master which handles the fragmenting of the Read Request and the Write Data. There is a fragmenter at the Slave that handles the Read Data.
Read example
A= 64
B = 256
Incoming Read_Request=1024 Bytes
Number of Request Fragments = 4
Number of Read Data returned = 16
Write example
A= 64
B = 256
Incoming Write=1024 Bytes
Number of Write sent = 16
The purpose of this ratio is to ensure that the Read and Write are sent in certain ratio. The PCIe block here will keep the count on the number of transactions, not on the number of Bytes. The idea is to evaluate the impact of this ratio on the throughput and latency. Internally, we maintain two attributes- Read_Transactions and Write_Transaction. We keep incrementing the count of the two memories. If the ratio does not match the parameter value, then we shall balance it out by increasing the output of the other command type.
For example, if the Ratio is 50%, while the Read_Transaction=100 and Write_Trandaction is 85. PCIe will try to transmit more Writes to achieve the 50% ratio.
A completion for Read is when the Data is received at the Master. A completion for Write is when the Request is sent out of the Slave port.
The Read_Write_Ratio will be
maintained as a local memory within each PCIe block. This memory
can be modified during the simulation. The Read and Write
Statistics can also be accessed as a local memory.
5.7 Statistics
o Master Queues- Number of transactions, Buffer usage, latency
o Slave Queues- Number of transactions, Buffer usage, latency
o Switch-Buffer usage, latency, throughput, utilization
o Read Statistics- Consolidated for all the Reads from this Master. Statistics is for the fragment size A. Number entered, Latency. Read is the from RC Read Request to the Data being sent from the Master Queue to the Root Complex.
o Write Statistics- Consolidated for all the Write operations to this Slave. Statistics is for fragment size A. Number entered, Latency
Expanding the PCIe
To add more Masters and Slaves, the user needs to add the following into the Open Instance:
For adding a Master, add a multiport on the left side. Inside the PCIe add a input port to the Transaction_In and an output to the Transaction_Out. Connect the input port to the Transaction_In and the output port to the Transaction_Out. The names of the Transaction_In port must start at input13. The name of the Transaction_Out port must start at output13.
For adding a Slave, add a multiport on the right side. Then add an input port to the Read_Data and an output to the Write_Data. Connect the input port to the Read_Data and the output port to the Write_Data. The names of the Read_Data port must start at input13. The name of the Write_Data port must start at output13.
Parameter |
Explanation |
Example |
Architecture_Name |
“Architecture_1” (String) |
Name of the Architecture_Setup block |
Bus_Name |
“PCIe_1” (String) |
Unique name for this Bus. Different from all architecture blocks and global model memories. |
Number of Lanes |
16(int) |
1,2,4,16,32, and 64 are the only possible values. The user is restricted to these values only. This can be a single value or an array of the Number of Master. |
Slave Buffer |
512 /* Max Bytes @ Slave */ |
Number of Bytes irrespective of the number of transactions. |
Master Buffer |
512 /* Max Bytes @ Master */ |
Number of Bytes irrespective of the number of transactions. |
Header_Bytes |
16 |
This is the packet header size. Single integer value. |
Number_of_Ports |
{12, 12} /* Master, Endpoint Ports */ |
Master, Slave}- Used for expansion only. |
BER |
1.0E-11 |
Range is 0.0 to 1.0. During check, a random number is generated. If the number is below this BER, a Nack is returned. If above, the transaction is accepted. |
Max_Payload_Size |
512 /* Write, Read Data */ |
Maximum transaction size. It is used in fragmentation. The Transaction Payload Size can be variable, meaning the incoming transaction can contain many Max Payload Size |
Max_Payload_Req_Size |
128 /* Read Requests */ |
Maximum request size is used in fragmenting the Read Request |
Read_to_Write_Ratio |
0.5 /* 0.0 to 1.0 */ |
If 0 this function is ignored. The range is 0.0 to 1.0 |
Devices_Attached_to_Slaves |
{{"DRAM","DRAM_1"},{"DRAM_2"},{"DRAM_3"},{"Dev_4"}}} |
Array of Arrays. Each index position is the list of Slaves that can be accessed via this Slave port. If a single Slave, then a string, else a string array |
Root_Complex_Flow_Control |
{false,false,false,false,false,false,false,false,false,false, |
Array of Booleans. One per Master device. True is enabled |
Endpoint_Flow_Control |
{false,false,false,false,false,false,false,false,false,false, |
Array of Booleans. One per Slave device. True is enabled |
Bit_64_Mode |
true |
64 bit type PCIe Header size |
PCIe_MBps |
PCI_Gen_1 |
A pulldown parameter with four possible values, PCIe_Gen_1, PCIe_Gen_2, PCIe_Gen_3, PCIe_Gen_4 |
Timeout |
1E-6 |
A user configured timeout value |
NumOfRetry |
4 |
A hardcoded value for maximum number of retries with the PCIe bus |
Enable_Plots | Enabling this check box displays throughput and latency.Enabling this check box displays throughput and latency. | A boolean valued flag used globally for setting viewPlot to true or false |
Created with the Personal Edition of HelpNDoc: Single source CHM, PDF, DOC and HTML Help creation