Browsable image of the model.
If outstanding count is not equal to zero, then Warp is in Wait state
The cycles in PE = Maximum number of instruction for all threads * 4 * Number of Threads / Lanes in PE
All PEs execute in parallel and are independent
All Memory and Shared instructions are sent from the PE to the Hopper. Sent at the end of the delay at the PE for that Warp
Shared Compute Instructions cycle= Number of SCU instructions for all Threads in Warp/ 4
Memory Instruction delay = Sent to L1/L2/Memory sequence
The address, size and type are attached to the execution Data Structure at the hopper
When all SCI and Mem Instr are done, the Outstanding Count is done and the information is sent via an event to the Group_Warp_x
Table 1 is used to create the baseline Data Structure
Table 2 and 3 are used to create a executable Data Structure
Warp states- 0-Invalid or empty,1-Ready,2-Execute,3-Wait
Counter: Outstanding of memory (memory count) and share compute (SCI count) per warp. Format is Array
The data structures are placed in the respective Warps based on the number in Warp Table.- Performed in Warp_Group_x. The order is not important.
Each Warp runs a program which is made of multiple traces.
Warps are assigned to PE in equal quantity
When a Warp finishes all the instruction in a trace, a Round-robin selection is done from the next Warp to find the next “ready” warp for execution at the PE
If outstanding count is not equal to zero, then Warp is in Wait state
The cycles in PE = Maximum number of instruction for all threads * 4 * Number of Threads / Lanes in PE
All PEs execute in parallel and are independent
All Memory and Shared instructions are sent from The PE to the Hopper. Sent at the end of the delay at the PE for that Warp
Shared Compute Instructions cycle= Number of SCU instructions for all Threads in Warp/ 4
Memory Instruction delay = Sent to L1/L2/Memory sequence
The address, size and type are attached to the execution Data Structure at the hopper
When all SCI and Mem Instr are done, the Outstanding Count is done and the information is sent via an event to the Group_Warp_x