Processor Tutorial 

Parent Previous Next

Processor_Tutorial

This Tutorial will walk you through some of the experiments that can be done on an existing Processor demo model.



The following screenshot shows the layout of the demo model (ARM Cortex A77):


model_layout.PNG








What does Behaviour Flow and Hardware Architecture mean?


Behavioral flow -> sequence flow representation of how tasks will be executed
HW architecture -> Representation of how the HW architecture of a device is implemented
Example:

    Beh_Flow_Example.PNG



Task sequence defined from behavior flow
    Task 3 dependent on the output of Task 1 and Task 2
    Task mapping to CPU cores done from behavior flow
        Task 1, 3 -> CPU_1
        Task 2     -> CPU_2


By this approach, changes in behaviour flow doesnt affect the hardware architecture and vice versa.


Statistics and Reports generated from the above demo model <Default Settings> :

stats_1.PNG

Completed Line number and the latency and MIPS for executing the instructions in the coresponding line are also printed out in the commandline. When the end of file is reached, the total time it took to complete the execution of the full trace file is printed out as well.


stats_2.PNG

Stats_Arch_Setup.PNG



Test case 1: Modify/use a new input Trace


Steps on how to generate executed processor (arm cortex a77) traces for the software code:                                                                                               

1. Install aarch64-none-linux-gnu-gcc
2. Compile the c code by setting the required flags
   
    c code screenshot:

                second_trace.PNG


To compile the code, the following command was used:

       aarch64-none-linux-gnu-gcc -mtune=cortex-a77 -mcpu=cortex-a77 -O3 -static -o data_parallel_bm data_parallel_bm.c -lm

                second_trace_compile.PNG

Binary file will be generated in the same folder:

                second_trace_binary.PNG



             NOTE : The following commands were used to compile the dhrystone code (which were used by the demo model by default)
                            aarch64-none-linux-gnu-gcc -O0 -mtune=cortex-a77 -mcpu=cortex-a77 --static -c -DHZ=60 -O2 -fno-inline dhry_1.c
                            aarch64-none-linux-gnu-gcc -O0 -mtune=cortex-a77 -mcpu=cortex-a77 --static -c -DHZ=60 -O2 -fno-inline dhry_2.c
                            aarch64-none-linux-gnu-gcc -O0 -mtune=cortex-a77 -mcpu=cortex-a77 --static -o dhrystone dhry_1.o dhry_2.o

3. Run the binary obtained in the above step on GEM5
    Following line specifies the command used to run the binary on GEM5 and to get the required executed traces:
       ./build/ARM/gem5.opt --debug-flags=Exec --debug-file=dptd_a77_trace ./configs/example/arm/starter_se.py --cpu minor --cpu-freq 3.0GHz --mem-type DDR4_2400_8x8 ./tests/data_parallel_bm

                second_trace_gem5_run.PNG

4. Once the GEM5 simulation completes, in m5out folder, we can find a file named dptd_a77_trace being generated.
         
                second_trace_gem5_output.PNG

    The generated output file contains the following:

                second_trace_gem5_output_file.PNG

5. We use the python code (titled trace_parser_updated.py) to convert the GEM5 output file into VisualSim readable format
    Following line specify the command used to generate the VisualSim readable file:
       python trace_parser_updated.py ./m5_out/dptd_a77_trace ./arm_isa_a77_gem5.txt
   
          second_trace_gem5_output_parse.PNG

    If the python parser prinst out "No missing instructions" , then that means everything looks good. If any warning is generated, please contact Mirabilis Design (info@mirabilisdesign.com)
   
    The output file generated by the python parser can be found in the same folder where the python parser is placed:
         
          second_trace_parsed_output.PNG

    This file contains the following:

          second_trace_parsed_output_file.PNG

6. Update the fileOrURL parameter of Trace_Mapper
    [*] Double click on Trace_Mapper
    [*] Click on Browse
    [*] Select the new csv file
   
    second_trace_vs_param.PNG

    Click on Commit and run the demo model.


Results:

Trace_2_Results_1.PNG

We can see that there is only 2239 lines in the new trace. The latency for completely executing the trace file is 1.17 msec.


Trace_2_Results_2.PNG

The Cache Stats, Networks stats can be seen above.






Test case 2: Modify the Processor configuration

Processor Parameters:
Test_Case_2_Config.PNG

We have reduced the max number of instructions per cycle that can be fetched and processed. Also the Reorder buffer size is reduced. Expects an increase in latency for this configuration.



Results:
Test_Case_2_Results.PNG

The latency for the execution of this trace file increased to 1.54 msec. Increase in latency is observed.


Test_Case_2_Results_2.PNG






Test 3 : Modify the Cache Configurations

Cache parameters:

The cache width was reduced to 128 bit instead of 256 bit:

Test_Case_3_Config.PNG


Test_Case_3_Results_1.PNG


Test_Case_3_Results_2.PNG


Slight change in the latency can be observed.


Test 4 : Modify the AMBA AXI Bus Configuration

Bus parameters

Test_Case_4_Config.PNG

Test_Case_4_Results_1.PNG

Test_Case_4_Results_2.PNG


Increase in latency can be observed as Clock Speed and Width are reduced.

Test 5 : Modify the DRAM configurations

DRAM Parameters

Test_Case_5_Config.PNG

Test_Case_5_Config_2.PNG

The Speed , Row, Col, Bank as well as the DDR4 timings were updated.


Results:

Test_Case_5_Results_1.PNG

Test_Case_5_Results_2.PNG

Increase in latency can be observed.

Test 6 : Replace AXI Bus with NoC

Test_Case_6_Config.PNG




Results:

Test_Case_6_Results_1.PNG


Test_Case_6_Results_2.PNG


The NoC stats are printed out as well in addition to the Cache and processor stats. Increase in latency can be observed.