Processor_Tutorial
This
Tutorial will walk you through some of the experiments that can be done on an existing Processor demo model.
The following screenshot shows the layout of the demo model (ARM Cortex A77):
data:image/s3,"s3://crabby-images/ede0a/ede0aad6ec9ce15c8c22174da6da7a9446703211" alt="model_layout.PNG model_layout.PNG"
What does Behaviour Flow and Hardware Architecture mean?
Behavioral flow -> sequence flow representation of how tasks will be executed
HW architecture -> Representation of how the HW architecture of a device is implemented
Example:
data:image/s3,"s3://crabby-images/9629d/9629d3bfc074fa826984bb18d8a34aa07107addf" alt="Beh_Flow_Example.PNG Beh_Flow_Example.PNG"
Task sequence defined from behavior flow
Task 3 dependent on the output of Task 1 and Task 2
Task mapping to CPU cores done from behavior flow
Task 1, 3 -> CPU_1
Task 2 -> CPU_2
By this approach, changes in behaviour flow doesnt affect the hardware architecture and vice versa.
Statistics and Reports generated from the above demo model <Default Settings> :
data:image/s3,"s3://crabby-images/deb1c/deb1cca967d81a51d5584a157474b9795c402996" alt="stats_1.PNG stats_1.PNG"
Completed Line number and the latency and MIPS for executing the
instructions in the coresponding line are also printed out in the
commandline. When the end of file is reached, the total time it took to
complete the execution of the full trace file is printed out as well.
data:image/s3,"s3://crabby-images/1a2e9/1a2e9096535ba0763f323d0f46f268e4c2f1efba" alt="stats_2.PNG stats_2.PNG"
data:image/s3,"s3://crabby-images/6e332/6e332fd8028e154dde003c05f532e263d4827108" alt="Stats_Arch_Setup.PNG Stats_Arch_Setup.PNG"
Test case 1: Modify/use a new input Trace
Steps on how to generate executed processor (arm cortex a77)
traces for the software code:
1. Install aarch64-none-linux-gnu-gcc
2. Compile the c code by setting the required flags
c code screenshot:
data:image/s3,"s3://crabby-images/1756e/1756e223350afb7b0b1cc6d114c5e01d4deaf652" alt="second_trace.PNG second_trace.PNG"
To compile the code, the following command was used:
aarch64-none-linux-gnu-gcc -mtune=cortex-a77 -mcpu=cortex-a77 -O3 -static -o data_parallel_bm data_parallel_bm.c -lm
data:image/s3,"s3://crabby-images/0d00c/0d00c81dd92037c57fbdef24540b5872bab0bc31" alt="second_trace_compile.PNG second_trace_compile.PNG"
Binary file will be generated in the same folder:
data:image/s3,"s3://crabby-images/645f1/645f1a475a1515c3b946d0afbfa8f031dfb44fcd" alt="second_trace_binary.PNG second_trace_binary.PNG"
NOTE : The
following commands were used to compile the dhrystone code (which were
used by the demo model by default)
aarch64-none-linux-gnu-gcc -O0
-mtune=cortex-a77 -mcpu=cortex-a77 --static -c -DHZ=60 -O2 -fno-inline
dhry_1.c
aarch64-none-linux-gnu-gcc -O0
-mtune=cortex-a77 -mcpu=cortex-a77 --static -c -DHZ=60 -O2 -fno-inline
dhry_2.c
aarch64-none-linux-gnu-gcc -O0
-mtune=cortex-a77 -mcpu=cortex-a77 --static -o dhrystone dhry_1.o
dhry_2.o
3. Run the binary obtained in the above step on GEM5
Following line specifies the command used to run the binary on GEM5 and to get the required executed traces:
./build/ARM/gem5.opt
--debug-flags=Exec --debug-file=dptd_a77_trace
./configs/example/arm/starter_se.py --cpu minor --cpu-freq 3.0GHz
--mem-type DDR4_2400_8x8 ./tests/data_parallel_bm
data:image/s3,"s3://crabby-images/11219/1121969cc29edef4a6c88938f5d2e0e76b5e59b2" alt="second_trace_gem5_run.PNG second_trace_gem5_run.PNG"
4. Once the GEM5 simulation completes, in m5out folder, we can find a file named dptd_a77_trace being generated.
data:image/s3,"s3://crabby-images/355d6/355d635692ec328b42e2e4e19d242d0e2c0fd9a2" alt="second_trace_gem5_output.PNG second_trace_gem5_output.PNG"
The generated output file contains the following:
data:image/s3,"s3://crabby-images/b11c5/b11c5f09a07ec51f1c498b1da0fdb5a623a1cb07" alt="second_trace_gem5_output_file.PNG second_trace_gem5_output_file.PNG"
5. We use the python code (titled trace_parser_updated.py) to convert the GEM5 output file into VisualSim readable format
Following line specify the command used to generate the VisualSim readable file:
python trace_parser_updated.py
./m5_out/dptd_a77_trace ./arm_isa_a77_gem5.txt
data:image/s3,"s3://crabby-images/f186c/f186c805334afd79d00b82ac324657e88c9e577a" alt="second_trace_gem5_output_parse.PNG second_trace_gem5_output_parse.PNG"
If the python parser
prinst out "No missing instructions" , then that means everything looks
good. If any warning is generated, please contact Mirabilis Design
(info@mirabilisdesign.com)
The output file generated by the python parser can be found in the same folder where the python parser is placed:
data:image/s3,"s3://crabby-images/1ed81/1ed81dc60a16912bf412a09482d17d7741b8303f" alt="second_trace_parsed_output.PNG second_trace_parsed_output.PNG"
This file contains the following:
data:image/s3,"s3://crabby-images/d2a36/d2a36d466f7f4ecdfe70b41cf845e7ce2bf8cf1c" alt="second_trace_parsed_output_file.PNG second_trace_parsed_output_file.PNG"
6. Update the fileOrURL parameter of Trace_Mapper
[*] Double click on Trace_Mapper
[*] Click on Browse
[*] Select the new csv file
data:image/s3,"s3://crabby-images/2c1a9/2c1a99579d99d1bc5d6e5144f7d8aec842d6524e" alt="second_trace_vs_param.PNG second_trace_vs_param.PNG"
Click on Commit and run the demo model.
Results:
data:image/s3,"s3://crabby-images/4b9e1/4b9e1c4660c605d2f2fba994aa5e507428c7bdbe" alt="Trace_2_Results_1.PNG Trace_2_Results_1.PNG"
We can see that there is only 2239 lines in the new trace. The latency for completely executing the trace file is 1.17 msec.
data:image/s3,"s3://crabby-images/50ba1/50ba1579449e3b90a09aa940dce1aa838707f6ae" alt="Trace_2_Results_2.PNG Trace_2_Results_2.PNG"
The Cache Stats, Networks stats can be seen above.
Test case 2: Modify the Processor configuration
Processor Parameters:
data:image/s3,"s3://crabby-images/f3e2c/f3e2c11d7913f27a5c8628e9cf05c551dd2463b7" alt="Test_Case_2_Config.PNG Test_Case_2_Config.PNG"
We have reduced the max number of instructions per cycle that can be
fetched and processed. Also the Reorder buffer size is reduced. Expects
an increase in latency for this configuration.
Results:
data:image/s3,"s3://crabby-images/94623/94623bd0422eb7309ae60686b12f26040fc5afc0" alt="Test_Case_2_Results.PNG Test_Case_2_Results.PNG"
The latency for the execution of this trace file increased to 1.54 msec. Increase in latency is observed.
data:image/s3,"s3://crabby-images/7e604/7e6042850f98cdbecc8d215f93194ed0b391b519" alt="Test_Case_2_Results_2.PNG Test_Case_2_Results_2.PNG"
Test 3 : Modify the Cache Configurations
Cache parameters:
The cache width was reduced to 128 bit instead of 256 bit:
data:image/s3,"s3://crabby-images/d2172/d2172b474e00a198208aab31676793005490e670" alt="Test_Case_3_Config.PNG Test_Case_3_Config.PNG"
data:image/s3,"s3://crabby-images/24a9f/24a9f34864aa608e23223274e63d92b072c27525" alt="Test_Case_3_Results_1.PNG Test_Case_3_Results_1.PNG"
data:image/s3,"s3://crabby-images/df22b/df22be341c4cb141a61ccd411ab2094f2f0ffbe7" alt="Test_Case_3_Results_2.PNG Test_Case_3_Results_2.PNG"
Slight change in the latency can be observed.
Test 4 : Modify the AMBA AXI Bus Configuration
Bus parameters
data:image/s3,"s3://crabby-images/5e45e/5e45e5dfc6addf87b793493880550b6a0f77663b" alt="Test_Case_4_Config.PNG Test_Case_4_Config.PNG"
data:image/s3,"s3://crabby-images/939ed/939ede5baf1bc0571daa421fd8a862c90d2a8391" alt="Test_Case_4_Results_1.PNG Test_Case_4_Results_1.PNG"
data:image/s3,"s3://crabby-images/7c787/7c787a2954928592af04bc9131e71faa48f2a1ec" alt="Test_Case_4_Results_2.PNG Test_Case_4_Results_2.PNG"
Increase in latency can be observed as Clock Speed and Width are reduced.
Test 5 : Modify the DRAM configurations
DRAM Parameters
data:image/s3,"s3://crabby-images/066cd/066cde42ba8fc281dc84c6b0c9fd4dacf808ebdd" alt="Test_Case_5_Config.PNG Test_Case_5_Config.PNG"
data:image/s3,"s3://crabby-images/57124/57124241eb000d3a6ffc2a45f64e09ca0a252c5a" alt="Test_Case_5_Config_2.PNG Test_Case_5_Config_2.PNG"
The Speed , Row, Col, Bank as well as the DDR4 timings were updated.
Results:
data:image/s3,"s3://crabby-images/2f161/2f1615785eb0e063f95878226efef0f34ab94750" alt="Test_Case_5_Results_1.PNG Test_Case_5_Results_1.PNG"
data:image/s3,"s3://crabby-images/b3299/b3299536c06e90af509a94c6f4f2f4ab37185500" alt="Test_Case_5_Results_2.PNG Test_Case_5_Results_2.PNG"
Increase in latency can be observed.
Test 6 : Replace AXI Bus with NoC
data:image/s3,"s3://crabby-images/99218/992188b3872eb8ac87556efa5549c24508c4fc8d" alt="Test_Case_6_Config.PNG Test_Case_6_Config.PNG"
Results:
data:image/s3,"s3://crabby-images/576bb/576bbfc7f630375b36efdf941dc667648905e687" alt="Test_Case_6_Results_1.PNG Test_Case_6_Results_1.PNG"
data:image/s3,"s3://crabby-images/cc6dc/cc6dc04166a0b30e701e2969b73c9c9f61963712" alt="Test_Case_6_Results_2.PNG Test_Case_6_Results_2.PNG"
The NoC stats are printed out as well in addition to the Cache and processor stats. Increase in latency can be observed.