Cycle-Accurate Cache

Modeling the cycle-accurate cache with 4-way associativity and snoop flag disabled.

faux_Nehalem_14

Browsable image of the model.

  • For an executable version,
  • Mouse over the icons to view parameters. Click on hierarchy and plotters to reveal content (if provided).
  • To simulate, click on Launch button, open downloaded file and click Run on the Java Security Page.
faux_Nehalem_14model <h2>TextDisplay</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>rowsDisplayed</td><td>10</td><td>10</td></tr><tr><td>columnsDisplayed</td><td>40</td><td>40</td></tr><tr><td>suppressBlankLines</td><td>false</td><td>false</td></tr><tr><td>title</td><td>&quot;Mem_Stats&quot;</td><td>&quot;Mem_Stats&quot;</td></tr><tr><td>ViewText</td><td>true</td><td>true</td></tr><tr><td>saveText</td><td>false</td><td>false</td></tr><tr><td>fileName</td><td>Enter Filename to save text</td><td>&quot;Enter Filename to save text&quot;</td></tr><tr><td>Append_Time</td><td>true</td><td>true</td></tr></table> <h2>Processor_01</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>Sim_Time</td><td>Sim_Time</td><td>169.0</td></tr><tr><td>Data_Structure</td><td>Data_Structure</td><td>&quot;Processor_DS&quot;</td></tr><tr><td>Destination_Name</td><td>&quot;SDRAM&quot;</td><td>&quot;SDRAM&quot;</td></tr><tr><td>Data_Size</td><td>16</td><td>16</td></tr><tr><td>Priority</td><td>1</td><td>1</td></tr><tr><td>Architecture_Name</td><td>Architecture_Name</td><td>&quot;Architecture_1&quot;</td></tr><tr><td>Processor_ID</td><td>1</td><td>1</td></tr><tr><td>L2_cache_KB</td><td>256</td><td>256</td></tr><tr><td>L1_cache_KB</td><td>32</td><td>32</td></tr><tr><td>L2_cache_associativity</td><td>8  /* 0 (Full Associaitive,1(Direct),2,4,8,16,32 */</td><td>8</td></tr><tr><td>L1_cache_associativity</td><td>4  /* 0 (Full Associaitive,1(Direct),2,4,8,16,32 */</td><td>4</td></tr><tr><td>L1_cache_replacement_policy</td><td>&quot;Least_Recently_Used&quot;  /* Least_Recently_Used, Most_Recently_Used */</td><td>&quot;Least_Recently_Used&quot;</td></tr><tr><td>L2_cache_replacement_policy</td><td>&quot;Least_Recently_Used&quot;  /* Least_Recently_Used, Most_Recently_Used */</td><td>&quot;Least_Recently_Used&quot;</td></tr><tr><td>L1_cache_line_words</td><td>16</td><td>16</td></tr><tr><td>L2_cache_line_words</td><td>16</td><td>16</td></tr><tr><td>L1_cache_write_policy</td><td>&quot;Write_Back&quot;  /* Write_Back, Write_Through */</td><td>&quot;Write_Back&quot;</td></tr><tr><td>L2_cache_write_policy</td><td>&quot;Write_Back&quot;  /* Write_Back, Write_Through */</td><td>&quot;Write_Back&quot;</td></tr><tr><td>L1_cache_prefetch_lines</td><td>1  /* 0,1,2,3... */</td><td>1</td></tr><tr><td>L2_cache_prefetch_lines</td><td>1  /* 0,1,2,3... */</td><td>1</td></tr><tr><td>L1_cache_bus_width_bytes</td><td>16</td><td>16</td></tr><tr><td>L2_cache_bus_width_bytes</td><td>16</td><td>16</td></tr><tr><td>L1_cache_speed_MHz</td><td>2660.0</td><td>2660.0</td></tr><tr><td>L2_cache_speed_MHz</td><td>2660.0</td><td>2660.0</td></tr><tr><td>L1_cache_DRAM_name</td><td>&quot;SDRAM&quot;</td><td>&quot;SDRAM&quot;</td></tr><tr><td>L1_cache_first_word_flag</td><td>false</td><td>false</td></tr><tr><td>L1_cache_snooping_flag</td><td>false</td><td>false</td></tr><tr><td>L1_cache_bytes_per_word</td><td>16</td><td>16</td></tr><tr><td>L2_cache_DRAM_name</td><td>&quot;SDRAM&quot;</td><td>&quot;SDRAM&quot;</td></tr><tr><td>L2_cache_first_word_flag</td><td>false</td><td>false</td></tr><tr><td>L2_cache_snooping_flag</td><td>false</td><td>false</td></tr><tr><td>L2_cache_bytes_per_word</td><td>16</td><td>16</td></tr><tr><td>Input_File</td><td>profile</td><td>&quot;cache_log.hello_profile.txt&quot;</td></tr><tr><td>Recycle_Input</td><td>false</td><td>false</td></tr><tr><td>L1_cache_overhead_cycles</td><td>4</td><td>4</td></tr><tr><td>L2_cache_overhead_cycles</td><td>15</td><td>15</td></tr><tr><td>view_plot</td><td>view_plot</td><td>true</td></tr><tr><td>save_plot</td><td>save_plot</td><td>false</td></tr><tr><td>Output_File</td><td>Model_Name + &quot;_&quot; + Processor_ID + &quot;.plt&quot;</td><td>&quot;Faux_Nehalem_x1_1.plt&quot;</td></tr></table> <h2>L3_Cache</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Architecture_Name</td><td>&quot;Architecture_1&quot;</td><td>&quot;Architecture_1&quot;</td></tr><tr><td>Cache_Name</td><td>&quot;L3_Cache&quot;</td><td>&quot;L3_Cache&quot;</td></tr><tr><td>Cache_Size_KB</td><td>8192</td><td>8192</td></tr><tr><td>Cache_Speed_Mhz</td><td>2660.0</td><td>2660.0</td></tr><tr><td>Cache_Bytes_per_Word</td><td>16</td><td>16</td></tr><tr><td>Bus_Width_Bytes</td><td>16</td><td>16</td></tr><tr><td>Cache_Line_Words</td><td>16</td><td>16</td></tr><tr><td>Cache_N_Associativity</td><td>16  /* 0 (Full Associaitive,1(Direct),2,4,8,16,32 */</td><td>16</td></tr><tr><td>Cache_Replacement_Policy</td><td>&quot;Least_Recently_Used&quot;  /* Least_Recently_Used, Most_Recently_Used */</td><td>&quot;Least_Recently_Used&quot;</td></tr><tr><td>Cache_Write_Policy</td><td>&quot;Write_Back&quot;  /* Write_Back, Write_Through */</td><td>&quot;Write_Back&quot;</td></tr><tr><td>Cache_Prefetch_Lines</td><td>1  /* 0,1,2,3... */</td><td>1</td></tr><tr><td>Overhead_Cycles</td><td>40</td><td>40</td></tr><tr><td>Next_Higher_Memory_Name</td><td>&quot;SDRAM&quot;</td><td>&quot;SDRAM&quot;</td></tr><tr><td>DRAM_Name</td><td>&quot;SDRAM&quot;</td><td>&quot;SDRAM&quot;</td></tr><tr><td>First_Word_Flag</td><td>false</td><td>false</td></tr><tr><td>Snooping_Flag</td><td>false</td><td>false</td></tr><tr><td>Read_File</td><td>&quot;none&quot;</td><td>&quot;none&quot;</td></tr><tr><td>Sim_Time</td><td>Sim_Time</td><td>169.0</td></tr><tr><td>_explanation</td><td>ProcessorGenerator-&gt;CycleAccurateCache</td><td>ProcessorGenerator-&gt;CycleAccurateCache</td></tr><tr><td>Number_Statistics_Samples</td><td>1</td><td>1</td></tr><tr><td>DEBUG</td><td>false</td><td>false</td></tr></table> <h2>Switch</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Architecture_Name</td><td>Architecture_Name</td><td>&quot;Architecture_1&quot;</td></tr><tr><td>Switch_Name</td><td>&quot;Switch&quot;</td><td>&quot;Switch&quot;</td></tr><tr><td>Speed_Mhz</td><td>2660.0</td><td>2660.0</td></tr><tr><td>Width_Bytes</td><td>16</td><td>16</td></tr><tr><td>Blocking_Mode</td><td>false</td><td>false</td></tr><tr><td>_explanation</td><td>Hardware_Modeling-&gt;Bus_Switch_Ctrl-&gt;Switch</td><td>Hardware_Modeling-&gt;Bus_Switch_Ctrl-&gt;Switch</td></tr><tr><td>Overhead_Cycles</td><td>1</td><td>1</td></tr><tr><td>Address_Bits</td><td>32</td><td>32</td></tr><tr><td>Sim_Time</td><td>1.0</td><td>1.0</td></tr></table> <h2>HW_DRAM</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Architecture_Name</td><td>&quot;Architecture_1&quot;</td><td>&quot;Architecture_1&quot;</td></tr><tr><td>HW_DRAM_Name</td><td>&quot;SDRAM&quot;</td><td>&quot;SDRAM&quot;</td></tr><tr><td>HW_DRAM_Speed_Mhz</td><td>Controller_Speed_Mhz</td><td>667.0</td></tr><tr><td>Number_of_Banks</td><td>8</td><td>8</td></tr><tr><td>Sim_Time</td><td>Sim_Time</td><td>169.0</td></tr><tr><td>_explanation</td><td>Hardware_Modeling-&gt;Memory-&gt;HW_DRAM</td><td>Hardware_Modeling-&gt;Memory-&gt;HW_DRAM</td></tr><tr><td>Memory_Width_Bytes</td><td>Memory_Width_Bytes</td><td>32</td></tr><tr><td>Burst_Length</td><td>8 /* 2, 4, 8 */</td><td>8</td></tr><tr><td>DRAM_Type</td><td>&quot;DDR3&quot; /* SDR, DDR, DDR2, LPDDR, LPDDR2_NV, LPDDR2_S2, LPDDR2_S4, DDR3 */</td><td>&quot;DDR3&quot;</td></tr><tr><td>Mfg_Suggest_Timing</td><td>{6,10,10,24} /* tCL, tRCD, tRP, tRAS */</td><td>{6, 10, 10, 24}</td></tr><tr><td>Extra_Timing</td><td>{1,4,4,10,5,5,1,1,0} /* DQSS, tWTR, tRRD,tWR, tRL, tWL, tDQSCK */</td><td>{1, 4, 4, 10, 5, 5, 1, 1, 0}</td></tr><tr><td>Fix_DQSS</td><td>true</td><td>true</td></tr><tr><td>Refresh_Rate_per_Bank_ms</td><td>64.0 /* 64.0 ms */</td><td>64.0</td></tr><tr><td>Refresh_Cycles_per_Bank</td><td>8192 /* 256 cycles per bank */</td><td>8192</td></tr><tr><td>Enable_External_Data</td><td>false</td><td>false</td></tr><tr><td>Address_Bit_Map</td><td>{{0,9},{10,24},{25,27}}  /* col, row, bank (min, max) Bit Position */</td><td>{{0, 9}, {10, 24}, {25, 27}}</td></tr><tr><td>Standard_Name</td><td>&quot;none&quot; /*reads DDR_Memory_Standards.txt */</td><td>&quot;none&quot;</td></tr><tr><td>Standard_File</td><td>VS/VisualSim/actor/arch/Memory/DDR_Memory_Standards.txt</td><td>&quot;VS/VisualSim/actor/arch/Memory/DDR_Memory_Standards.txt&quot;</td></tr><tr><td>Power_Manager_Name</td><td>&quot;none&quot;  /* Default */</td><td>&quot;none&quot;</td></tr><tr><td>Memory_Controller</td><td>&quot;none&quot;  /* Default */</td><td>&quot;none&quot;</td></tr><tr><td>Bank_at_a_Time</td><td>true  /* false=all */</td><td>true</td></tr><tr><td>DEBUG</td><td>false</td><td>false</td></tr><tr><td>State_Plot_Enable</td><td>false</td><td>false</td></tr><tr><td>Bus_Width_Bytes</td><td>Bus_Width_Bytes</td><td>16</td></tr><tr><td>uDRAM_File</td><td>&quot;none&quot;</td><td>&quot;none&quot;</td></tr><tr><td>uDRAM_Path</td><td>&quot;none&quot;</td><td>&quot;none&quot;</td></tr></table> <h2>Memory_Controller</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Architecture_Name</td><td>&quot;Architecture_1&quot;</td><td>&quot;Architecture_1&quot;</td></tr><tr><td>Controller_Name</td><td>&quot;LPDDR&quot;</td><td>&quot;LPDDR&quot;</td></tr><tr><td>DRAM_Type</td><td>&quot;DDR3&quot; /* SDR, DDR, DDR2, LPDDR, LPDDR2_NV, LPDDR2_S2, LPDDR2_S4, DDR3 */</td><td>&quot;DDR3&quot;</td></tr><tr><td>Controller_Speed_Mhz</td><td>Controller_Speed_Mhz</td><td>667.0</td></tr><tr><td>Memory_Width_Bytes</td><td>Memory_Width_Bytes</td><td>32</td></tr><tr><td>Bus_Width_Bytes</td><td>Bus_Width_Bytes</td><td>16</td></tr><tr><td>Command_Buffer_Length</td><td>Command_Buffer_Length</td><td>32</td></tr><tr><td>Commands_in_a_Row</td><td>16</td><td>16</td></tr><tr><td>Mfg_Suggest_Timing</td><td>{6,10,10,24} /* tCL, tRCD, tRP, tRAS */</td><td>{6, 10, 10, 24}</td></tr><tr><td>Extra_Timing</td><td>{1,4,4,10,5,5,1,1,0} /* DQSS, tWTR, tRRD,tWR, tRL, tWL, tDQSCK */</td><td>{1, 4, 4, 10, 5, 5, 1, 1, 0}</td></tr><tr><td>Burst_Length</td><td>8 /* 2, 4, 8 */</td><td>8</td></tr><tr><td>Memory_Column</td><td>{0,9} </td><td>{0, 9}</td></tr><tr><td>Memory_Row</td><td>{13,27} </td><td>{13, 27}</td></tr><tr><td>Memory_Bank</td><td>{10,12}</td><td>{10, 12}</td></tr><tr><td>Memory_Bank_Length</td><td>round(pow(2,(Memory_Bank(1) - Memory_Bank(0) + 1)))</td><td>8L</td></tr><tr><td>DRAM_Return_Cycles</td><td>0</td><td>0</td></tr><tr><td>First_Word_Flag</td><td>false</td><td>false</td></tr><tr><td>Sim_Time</td><td>Sim_Time</td><td>169.0</td></tr><tr><td>Custom_Arbiter_File</td><td>&quot;none&quot;</td><td>&quot;none&quot;</td></tr><tr><td>Custom_Arbiter_Path</td><td>&quot;none&quot;</td><td>&quot;none&quot;</td></tr><tr><td>DEBUG</td><td>false</td><td>false</td></tr><tr><td>_explanation</td><td>Hardware_Modeling-&gt;Memory-&gt;Memory_Controller</td><td>Hardware_Modeling-&gt;Memory-&gt;Memory_Controller</td></tr><tr><td>HW_DRAM_Name</td><td>&quot;DDR0&quot;</td><td>&quot;DDR0&quot;</td></tr><tr><td>Power_Manager_Name</td><td>&quot;none&quot;  /* Default */</td><td>&quot;none&quot;</td></tr></table> <h2>ArchitectureSetup</h2><table border="1"><tr><td><b>Parameter</b></td><td><b>Expression</b></td><td><b>Value</b></td></tr><tr><td>Block_Documentation</td><td>Enter User Documentation Here</td><td>Enter User Documentation Here</td></tr><tr><td>Architecture_Name</td><td>&quot;Architecture_1&quot;</td><td>&quot;Architecture_1&quot;</td></tr><tr><td>Field_Name_Mapping</td><td>/* First row contains Column Names.                */\\nExternal_Field_Name          Internal_Field_Name   ; \\nA_Address                    A_Address             ; \\nA_Bytes                      A_Bytes               ; \\nA_Data                       A_Data                ; \\nA_IDX                        A_IDX                 ; \\nA_Instruction                A_Instruction         ; \\nA_Priority                   A_Priority            ; \\nA_Source                     A_Source              ; \\nA_Destination                A_Destination         ; \\nA_Task_ID                    A_Task_ID             ; \\nA_Time                       A_Time                ; \\n</td><td>/* First row contains Column Names.                */\\nExternal_Field_Name          Internal_Field_Name   ; \\nA_Address                    A_Address             ; \\nA_Bytes                      A_Bytes               ; \\nA_Data                       A_Data                ; \\nA_IDX                        A_IDX                 ; \\nA_Instruction                A_Instruction         ; \\nA_Priority                   A_Priority            ; \\nA_Source                     A_Source              ; \\nA_Destination                A_Destination         ; \\nA_Task_ID                    A_Task_ID             ; \\nA_Time                       A_Time                ; \\n</td></tr><tr><td>Routing_Table</td><td>Source_Node Destination_Node   Hop Source_Port ;\\nSDRAM        MM_i_1            L2_Cache_1    output      ;\\nSDRAM        MM_d_1            L2_Cache_1    output      ;\\nSwitch       MM_i_1            L2_Cache_1    output      ;\\nSwitch       MM_d_1            L2_Cache_1    output      ;\\nSDRAM        MM_i_2            L2_Cache_2    output4     ;\\nSDRAM        MM_d_2            L2_Cache_2    output4     ;\\nSwitch       MM_i_2            L2_Cache_2    output4     ;\\nSwitch       MM_d_2            L2_Cache_2    output4     ;\\n</td><td>Source_Node Destination_Node   Hop Source_Port ;\\nSDRAM        MM_i_1            L2_Cache_1    output      ;\\nSDRAM        MM_d_1            L2_Cache_1    output      ;\\nSwitch       MM_i_1            L2_Cache_1    output      ;\\nSwitch       MM_d_1            L2_Cache_1    output      ;\\nSDRAM        MM_i_2            L2_Cache_2    output4     ;\\nSDRAM        MM_d_2            L2_Cache_2    output4     ;\\nSwitch       MM_i_2            L2_Cache_2    output4     ;\\nSwitch       MM_d_2            L2_Cache_2    output4     ;\\n</td></tr><tr><td>Number_of_Samples</td><td>2</td><td>2</td></tr><tr><td>Statistics_to_Plot</td><td>&quot;Processor_1_PROC_Utilization_Min, Processor_1_PROC_Utilization_Mean, Processor_1_PROC_Utilization_Max&quot;</td><td>&quot;Processor_1_PROC_Utilization_Min, Processor_1_PROC_Utilization_Mean, Processor_1_PROC_Utilization_Max&quot;</td></tr><tr><td>Internal_Plot_Trace_Offset</td><td>2</td><td>2</td></tr><tr><td>Listen_to_Architecture_Options</td><td>None</td><td>None</td></tr></table>

"A model of the Nehalem processor is presented with the goal of analyzing the performance of legacy software applications in a multi-core environment. Multi-core architectures continue to expand, with 4 and 8 core systems readily available to the market, and with 16 and 24 core systems already starting to appear. A major concern is to know how software applications will scale and/or adjust to the increasing availability of multi-core processing systems. Previous research indicates that as the number of cores increases, a legacy application will actually realize a decrease in performance due to resource contention at L3 cache levels as well as main memory. Therefore, a model of the Nehalem processor was developed using the VisualSim tool in order to capture current performance and then predict behavior as the number of cores increases. This will allow software developers to be ready for new multi-core processing systems before they actually are available to the market. It will also allow for the definition of new software development paradigms applied to multi-core systems.

To address this problem, a model of the Nehalem processor [1] system was developed using a tool called VisualSim Architect from Mirabilis Design Inc [3]. The Nehalem is the codename for Intel’s micro-architecture for multi-core processing systems. The main focus of the model was to describe processing behavior at all levels of cache and main memory systems. The input to the model would be empirical instruction and data operations of a software application from which the model would describe the performance. This would allow for the identification of bottlenecks. Parameters such as cache size, replacement policies, and bus width could then be adjusted to determine if performance can be improved. It would also indicate where an application was experiencing a bottleneck. The software could then be reviewed to determine if performance improvements could be realized. Because the model could be modified to implement more processing cores, this would allow for performance analysis and improvement before a specific multi-core system had been released to the market."