Add likwid collector

2026-02-05 02:41:44 +01:00 · 2021-03-25 14:47:10 +01:00
parent 4fddcb9741
commit a6ac0c5373
670 changed files with 24926 additions and 0 deletions
--- a/collectors/likwid/groups/pentiumm/BRANCH.txt
+++ b/collectors/likwid/groups/pentiumm/BRANCH.txt
@@ -0,0 +1,17 @@
+SHORT Branch prediction miss rate/ratio
+
+EVENTSET
+PMC0  BR_INST_EXEC
+PMC1  BR_MISSP_EXEC
+
+METRICS
+Runtime (RDTSC) [s] time
+Branch misprediction ratio  PMC1/PMC0
+
+LONG
+Formulas:
+Branch misprediction ratio = BR_MISSP_EXEC / BR_INST_EXEC
+-
+The rates state how often on average a branch or a mispredicted branch occurred
+per instruction retired in total. The branch misprediction ratio sets directly
+into relation what ratio of all branch instruction where mispredicted.
--- a/collectors/likwid/groups/pentiumm/CPI.txt
+++ b/collectors/likwid/groups/pentiumm/CPI.txt
@@ -0,0 +1,22 @@
+SHORT  Cycles per instruction
+
+EVENTSET
+PMC0  UOPS_RETIRED
+PMC1  CPU_CLK_UNHALTED
+
+METRICS
+Runtime (RDTSC) [s] time
+CPI   PMC1/PMC0
+IPC   PMC0/PMC1
+
+LONG
+Formulas:
+CPI  = CPU_CLK_UNHALTED/UOPS_RETIRED
+IPC  = UOPS_RETIRED/CPU_CLK_UNHALTED
+-
+This group measures how efficient the processor works with
+regard to instruction throughput. Also important as a standalone
+metric is UOPS_RETIRED as it tells you how many uops
+you need to execute for a task. An optimization might show very
+low CPI values but execute many more instruction for it.
+
--- a/collectors/likwid/groups/pentiumm/FLOPS_DP.txt
+++ b/collectors/likwid/groups/pentiumm/FLOPS_DP.txt
@@ -0,0 +1,20 @@
+SHORT Double Precision MFLOP/s
+
+EVENTSET
+PMC0 EMON_SSE_SSE2_COMP_INST_RETIRED_PACKED_DP
+PMC1 EMON_SSE_SSE2_COMP_INST_RETIRED_SCALAR_DP
+
+METRICS
+Runtime (RDTSC) [s] time
+DP [MFLOP/s]  1.0E-06*(PMC0*2.0+PMC1)/time
+Packed [MUOPS/s]   1.0E-06*(PMC0)/time
+Scalar [MUOPS/s] 1.0E-06*PMC1/time
+
+LONG
+Formulas:
+DP [MFLOP/s] =  (EMON_SSE_SSE2_COMP_INST_RETIRED_PACKED_DP*2 + EMON_SSE_SSE2_COMP_INST_RETIRED_SCALAR_DP )/ runtime
+Packed [MUOPS/s] = 1.0E-06*(EMON_SSE_SSE2_COMP_INST_RETIRED_PACKED_DP)/time
+Scalar [MUOPS/s] = 1.0E-06*EMON_SSE_SSE2_COMP_INST_RETIRED_SCALAR_DP/time
+-
+SSE scalar and packed double precision FLOP rates.
+
--- a/collectors/likwid/groups/pentiumm/FLOPS_SP.txt
+++ b/collectors/likwid/groups/pentiumm/FLOPS_SP.txt
@@ -0,0 +1,18 @@
+SHORT Single Precision MFLOP/s
+
+EVENTSET
+PMC0 EMON_SSE_SSE2_COMP_INST_RETIRED_ALL_SP
+PMC1 EMON_SSE_SSE2_COMP_INST_RETIRED_SCALAR_SP
+
+METRICS
+Runtime (RDTSC) [s] time
+SP [MFLOP/s]  1.0E-06*(PMC0)/time
+Scalar [MUOPS/s] 1.0E-06*(PMC1)/time
+
+LONG
+Formulas:
+SP [MFLOP/s] =  (EMON_SSE_SSE2_COMP_INST_RETIRED_ALL_SP)/ runtime
+Scalar [MUOPS/s] =  (EMON_SSE_SSE2_COMP_INST_RETIRED_SCALAR_SP)/ runtime
+-
+SSE scalar and packed single precision FLOP rates.
+
--- a/collectors/likwid/groups/pentiumm/L3.txt
+++ b/collectors/likwid/groups/pentiumm/L3.txt
@@ -0,0 +1,30 @@
+SHORT L3 cache bandwidth in MBytes/s
+
+EVENTSET
+PMC0  L2_LINES_IN_ALL_ALL
+PMC1  L2_LINES_OUT_ALL_ALL
+
+METRICS
+Runtime (RDTSC) [s] time
+L3 load bandwidth [MBytes/s]  1.0E-06*PMC0*64.0/time
+L3 load data volume [GBytes]  1.0E-09*PMC0*64.0
+L3 evict bandwidth [MBytes/s]  1.0E-06*PMC1*64.0/time
+L3 evict data volume [GBytes]  1.0E-09*PMC1*64.0
+L3 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
+L3 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0
+
+LONG
+Formulas:
+L3 load bandwidth [MBytes/s] = 1.0E-06*L2_LINES_IN_ALL_ALL*64.0/time
+L3 load data volume [GBytes] = 1.0E-09*L2_LINES_IN_ALL_ALL*64.0
+L3 evict bandwidth [MBytes/s] = 1.0E-06*L2_LINES_OUT_ALL_ALL*64.0/time
+L3 evict data volume [GBytes] = 1.0E-09*L2_LINES_OUT_ALL_ALL*64.0
+L3 bandwidth [MBytes/s] = 1.0E-06*(L2_LINES_IN_ALL_ALL+L2_LINES_OUT_ALL_ALL)*64/time
+L3 data volume [GBytes] = 1.0E-09*(L2_LINES_IN_ALL_ALL+L2_LINES_OUT_ALL_ALL)*64
+-
+Profiling group to measure L3 cache bandwidth. The bandwidth is computed by the
+number of cache line allocated in the L2 and the number of modified cache lines
+evicted from the L2. The group also output total data volume transferred between
+L2. Note that this bandwidth also includes data transfers due to a write
+allocate load on a store miss in L2.
+