Add likwid collector

This commit is contained in:
Thomas Roehl
2021-03-25 14:47:10 +01:00
parent 4fddcb9741
commit a6ac0c5373
670 changed files with 24926 additions and 0 deletions

View File

@@ -0,0 +1,17 @@
SHORT Branch prediction miss rate/ratio
EVENTSET
PMC0 BR_INST_EXEC
PMC1 BR_MISSP_EXEC
METRICS
Runtime (RDTSC) [s] time
Branch misprediction ratio PMC1/PMC0
LONG
Formulas:
Branch misprediction ratio = BR_MISSP_EXEC / BR_INST_EXEC
-
The rates state how often on average a branch or a mispredicted branch occurred
per instruction retired in total. The branch misprediction ratio sets directly
into relation what ratio of all branch instruction where mispredicted.

View File

@@ -0,0 +1,22 @@
SHORT Cycles per instruction
EVENTSET
PMC0 UOPS_RETIRED
PMC1 CPU_CLK_UNHALTED
METRICS
Runtime (RDTSC) [s] time
CPI PMC1/PMC0
IPC PMC0/PMC1
LONG
Formulas:
CPI = CPU_CLK_UNHALTED/UOPS_RETIRED
IPC = UOPS_RETIRED/CPU_CLK_UNHALTED
-
This group measures how efficient the processor works with
regard to instruction throughput. Also important as a standalone
metric is UOPS_RETIRED as it tells you how many uops
you need to execute for a task. An optimization might show very
low CPI values but execute many more instruction for it.

View File

@@ -0,0 +1,20 @@
SHORT Double Precision MFLOP/s
EVENTSET
PMC0 EMON_SSE_SSE2_COMP_INST_RETIRED_PACKED_DP
PMC1 EMON_SSE_SSE2_COMP_INST_RETIRED_SCALAR_DP
METRICS
Runtime (RDTSC) [s] time
DP [MFLOP/s] 1.0E-06*(PMC0*2.0+PMC1)/time
Packed [MUOPS/s] 1.0E-06*(PMC0)/time
Scalar [MUOPS/s] 1.0E-06*PMC1/time
LONG
Formulas:
DP [MFLOP/s] = (EMON_SSE_SSE2_COMP_INST_RETIRED_PACKED_DP*2 + EMON_SSE_SSE2_COMP_INST_RETIRED_SCALAR_DP )/ runtime
Packed [MUOPS/s] = 1.0E-06*(EMON_SSE_SSE2_COMP_INST_RETIRED_PACKED_DP)/time
Scalar [MUOPS/s] = 1.0E-06*EMON_SSE_SSE2_COMP_INST_RETIRED_SCALAR_DP/time
-
SSE scalar and packed double precision FLOP rates.

View File

@@ -0,0 +1,18 @@
SHORT Single Precision MFLOP/s
EVENTSET
PMC0 EMON_SSE_SSE2_COMP_INST_RETIRED_ALL_SP
PMC1 EMON_SSE_SSE2_COMP_INST_RETIRED_SCALAR_SP
METRICS
Runtime (RDTSC) [s] time
SP [MFLOP/s] 1.0E-06*(PMC0)/time
Scalar [MUOPS/s] 1.0E-06*(PMC1)/time
LONG
Formulas:
SP [MFLOP/s] = (EMON_SSE_SSE2_COMP_INST_RETIRED_ALL_SP)/ runtime
Scalar [MUOPS/s] = (EMON_SSE_SSE2_COMP_INST_RETIRED_SCALAR_SP)/ runtime
-
SSE scalar and packed single precision FLOP rates.

View File

@@ -0,0 +1,30 @@
SHORT L3 cache bandwidth in MBytes/s
EVENTSET
PMC0 L2_LINES_IN_ALL_ALL
PMC1 L2_LINES_OUT_ALL_ALL
METRICS
Runtime (RDTSC) [s] time
L3 load bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time
L3 load data volume [GBytes] 1.0E-09*PMC0*64.0
L3 evict bandwidth [MBytes/s] 1.0E-06*PMC1*64.0/time
L3 evict data volume [GBytes] 1.0E-09*PMC1*64.0
L3 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
L3 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0
LONG
Formulas:
L3 load bandwidth [MBytes/s] = 1.0E-06*L2_LINES_IN_ALL_ALL*64.0/time
L3 load data volume [GBytes] = 1.0E-09*L2_LINES_IN_ALL_ALL*64.0
L3 evict bandwidth [MBytes/s] = 1.0E-06*L2_LINES_OUT_ALL_ALL*64.0/time
L3 evict data volume [GBytes] = 1.0E-09*L2_LINES_OUT_ALL_ALL*64.0
L3 bandwidth [MBytes/s] = 1.0E-06*(L2_LINES_IN_ALL_ALL+L2_LINES_OUT_ALL_ALL)*64/time
L3 data volume [GBytes] = 1.0E-09*(L2_LINES_IN_ALL_ALL+L2_LINES_OUT_ALL_ALL)*64
-
Profiling group to measure L3 cache bandwidth. The bandwidth is computed by the
number of cache line allocated in the L2 and the number of modified cache lines
evicted from the L2. The group also output total data volume transferred between
L2. Note that this bandwidth also includes data transfers due to a write
allocate load on a store miss in L2.