mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-08-01 09:00:35 +02:00
Add likwid collector
This commit is contained in:
30
collectors/likwid/groups/core2/BRANCH.txt
Normal file
30
collectors/likwid/groups/core2/BRANCH.txt
Normal file
@@ -0,0 +1,30 @@
|
||||
SHORT Branch prediction miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 BR_INST_RETIRED_ANY
|
||||
PMC1 BR_INST_RETIRED_MISPRED
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Branch rate PMC0/FIXC0
|
||||
Branch misprediction rate PMC1/FIXC0
|
||||
Branch misprediction ratio PMC1/PMC0
|
||||
Instructions per branch FIXC0/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Branch rate = BR_INST_RETIRED_ANY/INSTR_RETIRED_ANY
|
||||
Branch misprediction rate = BR_INST_RETIRED_MISPRED/INSTR_RETIRED_ANY
|
||||
Branch misprediction ratio = BR_INST_RETIRED_MISPRED/BR_INST_RETIRED_ANY
|
||||
Instructions per branch = INSTR_RETIRED_ANY/BR_INST_RETIRED_ANY
|
||||
-
|
||||
The rates state how often on average a branch or a mispredicted branch occurred
|
||||
per instruction retired in total. The branch misprediction ratio sets directly
|
||||
into relation what ratio of all branch instruction where mispredicted.
|
||||
Instructions per branch is 1/branch rate.
|
34
collectors/likwid/groups/core2/CACHE.txt
Normal file
34
collectors/likwid/groups/core2/CACHE.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
SHORT Data cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 L1D_REPL
|
||||
PMC1 L1D_ALL_REF
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
data cache misses PMC0
|
||||
data cache request rate PMC1/FIXC0
|
||||
data cache miss rate PMC0/FIXC0
|
||||
data cache miss ratio PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
data cache request rate = L1D_ALL_REF / INSTR_RETIRED_ANY
|
||||
data cache miss rate = L1D_REPL / INSTR_RETIRED_ANY
|
||||
data cache miss ratio = L1D_REPL / L1D_ALL_REF
|
||||
-
|
||||
This group measures the locality of your data accesses with regard to the
|
||||
L1 cache. Data cache request rate tells you how data intensive your code is
|
||||
or how many data accesses you have on average per instruction.
|
||||
The data cache miss rate gives a measure how often it was necessary to get
|
||||
cache lines from higher levels of the memory hierarchy. And finally
|
||||
data cache miss ratio tells you how many of your memory references required
|
||||
a cache line to be loaded from a higher level. While the# data cache miss rate
|
||||
might be given by your algorithm you should try to get data cache miss ratio
|
||||
as low as possible by increasing your cache reuse.
|
||||
|
19
collectors/likwid/groups/core2/CLOCK.txt
Normal file
19
collectors/likwid/groups/core2/CLOCK.txt
Normal file
@@ -0,0 +1,19 @@
|
||||
SHORT CPU clock information
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = CPU_CLK_UNHALTED_CORE / INSTR_RETIRED_ANY
|
||||
-
|
||||
Most basic performance group measuring the the clock frequency of the machine.
|
||||
|
22
collectors/likwid/groups/core2/DATA.txt
Normal file
22
collectors/likwid/groups/core2/DATA.txt
Normal file
@@ -0,0 +1,22 @@
|
||||
SHORT Load to store ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 INST_RETIRED_LOADS
|
||||
PMC1 INST_RETIRED_STORES
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Load to store ratio PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Load to store ratio = INST_RETIRED_LOADS/INST_RETIRED_STORES
|
||||
-
|
||||
This is a simple metric to determine your load to store ratio.
|
||||
|
24
collectors/likwid/groups/core2/DIVIDE.txt
Normal file
24
collectors/likwid/groups/core2/DIVIDE.txt
Normal file
@@ -0,0 +1,24 @@
|
||||
SHORT Divide unit information
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 CYCLES_DIV_BUSY
|
||||
PMC1 DIV
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Number of divide ops PMC1
|
||||
Avg. divide unit usage duration PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Number of divide ops = DIV
|
||||
Avg. divide unit usage duration = CYCLES_DIV_BUSY/DIV
|
||||
-
|
||||
This performance group measures the average latency of divide operations
|
29
collectors/likwid/groups/core2/FLOPS_DP.txt
Normal file
29
collectors/likwid/groups/core2/FLOPS_DP.txt
Normal file
@@ -0,0 +1,29 @@
|
||||
SHORT Double Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 SIMD_COMP_INST_RETIRED_PACKED_DOUBLE
|
||||
PMC1 SIMD_COMP_INST_RETIRED_SCALAR_DOUBLE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
DP [MFLOP/s] 1.0E-06*(PMC0*2.0+PMC1)/time
|
||||
Packed [MUOPS/s] 1.0E-06*PMC0/time
|
||||
Scalar [MUOPS/s] 1.0E-06*PMC1/time
|
||||
Vectorization ratio 100*PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
DP [MFLOP/s] = 1.0E-06*(SIMD_COMP_INST_RETIRED_PACKED_DOUBLE*2+SIMD_COMP_INST_RETIRED_SCALAR_DOUBLE)/time
|
||||
Packed [MUOPS/s] = 1.0E-06*SIMD_COMP_INST_RETIRED_PACKED_DOUBLE/runtime
|
||||
Scalar [MUOPS/s] = 1.0E-06*SIMD_COMP_INST_RETIRED_SCALAR_DOUBLE/runtime
|
||||
Vectorization ratio = 100*SIMD_COMP_INST_RETIRED_PACKED_DOUBLE/SIMD_COMP_INST_RETIRED_SCALAR_DOUBLE
|
||||
-
|
||||
Profiling group to measure double SSE FLOPs. Don't forget that your code might also execute X87 FLOPs.
|
||||
On the number of SIMD_COMP_INST_RETIRED_PACKED_DOUBLE you can see how well your code was vectorized.
|
||||
|
||||
|
29
collectors/likwid/groups/core2/FLOPS_SP.txt
Normal file
29
collectors/likwid/groups/core2/FLOPS_SP.txt
Normal file
@@ -0,0 +1,29 @@
|
||||
SHORT Single Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 SIMD_COMP_INST_RETIRED_PACKED_SINGLE
|
||||
PMC1 SIMD_COMP_INST_RETIRED_SCALAR_SINGLE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
SP [MFLOP/s] 1.0E-06*(PMC0*4.0+PMC1)/time
|
||||
Packed [MUOPS/s] 1.0E-06*PMC0/time
|
||||
Scalar [MUOPS/s] 1.0E-06*PMC1/time
|
||||
Vectorization ratio 100*PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
SP [MFLOP/s] = 1.0E-06*(SIMD_COMP_INST_RETIRED_PACKED_SINGLE*4+SIMD_COMP_INST_RETIRED_SCALAR_SINGLE)/time
|
||||
Packed [MUOPS/s] = 1.0E-06*SIMD_COMP_INST_RETIRED_PACKED_SINGLE/runtime
|
||||
Scalar [MUOPS/s] = 1.0E-06*SIMD_COMP_INST_RETIRED_SCALAR_SINGLE/runtime
|
||||
Vectorization ratio [%] = 100*SIMD_COMP_INST_RETIRED_PACKED_SINGLE/SIMD_COMP_INST_RETIRED_SCALAR_SINGLE
|
||||
-
|
||||
Profiling group to measure single precision SSE FLOPs. Don't forget that your code might also execute X87 FLOPs.
|
||||
On the number of SIMD_COMP_INST_RETIRED_PACKED_SINGLE you can see how well your code was vectorized.
|
||||
|
||||
|
21
collectors/likwid/groups/core2/FLOPS_X87.txt
Normal file
21
collectors/likwid/groups/core2/FLOPS_X87.txt
Normal file
@@ -0,0 +1,21 @@
|
||||
SHORT X87 MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 X87_OPS_RETIRED_ANY
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
X87 [MFLOP/s] 1.0E-06*PMC0/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
X87 [MFLOP/s] = 1.0E-06*X87_OPS_RETIRED_ANY/time
|
||||
-
|
||||
Profiling group to measure X87 FLOPs. Note that also non computational operations
|
||||
are measured by this event.
|
||||
|
35
collectors/likwid/groups/core2/L2.txt
Normal file
35
collectors/likwid/groups/core2/L2.txt
Normal file
@@ -0,0 +1,35 @@
|
||||
SHORT L2 cache bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 L1D_REPL
|
||||
PMC1 L1D_M_EVICT
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
L2D load bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time
|
||||
L2D load data volume [GBytes] 1.0E-09*PMC0*64.0
|
||||
L2D evict bandwidth [MBytes/s] 1.0E-06*PMC1*64.0/time
|
||||
L2D evict data volume [GBytes] 1.0E-09*PMC1*64.0
|
||||
L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
|
||||
L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2D load bandwidth [MBytes/s] = 1.0E-06*L1D_REPL*64.0/time
|
||||
L2D load data volume [GBytes] = 1.0E-09*L1D_REPL*64.0
|
||||
L2D evict bandwidth [MBytes/s] = 1.0E-06*L1D_M_EVICT*64.0/time
|
||||
L2D evict data volume [GBytes] = 1.0E-09*L1D_M_EVICT*64.0
|
||||
L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_REPL+L1D_M_EVICT)*64/time
|
||||
L2 data volume [GBytes] = 1.0E-09*(L1D_REPL+L1D_M_EVICT)*64.0
|
||||
-
|
||||
Profiling group to measure L2 cache bandwidth. The bandwidth is
|
||||
computed by the number of cache line allocated in the L1 and the
|
||||
number of modified cache lines evicted from the L1.
|
||||
Note that this bandwidth also includes data transfers due to a
|
||||
write allocate load on a store miss in L1.
|
||||
|
34
collectors/likwid/groups/core2/L2CACHE.txt
Normal file
34
collectors/likwid/groups/core2/L2CACHE.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
SHORT L2 cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 L2_RQSTS_THIS_CORE_ALL_MESI
|
||||
PMC1 L2_RQSTS_SELF_I_STATE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
L2 request rate PMC0/FIXC0
|
||||
L2 miss rate PMC1/FIXC0
|
||||
L2 miss ratio PMC1/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2 request rate = L2_RQSTS_THIS_CORE_ALL_MESI / INSTR_RETIRED_ANY
|
||||
L2 miss rate = L2_RQSTS_SELF_I_STATE / INSTR_RETIRED_ANY
|
||||
L2 miss ratio = L2_RQSTS_SELF_I_STATE / L2_RQSTS_THIS_CORE_ALL_MESI
|
||||
-
|
||||
This group measures the locality of your data accesses with regard to the
|
||||
L2 cache. L2 request rate tells you how data intensive your code is
|
||||
or how many data accesses you have on average per instruction.
|
||||
The L2 miss rate gives a measure how often it was necessary to get
|
||||
cache lines from memory. And finally L2 miss ratio tells you how many of your
|
||||
memory references required a cache line to be loaded from a higher level.
|
||||
While the# data cache miss rate might be given by your algorithm you should
|
||||
try to get data cache miss ratio as low as possible by increasing your cache reuse.
|
||||
|
||||
|
23
collectors/likwid/groups/core2/MEM.txt
Normal file
23
collectors/likwid/groups/core2/MEM.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT Main memory bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 BUS_TRANS_MEM_THIS_CORE_THIS_A
|
||||
PMC1 BUS_TRANS_WB_THIS_CORE_ALL_A
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Memory bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
|
||||
Memory data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Memory bandwidth [MBytes/s] = 1.0E-06*BUS_TRANS_MEM_THIS_CORE_THIS_A*64/time
|
||||
Memory data volume [GBytes] = 1.0E-09*BUS_TRANS_MEM_THIS_CORE_THIS_A*64.0
|
||||
-
|
||||
Profiling group to measure memory bandwidth drawn by this core.
|
29
collectors/likwid/groups/core2/TLB.txt
Normal file
29
collectors/likwid/groups/core2/TLB.txt
Normal file
@@ -0,0 +1,29 @@
|
||||
SHORT TLB miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 DTLB_MISSES_ANY
|
||||
PMC1 L1D_ALL_REF
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
L1 DTLB request rate PMC1/FIXC0
|
||||
DTLB miss rate PMC0/FIXC0
|
||||
L1 DTLB miss ratio PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1 DTLB request rate = L1D_ALL_REF / INSTR_RETIRED_ANY
|
||||
DTLB miss rate = DTLB_MISSES_ANY / INSTR_RETIRED_ANY
|
||||
L1 DTLB miss ratio = DTLB_MISSES_ANY / L1D_ALL_REF
|
||||
-
|
||||
L1 DTLB request rate tells you how data intensive your code is
|
||||
or how many data accesses you have on average per instruction.
|
||||
The DTLB miss rate gives a measure how often a TLB miss occurred
|
||||
per instruction. And finally L1 DTLB miss ratio tells you how many
|
||||
of your memory references required caused a TLB miss on average.
|
||||
|
26
collectors/likwid/groups/core2/UOPS.txt
Normal file
26
collectors/likwid/groups/core2/UOPS.txt
Normal file
@@ -0,0 +1,26 @@
|
||||
SHORT UOPs execution info
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 RS_UOPS_DISPATCHED_ALL
|
||||
PMC1 UOPS_RETIRED_ANY
|
||||
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Executed UOPs PMC0
|
||||
Retired UOPs PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Executed UOPs = RS_UOPS_DISPATCHED_ALL
|
||||
Retired UOPs = UOPS_RETIRED_ANY
|
||||
-
|
||||
Performance group measures the executed and retired micro ops. The difference
|
||||
between executed and retired uOPs are the speculatively executed uOPs.
|
25
collectors/likwid/groups/core2/UOPS_RETIRE.txt
Normal file
25
collectors/likwid/groups/core2/UOPS_RETIRE.txt
Normal file
@@ -0,0 +1,25 @@
|
||||
SHORT UOPs retirement
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 UOPS_RETIRED_USED_CYCLES
|
||||
PMC1 UOPS_RETIRED_STALL_CYCLES
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Used cycles ratio PMC0/FIXC1
|
||||
Unused cycles ratio PMC1/FIXC1
|
||||
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Used cycles ratio = UOPS_RETIRED_USED_CYCLES/CPU_CLK_UNHALTED_CORE
|
||||
Unused cycles ratio = UOPS_RETIRED_STALL_CYCLES/CPU_CLK_UNHALTED_CORE
|
||||
-
|
||||
This performance group returns the ratios of used and unused CPU cycles. Here
|
||||
unused cycles are cycles where no operation is performed due to some stall.
|
Reference in New Issue
Block a user