mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-07-31 08:56:06 +02:00
Add likwid collector
This commit is contained in:
31
collectors/likwid/groups/ICL/BRANCH.txt
Normal file
31
collectors/likwid/groups/ICL/BRANCH.txt
Normal file
@@ -0,0 +1,31 @@
|
||||
SHORT Branch prediction miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 BR_INST_RETIRED_ALL_BRANCHES
|
||||
PMC1 BR_MISP_RETIRED_ALL_BRANCHES
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Branch rate PMC0/FIXC0
|
||||
Branch misprediction rate PMC1/FIXC0
|
||||
Branch misprediction ratio PMC1/PMC0
|
||||
Instructions per branch FIXC0/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Branch rate = BR_INST_RETIRED_ALL_BRANCHES/INSTR_RETIRED_ANY
|
||||
Branch misprediction rate = BR_MISP_RETIRED_ALL_BRANCHES/INSTR_RETIRED_ANY
|
||||
Branch misprediction ratio = BR_MISP_RETIRED_ALL_BRANCHES/BR_INST_RETIRED_ALL_BRANCHES
|
||||
Instructions per branch = INSTR_RETIRED_ANY/BR_INST_RETIRED_ALL_BRANCHES
|
||||
-
|
||||
The rates state how often on average a branch or a mispredicted branch occurred
|
||||
per instruction retired in total. The branch misprediction ratio sets directly
|
||||
into relation what ratio of all branch instruction where mispredicted.
|
||||
Instructions per branch is 1/branch rate.
|
||||
|
22
collectors/likwid/groups/ICL/DATA.txt
Normal file
22
collectors/likwid/groups/ICL/DATA.txt
Normal file
@@ -0,0 +1,22 @@
|
||||
SHORT Load to store ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 MEM_INST_RETIRED_ALL_LOADS
|
||||
PMC1 MEM_INST_RETIRED_ALL_STORES
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Load to store ratio PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Load to store ratio = MEM_INST_RETIRED_ALL_LOADS/MEM_INST_RETIRED_ALL_STORES
|
||||
-
|
||||
This is a metric to determine your load to store ratio.
|
||||
|
24
collectors/likwid/groups/ICL/DIVIDE.txt
Normal file
24
collectors/likwid/groups/ICL/DIVIDE.txt
Normal file
@@ -0,0 +1,24 @@
|
||||
SHORT Divide unit information
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 ARITH_DIVIDER_COUNT
|
||||
PMC1 ARITH_DIVIDER_ACTIVE
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Number of divide ops PMC0
|
||||
Avg. divide unit usage duration PMC1/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Number of divide ops = ARITH_DIVIDER_COUNT
|
||||
Avg. divide unit usage duration = ARITH_DIVIDER_ACTIVE/ARITH_DIVIDER_COUNT
|
||||
-
|
||||
This performance group measures the average latency of divide operations
|
35
collectors/likwid/groups/ICL/ENERGY.txt
Normal file
35
collectors/likwid/groups/ICL/ENERGY.txt
Normal file
@@ -0,0 +1,35 @@
|
||||
SHORT Power and Energy consumption
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
TMP0 TEMP_CORE
|
||||
PWR0 PWR_PKG_ENERGY
|
||||
PWR1 PWR_PP0_ENERGY
|
||||
PWR3 PWR_DRAM_ENERGY
|
||||
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Temperature [C] TMP0
|
||||
Energy [J] PWR0
|
||||
Power [W] PWR0/time
|
||||
Energy PP0 [J] PWR1
|
||||
Power PP0 [W] PWR1/time
|
||||
Energy DRAM [J] PWR3
|
||||
Power DRAM [W] PWR3/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Power = PWR_PKG_ENERGY / time
|
||||
Power PP0 = PWR_PP0_ENERGY / time
|
||||
Power DRAM = PWR_DRAM_ENERGY / time
|
||||
-
|
||||
Broadwell implements the new RAPL interface. This interface enables to
|
||||
monitor the consumed energy on the package (socket) and DRAM level.
|
||||
|
25
collectors/likwid/groups/ICL/FLOPS_AVX.txt
Normal file
25
collectors/likwid/groups/ICL/FLOPS_AVX.txt
Normal file
@@ -0,0 +1,25 @@
|
||||
SHORT Packed AVX MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE
|
||||
PMC1 FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE
|
||||
PMC2 FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE
|
||||
PMC3 FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Packed SP [MFLOP/s] 1.0E-06*(PMC0*8.0+PMC2*16.0)/time
|
||||
Packed DP [MFLOP/s] 1.0E-06*(PMC1*4.0+PMC3*8.0)/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Packed SP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE*8+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE*16)/runtime
|
||||
Packed DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime
|
||||
-
|
||||
Packed 32b AVX FLOPs rates.
|
34
collectors/likwid/groups/ICL/FLOPS_DP.txt
Normal file
34
collectors/likwid/groups/ICL/FLOPS_DP.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
SHORT Double Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE
|
||||
PMC1 FP_ARITH_INST_RETIRED_SCALAR_DOUBLE
|
||||
PMC2 FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE
|
||||
PMC3 FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
DP [MFLOP/s] 1.0E-06*(PMC0*2.0+PMC1+PMC2*4.0+PMC3*8.0)/time
|
||||
AVX DP [MFLOP/s] 1.0E-06*(PMC2*4.0+PMC3*8.0)/time
|
||||
AVX512 DP [MFLOP/s] 1.0E-06*(PMC3*8.0)/time
|
||||
Packed [MUOPS/s] 1.0E-06*(PMC0+PMC2+PMC3)/time
|
||||
Scalar [MUOPS/s] 1.0E-06*PMC1/time
|
||||
Vectorization ratio 100*(PMC0+PMC2+PMC3)/(PMC0+PMC1+PMC2+PMC3)
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE*2+FP_ARITH_INST_RETIRED_SCALAR_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime
|
||||
AVX DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime
|
||||
AVX512 DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime
|
||||
Packed [MUOPS/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE)/runtime
|
||||
Scalar [MUOPS/s] = 1.0E-06*FP_ARITH_INST_RETIRED_SCALAR_DOUBLE/runtime
|
||||
Vectorization ratio = 100*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE)/(FP_ARITH_INST_RETIRED_SCALAR_DOUBLE+FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE)
|
||||
-
|
||||
SSE scalar and packed double precision FLOP rates.
|
||||
|
34
collectors/likwid/groups/ICL/FLOPS_SP.txt
Normal file
34
collectors/likwid/groups/ICL/FLOPS_SP.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
SHORT Single Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE
|
||||
PMC1 FP_ARITH_INST_RETIRED_SCALAR_SINGLE
|
||||
PMC2 FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE
|
||||
PMC3 FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
SP [MFLOP/s] 1.0E-06*(PMC0*4.0+PMC1+PMC2*8.0+PMC3*16.0)/time
|
||||
AVX SP [MFLOP/s] 1.0E-06*(PMC2*8.0+PMC3*16.0)/time
|
||||
AVX512 SP [MFLOP/s] 1.0E-06*(PMC3*16.0)/time
|
||||
Packed [MUOPS/s] 1.0E-06*(PMC0+PMC2+PMC3)/time
|
||||
Scalar [MUOPS/s] 1.0E-06*PMC1/time
|
||||
Vectorization ratio 100*(PMC0+PMC2+PMC3)/(PMC0+PMC1+PMC2+PMC3)
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
SP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE*4+FP_ARITH_INST_RETIRED_SCALAR_SINGLE+FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE*8+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE*16)/runtime
|
||||
AVX SP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE*8+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE*16)/runtime
|
||||
AVX512 SP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE*16)/runtime
|
||||
Packed [MUOPS/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE)/runtime
|
||||
Scalar [MUOPS/s] = 1.0E-06*FP_ARITH_INST_RETIRED_SCALAR_SINGLE/runtime
|
||||
Vectorization ratio [%] = 100*(FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE)/(FP_ARITH_INST_RETIRED_SCALAR_SINGLE+FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE)
|
||||
-
|
||||
SSE scalar and packed single precision FLOP rates.
|
||||
|
Reference in New Issue
Block a user