mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-08-01 00:56:26 +02:00
Add likwid collector
This commit is contained in:
31
collectors/likwid/groups/knl/BRANCH.txt
Normal file
31
collectors/likwid/groups/knl/BRANCH.txt
Normal file
@@ -0,0 +1,31 @@
|
||||
SHORT Branch prediction miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 BR_INST_RETIRED_ALL_BRANCHES
|
||||
PMC1 BR_MISP_RETIRED_ALL_BRANCHES
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Branch rate PMC0/FIXC0
|
||||
Branch misprediction rate PMC1/FIXC0
|
||||
Branch misprediction ratio PMC1/PMC0
|
||||
Instructions per branch FIXC0/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Branch rate = BR_INST_RETIRED_ALL_BRANCHES/INSTR_RETIRED_ANY
|
||||
Branch misprediction rate = BR_MISP_RETIRED_ALL_BRANCHES/INSTR_RETIRED_ANY
|
||||
Branch misprediction ratio = BR_MISP_RETIRED_ALL_BRANCHES/BR_INST_RETIRED_ALL_BRANCHES
|
||||
Instructions per branch = INSTR_RETIRED_ANY/BR_INST_RETIRED_ALL_BRANCHES
|
||||
-
|
||||
The rates state how often on average a branch or a mispredicted branch occurred
|
||||
per instruction retired in total. The branch misprediction ratio sets directly
|
||||
into relation what ratio of all branch instruction where mispredicted.
|
||||
Instructions per branch is 1/branch rate.
|
||||
|
23
collectors/likwid/groups/knl/CLOCK.txt
Normal file
23
collectors/likwid/groups/knl/CLOCK.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT Power and Energy consumption
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PWR0 PWR_PKG_ENERGY
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Energy [J] PWR0
|
||||
Power [W] PWR0/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Power = PWR_PKG_ENERGY / time
|
||||
-
|
||||
The Xeon Phi (Knights Landing) implements the new RAPL interface. This interface enables to
|
||||
monitor the consumed energy on the package (socket) level.
|
||||
|
22
collectors/likwid/groups/knl/DATA.txt
Normal file
22
collectors/likwid/groups/knl/DATA.txt
Normal file
@@ -0,0 +1,22 @@
|
||||
SHORT Load to store ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 MEM_UOPS_RETIRED_ALL_LOADS
|
||||
PMC1 MEM_UOPS_RETIRED_ALL_STORES
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Load to store ratio PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Load to store ratio = MEM_UOPS_RETIRED_ALL_LOADS/MEM_UOPS_RETIRED_ALL_STORES
|
||||
-
|
||||
This is a metric to determine your load to store ratio.
|
||||
|
24
collectors/likwid/groups/knl/DIVIDE.txt
Normal file
24
collectors/likwid/groups/knl/DIVIDE.txt
Normal file
@@ -0,0 +1,24 @@
|
||||
SHORT Divide unit information
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 CYCLES_DIV_BUSY_COUNT
|
||||
PMC1 CYCLES_DIV_BUSY
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Number of divide ops PMC0
|
||||
Avg. divide unit usage duration PMC1/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Number of divide ops = CYCLES_DIV_BUSY_COUNT
|
||||
Avg. divide unit usage duration = CYCLES_DIV_BUSY/CYCLES_DIV_BUSY_COUNT
|
||||
-
|
||||
This performance group measures the average latency of divide operations
|
33
collectors/likwid/groups/knl/ENERGY.txt
Normal file
33
collectors/likwid/groups/knl/ENERGY.txt
Normal file
@@ -0,0 +1,33 @@
|
||||
SHORT Power and Energy consumption
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
TMP0 TEMP_CORE
|
||||
PWR0 PWR_PKG_ENERGY
|
||||
PWR1 PWR_PP0_ENERGY
|
||||
PWR3 PWR_DRAM_ENERGY
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Temperature [C] TMP0
|
||||
Energy [J] PWR0
|
||||
Power [W] PWR0/time
|
||||
Energy PP0 [J] PWR1
|
||||
Power PP0 [W] PWR1/time
|
||||
Energy DRAM [J] PWR3
|
||||
Power DRAM [W] PWR3/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Power = PWR_PKG_ENERGY / time
|
||||
Power PP0 = PWR_PP0_ENERGY / time
|
||||
Power DRAM = PWR_DRAM_ENERGY / time
|
||||
-
|
||||
Knights Landing implements the new RAPL interface. This interface enables to
|
||||
monitor the consumed energy on the package (socket) level.
|
||||
|
34
collectors/likwid/groups/knl/FLOPS_DP.txt
Normal file
34
collectors/likwid/groups/knl/FLOPS_DP.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
SHORT Double Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 UOPS_RETIRED_SCALAR_SIMD
|
||||
PMC1 UOPS_RETIRED_PACKED_SIMD
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
DP [MFLOP/s] (SSE assumed) 1.0E-06*((PMC1*2.0)+PMC0)/time
|
||||
DP [MFLOP/s] (AVX assumed) 1.0E-06*((PMC1*4.0)+PMC0)/time
|
||||
DP [MFLOP/s] (AVX512 assumed) 1.0E-06*((PMC1*8.0)+PMC0)/time
|
||||
Packed [MUOPS/s] 1.0E-06*(PMC1)/time
|
||||
Scalar [MUOPS/s] 1.0E-06*PMC0/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
DP [MFLOP/s] (SSE assumed) = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD*2+UOPS_RETIRED_SCALAR_SIMD)/runtime
|
||||
DP [MFLOP/s] (AVX assumed) = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD*4+UOPS_RETIRED_SCALAR_SIMD)/runtime
|
||||
DP [MFLOP/s] (AVX512 assumed) = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD*8+UOPS_RETIRED_SCALAR_SIMD)/runtime
|
||||
Packed [MUOPS/s] = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD)/runtime
|
||||
Scalar [MUOPS/s] = 1.0E-06*UOPS_RETIRED_SCALAR_SIMD/runtime
|
||||
-
|
||||
AVX/SSE scalar and packed double precision FLOP rates. The Xeon Phi (Knights Landing) provides
|
||||
no possibility to differentiate between double and single precision FLOP/s. Therefore, we only
|
||||
assume that the printed [MFLOP/s] value is for double-precision code. Moreover, there is no way
|
||||
to distinguish between SSE, AVX or AVX512 packed SIMD operations. Therefore, this group prints
|
||||
out the [MFLOP/s] for different SIMD techniques.
|
||||
WARNING: The events also count for integer arithmetics
|
34
collectors/likwid/groups/knl/FLOPS_SP.txt
Normal file
34
collectors/likwid/groups/knl/FLOPS_SP.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
SHORT Single Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 UOPS_RETIRED_SCALAR_SIMD
|
||||
PMC1 UOPS_RETIRED_PACKED_SIMD
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
SP [MFLOP/s] (SSE assumed) 1.0E-06*(PMC1*4.0+PMC0)/time
|
||||
SP [MFLOP/s] (AVX assumed) 1.0E-06*(PMC1*8.0+PMC0)/time
|
||||
SP [MFLOP/s] (AVX512 assumed) 1.0E-06*(PMC1*16.0+PMC0)/time
|
||||
Packed [MUOPS/s] 1.0E-06*(PMC1)/time
|
||||
Scalar [MUOPS/s] 1.0E-06*PMC0/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
SP [MFLOP/s] (SSE assumed) = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD*4+UOPS_RETIRED_SCALAR_SIMD)/runtime
|
||||
SP [MFLOP/s] (AVX assumed) = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD*8+UOPS_RETIRED_SCALAR_SIMD)/runtime
|
||||
SP [MFLOP/s] (AVX512 assumed) = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD*16+UOPS_RETIRED_SCALAR_SIMD)/runtime
|
||||
Packed [MUOPS/s] = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD)/runtime
|
||||
Scalar [MUOPS/s] = 1.0E-06*UOPS_RETIRED_SCALAR_SIMD/runtime
|
||||
-
|
||||
AVX/SSE scalar and packed single precision FLOP rates. The Xeon Phi (Knights Landing) provides
|
||||
no possibility to differentiate between double and single precision FLOP/s. Therefore, we only
|
||||
assume that the printed MFLOP/s value is for single-precision code. Moreover, there is no way
|
||||
to distinguish between SSE, AVX or AVX512 packed SIMD operations. Therefore, this group prints
|
||||
out the MFLOP/s for different SIMD techniques.
|
||||
WARNING: The events also count for integer arithmetics
|
25
collectors/likwid/groups/knl/FRONTEND_STALLS.txt
Normal file
25
collectors/likwid/groups/knl/FRONTEND_STALLS.txt
Normal file
@@ -0,0 +1,25 @@
|
||||
SHORT Frontend stalls
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 NO_ALLOC_CYCLES_ALL
|
||||
PMC1 NO_ALLOC_CYCLES_ALL_COUNT
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Frontend stalls PMC1
|
||||
Avg. frontend stall duration [cyc] PMC0/PMC1
|
||||
Frontend stall ratio PMC0/FIXC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Frontend stalls = NO_ALLOC_CYCLES_ALL
|
||||
Avg. frontend stall duration [cyc] = NO_ALLOC_CYCLES_ALL/NO_ALLOC_CYCLES_ALL_COUNT
|
||||
Frontend stall ratio = NO_ALLOC_CYCLES_ALL/CPU_CLK_UNHALTED_CORE
|
||||
-
|
||||
Frontend stalls
|
46
collectors/likwid/groups/knl/HBM.txt
Normal file
46
collectors/likwid/groups/knl/HBM.txt
Normal file
@@ -0,0 +1,46 @@
|
||||
SHORT Memory bandwidth in MBytes/s for High Bandwidth Memory (HBM)
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
EDBOX0C0 EDC_RPQ_INSERTS
|
||||
EDBOX1C0 EDC_RPQ_INSERTS
|
||||
EDBOX2C0 EDC_RPQ_INSERTS
|
||||
EDBOX3C0 EDC_RPQ_INSERTS
|
||||
EDBOX4C0 EDC_RPQ_INSERTS
|
||||
EDBOX5C0 EDC_RPQ_INSERTS
|
||||
EDBOX6C0 EDC_RPQ_INSERTS
|
||||
EDBOX7C0 EDC_RPQ_INSERTS
|
||||
EDBOX0C1 EDC_WPQ_INSERTS
|
||||
EDBOX1C1 EDC_WPQ_INSERTS
|
||||
EDBOX2C1 EDC_WPQ_INSERTS
|
||||
EDBOX3C1 EDC_WPQ_INSERTS
|
||||
EDBOX4C1 EDC_WPQ_INSERTS
|
||||
EDBOX5C1 EDC_WPQ_INSERTS
|
||||
EDBOX6C1 EDC_WPQ_INSERTS
|
||||
EDBOX7C1 EDC_WPQ_INSERTS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Memory read bandwidth [MBytes/s] 1.0E-06*(EDBOX0C0+EDBOX1C0+EDBOX2C0+EDBOX3C0+EDBOX4C0+EDBOX5C0+EDBOX6C0+EDBOX7C0)*64.0/time
|
||||
Memory read data volume [GBytes] 1.0E-09*(EDBOX0C0+EDBOX1C0+EDBOX2C0+EDBOX3C0+EDBOX4C0+EDBOX5C0+EDBOX6C0+EDBOX7C0)*64.0
|
||||
Memory writeback bandwidth [MBytes/s] 1.0E-06*(EDBOX0C1+EDBOX1C1+EDBOX2C1+EDBOX3C1+EDBOX4C1+EDBOX5C1+EDBOX6C1+EDBOX7C1)*64.0/time
|
||||
Memory writeback data volume [GBytes] 1.0E-09*(EDBOX0C1+EDBOX1C1+EDBOX2C1+EDBOX3C1+EDBOX4C1+EDBOX5C1+EDBOX6C1+EDBOX7C1)*64.0
|
||||
Memory bandwidth [MBytes/s] 1.0E-06*(EDBOX0C0+EDBOX1C0+EDBOX2C0+EDBOX3C0+EDBOX4C0+EDBOX5C0+EDBOX6C0+EDBOX7C0+EDBOX0C1+EDBOX1C1+EDBOX2C1+EDBOX3C1+EDBOX4C1+EDBOX5C1+EDBOX6C1+EDBOX7C1)*64.0/time
|
||||
Memory data volume [GBytes] 1.0E-09*(EDBOX0C0+EDBOX1C0+EDBOX2C0+EDBOX3C0+EDBOX4C0+EDBOX5C0+EDBOX6C0+EDBOX7C0+EDBOX0C1+EDBOX1C1+EDBOX2C1+EDBOX3C1+EDBOX4C1+EDBOX5C1+EDBOX6C1+EDBOX7C1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Memory read bandwidth [MBytes/s] = 1.0E-06*(sum(EDC_RPQ_INSERTS))*64/time
|
||||
Memory read data volume [GBytes] = 1.0E-09*(sum(EDC_RPQ_INSERTS))*64
|
||||
Memory writeback bandwidth [MBytes/s] = 1.0E-06*(sum(EDC_WPQ_INSERTS))*64/time
|
||||
Memory writeback data volume [GBytes] = 1.0E-09*(sum(EDC_WPQ_INSERTS))*64
|
||||
Memory bandwidth [MBytes/s] = 1.0E-06*(sum(EDC_RPQ_INSERTS)+sum(EDC_WPQ_INSERTS))*64/time
|
||||
Memory data volume [GBytes] = 1.0E-09*(sum(EDC_RPQ_INSERTS)+sum(EDC_WPQ_INSERTS))*64
|
||||
-
|
||||
Profiling group to measure data transfers from and to the high bandwidth memory (HBM).
|
||||
|
87
collectors/likwid/groups/knl/HBM_CACHE.txt
Normal file
87
collectors/likwid/groups/knl/HBM_CACHE.txt
Normal file
@@ -0,0 +1,87 @@
|
||||
SHORT Memory bandwidth in MBytes/s for High Bandwidth Memory (HBM)
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
EDBOX0C0 EDC_RPQ_INSERTS
|
||||
EDBOX1C0 EDC_RPQ_INSERTS
|
||||
EDBOX2C0 EDC_RPQ_INSERTS
|
||||
EDBOX3C0 EDC_RPQ_INSERTS
|
||||
EDBOX4C0 EDC_RPQ_INSERTS
|
||||
EDBOX5C0 EDC_RPQ_INSERTS
|
||||
EDBOX6C0 EDC_RPQ_INSERTS
|
||||
EDBOX7C0 EDC_RPQ_INSERTS
|
||||
EDBOX0C1 EDC_WPQ_INSERTS
|
||||
EDBOX1C1 EDC_WPQ_INSERTS
|
||||
EDBOX2C1 EDC_WPQ_INSERTS
|
||||
EDBOX3C1 EDC_WPQ_INSERTS
|
||||
EDBOX4C1 EDC_WPQ_INSERTS
|
||||
EDBOX5C1 EDC_WPQ_INSERTS
|
||||
EDBOX6C1 EDC_WPQ_INSERTS
|
||||
EDBOX7C1 EDC_WPQ_INSERTS
|
||||
EUBOX0C0 EDC_MISS_CLEAN
|
||||
EUBOX1C0 EDC_MISS_CLEAN
|
||||
EUBOX2C0 EDC_MISS_CLEAN
|
||||
EUBOX3C0 EDC_MISS_CLEAN
|
||||
EUBOX4C0 EDC_MISS_CLEAN
|
||||
EUBOX5C0 EDC_MISS_CLEAN
|
||||
EUBOX6C0 EDC_MISS_CLEAN
|
||||
EUBOX7C0 EDC_MISS_CLEAN
|
||||
EUBOX0C1 EDC_MISS_DIRTY
|
||||
EUBOX1C1 EDC_MISS_DIRTY
|
||||
EUBOX2C1 EDC_MISS_DIRTY
|
||||
EUBOX3C1 EDC_MISS_DIRTY
|
||||
EUBOX4C1 EDC_MISS_DIRTY
|
||||
EUBOX5C1 EDC_MISS_DIRTY
|
||||
EUBOX6C1 EDC_MISS_DIRTY
|
||||
EUBOX7C1 EDC_MISS_DIRTY
|
||||
MBOX0C0 MC_CAS_READS
|
||||
MBOX0C1 MC_CAS_WRITES
|
||||
MBOX1C0 MC_CAS_READS
|
||||
MBOX1C1 MC_CAS_WRITES
|
||||
MBOX2C0 MC_CAS_READS
|
||||
MBOX2C1 MC_CAS_WRITES
|
||||
MBOX4C0 MC_CAS_READS
|
||||
MBOX4C1 MC_CAS_WRITES
|
||||
MBOX5C0 MC_CAS_READS
|
||||
MBOX5C1 MC_CAS_WRITES
|
||||
MBOX6C0 MC_CAS_READS
|
||||
MBOX6C1 MC_CAS_WRITES
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
MCDRAM Memory read bandwidth [MBytes/s] 1.0E-06*((EDBOX0C0+EDBOX1C0+EDBOX2C0+EDBOX3C0+EDBOX4C0+EDBOX5C0+EDBOX6C0+EDBOX7C0)-(EUBOX0C0+EUBOX1C0+EUBOX2C0+EUBOX3C0+EUBOX4C0+EUBOX5C0+EUBOX6C0+EUBOX7C0)-(EUBOX0C1+EUBOX1C1+EUBOX2C1+EUBOX3C1+EUBOX4C1+EUBOX5C1+EUBOX6C1+EUBOX7C1))*64.0/time
|
||||
MCDRAM Memory read data volume [GBytes] 1.0E-09*((EDBOX0C0+EDBOX1C0+EDBOX2C0+EDBOX3C0+EDBOX4C0+EDBOX5C0+EDBOX6C0+EDBOX7C0)-(EUBOX0C0+EUBOX1C0+EUBOX2C0+EUBOX3C0+EUBOX4C0+EUBOX5C0+EUBOX6C0+EUBOX7C0)-(EUBOX0C1+EUBOX1C1+EUBOX2C1+EUBOX3C1+EUBOX4C1+EUBOX5C1+EUBOX6C1+EUBOX7C1))*64.0
|
||||
MCDRAM Memory writeback bandwidth [MBytes/s] 1.0E-06*((EDBOX0C1+EDBOX1C1+EDBOX2C1+EDBOX3C1+EDBOX4C1+EDBOX5C1+EDBOX6C1+EDBOX7C1)-(MBOX0C0+MBOX1C0+MBOX2C0+MBOX4C0+MBOX5C0+MBOX6C0))*64.0/time
|
||||
MCDRAM Memory writeback data volume [GBytes] 1.0E-09*((EDBOX0C1+EDBOX1C1+EDBOX2C1+EDBOX3C1+EDBOX4C1+EDBOX5C1+EDBOX6C1+EDBOX7C1)-(MBOX0C0+MBOX1C0+MBOX2C0+MBOX4C0+MBOX5C0+MBOX6C0))*64.0
|
||||
MCDRAM Memory bandwidth [MBytes/s] 1.0E-06*(((EDBOX0C0+EDBOX1C0+EDBOX2C0+EDBOX3C0+EDBOX4C0+EDBOX5C0+EDBOX6C0+EDBOX7C0)-(EUBOX0C0+EUBOX1C0+EUBOX2C0+EUBOX3C0+EUBOX4C0+EUBOX5C0+EUBOX6C0+EUBOX7C0)-(EUBOX0C1+EUBOX1C1+EUBOX2C1+EUBOX3C1+EUBOX4C1+EUBOX5C1+EUBOX6C1+EUBOX7C1))+((EDBOX0C1+EDBOX1C1+EDBOX2C1+EDBOX3C1+EDBOX4C1+EDBOX5C1+EDBOX6C1+EDBOX7C1)-(MBOX0C0+MBOX1C0+MBOX2C0+MBOX4C0+MBOX5C0+MBOX6C0)))*64.0/time
|
||||
MCDRAM Memory data volume [GBytes] 1.0E-09*(((EDBOX0C0+EDBOX1C0+EDBOX2C0+EDBOX3C0+EDBOX4C0+EDBOX5C0+EDBOX6C0+EDBOX7C0)-(EUBOX0C0+EUBOX1C0+EUBOX2C0+EUBOX3C0+EUBOX4C0+EUBOX5C0+EUBOX6C0+EUBOX7C0)-(EUBOX0C1+EUBOX1C1+EUBOX2C1+EUBOX3C1+EUBOX4C1+EUBOX5C1+EUBOX6C1+EUBOX7C1))+((EDBOX0C1+EDBOX1C1+EDBOX2C1+EDBOX3C1+EDBOX4C1+EDBOX5C1+EDBOX6C1+EDBOX7C1)-(MBOX0C0+MBOX1C0+MBOX2C0+MBOX4C0+MBOX5C0+MBOX6C0)))*64.0
|
||||
DDR Memory read bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX4C0+MBOX5C0+MBOX6C0)*64.0/time
|
||||
DDR Memory read data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX4C0+MBOX5C0+MBOX6C0)*64.0
|
||||
DDR Memory writeback bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX4C1+MBOX5C1+MBOX6C1)*64.0/time
|
||||
DDR Memory writeback data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX4C1+MBOX5C1+MBOX6C1)*64.0
|
||||
DDR Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX4C1+MBOX5C1+MBOX6C1)*64.0/time
|
||||
DDR Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX4C1+MBOX5C1+MBOX6C1)*64.0
|
||||
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
MCDRAM Memory read bandwidth [MBytes/s] = 1.0E-06*(sum(EDC_RPQ_INSERTS))*64/time
|
||||
MCDRAM Memory read data volume [GBytes] = 1.0E-09*(sum(EDC_RPQ_INSERTS))*64
|
||||
MCDRAM Memory writeback bandwidth [MBytes/s] = 1.0E-06*(sum(EDC_WPQ_INSERTS))*64/time
|
||||
MCDRAM Memory writeback data volume [GBytes] = 1.0E-09*(sum(EDC_WPQ_INSERTS))*64
|
||||
MCDRAM Memory bandwidth [MBytes/s] = 1.0E-06*(sum(EDC_RPQ_INSERTS)+sum(EDC_WPQ_INSERTS))*64/time
|
||||
MCDRAM Memory data volume [GBytes] = 1.0E-09*(sum(EDC_RPQ_INSERTS)+sum(EDC_WPQ_INSERTS))*64
|
||||
DDR Memory read bandwidth [MBytes/s] = 1.0E-06*(sum(MC_CAS_READS))*64/time
|
||||
DDR Memory read data volume [GBytes] = 1.0E-09*(sum(MC_CAS_READS))*64
|
||||
DDR Memory writeback bandwidth [MBytes/s] = 1.0E-06*(sum(MC_CAS_WRITES))*64/time
|
||||
DDR Memory writeback data volume [GBytes] = 1.0E-09*(sum(MC_CAS_WRITES))*64
|
||||
DDR Memory bandwidth [MBytes/s] = 1.0E-06*(sum(MC_CAS_READS)+sum(MC_CAS_WRITES))*64/time
|
||||
DDR Memory data volume [GBytes] = 1.0E-09*(sum(MC_CAS_READS)+sum(MC_CAS_WRITES))*64
|
||||
-
|
||||
Profiling group to measure data transfers from and to the high bandwidth memory (HBM).
|
32
collectors/likwid/groups/knl/HBM_OFFCORE.txt
Normal file
32
collectors/likwid/groups/knl/HBM_OFFCORE.txt
Normal file
@@ -0,0 +1,32 @@
|
||||
SHORT Memory bandwidth in MBytes/s for High Bandwidth Memory (HBM)
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0:MATCH0=0x4908:MATCH1=0x3F8060 OFFCORE_RESPONSE_0_OPTIONS
|
||||
PMC1:MATCH0=0x32F7:MATCH1=0x3F8060 OFFCORE_RESPONSE_1_OPTIONS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Memory read bandwidth [MBytes/s] 1.0E-06*(PMC1)*64.0/time
|
||||
Memory read data volume [GBytes] 1.0E-09*(PMC1)*64.0
|
||||
Memory writeback bandwidth [MBytes/s] 1.0E-06*(PMC0)*64.0/time
|
||||
Memory writeback data volume [GBytes] 1.0E-09*(PMC0)*64.0
|
||||
Memory bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
|
||||
Memory data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Memory read bandwidth [MBytes/s] = 1.0E-06*(sum(OFFCORE_RESPONSE_1_OPTIONS:MATCH0=0x32F7:MATCH1=0x3F8060))*64/time
|
||||
Memory read data volume [GBytes] = 1.0E-09*(sum(OFFCORE_RESPONSE_1_OPTIONS:MATCH0=0x32F7:MATCH1=0x3F8060))*64
|
||||
Memory writeback bandwidth [MBytes/s] = 1.0E-06*(sum(OFFCORE_RESPONSE_0_OPTIONS:MATCH0=0x4908:MATCH1=0x3F8060))*64/time
|
||||
Memory writeback data volume [GBytes] = 1.0E-09*(sum(OFFCORE_RESPONSE_0_OPTIONS:MATCH0=0x4908:MATCH1=0x3F8060))*64
|
||||
Memory bandwidth [MBytes/s] = 1.0E-06*(sum(OFFCORE_RESPONSE_1_OPTIONS:MATCH0=0x32F7:MATCH1=0x3F8060)+sum(OFFCORE_RESPONSE_0_OPTIONS:MATCH0=0x4908:MATCH1=0x3F8060))*64/time
|
||||
Memory data volume [GBytes] = 1.0E-09*(sum(OFFCORE_RESPONSE_1_OPTIONS:MATCH0=0x32F7:MATCH1=0x3F8060)+sum(OFFCORE_RESPONSE_0_OPTIONS:MATCH0=0x4908:MATCH1=0x3F8060))*64
|
||||
-
|
||||
Profiling group to measure data transfers from and to the high bandwidth memory (HBM).
|
||||
If possible, use the HBM or HBM_CACHE group because they provide more accurate counts.
|
25
collectors/likwid/groups/knl/ICACHE.txt
Normal file
25
collectors/likwid/groups/knl/ICACHE.txt
Normal file
@@ -0,0 +1,25 @@
|
||||
SHORT Instruction cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 ICACHE_ACCESSES
|
||||
PMC1 ICACHE_MISSES
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
L1I request rate PMC0/FIXC0
|
||||
L1I miss rate PMC1/FIXC0
|
||||
L1I miss ratio PMC1/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1I request rate = ICACHE_ACCESSES / INSTR_RETIRED_ANY
|
||||
L1I miss rate = ICACHE_MISSES / INSTR_RETIRED_ANY
|
||||
L1I miss ratio = ICACHE_MISSES / ICACHE_ACCESSES
|
||||
-
|
||||
This group measures some L1 instruction cache metrics.
|
36
collectors/likwid/groups/knl/L2.txt
Normal file
36
collectors/likwid/groups/knl/L2.txt
Normal file
@@ -0,0 +1,36 @@
|
||||
SHORT L2 cache bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 L2_REQUESTS_REFERENCE
|
||||
PMC1:MATCH0=0x0002:MATCH1=0x1 OFFCORE_RESPONSE_0_OPTIONS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
L2 non-RFO bandwidth [MBytes/s] 1.E-06*(PMC0)*64.0/time
|
||||
L2 non-RFO data volume [GByte] 1.E-09*PMC0*64.0
|
||||
L2 RFO bandwidth [MBytes/s] 1.E-06*(PMC1)*64.0/time
|
||||
L2 RFO data volume [GByte] 1.E-09*(PMC1)*64.0
|
||||
L2 bandwidth [MBytes/s] 1.E-06*(PMC0+PMC1)*64.0/time
|
||||
L2 data volume [GByte] 1.E-09*(PMC0+PMC1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2 non-RFO bandwidth [MBytes/s] = 1.E-06*L2_REQUESTS_REFERENCE*64.0/time
|
||||
L2 non-RFO data volume [GByte] = 1.E-09*L2_REQUESTS_REFERENCE*64.0
|
||||
L2 RFO bandwidth [MBytes/s] = 1.E-06*(OFFCORE_RESPONSE_0_OPTIONS:MATCH0=0x0002:MATCH1=0x1)*64.0/time
|
||||
L2 RFO data volume [GByte] = 1.E-09*(OFFCORE_RESPONSE_0_OPTIONS:MATCH0=0x0002:MATCH1=0x1)*64.0
|
||||
L2 bandwidth [MBytes/s] = 1.E-06*(L2_REQUESTS_REFERENCE+OFFCORE_RESPONSE_0_OPTIONS:MATCH0=0x0002:MATCH1=0x1)*64.0/time
|
||||
L2 data volume [GByte] = 1.E-09*(L2_REQUESTS_REFERENCE+OFFCORE_RESPONSE_0_OPTIONS:MATCH0=0x0002:MATCH1=0x1)*64.0
|
||||
--
|
||||
The L2 bandwidth and data volume does not contain RFOs (also called
|
||||
write-allocates). The RFO bandwidth and data volume is only accurate when all
|
||||
used data fits in the L2 cache. As soon as the data exceeds the L2 cache size,
|
||||
the RFO metrics are too high.
|
||||
Moreover, with increasing count of measured cores, the non-RFO metrics overcount
|
||||
but commonly stay withing 10% error.
|
34
collectors/likwid/groups/knl/L2CACHE.txt
Normal file
34
collectors/likwid/groups/knl/L2CACHE.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
SHORT L2 cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 MEM_UOPS_RETIRED_L2_HIT_LOADS
|
||||
PMC1 MEM_UOPS_RETIRED_L2_MISS_LOADS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
L2 request rate (PMC0+PMC1)/FIXC0
|
||||
L2 miss rate PMC1/FIXC0
|
||||
L2 miss ratio PMC1/(PMC0+PMC1)
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2 request rate = (MEM_UOPS_RETIRED_L2_HIT_LOADS+MEM_UOPS_RETIRED_L2_MISS_LOADS)/INSTR_RETIRED_ANY
|
||||
L2 miss rate = MEM_UOPS_RETIRED_L2_MISS_LOADS/INSTR_RETIRED_ANY
|
||||
L2 miss ratio = MEM_UOPS_RETIRED_L2_MISS_LOADS/(MEM_UOPS_RETIRED_L2_HIT_LOADS+MEM_UOPS_RETIRED_L2_MISS_LOADS)
|
||||
-
|
||||
This group measures the locality of your data accesses with regard to the
|
||||
L2 cache. L2 request rate tells you how data intensive your code is
|
||||
or how many data accesses you have on average per instruction.
|
||||
The L2 miss rate gives a measure how often it was necessary to get
|
||||
cache lines from memory. And finally L2 miss ratio tells you how many of your
|
||||
memory references required a cache line to be loaded from a higher level.
|
||||
While the data cache miss rate might be given by your algorithm you should
|
||||
try to get data cache miss ratio as low as possible by increasing your cache
|
||||
reuse.
|
||||
|
47
collectors/likwid/groups/knl/MEM.txt
Normal file
47
collectors/likwid/groups/knl/MEM.txt
Normal file
@@ -0,0 +1,47 @@
|
||||
SHORT Memory bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
MBOX0C0 MC_CAS_READS
|
||||
MBOX0C1 MC_CAS_WRITES
|
||||
MBOX1C0 MC_CAS_READS
|
||||
MBOX1C1 MC_CAS_WRITES
|
||||
MBOX2C0 MC_CAS_READS
|
||||
MBOX2C1 MC_CAS_WRITES
|
||||
MBOX4C0 MC_CAS_READS
|
||||
MBOX4C1 MC_CAS_WRITES
|
||||
MBOX5C0 MC_CAS_READS
|
||||
MBOX5C1 MC_CAS_WRITES
|
||||
MBOX6C0 MC_CAS_READS
|
||||
MBOX6C1 MC_CAS_WRITES
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Memory read bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX4C0+MBOX5C0+MBOX6C0)*64.0/time
|
||||
Memory read data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX4C0+MBOX5C0+MBOX6C0)*64.0
|
||||
Memory writeback bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX4C1+MBOX5C1+MBOX6C1)*64.0/time
|
||||
Memory writeback data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX4C1+MBOX5C1+MBOX6C1)*64.0
|
||||
Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX4C1+MBOX5C1+MBOX6C1)*64.0/time
|
||||
Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX4C1+MBOX5C1+MBOX6C1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Memory read bandwidth [MBytes/s] = 1.0E-06*(sum(MC_CAS_READS))*64/time
|
||||
Memory read data volume [GBytes] = 1.0E-09*(sum(MC_CAS_READS))*64
|
||||
Memory writeback bandwidth [MBytes/s] = 1.0E-06*(sum(MC_CAS_WRITES))*64/time
|
||||
Memory writeback data volume [GBytes] = 1.0E-09*(sum(MC_CAS_WRITES))*64
|
||||
Memory bandwidth [MBytes/s] = 1.0E-06*(sum(MC_CAS_READS)+sum(MC_CAS_WRITES))*64/time
|
||||
Memory data volume [GBytes] = 1.0E-09*(sum(MC_CAS_READS)+sum(MC_CAS_WRITES))*64
|
||||
-
|
||||
Profiling group to measure L2 to MEM load cache bandwidth. The bandwidth is computed by the
|
||||
number of cache line allocated in the L2 cache. Since there is no possibility to retrieve
|
||||
the evicted cache lines, this group measures only the load cache bandwidth. The
|
||||
writeback metrics count only modified cache lines that are written back to go to
|
||||
exclusive state
|
||||
The group also output totally load and writeback data volume transferred between memory and L2.
|
||||
|
27
collectors/likwid/groups/knl/TLB_DATA.txt
Normal file
27
collectors/likwid/groups/knl/TLB_DATA.txt
Normal file
@@ -0,0 +1,27 @@
|
||||
SHORT L2 data TLB miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 PAGE_WALKS_DTLB_COUNT
|
||||
PMC1 PAGE_WALKS_DTLB_CYCLES
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
L1 DTLB misses PMC0
|
||||
L1 DTLB miss rate PMC0/FIXC0
|
||||
L1 DTLB miss duration [Cyc] PMC1/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1 DTLB misses = PAGE_WALKS_DTLB_COUNT
|
||||
L1 DTLB miss rate = PAGE_WALKS_DTLB_COUNT / INSTR_RETIRED_ANY
|
||||
L1 DTLB miss duration [Cyc] = PAGE_WALKS_DTLB_CYCLES / PAGE_WALKS_DTLB_COUNT
|
||||
-
|
||||
The DTLB load and store miss rates gives a measure how often a TLB miss occurred
|
||||
per instruction. The duration measures the time in cycles how long a walk did take.
|
||||
|
27
collectors/likwid/groups/knl/TLB_INSTR.txt
Normal file
27
collectors/likwid/groups/knl/TLB_INSTR.txt
Normal file
@@ -0,0 +1,27 @@
|
||||
SHORT L1 Instruction TLB miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 PAGE_WALKS_ITLB_COUNT
|
||||
PMC1 PAGE_WALKS_ITLB_CYCLES
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
L1 ITLB misses PMC0
|
||||
L1 ITLB miss rate PMC0/FIXC0
|
||||
L1 ITLB miss duration [Cyc] PMC1/PMC0
|
||||
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1 ITLB misses = PAGE_WALKS_ITLB_COUNT
|
||||
L1 ITLB miss rate = PAGE_WALKS_ITLB_COUNT / INSTR_RETIRED_ANY
|
||||
L1 ITLB miss duration [Cyc] = PAGE_WALKS_ITLB_CYCLES / PAGE_WALKS_ITLB_COUNT
|
||||
-
|
||||
The ITLB miss rates gives a measure how often a TLB miss occurred
|
||||
per instruction. The duration measures the time in cycles how long a walk did take.
|
25
collectors/likwid/groups/knl/UOPS_STALLS.txt
Normal file
25
collectors/likwid/groups/knl/UOPS_STALLS.txt
Normal file
@@ -0,0 +1,25 @@
|
||||
SHORT UOP retirement stalls
|
||||
|
||||
EVENTSET
|
||||
FIXC0 INSTR_RETIRED_ANY
|
||||
FIXC1 CPU_CLK_UNHALTED_CORE
|
||||
FIXC2 CPU_CLK_UNHALTED_REF
|
||||
PMC0 UOPS_RETIRED_STALLED_CYCLES
|
||||
PMC1 UOPS_RETIRED_STALLS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] FIXC1*inverseClock
|
||||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||||
CPI FIXC1/FIXC0
|
||||
Number of stalls PMC1
|
||||
Avg. stall duration [cyc] PMC0/PMC1
|
||||
Stall ratio PMC0/FIXC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Number of stalls = UOPS_RETIRED_STALLS
|
||||
Avg. stall duration [cyc] = UOPS_RETIRED_STALLED_CYCLES/UOPS_RETIRED_STALLS
|
||||
Stall ratio = UOPS_RETIRED_STALLED_CYCLES/CPU_CLK_UNHALTED_CORE
|
||||
-
|
||||
This group measures stalls in the UOP retirement.
|
Reference in New Issue
Block a user