mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-07-31 08:56:06 +02:00
Add likwid collector
This commit is contained in:
30
collectors/likwid/groups/power9/BRANCH.txt
Normal file
30
collectors/likwid/groups/power9/BRANCH.txt
Normal file
@@ -0,0 +1,30 @@
|
||||
SHORT Branch prediction miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC1 PM_BR_PRED
|
||||
PMC2 PM_IOPS_CMPL
|
||||
PMC3 PM_BR_MPRED_CMPL
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
Branch rate (PMC1)/PMC4
|
||||
Branch misprediction rate PMC3/PMC4
|
||||
Branch misprediction ratio PMC3/(PMC1)
|
||||
Instructions per branch PMC4/(PMC1)
|
||||
Operations per branch PMC2/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Branch rate = PM_BR_PRED/PM_RUN_INST_CMPL
|
||||
Branch misprediction rate = PM_BR_MPRED_CMPL/PM_RUN_INST_CMPL
|
||||
Branch misprediction ratio = PM_BR_MPRED_CMPL/PM_BR_PRED
|
||||
Instructions per branch = PM_RUN_INST_CMPL/PM_BR_PRED
|
||||
-
|
||||
The rates state how often in average a branch or a mispredicted branch occured
|
||||
per instruction retired in total. The Branch misprediction ratio sets directly
|
||||
into relation what ratio of all branch instruction where mispredicted.
|
||||
Instructions per branch is 1/Branch rate.
|
||||
|
23
collectors/likwid/groups/power9/DATA.txt
Normal file
23
collectors/likwid/groups/power9/DATA.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT Load to store ratio
|
||||
|
||||
EVENTSET
|
||||
PMC3 PM_LD_CMPL
|
||||
PMC1 PM_ST_CMPL
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
Load to store ratio PMC3/PMC1
|
||||
Load rate PMC3/PMC4
|
||||
Store rate PMC1/PMC4
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Load to store ratio = PM_LD_CMPL/PM_ST_CMPL
|
||||
Load ratio = PM_LD_CMPL/PM_RUN_INST_CMPL
|
||||
Store ratio = PM_ST_CMPL/PM_RUN_INST_CMPL
|
||||
-
|
||||
This is a metric to determine your load to store ratio.
|
||||
|
25
collectors/likwid/groups/power9/FLOPS.txt
Normal file
25
collectors/likwid/groups/power9/FLOPS.txt
Normal file
@@ -0,0 +1,25 @@
|
||||
SHORT SP/DP scalar/vector MFlops/s
|
||||
|
||||
EVENTSET
|
||||
PMC3 PM_FLOP_CMPL
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
SP/DP [MFLOP/s] (scalar assumed) 1.0E-06*PMC3*2.0/time
|
||||
SP [MFLOP/s] (vector assumed) 1.0E-06*PMC3*8.0/time
|
||||
DP [MFLOP/s] (vector assumed) 1.0E-06*PMC3*4.0/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = PM_RUN_CYC/PM_RUN_INST_CMPL
|
||||
SP/DP [MFLOP/s] (scalar assumed) = 1.0E-06*PM_FLOP_CMPL*2.0/runtime
|
||||
SP [MFLOP/s] (vector assumed) = 1.0E-06*PM_FLOP_CMPL*8.0/runtime
|
||||
DP [MFLOP/s] (vector assumed) = 1.0E-06*PM_FLOP_CMPL*4.0/runtime
|
||||
--
|
||||
This group counts floating-point operations. All is derived out of a
|
||||
single event PM_FLOP_CMPL, so if you have mixed usage of SP or DP and
|
||||
scalar and vector operations, the count won't be exact. With pure codes
|
||||
the counts are pretty accurate (e.g. when using likwid-bench).
|
21
collectors/likwid/groups/power9/FLOPS_FMA.txt
Normal file
21
collectors/likwid/groups/power9/FLOPS_FMA.txt
Normal file
@@ -0,0 +1,21 @@
|
||||
SHORT Floating-point operations with scalar FMA instuctions
|
||||
|
||||
EVENTSET
|
||||
PMC3 PM_FMA_CMPL
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
Scalar FMAs PMC3
|
||||
Scalar FMA [MFLOP/s] 1E-6*(PMC3)*2.0/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Scalar FMAs = PM_FMA_CMPL
|
||||
Scalar FMA [MFLOP/s] = 1E-6*(PM_FMA_CMPL)*2.0/runtime
|
||||
--
|
||||
This groups counts scalar FMA operations.
|
||||
PM_FMA_CMPL: Two-flops instruction completed (fmadd, fnmadd, fmsub,
|
||||
fnmsub). Scalar instructions only.
|
23
collectors/likwid/groups/power9/FLOPS_VSX.txt
Normal file
23
collectors/likwid/groups/power9/FLOPS_VSX.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT Vectorized MFlops/s
|
||||
|
||||
EVENTSET
|
||||
PMC1 PM_VSU_FIN
|
||||
PMC3 PM_VECTOR_FLOP_CMPL
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
SP [MFLOP/s] (assumed) 1.0E-06*(PMC3*8.0)/time
|
||||
DP [MFLOP/s] (assumed) 1.0E-06*(PMC3*4.0)/time
|
||||
Vector MIOPS/s 1.0E-06*(PMC1)/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = PM_RUN_CYC/PM_RUN_INST_CMPL
|
||||
SP [MFLOP/s] (assumed) = 1.0E-06*(PM_VECTOR_FLOP_CMPL*4)/runtime
|
||||
DP [MFLOP/s] (assumed) = 1.0E-06*(PM_VECTOR_FLOP_CMPL*8)/runtime
|
||||
Vector MIOPS/s = 1.0E-06*(PM_VECTOR_FLOP_CMPL)/runtime
|
||||
--
|
||||
This group measures vector operations. There is no differentiation between SP and DP possible.
|
22
collectors/likwid/groups/power9/ICACHE.txt
Normal file
22
collectors/likwid/groups/power9/ICACHE.txt
Normal file
@@ -0,0 +1,22 @@
|
||||
SHORT Instruction cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 PM_INST_FROM_L1
|
||||
PMC1 PM_L1_ICACHE_MISS
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
L1I request rate PMC0/PMC4
|
||||
L1I miss rate PMC1/PMC4
|
||||
L1I miss ratio PMC1/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1I request rate = ICACHE_ACCESSES / INSTR_RETIRED_ANY
|
||||
L1I miss rate = ICACHE_MISSES / INSTR_RETIRED_ANY
|
||||
L1I miss ratio = ICACHE_MISSES / ICACHE_ACCESSES
|
||||
-
|
||||
This group measures some L1 instruction cache metrics.
|
33
collectors/likwid/groups/power9/L2CACHE.txt
Normal file
33
collectors/likwid/groups/power9/L2CACHE.txt
Normal file
@@ -0,0 +1,33 @@
|
||||
SHORT L2 cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC1 PM_L2_LD_MISS
|
||||
PMC2 PM_L2_LD_DISP
|
||||
PMC3 PM_L2_ST_DISP
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
L2 request rate (PMC2+PMC3)/PMC4
|
||||
L2 load miss rate PMC1/PMC4
|
||||
L2 load miss ratio PMC1/(PMC2+PMC3)
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2 request rate = (PM_L2_LD_DISP+PM_L2_ST_DISP)/PM_RUN_INST_CMPL
|
||||
L2 load miss rate = (PM_L2_LD_MISS)/PM_RUN_INST_CMPL
|
||||
L2 load miss ratio = (PM_L2_LD_MISS)/(PM_L2_LD_DISP+PM_L2_ST_DISP)
|
||||
-
|
||||
This group measures the locality of your data accesses with regard to the
|
||||
L2 Cache. L2 request rate tells you how data intensive your code is
|
||||
or how many data accesses you have in average per instruction.
|
||||
The L2 miss rate gives a measure how often it was necessary to get
|
||||
cachelines from memory. And finally L2 load miss ratio tells you how many of your
|
||||
memory references required a cacheline to be loaded from a higher level.
|
||||
While the data cache miss rate might be given by your algorithm you should
|
||||
try to get data cache miss ratio as low as possible by increasing your cache reuse.
|
||||
|
||||
|
23
collectors/likwid/groups/power9/L2LOAD.txt
Normal file
23
collectors/likwid/groups/power9/L2LOAD.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT L2 cache bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 PM_L2_LD
|
||||
PMC2 PM_L2_INST
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
L2 load bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC2)*128.0/time
|
||||
L2 load data volume [GBytes] 1.0E-09*(PMC0+PMC2)*128.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = PM_RUN_CYC/PM_RUN_INST_CMPL
|
||||
L2 load bandwidth [MBytes/s] = 1.0E-06*(PM_L2_LD+PM_L2_INST)*128.0/time
|
||||
L2 load data volume [GBytes] = 1.0E-09*(PM_L2_LD+PM_L2_INST)*128.0
|
||||
-
|
||||
Profiling group to measure L2 load cache bandwidth. The bandwidth is computed by the
|
||||
number of cacheline loaded from L2 cache to L1.
|
22
collectors/likwid/groups/power9/L2STORE.txt
Normal file
22
collectors/likwid/groups/power9/L2STORE.txt
Normal file
@@ -0,0 +1,22 @@
|
||||
SHORT L2 cache bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 PM_L2_ST
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
L2 store bandwidth [MBytes/s] 1.0E-06*(PMC0)*128.0/time
|
||||
L2 store data volume [GBytes] 1.0E-09*(PMC0)*128.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = PM_RUN_CYC/PM_RUN_INST_CMPL
|
||||
L2 load bandwidth [MBytes/s] = 1.0E-06*(PM_L2_ST)*128.0/time
|
||||
L2 load data volume [GBytes] = 1.0E-09*(PM_L2_ST)*128.0
|
||||
-
|
||||
Profiling group to measure L2 store cache bandwidth. The bandwidth is computed by the
|
||||
number of cacheline stored from L1 cache to L2.
|
29
collectors/likwid/groups/power9/L3.txt
Normal file
29
collectors/likwid/groups/power9/L3.txt
Normal file
@@ -0,0 +1,29 @@
|
||||
SHORT L3 cache bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 PM_L3_LD_PREF
|
||||
PMC3 PM_DATA_FROM_L3
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
L3D load bandwidth [MBytes/s] 1.0E-06*(PMC3+PMC0)*128.0/time
|
||||
L3D load data volume [GBytes] 1.0E-09*(PMC3+PMC0)*128.0
|
||||
Loads from local L3 per cycle 100.0*(PMC3+PMC0)/PMC5
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = PM_RUN_CYC/PM_RUN_INST_CMPL
|
||||
L3D load bandwidth [MBytes/s] = 1.0E-06*(PM_DATA_FROM_L3)*128.0/time
|
||||
L3D load data volume [GBytes] = 1.0E-09*(PM_DATA_FROM_L3)*128.0
|
||||
L3D evict bandwidth [MBytes/s] = 1.0E-06*(PM_L2_CASTOUT_MOD)*128.0/time
|
||||
L3D evict data volume [GBytes] = 1.0E-09*(PM_L2_CASTOUT_MOD)*128.0
|
||||
L3 bandwidth [MBytes/s] = 1.0E-06*(PM_DATA_FROM_L3+PM_L2_CASTOUT_MOD)*128.0/time
|
||||
L3 data volume [GBytes] = 1.0E-09*(PM_DATA_FROM_L3+PM_L2_CASTOUT_MOD)*128.0
|
||||
-
|
||||
Profiling group to measure L3 cache bandwidth. The bandwidth is computed by the
|
||||
number of cacheline loaded from the L3 to the L2 data cache. There is currently no
|
||||
event to get the evicted data volume.
|
47
collectors/likwid/groups/power9/MEM.txt
Normal file
47
collectors/likwid/groups/power9/MEM.txt
Normal file
@@ -0,0 +1,47 @@
|
||||
SHORT Main memory bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
MBOX0C0 PM_MBA0_READ_BYTES
|
||||
MBOX0C1 PM_MBA0_WRITE_BYTES
|
||||
MBOX1C0 PM_MBA1_READ_BYTES
|
||||
MBOX1C1 PM_MBA1_WRITE_BYTES
|
||||
MBOX2C0 PM_MBA2_READ_BYTES
|
||||
MBOX2C1 PM_MBA2_WRITE_BYTES
|
||||
MBOX3C0 PM_MBA3_READ_BYTES
|
||||
MBOX3C1 PM_MBA3_WRITE_BYTES
|
||||
MBOX4C0 PM_MBA4_READ_BYTES
|
||||
MBOX4C1 PM_MBA4_WRITE_BYTES
|
||||
MBOX5C0 PM_MBA5_READ_BYTES
|
||||
MBOX5C1 PM_MBA5_WRITE_BYTES
|
||||
MBOX6C0 PM_MBA6_READ_BYTES
|
||||
MBOX6C1 PM_MBA6_WRITE_BYTES
|
||||
MBOX7C0 PM_MBA7_READ_BYTES
|
||||
MBOX7C1 PM_MBA7_WRITE_BYTES
|
||||
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
Memory read bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX7C0)*64.0/time
|
||||
Memory read data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX7C0)*64.0
|
||||
Memory write bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1+MBOX6C1+MBOX7C1)*64.0/time
|
||||
Memory write data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1+MBOX6C1+MBOX7C1)*64.0
|
||||
Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX7C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1+MBOX6C1+MBOX7C1)*64.0/time
|
||||
Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX7C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1+MBOX6C1+MBOX7C1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Memory read bandwidth [MBytes/s] = 1.0E-06*(SUM(PM_MBAx_READ_BYTES))*64.0/runtime
|
||||
Memory read data volume [GBytes] = 1.0E-09*(SUM(PM_MBAx_READ_BYTES))*64.0
|
||||
Memory write bandwidth [MBytes/s] = 1.0E-06*(SUM(PM_MBAx_WRITE_BYTES))*64.0/runtime
|
||||
Memory write data volume [GBytes] = 1.0E-09*(SUM(PM_MBAx_WRITE_BYTES))*64.0
|
||||
Memory bandwidth [MBytes/s] = 1.0E-06*(SUM(PM_MBAx_READ_BYTES)+SUM(PM_MBAx_WRITE_BYTES))*64.0/runtime
|
||||
Memory data volume [GBytes] = 1.0E-09*(SUM(PM_MBAx_READ_BYTES)+SUM(PM_MBAx_WRITE_BYTES))*64.0
|
||||
-
|
||||
Profiling group to measure memory bandwidth drawn by all cores of a socket.
|
||||
Since this group is based on Uncore events it is only possible to measure on a
|
||||
per socket base. Some of the counters may not be available on your system.
|
||||
Also outputs total data volume transferred from main memory.
|
42
collectors/likwid/groups/power9/TLB_DATA.txt
Normal file
42
collectors/likwid/groups/power9/TLB_DATA.txt
Normal file
@@ -0,0 +1,42 @@
|
||||
SHORT L1 Data TLB miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 PM_LSU_DTLB_MISS_16G_1G
|
||||
PMC1 PM_LSU_DTLB_MISS_4K
|
||||
PMC2 PM_LSU_DTLB_MISS_64K
|
||||
PMC3 PM_LSU_DTLB_MISS_16M_2M
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
L1 DTLB 4K misses PMC1
|
||||
L1 DTLB 4K miss rate PMC1/PMC4
|
||||
L1 DTLB 4K miss ratio [%] (PMC1/(PMC0+PMC1+PMC2+PMC3))*100.0
|
||||
L1 DTLB 64K misses PMC2
|
||||
L1 DTLB 64K miss rate PMC2/PMC4
|
||||
L1 DTLB 64K miss ratio [%] (PMC2/(PMC0+PMC1+PMC2+PMC3))*100.0
|
||||
L1 DTLB 16M/2M misses PMC3
|
||||
L1 DTLB 16M/2M miss rate PMC3/PMC4
|
||||
L1 DTLB 16M/2M miss ratio [%] (PMC3/(PMC0+PMC1+PMC2+PMC3))*100.0
|
||||
L1 DTLB 16G/1G misses PMC0
|
||||
L1 DTLB 16G/1G miss rate PMC0/PMC4
|
||||
L1 DTLB 16G/1G miss ratio [%] (PMC0/(PMC0+PMC1+PMC2+PMC3))*100.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1 DTLB 4K misses = PM_LSU_DTLB_MISS_4K
|
||||
L1 DTLB 4K miss rate = PM_LSU_DTLB_MISS_4K/PM_RUN_INST_CMPL
|
||||
L1 DTLB 4K miss ratio [%] = (PM_LSU_DTLB_MISS_4K/(PM_LSU_DTLB_MISS_4K+PM_DTLB_MISS_64K+PM_DTLB_MISS_16M_2M+PM_DTLB_MISS_16G_1G))*100
|
||||
L1 DTLB 64K misses = PM_LSU_DTLB_MISS_64K
|
||||
L1 DTLB 64K miss rate = PM_LSU_DTLB_MISS_64K/PM_RUN_INST_CMPL
|
||||
L1 DTLB 64K miss ratio [%] = (PM_LSU_DTLB_MISS_64K/(PM_LSU_DTLB_MISS_4K+PM_DTLB_MISS_64K+PM_DTLB_MISS_16M_2M+PM_DTLB_MISS_16G_1G))*100
|
||||
L1 DTLB 4K misses = PM_LSU_DTLB_MISS_4K
|
||||
L1 DTLB 4K miss rate = PM_LSU_DTLB_MISS_4K/PM_RUN_INST_CMPL
|
||||
L1 DTLB 4K miss ratio [%] = (PM_LSU_DTLB_MISS_4K/(PM_LSU_DTLB_MISS_4K+PM_DTLB_MISS_64K+PM_DTLB_MISS_16M_2M+PM_DTLB_MISS_16G_1G))*100
|
||||
L1 DTLB 4K misses = PM_LSU_DTLB_MISS_4K
|
||||
L1 DTLB 4K miss rate = PM_LSU_DTLB_MISS_4K/PM_RUN_INST_CMPL
|
||||
L1 DTLB 4K miss ratio [%] = (PM_LSU_DTLB_MISS_4K/(PM_LSU_DTLB_MISS_4K+PM_DTLB_MISS_64K+PM_DTLB_MISS_16M_2M+PM_DTLB_MISS_16G_1G))*100
|
||||
-
|
||||
This group measures the data TLB misses for different page sizes.
|
21
collectors/likwid/groups/power9/TLB_INSTR.txt
Normal file
21
collectors/likwid/groups/power9/TLB_INSTR.txt
Normal file
@@ -0,0 +1,21 @@
|
||||
SHORT L1 Instruction TLB miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC3 PM_ITLB_MISS
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC5/PMC4
|
||||
L1 ITLB misses PMC3
|
||||
L1 ITLB miss rate PMC3/PMC4
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1 ITLB misses = PM_ITLB_MISS
|
||||
L1 ITLB miss rate = PM_ITLB_MISS/PM_RUN_INST_CMPL
|
||||
-
|
||||
This group measures the reloads of the instruction TLB.
|
||||
Misses to the HPT are counted once while misses in the Radix
|
||||
tree count the number of tree levels traversed.
|
22
collectors/likwid/groups/power9/USEFUL.txt
Normal file
22
collectors/likwid/groups/power9/USEFUL.txt
Normal file
@@ -0,0 +1,22 @@
|
||||
SHORT Rate of useful instructions
|
||||
|
||||
EVENTSET
|
||||
PMC0 PM_RUN_SPURR
|
||||
PMC1 PM_INST_DISP
|
||||
PMC3 PM_RUN_PURR
|
||||
PMC4 PM_RUN_INST_CMPL
|
||||
PMC5 PM_RUN_CYC
|
||||
|
||||
METRICS
|
||||
CPI PMC5/PMC4
|
||||
Useful instr. rate [%] (PMC4/PMC1)*100.0
|
||||
Processor Utilization [%] (PMC0/PMC3)*100.0
|
||||
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Useful instr. rate [%] = (PM_RUN_INST_CMPL/PM_INST_DISP)*100
|
||||
Processor Utilization [%] = (PM_RUN_SPURR/PM_RUN_PURR)*100
|
||||
--
|
||||
This performance group shows the overhead of speculative
|
||||
execution of instructions and the processor utilization.
|
Reference in New Issue
Block a user