mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-08-01 00:56:26 +02:00
Add likwid collector
This commit is contained in:
32
collectors/likwid/groups/arm8_tx2/BRANCH.txt
Normal file
32
collectors/likwid/groups/arm8_tx2/BRANCH.txt
Normal file
@@ -0,0 +1,32 @@
|
||||
SHORT Branch prediction miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 BR_PRED
|
||||
PMC3 BR_MIS_PRED
|
||||
PMC4 INST_SPEC
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
Branch rate PMC2/PMC0
|
||||
Branch misprediction rate PMC3/PMC0
|
||||
Branch misprediction ratio PMC3/(PMC2+PMC3)
|
||||
Instructions per branch PMC0/(PMC2+PMC3)
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = CPU_CYCLES/INST_RETIRED
|
||||
Branch rate = BR_PRED/INST_RETIRED
|
||||
Branch misprediction rate = BR_MIS_PRED/INST_RETIRED
|
||||
Branch misprediction ratio = BR_MIS_PRED/(BR_PRED+BR_MIS_PRED)
|
||||
Instructions per branch = INSTR_RETIRED_ANY/(BR_PRED+BR_MIS_PRED)
|
||||
-
|
||||
The rates state how often in average a branch or a mispredicted branch occured
|
||||
per instruction retired in total. The Branch misprediction ratio sets directly
|
||||
into relation what ratio of all branch instruction where mispredicted.
|
||||
Instructions per branch is 1/Branch rate.
|
||||
|
25
collectors/likwid/groups/arm8_tx2/DATA.txt
Normal file
25
collectors/likwid/groups/arm8_tx2/DATA.txt
Normal file
@@ -0,0 +1,25 @@
|
||||
SHORT Load to store ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 LD_RETIRED
|
||||
PMC3 ST_RETIRED
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
Load to store ratio PMC2/PMC3
|
||||
Load ratio PMC2/PMC0
|
||||
Store ratio PMC3/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = CPU_CYCLES/INST_RETIRED
|
||||
Load to store ratio = LD_RETIRED / ST_RETIRED
|
||||
Load ratio = LD_RETIRED / INST_RETIRED
|
||||
Store ratio = ST_RETIRED / INST_RETIRED
|
||||
-
|
||||
This is a metric to determine your load to store ratio.
|
||||
|
28
collectors/likwid/groups/arm8_tx2/FLOPS_DP.txt
Normal file
28
collectors/likwid/groups/arm8_tx2/FLOPS_DP.txt
Normal file
@@ -0,0 +1,28 @@
|
||||
SHORT Double Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 VFP_SPEC
|
||||
PMC3 ASE_SPEC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
DP [MFLOP/s] 1.0E-06*(PMC3*2.0+PMC2)/time
|
||||
NEON DP [MFLOP/s] 1.0E-06*(PMC3*2.0)/time
|
||||
Packed [MUOPS/s] 1.0E-06*(PMC3)/time
|
||||
Scalar [MUOPS/s] 1.0E-06*PMC2/time
|
||||
Vectorization ratio 100*(PMC3)/(PMC2+PMC3)
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
DP [MFLOP/s] = 1.0E-06*(ASE_SPEC*2+VFP_SPEC)/runtime
|
||||
NEON DP [MFLOP/s] = 1.0E-06*(ASE_SPEC*4)/runtime
|
||||
Packed [MUOPS/s] = 1.0E-06*(ASE_SPEC)/runtime
|
||||
Scalar [MUOPS/s] = 1.0E-06*VFP_SPEC/runtime
|
||||
Vectorization ratio = 100*(ASE_SPEC)/(ASE_SPEC+VFP_SPEC)
|
||||
-
|
||||
NEON scalar and packed double precision FLOP rates.
|
||||
|
28
collectors/likwid/groups/arm8_tx2/FLOPS_SP.txt
Normal file
28
collectors/likwid/groups/arm8_tx2/FLOPS_SP.txt
Normal file
@@ -0,0 +1,28 @@
|
||||
SHORT Single Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 VFP_SPEC
|
||||
PMC3 ASE_SPEC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
SP [MFLOP/s] 1.0E-06*(PMC3*2.0+PMC2)/time
|
||||
NEON SP [MFLOP/s] 1.0E-06*(PMC3*2.0)/time
|
||||
Packed [MUOPS/s] 1.0E-06*(PMC3)/time
|
||||
Scalar [MUOPS/s] 1.0E-06*PMC2/time
|
||||
Vectorization ratio 100*(PMC3)/(PMC2+PMC3)
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
SP [MFLOP/s] = 1.0E-06*(ASE_SPEC*2+VFP_SPEC)/runtime
|
||||
NEON SP [MFLOP/s] = 1.0E-06*(ASE_SPEC*4)/runtime
|
||||
Packed [MUOPS/s] = 1.0E-06*(ASE_SPEC)/runtime
|
||||
Scalar [MUOPS/s] = 1.0E-06*VFP_SPEC/runtime
|
||||
Vectorization ratio = 100*(ASE_SPEC)/(ASE_SPEC+VFP_SPEC)
|
||||
-
|
||||
NEON scalar and packed single precision FLOP rates.
|
||||
|
23
collectors/likwid/groups/arm8_tx2/ICACHE.txt
Normal file
23
collectors/likwid/groups/arm8_tx2/ICACHE.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT Instruction cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 L1I_CACHE
|
||||
PMC3 L1I_CACHE_REFILL
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
L1I request rate PMC2/PMC0
|
||||
L1I miss rate PMC3/PMC0
|
||||
L1I miss ratio PMC3/PMC2
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1I request rate = L1I_CACHE / INST_RETIRED
|
||||
L1I miss rate = L1I_CACHE_REFILL / INST_RETIRED
|
||||
L1I miss ratio = L1I_CACHE_REFILL / L1I_CACHE
|
||||
-
|
||||
This group measures some L1 instruction cache metrics.
|
41
collectors/likwid/groups/arm8_tx2/L2.txt
Normal file
41
collectors/likwid/groups/arm8_tx2/L2.txt
Normal file
@@ -0,0 +1,41 @@
|
||||
SHORT L2 cache bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 L1D_CACHE_REFILL
|
||||
PMC3 L1D_CACHE_WB
|
||||
PMC4 L1I_CACHE_REFILL
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
L2D load bandwidth [MBytes/s] 1.0E-06*PMC2*64.0/time
|
||||
L2D load data volume [GBytes] 1.0E-09*PMC2*64.0
|
||||
L2D evict bandwidth [MBytes/s] 1.0E-06*PMC3*64.0/time
|
||||
L2D evict data volume [GBytes] 1.0E-09*PMC3*64.0
|
||||
L2I load bandwidth [MBytes/s] 1.0E-06*PMC4*64.0/time
|
||||
L2I load data volume [GBytes] 1.0E-09*PMC4*64.0
|
||||
L2 bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3+PMC4)*64.0/time
|
||||
L2 data volume [GBytes] 1.0E-09*(PMC2+PMC3+PMC4)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = CPU_CYCLES/INST_RETIRED
|
||||
L2D load bandwidth [MBytes/s] = 1.0E-06*L1D_CACHE_REFILL*64.0/time
|
||||
L2D load data volume [GBytes] = 1.0E-09*L1D_CACHE_REFILL*64.0
|
||||
L2D evict bandwidth [MBytes/s] = 1.0E-06*L1D_CACHE_WB*64.0/time
|
||||
L2D evict data volume [GBytes] = 1.0E-09*L1D_CACHE_WB*64.0
|
||||
L2I load bandwidth [MBytes/s] = 1.0E-06*L1I_CACHE_REFILL*64.0/time
|
||||
L2I load data volume [GBytes] = 1.0E-09*L1I_CACHE_REFILL*64.0
|
||||
L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_CACHE_REFILL+L1D_CACHE_WB+L1I_CACHE_REFILL)*64.0/time
|
||||
L2 data volume [GBytes] = 1.0E-09*(L1D_CACHE_REFILL+L1D_CACHE_WB+L1I_CACHE_REFILL)*64.0
|
||||
-
|
||||
Profiling group to measure L2 cache bandwidth. The bandwidth is computed by the
|
||||
number of cacheline loaded from the L2 to the L1 data cache and the writebacks from
|
||||
the L1 data cache to the L2 cache. The group also outputs total data volume transfered between
|
||||
L2 and L1. Note that this bandwidth also includes data transfers due to a write
|
||||
allocate load on a store miss in L1 and cachelines transfered it the instruction
|
||||
cache.
|
32
collectors/likwid/groups/arm8_tx2/L2CACHE.txt
Normal file
32
collectors/likwid/groups/arm8_tx2/L2CACHE.txt
Normal file
@@ -0,0 +1,32 @@
|
||||
SHORT L2 cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 L2D_CACHE
|
||||
PMC3 L2D_CACHE_REFILL
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
L2 request rate PMC2/PMC0
|
||||
L2 miss rate PMC3/PMC0
|
||||
L2 miss ratio PMC3/PMC2
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2 request rate = L2D_CACHE/INST_RETIRED
|
||||
L2 miss rate = L2D_CACHE_REFILL/INST_RETIRED
|
||||
L2 miss ratio = L2D_CACHE_REFILL/L2D_CACHE
|
||||
-
|
||||
This group measures the locality of your data accesses with regard to the
|
||||
L2 cache. L2 request rate tells you how data intensive your code is
|
||||
or how many data accesses you have on average per instruction.
|
||||
The L2 miss rate gives a measure how often it was necessary to get
|
||||
cache lines from memory. And finally L2 miss ratio tells you how many of your
|
||||
memory references required a cache line to be loaded from a higher level.
|
||||
While the data cache miss rate might be given by your algorithm you should
|
||||
try to get data cache miss ratio as low as possible by increasing your cache reuse.
|
||||
|
||||
|
38
collectors/likwid/groups/arm8_tx2/L3.txt
Normal file
38
collectors/likwid/groups/arm8_tx2/L3.txt
Normal file
@@ -0,0 +1,38 @@
|
||||
SHORT L3 cache bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 L2D_CACHE_REFILL
|
||||
PMC3 L2D_CACHE_WB
|
||||
PMC4 L2D_CACHE_ALLOCATE
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
L3 load bandwidth [MBytes/s] 1.0E-06*(PMC2-PMC4)*64.0/time
|
||||
L3 load data volume [GBytes] 1.0E-09*(PMC2-PMC4)*64.0
|
||||
L3 evict bandwidth [MBytes/s] 1.0E-06*PMC3*64.0/time
|
||||
L3 evict data volume [GBytes] 1.0E-09*PMC3*64.0
|
||||
L3 bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3-PMC4)*64.0/time
|
||||
L3 data volume [GBytes] 1.0E-09*(PMC2+PMC3-PMC4)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = CPU_CYCLES/INST_RETIRED
|
||||
L3 load bandwidth [MBytes/s] = 1.0E-06*(L2D_CACHE_REFILL-L2D_CACHE_ALLOCATE)*64.0/time
|
||||
L3 load data volume [GBytes] = 1.0E-09*(L2D_CACHE_REFILL-L2D_CACHE_ALLOCATE)*64.0
|
||||
L3 evict bandwidth [MBytes/s] = 1.0E-06*L2D_CACHE_WB*64.0/time
|
||||
L3 evict data volume [GBytes] = 1.0E-09*L2D_CACHE_WB*64.0
|
||||
L3 bandwidth [MBytes/s] = 1.0E-06*(L2D_CACHE_REFILL+L2D_CACHE_WB-L2D_CACHE_ALLOCATE))*64.0/time
|
||||
L3 data volume [GBytes] = 1.0E-09*(L2D_CACHE_REFILL+L2D_CACHE_WB-L2D_CACHE_ALLOCATE))*64.0
|
||||
-
|
||||
Profiling group to measure L2 <-> L3 cache bandwidth. The bandwidth is computed by the
|
||||
number of cache lines loaded from the L3 to the L2 data cache and the writebacks from
|
||||
the L2 data cache to the L3 cache. The group also outputs total data volume transfered between
|
||||
L3 and L2. For streaming-stores, the cache lines are allocated in L2, consequently there
|
||||
is no traffic between L3 and L2 in this case. But the L2D_CACHE_REFILL event counts these
|
||||
allocated cache lines, that's why the value of L2D_CACHE_REFILL is reduced
|
||||
by L2D_CACHE_ALLOCATE.
|
32
collectors/likwid/groups/arm8_tx2/MEM.txt
Normal file
32
collectors/likwid/groups/arm8_tx2/MEM.txt
Normal file
@@ -0,0 +1,32 @@
|
||||
SHORT Main memory bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
MBOX0C0 MEMORY_READS
|
||||
MBOX0C1 MEMORY_WRITES
|
||||
MBOX1C0 MEMORY_READS
|
||||
MBOX1C1 MEMORY_WRITES
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
Memory read bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0)*64.0/time
|
||||
Memory read data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0)*64.0
|
||||
Memory write bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX1C1)*64.0/time
|
||||
Memory write data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX1C1)*64.0
|
||||
Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX0C1+MBOX1C1)*64.0/time
|
||||
Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX0C1+MBOX1C1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Memory read bandwidth [MBytes/s] = 1.0E-06*(SUM(MEMORY_READS))*64.0/runtime
|
||||
Memory read data volume [GBytes] = 1.0E-09*(SUM(MEMORY_READS))*64.0
|
||||
Memory write bandwidth [MBytes/s] = 1.0E-06*(SUM(MEMORY_WRITES))*64.0/runtime
|
||||
Memory write data volume [GBytes] = 1.0E-09*(SUM(MEMORY_WRITES))*64.0
|
||||
Memory bandwidth [MBytes/s] = 1.0E-06*(SUM(MEMORY_READS)+SUM(MEMORY_WRITES))*64.0/runtime
|
||||
Memory data volume [GBytes] = 1.0E-09*(SUM(MEMORY_READS)+SUM(MEMORY_WRITES))*64.0
|
||||
-
|
||||
Profiling group to measure memory bandwidth. It uses the performance monitoring
|
||||
hardware of the memory controllers.
|
44
collectors/likwid/groups/arm8_tx2/SPEC.txt
Normal file
44
collectors/likwid/groups/arm8_tx2/SPEC.txt
Normal file
@@ -0,0 +1,44 @@
|
||||
SHORT Information about speculative execution
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_SPEC
|
||||
PMC1 LD_SPEC
|
||||
PMC2 ST_SPEC
|
||||
PMC3 DP_SPEC
|
||||
PMC4 VFP_SPEC
|
||||
PMC5 ASE_SPEC
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Operations spec. executed PMC0
|
||||
Load ops spec. executed PMC1
|
||||
Store ops spec. executed PMC2
|
||||
Integer data ops spec. executed PMC3
|
||||
Scalar FP ops spec. executed PMC4
|
||||
Vector FP ops spec. executed PMC5
|
||||
Other ops spec. executed (PMC0-PMC1-PMC2-PMC3-PMC4-PMC5)
|
||||
Load ops spec. ratio PMC1/PMC0
|
||||
Store ops spec. ratio PMC2/PMC0
|
||||
Integer data ops spec. ratio PMC3/PMC0
|
||||
Scalar FP ops spec. ratio PMC4/PMC0
|
||||
Vector FP ops spec. ratio PMC5/PMC0
|
||||
Other ops spec. ratio (PMC0-PMC1-PMC2-PMC3-PMC4-PMC5)/PMC0
|
||||
|
||||
|
||||
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Load ops spec. ratio = LD_SPEC / INST_SPEC
|
||||
Store ops spec. ratio = ST_SPEC / INST_SPEC
|
||||
Integer data ops spec. ratio = DP_SPEC / INST_SPEC
|
||||
Scalar FP ops spec. ratio = VFP_SPEC / INST_SPEC
|
||||
Vector FP ops spec. ratio = ASE_SPEC / INST_SPEC
|
||||
Other ops spec. ratio = (INST_SPEC-LD_SPEC-ST_SPEC-DP_SPEC-VFP_SPEC-ASE_SPEC) / INST_SPEC
|
||||
-
|
||||
This group gives information about the speculative execution of micro-ops.
|
||||
It is currently unclear why Other ops spec. executed and ratio is negative
|
||||
in some cases. Although the documentation contains an OP_RETIRED, there is no
|
||||
equivalent OP_SPEC which could be a better reference in this group instead of
|
||||
INST_SPEC.
|
27
collectors/likwid/groups/arm8_tx2/TLB_DATA.txt
Normal file
27
collectors/likwid/groups/arm8_tx2/TLB_DATA.txt
Normal file
@@ -0,0 +1,27 @@
|
||||
SHORT L1 data TLB miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 L1D_TLB_REFILL_RD
|
||||
PMC3 L1D_TLB_REFILL_WR
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
L1 DTLB load misses PMC2
|
||||
L1 DTLB load miss rate PMC2/PMC0
|
||||
L1 DTLB store misses PMC3
|
||||
L1 DTLB store miss rate PMC3/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1 DTLB load misses = L1D_TLB_REFILL_RD
|
||||
L1 DTLB load miss rate = L1D_TLB_REFILL_RD / INST_RETIRED
|
||||
L1 DTLB store misses = L1D_TLB_REFILL_WR
|
||||
L1 DTLB store miss rate = L1D_TLB_REFILL_WR / INST_RETIRED
|
||||
-
|
||||
The DTLB load and store miss rates gives a measure how often a TLB miss occurred
|
||||
per instruction.
|
||||
|
23
collectors/likwid/groups/arm8_tx2/TLB_INSTR.txt
Normal file
23
collectors/likwid/groups/arm8_tx2/TLB_INSTR.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT L1 Instruction TLB miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 L1I_TLB_REFILL
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
L1 ITLB misses PMC2
|
||||
L1 ITLB miss rate PMC2/PMC0
|
||||
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1 ITLB misses = L1I_TLB_REFILL
|
||||
L1 ITLB miss rate = L1I_TLB_REFILL / INST_RETIRED
|
||||
-
|
||||
The ITLB miss rates gives a measure how often a TLB miss occurred
|
||||
per instruction.
|
||||
|
Reference in New Issue
Block a user