Add likwid collector

This commit is contained in:
Thomas Roehl
2021-03-25 14:47:10 +01:00
parent 4fddcb9741
commit a6ac0c5373
670 changed files with 24926 additions and 0 deletions

View File

@@ -0,0 +1,32 @@
SHORT Branch prediction miss rate/ratio
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 BR_PRED
PMC3 BR_MIS_PRED
PMC4 INST_SPEC
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
Branch rate PMC2/PMC0
Branch misprediction rate PMC3/PMC0
Branch misprediction ratio PMC3/(PMC2+PMC3)
Instructions per branch PMC0/(PMC2+PMC3)
LONG
Formulas:
CPI = CPU_CYCLES/INST_RETIRED
Branch rate = BR_PRED/INST_RETIRED
Branch misprediction rate = BR_MIS_PRED/INST_RETIRED
Branch misprediction ratio = BR_MIS_PRED/(BR_PRED+BR_MIS_PRED)
Instructions per branch = INSTR_RETIRED_ANY/(BR_PRED+BR_MIS_PRED)
-
The rates state how often in average a branch or a mispredicted branch occured
per instruction retired in total. The Branch misprediction ratio sets directly
into relation what ratio of all branch instruction where mispredicted.
Instructions per branch is 1/Branch rate.

View File

@@ -0,0 +1,25 @@
SHORT Load to store ratio
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 LD_RETIRED
PMC3 ST_RETIRED
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
Load to store ratio PMC2/PMC3
Load ratio PMC2/PMC0
Store ratio PMC3/PMC0
LONG
Formulas:
CPI = CPU_CYCLES/INST_RETIRED
Load to store ratio = LD_RETIRED / ST_RETIRED
Load ratio = LD_RETIRED / INST_RETIRED
Store ratio = ST_RETIRED / INST_RETIRED
-
This is a metric to determine your load to store ratio.

View File

@@ -0,0 +1,28 @@
SHORT Double Precision MFLOP/s
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 VFP_SPEC
PMC3 ASE_SPEC
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
DP [MFLOP/s] 1.0E-06*(PMC3*2.0+PMC2)/time
NEON DP [MFLOP/s] 1.0E-06*(PMC3*2.0)/time
Packed [MUOPS/s] 1.0E-06*(PMC3)/time
Scalar [MUOPS/s] 1.0E-06*PMC2/time
Vectorization ratio 100*(PMC3)/(PMC2+PMC3)
LONG
Formulas:
DP [MFLOP/s] = 1.0E-06*(ASE_SPEC*2+VFP_SPEC)/runtime
NEON DP [MFLOP/s] = 1.0E-06*(ASE_SPEC*4)/runtime
Packed [MUOPS/s] = 1.0E-06*(ASE_SPEC)/runtime
Scalar [MUOPS/s] = 1.0E-06*VFP_SPEC/runtime
Vectorization ratio = 100*(ASE_SPEC)/(ASE_SPEC+VFP_SPEC)
-
NEON scalar and packed double precision FLOP rates.

View File

@@ -0,0 +1,28 @@
SHORT Single Precision MFLOP/s
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 VFP_SPEC
PMC3 ASE_SPEC
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
SP [MFLOP/s] 1.0E-06*(PMC3*2.0+PMC2)/time
NEON SP [MFLOP/s] 1.0E-06*(PMC3*2.0)/time
Packed [MUOPS/s] 1.0E-06*(PMC3)/time
Scalar [MUOPS/s] 1.0E-06*PMC2/time
Vectorization ratio 100*(PMC3)/(PMC2+PMC3)
LONG
Formulas:
SP [MFLOP/s] = 1.0E-06*(ASE_SPEC*2+VFP_SPEC)/runtime
NEON SP [MFLOP/s] = 1.0E-06*(ASE_SPEC*4)/runtime
Packed [MUOPS/s] = 1.0E-06*(ASE_SPEC)/runtime
Scalar [MUOPS/s] = 1.0E-06*VFP_SPEC/runtime
Vectorization ratio = 100*(ASE_SPEC)/(ASE_SPEC+VFP_SPEC)
-
NEON scalar and packed single precision FLOP rates.

View File

@@ -0,0 +1,23 @@
SHORT Instruction cache miss rate/ratio
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 L1I_CACHE
PMC3 L1I_CACHE_REFILL
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
L1I request rate PMC2/PMC0
L1I miss rate PMC3/PMC0
L1I miss ratio PMC3/PMC2
LONG
Formulas:
L1I request rate = L1I_CACHE / INST_RETIRED
L1I miss rate = L1I_CACHE_REFILL / INST_RETIRED
L1I miss ratio = L1I_CACHE_REFILL / L1I_CACHE
-
This group measures some L1 instruction cache metrics.

View File

@@ -0,0 +1,41 @@
SHORT L2 cache bandwidth in MBytes/s
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 L1D_CACHE_REFILL
PMC3 L1D_CACHE_WB
PMC4 L1I_CACHE_REFILL
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
L2D load bandwidth [MBytes/s] 1.0E-06*PMC2*64.0/time
L2D load data volume [GBytes] 1.0E-09*PMC2*64.0
L2D evict bandwidth [MBytes/s] 1.0E-06*PMC3*64.0/time
L2D evict data volume [GBytes] 1.0E-09*PMC3*64.0
L2I load bandwidth [MBytes/s] 1.0E-06*PMC4*64.0/time
L2I load data volume [GBytes] 1.0E-09*PMC4*64.0
L2 bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3+PMC4)*64.0/time
L2 data volume [GBytes] 1.0E-09*(PMC2+PMC3+PMC4)*64.0
LONG
Formulas:
CPI = CPU_CYCLES/INST_RETIRED
L2D load bandwidth [MBytes/s] = 1.0E-06*L1D_CACHE_REFILL*64.0/time
L2D load data volume [GBytes] = 1.0E-09*L1D_CACHE_REFILL*64.0
L2D evict bandwidth [MBytes/s] = 1.0E-06*L1D_CACHE_WB*64.0/time
L2D evict data volume [GBytes] = 1.0E-09*L1D_CACHE_WB*64.0
L2I load bandwidth [MBytes/s] = 1.0E-06*L1I_CACHE_REFILL*64.0/time
L2I load data volume [GBytes] = 1.0E-09*L1I_CACHE_REFILL*64.0
L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_CACHE_REFILL+L1D_CACHE_WB+L1I_CACHE_REFILL)*64.0/time
L2 data volume [GBytes] = 1.0E-09*(L1D_CACHE_REFILL+L1D_CACHE_WB+L1I_CACHE_REFILL)*64.0
-
Profiling group to measure L2 cache bandwidth. The bandwidth is computed by the
number of cacheline loaded from the L2 to the L1 data cache and the writebacks from
the L1 data cache to the L2 cache. The group also outputs total data volume transfered between
L2 and L1. Note that this bandwidth also includes data transfers due to a write
allocate load on a store miss in L1 and cachelines transfered it the instruction
cache.

View File

@@ -0,0 +1,32 @@
SHORT L2 cache miss rate/ratio
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 L2D_CACHE
PMC3 L2D_CACHE_REFILL
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
L2 request rate PMC2/PMC0
L2 miss rate PMC3/PMC0
L2 miss ratio PMC3/PMC2
LONG
Formulas:
L2 request rate = L2D_CACHE/INST_RETIRED
L2 miss rate = L2D_CACHE_REFILL/INST_RETIRED
L2 miss ratio = L2D_CACHE_REFILL/L2D_CACHE
-
This group measures the locality of your data accesses with regard to the
L2 cache. L2 request rate tells you how data intensive your code is
or how many data accesses you have on average per instruction.
The L2 miss rate gives a measure how often it was necessary to get
cache lines from memory. And finally L2 miss ratio tells you how many of your
memory references required a cache line to be loaded from a higher level.
While the data cache miss rate might be given by your algorithm you should
try to get data cache miss ratio as low as possible by increasing your cache reuse.

View File

@@ -0,0 +1,38 @@
SHORT L3 cache bandwidth in MBytes/s
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 L2D_CACHE_REFILL
PMC3 L2D_CACHE_WB
PMC4 L2D_CACHE_ALLOCATE
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
L3 load bandwidth [MBytes/s] 1.0E-06*(PMC2-PMC4)*64.0/time
L3 load data volume [GBytes] 1.0E-09*(PMC2-PMC4)*64.0
L3 evict bandwidth [MBytes/s] 1.0E-06*PMC3*64.0/time
L3 evict data volume [GBytes] 1.0E-09*PMC3*64.0
L3 bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3-PMC4)*64.0/time
L3 data volume [GBytes] 1.0E-09*(PMC2+PMC3-PMC4)*64.0
LONG
Formulas:
CPI = CPU_CYCLES/INST_RETIRED
L3 load bandwidth [MBytes/s] = 1.0E-06*(L2D_CACHE_REFILL-L2D_CACHE_ALLOCATE)*64.0/time
L3 load data volume [GBytes] = 1.0E-09*(L2D_CACHE_REFILL-L2D_CACHE_ALLOCATE)*64.0
L3 evict bandwidth [MBytes/s] = 1.0E-06*L2D_CACHE_WB*64.0/time
L3 evict data volume [GBytes] = 1.0E-09*L2D_CACHE_WB*64.0
L3 bandwidth [MBytes/s] = 1.0E-06*(L2D_CACHE_REFILL+L2D_CACHE_WB-L2D_CACHE_ALLOCATE))*64.0/time
L3 data volume [GBytes] = 1.0E-09*(L2D_CACHE_REFILL+L2D_CACHE_WB-L2D_CACHE_ALLOCATE))*64.0
-
Profiling group to measure L2 <-> L3 cache bandwidth. The bandwidth is computed by the
number of cache lines loaded from the L3 to the L2 data cache and the writebacks from
the L2 data cache to the L3 cache. The group also outputs total data volume transfered between
L3 and L2. For streaming-stores, the cache lines are allocated in L2, consequently there
is no traffic between L3 and L2 in this case. But the L2D_CACHE_REFILL event counts these
allocated cache lines, that's why the value of L2D_CACHE_REFILL is reduced
by L2D_CACHE_ALLOCATE.

View File

@@ -0,0 +1,32 @@
SHORT Main memory bandwidth in MBytes/s
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
MBOX0C0 MEMORY_READS
MBOX0C1 MEMORY_WRITES
MBOX1C0 MEMORY_READS
MBOX1C1 MEMORY_WRITES
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
Memory read bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0)*64.0/time
Memory read data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0)*64.0
Memory write bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX1C1)*64.0/time
Memory write data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX1C1)*64.0
Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX0C1+MBOX1C1)*64.0/time
Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX0C1+MBOX1C1)*64.0
LONG
Formulas:
Memory read bandwidth [MBytes/s] = 1.0E-06*(SUM(MEMORY_READS))*64.0/runtime
Memory read data volume [GBytes] = 1.0E-09*(SUM(MEMORY_READS))*64.0
Memory write bandwidth [MBytes/s] = 1.0E-06*(SUM(MEMORY_WRITES))*64.0/runtime
Memory write data volume [GBytes] = 1.0E-09*(SUM(MEMORY_WRITES))*64.0
Memory bandwidth [MBytes/s] = 1.0E-06*(SUM(MEMORY_READS)+SUM(MEMORY_WRITES))*64.0/runtime
Memory data volume [GBytes] = 1.0E-09*(SUM(MEMORY_READS)+SUM(MEMORY_WRITES))*64.0
-
Profiling group to measure memory bandwidth. It uses the performance monitoring
hardware of the memory controllers.

View File

@@ -0,0 +1,44 @@
SHORT Information about speculative execution
EVENTSET
PMC0 INST_SPEC
PMC1 LD_SPEC
PMC2 ST_SPEC
PMC3 DP_SPEC
PMC4 VFP_SPEC
PMC5 ASE_SPEC
METRICS
Runtime (RDTSC) [s] time
Operations spec. executed PMC0
Load ops spec. executed PMC1
Store ops spec. executed PMC2
Integer data ops spec. executed PMC3
Scalar FP ops spec. executed PMC4
Vector FP ops spec. executed PMC5
Other ops spec. executed (PMC0-PMC1-PMC2-PMC3-PMC4-PMC5)
Load ops spec. ratio PMC1/PMC0
Store ops spec. ratio PMC2/PMC0
Integer data ops spec. ratio PMC3/PMC0
Scalar FP ops spec. ratio PMC4/PMC0
Vector FP ops spec. ratio PMC5/PMC0
Other ops spec. ratio (PMC0-PMC1-PMC2-PMC3-PMC4-PMC5)/PMC0
LONG
Formulas:
Load ops spec. ratio = LD_SPEC / INST_SPEC
Store ops spec. ratio = ST_SPEC / INST_SPEC
Integer data ops spec. ratio = DP_SPEC / INST_SPEC
Scalar FP ops spec. ratio = VFP_SPEC / INST_SPEC
Vector FP ops spec. ratio = ASE_SPEC / INST_SPEC
Other ops spec. ratio = (INST_SPEC-LD_SPEC-ST_SPEC-DP_SPEC-VFP_SPEC-ASE_SPEC) / INST_SPEC
-
This group gives information about the speculative execution of micro-ops.
It is currently unclear why Other ops spec. executed and ratio is negative
in some cases. Although the documentation contains an OP_RETIRED, there is no
equivalent OP_SPEC which could be a better reference in this group instead of
INST_SPEC.

View File

@@ -0,0 +1,27 @@
SHORT L1 data TLB miss rate/ratio
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 L1D_TLB_REFILL_RD
PMC3 L1D_TLB_REFILL_WR
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
L1 DTLB load misses PMC2
L1 DTLB load miss rate PMC2/PMC0
L1 DTLB store misses PMC3
L1 DTLB store miss rate PMC3/PMC0
LONG
Formulas:
L1 DTLB load misses = L1D_TLB_REFILL_RD
L1 DTLB load miss rate = L1D_TLB_REFILL_RD / INST_RETIRED
L1 DTLB store misses = L1D_TLB_REFILL_WR
L1 DTLB store miss rate = L1D_TLB_REFILL_WR / INST_RETIRED
-
The DTLB load and store miss rates gives a measure how often a TLB miss occurred
per instruction.

View File

@@ -0,0 +1,23 @@
SHORT L1 Instruction TLB miss rate/ratio
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 L1I_TLB_REFILL
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
L1 ITLB misses PMC2
L1 ITLB miss rate PMC2/PMC0
LONG
Formulas:
L1 ITLB misses = L1I_TLB_REFILL
L1 ITLB miss rate = L1I_TLB_REFILL / INST_RETIRED
-
The ITLB miss rates gives a measure how often a TLB miss occurred
per instruction.