Add likwid collector

2025-12-18 21:26:18 +01:00 · 2021-03-25 14:47:10 +01:00
parent 4fddcb9741
commit a6ac0c5373
670 changed files with 24926 additions and 0 deletions
--- a/collectors/likwid/groups/arm8_tx2/BRANCH.txt
+++ b/collectors/likwid/groups/arm8_tx2/BRANCH.txt
@@ -0,0 +1,32 @@
+SHORT Branch prediction miss rate/ratio
+
+EVENTSET
+PMC0  INST_RETIRED
+PMC1  CPU_CYCLES
+PMC2  BR_PRED
+PMC3  BR_MIS_PRED
+PMC4  INST_SPEC
+
+
+METRICS
+Runtime (RDTSC) [s] time
+Clock [MHz] 1.E-06*PMC1/time
+CPI  PMC1/PMC0
+Branch rate   PMC2/PMC0
+Branch misprediction rate  PMC3/PMC0
+Branch misprediction ratio  PMC3/(PMC2+PMC3)
+Instructions per branch  PMC0/(PMC2+PMC3)
+
+LONG
+Formulas:
+CPI = CPU_CYCLES/INST_RETIRED
+Branch rate = BR_PRED/INST_RETIRED
+Branch misprediction rate =  BR_MIS_PRED/INST_RETIRED
+Branch misprediction ratio = BR_MIS_PRED/(BR_PRED+BR_MIS_PRED)
+Instructions per branch = INSTR_RETIRED_ANY/(BR_PRED+BR_MIS_PRED)
+-
+The rates state how often in average a branch or a mispredicted branch occured
+per instruction retired in total. The Branch misprediction ratio sets directly
+into relation what ratio of all branch instruction where mispredicted.
+Instructions per branch is 1/Branch rate.
+
--- a/collectors/likwid/groups/arm8_tx2/DATA.txt
+++ b/collectors/likwid/groups/arm8_tx2/DATA.txt
@@ -0,0 +1,25 @@
+SHORT Load to store ratio
+
+EVENTSET
+PMC0  INST_RETIRED
+PMC1  CPU_CYCLES
+PMC2  LD_RETIRED
+PMC3  ST_RETIRED
+
+METRICS
+Runtime (RDTSC) [s] time
+Clock [MHz] 1.E-06*PMC1/time
+CPI  PMC1/PMC0
+Load to store ratio PMC2/PMC3
+Load ratio PMC2/PMC0
+Store ratio PMC3/PMC0
+
+LONG
+Formulas:
+CPI = CPU_CYCLES/INST_RETIRED
+Load to store ratio = LD_RETIRED / ST_RETIRED
+Load ratio = LD_RETIRED / INST_RETIRED
+Store ratio = ST_RETIRED / INST_RETIRED
+-
+This is a metric to determine your load to store ratio.
+
--- a/collectors/likwid/groups/arm8_tx2/FLOPS_DP.txt
+++ b/collectors/likwid/groups/arm8_tx2/FLOPS_DP.txt
@@ -0,0 +1,28 @@
+SHORT Double Precision MFLOP/s
+
+EVENTSET
+PMC0  INST_RETIRED
+PMC1  CPU_CYCLES
+PMC2  VFP_SPEC
+PMC3  ASE_SPEC
+
+METRICS
+Runtime (RDTSC) [s] time
+Clock [MHz] 1.E-06*PMC1/time
+CPI  PMC1/PMC0
+DP [MFLOP/s]  1.0E-06*(PMC3*2.0+PMC2)/time
+NEON DP [MFLOP/s]  1.0E-06*(PMC3*2.0)/time
+Packed [MUOPS/s]   1.0E-06*(PMC3)/time
+Scalar [MUOPS/s] 1.0E-06*PMC2/time
+Vectorization ratio 100*(PMC3)/(PMC2+PMC3)
+
+LONG
+Formulas:
+DP [MFLOP/s] = 1.0E-06*(ASE_SPEC*2+VFP_SPEC)/runtime
+NEON DP [MFLOP/s] = 1.0E-06*(ASE_SPEC*4)/runtime
+Packed [MUOPS/s] = 1.0E-06*(ASE_SPEC)/runtime
+Scalar [MUOPS/s] = 1.0E-06*VFP_SPEC/runtime
+Vectorization ratio = 100*(ASE_SPEC)/(ASE_SPEC+VFP_SPEC)
+-
+NEON scalar and packed double precision FLOP rates.
+
--- a/collectors/likwid/groups/arm8_tx2/FLOPS_SP.txt
+++ b/collectors/likwid/groups/arm8_tx2/FLOPS_SP.txt
@@ -0,0 +1,28 @@
+SHORT Single Precision MFLOP/s
+
+EVENTSET
+PMC0  INST_RETIRED
+PMC1  CPU_CYCLES
+PMC2  VFP_SPEC
+PMC3  ASE_SPEC
+
+METRICS
+Runtime (RDTSC) [s] time
+Clock [MHz] 1.E-06*PMC1/time
+CPI  PMC1/PMC0
+SP [MFLOP/s]  1.0E-06*(PMC3*2.0+PMC2)/time
+NEON SP [MFLOP/s]  1.0E-06*(PMC3*2.0)/time
+Packed [MUOPS/s]   1.0E-06*(PMC3)/time
+Scalar [MUOPS/s] 1.0E-06*PMC2/time
+Vectorization ratio 100*(PMC3)/(PMC2+PMC3)
+
+LONG
+Formulas:
+SP [MFLOP/s] = 1.0E-06*(ASE_SPEC*2+VFP_SPEC)/runtime
+NEON SP [MFLOP/s] = 1.0E-06*(ASE_SPEC*4)/runtime
+Packed [MUOPS/s] = 1.0E-06*(ASE_SPEC)/runtime
+Scalar [MUOPS/s] = 1.0E-06*VFP_SPEC/runtime
+Vectorization ratio = 100*(ASE_SPEC)/(ASE_SPEC+VFP_SPEC)
+-
+NEON scalar and packed single precision FLOP rates.
+
--- a/collectors/likwid/groups/arm8_tx2/ICACHE.txt
+++ b/collectors/likwid/groups/arm8_tx2/ICACHE.txt
@@ -0,0 +1,23 @@
+SHORT  Instruction cache miss rate/ratio
+
+EVENTSET
+PMC0  INST_RETIRED
+PMC1  CPU_CYCLES
+PMC2  L1I_CACHE
+PMC3  L1I_CACHE_REFILL
+
+METRICS
+Runtime (RDTSC) [s] time
+Clock [MHz] 1.E-06*PMC1/time
+CPI  PMC1/PMC0
+L1I request rate PMC2/PMC0
+L1I miss rate PMC3/PMC0
+L1I miss ratio PMC3/PMC2
+
+LONG
+Formulas:
+L1I request rate = L1I_CACHE / INST_RETIRED
+L1I miss rate = L1I_CACHE_REFILL / INST_RETIRED
+L1I miss ratio = L1I_CACHE_REFILL / L1I_CACHE
+-
+This group measures some L1 instruction cache metrics.
--- a/collectors/likwid/groups/arm8_tx2/L2.txt
+++ b/collectors/likwid/groups/arm8_tx2/L2.txt
@@ -0,0 +1,41 @@
+SHORT  L2 cache bandwidth in MBytes/s
+
+EVENTSET
+PMC0  INST_RETIRED
+PMC1  CPU_CYCLES
+PMC2  L1D_CACHE_REFILL
+PMC3  L1D_CACHE_WB
+PMC4  L1I_CACHE_REFILL
+
+
+METRICS
+Runtime (RDTSC) [s] time
+Clock [MHz] 1.E-06*PMC1/time
+CPI  PMC1/PMC0
+L2D load bandwidth [MBytes/s]  1.0E-06*PMC2*64.0/time
+L2D load data volume [GBytes]  1.0E-09*PMC2*64.0
+L2D evict bandwidth [MBytes/s]  1.0E-06*PMC3*64.0/time
+L2D evict data volume [GBytes]  1.0E-09*PMC3*64.0
+L2I load bandwidth [MBytes/s]  1.0E-06*PMC4*64.0/time
+L2I load data volume [GBytes]  1.0E-09*PMC4*64.0
+L2 bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3+PMC4)*64.0/time
+L2 data volume [GBytes] 1.0E-09*(PMC2+PMC3+PMC4)*64.0
+
+LONG
+Formulas:
+CPI = CPU_CYCLES/INST_RETIRED
+L2D load bandwidth [MBytes/s] = 1.0E-06*L1D_CACHE_REFILL*64.0/time
+L2D load data volume [GBytes] = 1.0E-09*L1D_CACHE_REFILL*64.0
+L2D evict bandwidth [MBytes/s] = 1.0E-06*L1D_CACHE_WB*64.0/time
+L2D evict data volume [GBytes] = 1.0E-09*L1D_CACHE_WB*64.0
+L2I load bandwidth [MBytes/s] = 1.0E-06*L1I_CACHE_REFILL*64.0/time
+L2I load data volume [GBytes] = 1.0E-09*L1I_CACHE_REFILL*64.0
+L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_CACHE_REFILL+L1D_CACHE_WB+L1I_CACHE_REFILL)*64.0/time
+L2 data volume [GBytes] = 1.0E-09*(L1D_CACHE_REFILL+L1D_CACHE_WB+L1I_CACHE_REFILL)*64.0
+-
+Profiling group to measure L2 cache bandwidth. The bandwidth is computed by the
+number of cacheline loaded from the L2 to the L1 data cache and the writebacks from
+the L1 data cache to the L2 cache. The group also outputs total data volume transfered between
+L2 and L1. Note that this bandwidth also includes data transfers due to a write
+allocate load on a store miss in L1 and cachelines transfered it the instruction
+cache.
--- a/collectors/likwid/groups/arm8_tx2/L2CACHE.txt
+++ b/collectors/likwid/groups/arm8_tx2/L2CACHE.txt
@@ -0,0 +1,32 @@
+SHORT L2 cache miss rate/ratio
+
+EVENTSET
+PMC0  INST_RETIRED
+PMC1  CPU_CYCLES
+PMC2  L2D_CACHE
+PMC3  L2D_CACHE_REFILL
+
+METRICS
+Runtime (RDTSC) [s] time
+Clock [MHz] 1.E-06*PMC1/time
+CPI  PMC1/PMC0
+L2 request rate PMC2/PMC0
+L2 miss rate PMC3/PMC0
+L2 miss ratio PMC3/PMC2
+
+LONG
+Formulas:
+L2 request rate = L2D_CACHE/INST_RETIRED
+L2 miss rate = L2D_CACHE_REFILL/INST_RETIRED
+L2 miss ratio = L2D_CACHE_REFILL/L2D_CACHE
+-
+This group measures the locality of your data accesses with regard to the
+L2 cache. L2 request rate tells you how data intensive your code is
+or how many data accesses you have on average per instruction.
+The L2 miss rate gives a measure how often it was necessary to get
+cache lines from memory. And finally L2 miss ratio tells you how many of your
+memory references required a cache line to be loaded from a higher level.
+While the data cache miss rate might be given by your algorithm you should
+try to get data cache miss ratio as low as possible by increasing your cache reuse.
+
+
--- a/collectors/likwid/groups/arm8_tx2/L3.txt
+++ b/collectors/likwid/groups/arm8_tx2/L3.txt
@@ -0,0 +1,38 @@
+SHORT  L3 cache bandwidth in MBytes/s
+
+EVENTSET
+PMC0  INST_RETIRED
+PMC1  CPU_CYCLES
+PMC2  L2D_CACHE_REFILL
+PMC3  L2D_CACHE_WB
+PMC4  L2D_CACHE_ALLOCATE
+
+
+METRICS
+Runtime (RDTSC) [s] time
+Clock [MHz] 1.E-06*PMC1/time
+CPI  PMC1/PMC0
+L3 load bandwidth [MBytes/s]  1.0E-06*(PMC2-PMC4)*64.0/time
+L3 load data volume [GBytes]  1.0E-09*(PMC2-PMC4)*64.0
+L3 evict bandwidth [MBytes/s]  1.0E-06*PMC3*64.0/time
+L3 evict data volume [GBytes]  1.0E-09*PMC3*64.0
+L3 bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3-PMC4)*64.0/time
+L3 data volume [GBytes] 1.0E-09*(PMC2+PMC3-PMC4)*64.0
+
+LONG
+Formulas:
+CPI = CPU_CYCLES/INST_RETIRED
+L3 load bandwidth [MBytes/s] = 1.0E-06*(L2D_CACHE_REFILL-L2D_CACHE_ALLOCATE)*64.0/time
+L3 load data volume [GBytes] = 1.0E-09*(L2D_CACHE_REFILL-L2D_CACHE_ALLOCATE)*64.0
+L3 evict bandwidth [MBytes/s] = 1.0E-06*L2D_CACHE_WB*64.0/time
+L3 evict data volume [GBytes] = 1.0E-09*L2D_CACHE_WB*64.0
+L3 bandwidth [MBytes/s] = 1.0E-06*(L2D_CACHE_REFILL+L2D_CACHE_WB-L2D_CACHE_ALLOCATE))*64.0/time
+L3 data volume [GBytes] = 1.0E-09*(L2D_CACHE_REFILL+L2D_CACHE_WB-L2D_CACHE_ALLOCATE))*64.0
+-
+Profiling group to measure L2 <-> L3 cache bandwidth. The bandwidth is computed by the
+number of cache lines loaded from the L3 to the L2 data cache and the writebacks from
+the L2 data cache to the L3 cache. The group also outputs total data volume transfered between
+L3 and L2. For streaming-stores, the cache lines are allocated in L2, consequently there
+is no traffic between L3 and L2 in this case. But the L2D_CACHE_REFILL event counts these
+allocated cache lines, that's why the value of L2D_CACHE_REFILL is reduced
+by L2D_CACHE_ALLOCATE.
--- a/collectors/likwid/groups/arm8_tx2/MEM.txt
+++ b/collectors/likwid/groups/arm8_tx2/MEM.txt
@@ -0,0 +1,32 @@
+SHORT Main memory bandwidth in MBytes/s
+
+EVENTSET
+PMC0  INST_RETIRED
+PMC1  CPU_CYCLES
+MBOX0C0  MEMORY_READS
+MBOX0C1  MEMORY_WRITES
+MBOX1C0  MEMORY_READS
+MBOX1C1  MEMORY_WRITES
+
+METRICS
+Runtime (RDTSC) [s] time
+Clock [MHz] 1.E-06*PMC1/time
+CPI  PMC1/PMC0
+Memory read bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0)*64.0/time
+Memory read data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0)*64.0
+Memory write bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX1C1)*64.0/time
+Memory write data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX1C1)*64.0
+Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX0C1+MBOX1C1)*64.0/time
+Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX0C1+MBOX1C1)*64.0
+
+LONG
+Formulas:
+Memory read bandwidth [MBytes/s] = 1.0E-06*(SUM(MEMORY_READS))*64.0/runtime
+Memory read data volume [GBytes] = 1.0E-09*(SUM(MEMORY_READS))*64.0
+Memory write bandwidth [MBytes/s] = 1.0E-06*(SUM(MEMORY_WRITES))*64.0/runtime
+Memory write data volume [GBytes] = 1.0E-09*(SUM(MEMORY_WRITES))*64.0
+Memory bandwidth [MBytes/s] = 1.0E-06*(SUM(MEMORY_READS)+SUM(MEMORY_WRITES))*64.0/runtime
+Memory data volume [GBytes] = 1.0E-09*(SUM(MEMORY_READS)+SUM(MEMORY_WRITES))*64.0
+-
+Profiling group to measure memory bandwidth. It uses the performance monitoring
+hardware of the memory controllers.
--- a/collectors/likwid/groups/arm8_tx2/SPEC.txt
+++ b/collectors/likwid/groups/arm8_tx2/SPEC.txt
@@ -0,0 +1,44 @@
+SHORT Information about speculative execution
+
+EVENTSET
+PMC0 INST_SPEC
+PMC1 LD_SPEC
+PMC2 ST_SPEC
+PMC3 DP_SPEC
+PMC4 VFP_SPEC
+PMC5 ASE_SPEC
+
+
+METRICS
+Runtime (RDTSC) [s] time
+Operations spec. executed PMC0
+Load ops spec. executed PMC1
+Store ops spec. executed PMC2
+Integer data ops spec. executed PMC3
+Scalar FP ops spec. executed PMC4
+Vector FP ops spec. executed PMC5
+Other ops spec. executed (PMC0-PMC1-PMC2-PMC3-PMC4-PMC5)
+Load ops spec. ratio PMC1/PMC0
+Store ops spec. ratio PMC2/PMC0
+Integer data ops spec. ratio PMC3/PMC0
+Scalar FP ops spec. ratio PMC4/PMC0
+Vector FP ops spec. ratio PMC5/PMC0
+Other ops spec. ratio (PMC0-PMC1-PMC2-PMC3-PMC4-PMC5)/PMC0
+
+
+
+
+LONG
+Formulas:
+Load ops spec. ratio = LD_SPEC / INST_SPEC
+Store ops spec. ratio = ST_SPEC / INST_SPEC
+Integer data ops spec. ratio = DP_SPEC / INST_SPEC
+Scalar FP ops spec. ratio = VFP_SPEC / INST_SPEC
+Vector FP ops spec. ratio = ASE_SPEC / INST_SPEC
+Other ops spec. ratio = (INST_SPEC-LD_SPEC-ST_SPEC-DP_SPEC-VFP_SPEC-ASE_SPEC) / INST_SPEC
+-
+This group gives information about the speculative execution of micro-ops.
+It is currently unclear why Other ops spec. executed and ratio is negative
+in some cases. Although the documentation contains an OP_RETIRED, there is no
+equivalent OP_SPEC which could be a better reference in this group instead of
+INST_SPEC.
--- a/collectors/likwid/groups/arm8_tx2/TLB_DATA.txt
+++ b/collectors/likwid/groups/arm8_tx2/TLB_DATA.txt
@@ -0,0 +1,27 @@
+SHORT  L1 data TLB miss rate/ratio
+
+EVENTSET
+PMC0  INST_RETIRED
+PMC1  CPU_CYCLES
+PMC2  L1D_TLB_REFILL_RD
+PMC3  L1D_TLB_REFILL_WR
+
+METRICS
+Runtime (RDTSC) [s] time
+Clock [MHz] 1.E-06*PMC1/time
+CPI  PMC1/PMC0
+L1 DTLB load misses     PMC2
+L1 DTLB load miss rate  PMC2/PMC0
+L1 DTLB store misses     PMC3
+L1 DTLB store miss rate  PMC3/PMC0
+
+LONG
+Formulas:
+L1 DTLB load misses = L1D_TLB_REFILL_RD
+L1 DTLB load miss rate = L1D_TLB_REFILL_RD / INST_RETIRED
+L1 DTLB store misses = L1D_TLB_REFILL_WR
+L1 DTLB store miss rate = L1D_TLB_REFILL_WR / INST_RETIRED
+-
+The DTLB load and store miss rates gives a measure how often a TLB miss occurred
+per instruction.
+
--- a/collectors/likwid/groups/arm8_tx2/TLB_INSTR.txt
+++ b/collectors/likwid/groups/arm8_tx2/TLB_INSTR.txt
@@ -0,0 +1,23 @@
+SHORT  L1 Instruction TLB miss rate/ratio
+
+EVENTSET
+PMC0  INST_RETIRED
+PMC1  CPU_CYCLES
+PMC2  L1I_TLB_REFILL
+
+METRICS
+Runtime (RDTSC) [s] time
+Clock [MHz] 1.E-06*PMC1/time
+CPI  PMC1/PMC0
+L1 ITLB misses     PMC2
+L1 ITLB miss rate  PMC2/PMC0
+
+
+LONG
+Formulas:
+L1 ITLB misses = L1I_TLB_REFILL
+L1 ITLB miss rate = L1I_TLB_REFILL / INST_RETIRED
+-
+The ITLB miss rates gives a measure how often a TLB miss occurred
+per instruction.
+