mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-07-31 08:56:06 +02:00
Add likwid collector
This commit is contained in:
30
collectors/likwid/groups/arm64fx/BRANCH.txt
Normal file
30
collectors/likwid/groups/arm64fx/BRANCH.txt
Normal file
@@ -0,0 +1,30 @@
|
||||
SHORT Branch prediction miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 BR_PRED
|
||||
PMC3 BR_MIS_PRED
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC1/PMC0
|
||||
Branch rate PMC2/PMC0
|
||||
Branch misprediction rate PMC3/PMC0
|
||||
Branch misprediction ratio PMC3/(PMC2+PMC3)
|
||||
Instructions per branch PMC0/(PMC2+PMC3)
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = CPU_CYCLES/INST_RETIRED
|
||||
Branch rate = BR_PRED/INST_RETIRED
|
||||
Branch misprediction rate = BR_MIS_PRED/INST_RETIRED
|
||||
Branch misprediction ratio = BR_MIS_PRED/(BR_PRED+BR_MIS_PRED)
|
||||
Instructions per branch = INSTR_RETIRED_ANY/(BR_PRED+BR_MIS_PRED)
|
||||
-
|
||||
The rates state how often in average a branch or a mispredicted branch occured
|
||||
per instruction retired in total. The Branch misprediction ratio sets directly
|
||||
into relation what ratio of all branch instruction where mispredicted.
|
||||
Instructions per branch is 1/Branch rate.
|
||||
|
24
collectors/likwid/groups/arm64fx/DATA.txt
Normal file
24
collectors/likwid/groups/arm64fx/DATA.txt
Normal file
@@ -0,0 +1,24 @@
|
||||
SHORT Load to store ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_SPEC
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 LD_SPEC
|
||||
PMC3 ST_SPEC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC1/PMC0
|
||||
Load to store ratio PMC2/PMC3
|
||||
Load ratio PMC2/PMC0
|
||||
Store ratio PMC3/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = CPU_CYCLES/INST_SPEC
|
||||
Load to store ratio = LD_SPEC / ST_SPEC
|
||||
Load ratio = LD_SPEC / INST_SPEC
|
||||
Store ratio = ST_SPEC / INST_SPEC
|
||||
-
|
||||
This is a metric to determine your load to store ratio.
|
||||
|
26
collectors/likwid/groups/arm64fx/FLOPS_DP.txt
Normal file
26
collectors/likwid/groups/arm64fx/FLOPS_DP.txt
Normal file
@@ -0,0 +1,26 @@
|
||||
SHORT Double Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC3 FP_DP_FIXED_OPS_SPEC
|
||||
PMC4 FP_DP_SCALE_OPS_SPEC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
DP (FP) [MFLOP/s] 1E-06*(PMC3)/time
|
||||
DP (FP+SVE128) [MFLOP/s] 1E-06*(((PMC4*128)/128)+PMC3)/time
|
||||
DP (FP+SVE256) [MFLOP/s] 1E-06*(((PMC4*256)/128)+PMC3)/time
|
||||
DP (FP+SVE512) [MFLOP/s] 1E-06*(((PMC4*512)/128)+PMC3)/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
DP (FP) [MFLOP/s] = 1E-06*FP_DP_FIXED_OPS_SPEC/time
|
||||
DP (FP+SVE128) [MFLOP/s] = 1.0E-06*(FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*128)/128))/time
|
||||
DP (FP+SVE256) [MFLOP/s] = 1.0E-06*(FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*256)/128))/time
|
||||
DP (FP+SVE512) [MFLOP/s] = 1.0E-06*(FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*512)/128))/time
|
||||
-
|
||||
Double-precision FP rate for scalar and SVE vector operations with different widths. The events for
|
||||
the SVE metrics assumes that all vector elements are active.
|
26
collectors/likwid/groups/arm64fx/FLOPS_HP.txt
Normal file
26
collectors/likwid/groups/arm64fx/FLOPS_HP.txt
Normal file
@@ -0,0 +1,26 @@
|
||||
SHORT Half-Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC3 FP_HP_FIXED_OPS_SPEC
|
||||
PMC4 FP_HP_SCALE_OPS_SPEC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
HP (FP) [MFLOP/s] 1E-06*(PMC3)/time
|
||||
HP (FP+SVE128) [MFLOP/s] 1E-06*(((PMC4*128)/128)+PMC3)/time
|
||||
HP (FP+SVE256) [MFLOP/s] 1E-06*(((PMC4*256)/128)+PMC3)/time
|
||||
HP (FP+SVE512) [MFLOP/s] 1E-06*(((PMC4*512)/128)+PMC3)/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
HP (FP) [MFLOP/s] = 1E-06*FP_HP_FIXED_OPS_SPEC/time
|
||||
HP (FP+SVE128) [MFLOP/s] = 1.0E-06*(FP_HP_FIXED_OPS_SPEC+((FP_HP_SCALE_OPS_SPEC*128)/128))/time
|
||||
HP (FP+SVE256) [MFLOP/s] = 1.0E-06*(FP_HP_FIXED_OPS_SPEC+((FP_HP_SCALE_OPS_SPEC*256)/128))/time
|
||||
HP (FP+SVE512) [MFLOP/s] = 1.0E-06*(FP_HP_FIXED_OPS_SPEC+((FP_HP_SCALE_OPS_SPEC*512)/128))/time
|
||||
-
|
||||
Half-precision FP rate for scalar and SVE vector operations with different widths. The events for
|
||||
the SVE metrics assumes that all vector elements are active.
|
26
collectors/likwid/groups/arm64fx/FLOPS_SP.txt
Normal file
26
collectors/likwid/groups/arm64fx/FLOPS_SP.txt
Normal file
@@ -0,0 +1,26 @@
|
||||
SHORT Single Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC3 FP_SP_FIXED_OPS_SPEC
|
||||
PMC4 FP_SP_SCALE_OPS_SPEC
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Clock [MHz] 1.E-06*PMC1/time
|
||||
CPI PMC1/PMC0
|
||||
SP (FP) [MFLOP/s] 1E-06*(PMC3)/time
|
||||
SP (FP+SVE128) [MFLOP/s] 1E-06*(((PMC4*128)/128)+PMC3)/time
|
||||
SP (FP+SVE256) [MFLOP/s] 1E-06*(((PMC4*256)/128)+PMC3)/time
|
||||
SP (FP+SVE512) [MFLOP/s] 1E-06*(((PMC4*512)/128)+PMC3)/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
SP (FP) [MFLOP/s] = 1E-06*FP_SP_FIXED_OPS_SPEC/time
|
||||
SP (FP+SVE128) [MFLOP/s] = 1.0E-06*(FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*128)/128))/time
|
||||
SP (FP+SVE256) [MFLOP/s] = 1.0E-06*(FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*256)/128))/time
|
||||
SP (FP+SVE512) [MFLOP/s] = 1.0E-06*(FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*512)/128))/time
|
||||
-
|
||||
Single-precision FP rate for scalar and SVE vector operations with different widths. The events for
|
||||
the SVE metrics assumes that all vector elements are active.
|
33
collectors/likwid/groups/arm64fx/FP_PIPE.txt
Normal file
33
collectors/likwid/groups/arm64fx/FP_PIPE.txt
Normal file
@@ -0,0 +1,33 @@
|
||||
SHORT Utilization of FP pipelines
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 FLA_VAL
|
||||
PMC3 FLA_VAL_PRD_CNT
|
||||
PMC4 FLB_VAL
|
||||
PMC5 FLB_VAL_PRD_CNT
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC1/PMC0
|
||||
FP operation pipeline A busy rate [%] (PMC2/PMC1)*100.0
|
||||
FP pipeline A active element rate [%] (PMC3/(PMC2*16))*100.0
|
||||
FP operation pipeline B busy rate [%] (PMC4/PMC1)*100.0
|
||||
FP pipeline B active element rate [%] (PMC5/(PMC4*16))*100.0
|
||||
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = CPU_CYCLES/INST_SPEC
|
||||
FP operation pipeline A busy rate [%] = (FLA_VAL/CPU_CYCLES)*100.0
|
||||
FP pipeline A active element rate [%] = (FLA_VAL_PRD_CNT/(FLA_VAL*16))*100.0
|
||||
FP operation pipeline B busy rate [%] = (FLB_VAL/CPU_CYCLES)*100.0
|
||||
FP pipeline B active element rate [%] = (FLB_VAL_PRD_CNT/(FLB_VAL*16))*100.0
|
||||
-
|
||||
FLx_VAL: This event counts valid cycles of FLx pipeline.
|
||||
FLx_VAL_PRD_CNT: This event counts the number of 1's in the predicate bits of
|
||||
request in FLA pipeline, where it is corrected so that it
|
||||
becomes 16 when all bits are 1.
|
||||
So each predicate mask has 16 slots, so there are 16 slots per cycle in FLA and
|
||||
FLB. FLA is partly used by other instructions like SVE stores.
|
24
collectors/likwid/groups/arm64fx/ICACHE.txt
Normal file
24
collectors/likwid/groups/arm64fx/ICACHE.txt
Normal file
@@ -0,0 +1,24 @@
|
||||
SHORT Instruction cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 L1I_CACHE
|
||||
PMC3 L1I_CACHE_REFILL
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC1/PMC0
|
||||
L1I request rate PMC2/PMC0
|
||||
L1I miss rate PMC3/PMC0
|
||||
L1I miss ratio PMC3/PMC2
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = CPU_CYCLES/INST_RETIRED
|
||||
L1I request rate = L1I_CACHE / INST_RETIRED
|
||||
L1I miss rate = L1I_CACHE_REFILL / INST_RETIRED
|
||||
L1I miss ratio = L1I_CACHE_REFILL / L1I_CACHE
|
||||
-
|
||||
This group measures some L1 instruction cache metrics.
|
40
collectors/likwid/groups/arm64fx/L2.txt
Normal file
40
collectors/likwid/groups/arm64fx/L2.txt
Normal file
@@ -0,0 +1,40 @@
|
||||
SHORT L2 cache bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 L1D_CACHE_REFILL
|
||||
PMC3 L1D_CACHE_WB
|
||||
PMC4 L1I_CACHE_REFILL
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC1/PMC0
|
||||
L1D<-L2 load bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
|
||||
L1D<-L2 load data volume [GBytes] 1.0E-09*(PMC2)*256.0
|
||||
L1D->L2 evict bandwidth [MBytes/s] 1.0E-06*PMC3*256.0/time
|
||||
L1D->L2 evict data volume [GBytes] 1.0E-09*PMC3*256.0
|
||||
L1I<-L2 load bandwidth [MBytes/s] 1.0E-06*PMC4*256.0/time
|
||||
L1I<-L2 load data volume [GBytes] 1.0E-09*PMC4*256.0
|
||||
L1<->L2 bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3+PMC4)*256.0/time
|
||||
L1<->L2 data volume [GBytes] 1.0E-09*(PMC2+PMC3+PMC4)*256.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = CPU_CYCLES/INST_RETIRED
|
||||
L1D<-L2 load bandwidth [MBytes/s] = 1.0E-06*L1D_CACHE_REFILL*256.0/time
|
||||
L1D<-L2 load data volume [GBytes] = 1.0E-09*L1D_CACHE_REFILL*256.0
|
||||
L1D->L2 evict bandwidth [MBytes/s] = 1.0E-06*L1D_CACHE_WB*256.0/time
|
||||
L1D->L2 evict data volume [GBytes] = 1.0E-09*L1D_CACHE_WB*256.0
|
||||
L1I<-L2 load bandwidth [MBytes/s] = 1.0E-06*L1I_CACHE_REFILL*256.0/time
|
||||
L1I<-L2 load data volume [GBytes] = 1.0E-09*L1I_CACHE_REFILL*256.0
|
||||
L1<->L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_CACHE_REFILL+L1D_CACHE_WB+L1I_CACHE_REFILL)*256.0/time
|
||||
L1<->L2 data volume [GBytes] = 1.0E-09*(L1D_CACHE_REFILL+L1D_CACHE_WB+L1I_CACHE_REFILL)*256.0
|
||||
-
|
||||
Profiling group to measure L2 cache bandwidth. The bandwidth is computed by the
|
||||
number of cacheline loaded from the L2 to the L1 data cache and the writebacks from
|
||||
the L1 data cache to the L2 cache. The group also outputs total data volume transfered between
|
||||
L2 and L1. Note that this bandwidth also includes data transfers due to a write
|
||||
allocate load on a store miss in L1 and cachelines transfered in the L1 instruction
|
||||
cache.
|
29
collectors/likwid/groups/arm64fx/MEM.txt
Normal file
29
collectors/likwid/groups/arm64fx/MEM.txt
Normal file
@@ -0,0 +1,29 @@
|
||||
SHORT Main memory bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 BUS_READ_TOTAL_MEM
|
||||
PMC3 BUS_WRITE_TOTAL_MEM
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC1/PMC0
|
||||
Memory read bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
|
||||
Memory read data volume [GBytes] 1.0E-09*(PMC2)*256.0
|
||||
Memory write bandwidth [MBytes/s] 1.0E-06*(PMC3)*256.0/time
|
||||
Memory write data volume [GBytes] 1.0E-09*(PMC3)*256.0
|
||||
Memory bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*256.0/time
|
||||
Memory data volume [GBytes] 1.0E-09*(PMC2+PMC3)*256.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Memory read bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM)*256.0/runtime
|
||||
Memory read data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM)*256.0
|
||||
Memory write bandwidth [MBytes/s] = 1.0E-06*(BUS_WRITE_TOTAL_MEM)*256.0/runtime
|
||||
Memory write data volume [GBytes] = 1.0E-09*(BUS_WRITE_TOTAL_MEM)*256.0
|
||||
Memory bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0/runtime
|
||||
Memory data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0
|
||||
-
|
||||
Profiling group to measure memory bandwidth. The cache line size is 256 Byte.
|
50
collectors/likwid/groups/arm64fx/MEM_DP.txt
Normal file
50
collectors/likwid/groups/arm64fx/MEM_DP.txt
Normal file
@@ -0,0 +1,50 @@
|
||||
SHORT Overview of arithmetic and main memory performance
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 BUS_READ_TOTAL_MEM
|
||||
PMC3 BUS_WRITE_TOTAL_MEM
|
||||
PMC4 FP_DP_FIXED_OPS_SPEC
|
||||
PMC5 FP_DP_SCALE_OPS_SPEC
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC1/PMC0
|
||||
DP (FP) [MFLOP/s] 1E-06*(PMC4)/time
|
||||
DP (FP+SVE128) [MFLOP/s] 1E-06*(((PMC5*128.0)/128.0)+PMC4)/time
|
||||
DP (FP+SVE256) [MFLOP/s] 1E-06*(((PMC5*256.0)/128.0)+PMC4)/time
|
||||
DP (FP+SVE512) [MFLOP/s] 1E-06*(((PMC5*512.0)/128.0)+PMC4)/time
|
||||
Memory read bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
|
||||
Memory read data volume [GBytes] 1.0E-09*(PMC2)*256.0
|
||||
Memory write bandwidth [MBytes/s] 1.0E-06*(PMC3)*256.0/time
|
||||
Memory write data volume [GBytes] 1.0E-09*(PMC3)*256.0
|
||||
Memory bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*256.0/time
|
||||
Memory data volume [GBytes] 1.0E-09*(PMC2+PMC3)*256.0
|
||||
Operational intensity (FP) PMC4/((PMC2+PMC3)*256.0)
|
||||
Operational intensity (FP+SVE128) (((PMC5*128.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
|
||||
Operational intensity (FP+SVE256) (((PMC5*256.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
|
||||
Operational intensity (FP+SVE512) (((PMC5*512.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
|
||||
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
DP (FP) [MFLOP/s] = 1E-06*FP_DP_FIXED_OPS_SPEC/time
|
||||
DP (FP+SVE128) [MFLOP/s] = 1.0E-06*(FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*128)/128))/time
|
||||
DP (FP+SVE256) [MFLOP/s] = 1.0E-06*(FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*256)/128))/time
|
||||
DP (FP+SVE512) [MFLOP/s] = 1.0E-06*(FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*512)/128))/time
|
||||
Memory read bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM)*256.0/runtime
|
||||
Memory read data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM)*256.0
|
||||
Memory write bandwidth [MBytes/s] = 1.0E-06*(BUS_WRITE_TOTAL_MEM)*256.0/runtime
|
||||
Memory write data volume [GBytes] = 1.0E-09*(BUS_WRITE_TOTAL_MEM)*256.0
|
||||
Memory bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0/runtime
|
||||
Memory data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0
|
||||
Operational intensity (FP) = FP_DP_FIXED_OPS_SPEC/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
|
||||
Operational intensity (FP+SVE128) = (FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*128)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
|
||||
Operational intensity (FP+SVE256) = (FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*256)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
|
||||
Operational intensity (FP+SVE512) = (FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*512)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
|
||||
-
|
||||
Profiling group to measure memory bandwidth and double-precision FP rate for scalar and SVE vector
|
||||
operations with different widths. The events for the SVE metrics assumes that all vector elements
|
||||
are active. The cache line size is 256 Byte.
|
50
collectors/likwid/groups/arm64fx/MEM_HP.txt
Normal file
50
collectors/likwid/groups/arm64fx/MEM_HP.txt
Normal file
@@ -0,0 +1,50 @@
|
||||
SHORT Overview of arithmetic and main memory performance
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 BUS_READ_TOTAL_MEM
|
||||
PMC3 BUS_WRITE_TOTAL_MEM
|
||||
PMC4 FP_HP_FIXED_OPS_HPEC
|
||||
PMC5 FP_HP_SCALE_OPS_HPEC
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC1/PMC0
|
||||
HP (FP) [MFLOP/s] 1E-06*(PMC4)/time
|
||||
HP (FP+SVE128) [MFLOP/s] 1E-06*(((PMC5*128.0)/128.0)+PMC4)/time
|
||||
HP (FP+SVE256) [MFLOP/s] 1E-06*(((PMC5*256.0)/128.0)+PMC4)/time
|
||||
HP (FP+SVE512) [MFLOP/s] 1E-06*(((PMC5*512.0)/128.0)+PMC4)/time
|
||||
Memory read bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
|
||||
Memory read data volume [GBytes] 1.0E-09*(PMC2)*256.0
|
||||
Memory write bandwidth [MBytes/s] 1.0E-06*(PMC3)*256.0/time
|
||||
Memory write data volume [GBytes] 1.0E-09*(PMC3)*256.0
|
||||
Memory bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*256.0/time
|
||||
Memory data volume [GBytes] 1.0E-09*(PMC2+PMC3)*256.0
|
||||
Operational intensity (FP) PMC4/((PMC2+PMC3)*256.0)
|
||||
Operational intensity (FP+SVE128) (((PMC5*128.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
|
||||
Operational intensity (FP+SVE256) (((PMC5*256.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
|
||||
Operational intensity (FP+SVE512) (((PMC5*512.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
|
||||
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
HP (FP) [MFLOP/s] = 1E-06*FP_HP_FIXED_OPS_HPEC/time
|
||||
HP (FP+SVE128) [MFLOP/s] = 1.0E-06*(FP_HP_FIXED_OPS_HPEC+((FP_HP_SCALE_OPS_HPEC*128)/128))/time
|
||||
HP (FP+SVE256) [MFLOP/s] = 1.0E-06*(FP_HP_FIXED_OPS_HPEC+((FP_HP_SCALE_OPS_HPEC*256)/128))/time
|
||||
HP (FP+SVE512) [MFLOP/s] = 1.0E-06*(FP_HP_FIXED_OPS_HPEC+((FP_HP_SCALE_OPS_HPEC*512)/128))/time
|
||||
Memory read bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM)*256.0/runtime
|
||||
Memory read data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM)*256.0
|
||||
Memory write bandwidth [MBytes/s] = 1.0E-06*(BUS_WRITE_TOTAL_MEM)*256.0/runtime
|
||||
Memory write data volume [GBytes] = 1.0E-09*(BUS_WRITE_TOTAL_MEM)*256.0
|
||||
Memory bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0/runtime
|
||||
Memory data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0
|
||||
Operational intensity (FP) = FP_HP_FIXED_OPS_HPEC/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
|
||||
Operational intensity (FP+SVE128) = (FP_HP_FIXED_OPS_HPEC+((FP_HP_SCALE_OPS_HPEC*128)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
|
||||
Operational intensity (FP+SVE256) = (FP_HP_FIXED_OPS_HPEC+((FP_HP_SCALE_OPS_HPEC*256)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
|
||||
Operational intensity (FP+SVE512) = (FP_HP_FIXED_OPS_HPEC+((FP_HP_SCALE_OPS_HPEC*512)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
|
||||
-
|
||||
Profiling group to measure memory bandwidth and half-precision FP rate for scalar and SVE vector
|
||||
operations with different widths. The events for the SVE metrics assumes that all vector elements
|
||||
are active. The cache line size is 256 Byte.
|
50
collectors/likwid/groups/arm64fx/MEM_SP.txt
Normal file
50
collectors/likwid/groups/arm64fx/MEM_SP.txt
Normal file
@@ -0,0 +1,50 @@
|
||||
SHORT Overview of arithmetic and main memory performance
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 BUS_READ_TOTAL_MEM
|
||||
PMC3 BUS_WRITE_TOTAL_MEM
|
||||
PMC4 FP_SP_FIXED_OPS_SPEC
|
||||
PMC5 FP_SP_SCALE_OPS_SPEC
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC1/PMC0
|
||||
SP (FP) [MFLOP/s] 1E-06*(PMC4)/time
|
||||
SP (FP+SVE128) [MFLOP/s] 1E-06*(((PMC5*128.0)/128.0)+PMC4)/time
|
||||
SP (FP+SVE256) [MFLOP/s] 1E-06*(((PMC5*256.0)/128.0)+PMC4)/time
|
||||
SP (FP+SVE512) [MFLOP/s] 1E-06*(((PMC5*512.0)/128.0)+PMC4)/time
|
||||
Memory read bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
|
||||
Memory read data volume [GBytes] 1.0E-09*(PMC2)*256.0
|
||||
Memory write bandwidth [MBytes/s] 1.0E-06*(PMC3)*256.0/time
|
||||
Memory write data volume [GBytes] 1.0E-09*(PMC3)*256.0
|
||||
Memory bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*256.0/time
|
||||
Memory data volume [GBytes] 1.0E-09*(PMC2+PMC3)*256.0
|
||||
Operational intensity (FP) PMC4/((PMC2+PMC3)*256.0)
|
||||
Operational intensity (FP+SVE128) (((PMC5*128.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
|
||||
Operational intensity (FP+SVE256) (((PMC5*256.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
|
||||
Operational intensity (FP+SVE512) (((PMC5*512.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
|
||||
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
SP (FP) [MFLOP/s] = 1E-06*FP_SP_FIXED_OPS_SPEC/time
|
||||
SP (FP+SVE128) [MFLOP/s] = 1.0E-06*(FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*128)/128))/time
|
||||
SP (FP+SVE256) [MFLOP/s] = 1.0E-06*(FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*256)/128))/time
|
||||
SP (FP+SVE512) [MFLOP/s] = 1.0E-06*(FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*512)/128))/time
|
||||
Memory read bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM)*256.0/runtime
|
||||
Memory read data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM)*256.0
|
||||
Memory write bandwidth [MBytes/s] = 1.0E-06*(BUS_WRITE_TOTAL_MEM)*256.0/runtime
|
||||
Memory write data volume [GBytes] = 1.0E-09*(BUS_WRITE_TOTAL_MEM)*256.0
|
||||
Memory bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0/runtime
|
||||
Memory data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0
|
||||
Operational intensity (FP) = FP_SP_FIXED_OPS_SPEC/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
|
||||
Operational intensity (FP+SVE128) = (FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*128)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
|
||||
Operational intensity (FP+SVE256) = (FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*256)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
|
||||
Operational intensity (FP+SVE512) = (FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*512)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
|
||||
-
|
||||
Profiling group to measure memory bandwidth and single-precision FP rate for scalar and SVE vector
|
||||
operations with different widths. The events for the SVE metrics assumes that all vector elements
|
||||
are active. The cache line size is 256 Byte.
|
29
collectors/likwid/groups/arm64fx/PCI.txt
Normal file
29
collectors/likwid/groups/arm64fx/PCI.txt
Normal file
@@ -0,0 +1,29 @@
|
||||
SHORT PCI bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 BUS_READ_TOTAL_PCI
|
||||
PMC3 BUS_WRITE_TOTAL_PCI
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC1/PMC0
|
||||
PCI read bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
|
||||
PCI read data volume [GBytes] 1.0E-09*(PMC2)*256.0
|
||||
PCI write bandwidth [MBytes/s] 1.0E-06*(PMC3)*256.0/time
|
||||
PCI write data volume [GBytes] 1.0E-09*(PMC3)*256.0
|
||||
PCI bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*256.0/time
|
||||
PCI data volume [GBytes] 1.0E-09*(PMC2+PMC3)*256.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
PCI read bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_PCI)*256.0/runtime
|
||||
PCI read data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_PCI)*256.0
|
||||
PCI write bandwidth [MBytes/s] = 1.0E-06*(BUS_WRITE_TOTAL_PCI)*256.0/runtime
|
||||
PCI write data volume [GBytes] = 1.0E-09*(BUS_WRITE_TOTAL_PCI)*256.0
|
||||
PCI bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_PCI+BUS_WRITE_TOTAL_PCI)*256.0/runtime
|
||||
PCI data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_PCI+BUS_WRITE_TOTAL_PCI)*256.0
|
||||
-
|
||||
Profiling group to measure PCI bandwidth. The cache line size is 256 Byte.
|
29
collectors/likwid/groups/arm64fx/TOFU.txt
Normal file
29
collectors/likwid/groups/arm64fx/TOFU.txt
Normal file
@@ -0,0 +1,29 @@
|
||||
SHORT TOFU bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 INST_RETIRED
|
||||
PMC1 CPU_CYCLES
|
||||
PMC2 BUS_READ_TOTAL_TOFU
|
||||
PMC3 BUS_WRITE_TOTAL_TOFU
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
CPI PMC1/PMC0
|
||||
TOFU read bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
|
||||
TOFU read data volume [GBytes] 1.0E-09*(PMC2)*256.0
|
||||
TOFU write bandwidth [MBytes/s] 1.0E-06*(PMC3)*256.0/time
|
||||
TOFU write data volume [GBytes] 1.0E-09*(PMC3)*256.0
|
||||
TOFU bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*256.0/time
|
||||
TOFU data volume [GBytes] 1.0E-09*(PMC2+PMC3)*256.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
TOFU read bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_TOFU)*256.0/runtime
|
||||
TOFU read data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_TOFU)*256.0
|
||||
TOFU write bandwidth [MBytes/s] = 1.0E-06*(BUS_WRITE_TOTAL_TOFU)*256.0/runtime
|
||||
TOFU write data volume [GBytes] = 1.0E-09*(BUS_WRITE_TOTAL_TOFU)*256.0
|
||||
TOFU bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_TOFU+BUS_WRITE_TOTAL_TOFU)*256.0/runtime
|
||||
TOFU data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_TOFU+BUS_WRITE_TOTAL_TOFU)*256.0
|
||||
-
|
||||
Profiling group to measure TOFU bandwidth. The cache line size is 256 Byte.
|
Reference in New Issue
Block a user