Add likwid collector

This commit is contained in:
Thomas Roehl
2021-03-25 14:47:10 +01:00
parent 4fddcb9741
commit a6ac0c5373
670 changed files with 24926 additions and 0 deletions

View File

@@ -0,0 +1,30 @@
SHORT Branch prediction miss rate/ratio
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 BR_PRED
PMC3 BR_MIS_PRED
METRICS
Runtime (RDTSC) [s] time
CPI PMC1/PMC0
Branch rate PMC2/PMC0
Branch misprediction rate PMC3/PMC0
Branch misprediction ratio PMC3/(PMC2+PMC3)
Instructions per branch PMC0/(PMC2+PMC3)
LONG
Formulas:
CPI = CPU_CYCLES/INST_RETIRED
Branch rate = BR_PRED/INST_RETIRED
Branch misprediction rate = BR_MIS_PRED/INST_RETIRED
Branch misprediction ratio = BR_MIS_PRED/(BR_PRED+BR_MIS_PRED)
Instructions per branch = INSTR_RETIRED_ANY/(BR_PRED+BR_MIS_PRED)
-
The rates state how often in average a branch or a mispredicted branch occured
per instruction retired in total. The Branch misprediction ratio sets directly
into relation what ratio of all branch instruction where mispredicted.
Instructions per branch is 1/Branch rate.

View File

@@ -0,0 +1,24 @@
SHORT Load to store ratio
EVENTSET
PMC0 INST_SPEC
PMC1 CPU_CYCLES
PMC2 LD_SPEC
PMC3 ST_SPEC
METRICS
Runtime (RDTSC) [s] time
CPI PMC1/PMC0
Load to store ratio PMC2/PMC3
Load ratio PMC2/PMC0
Store ratio PMC3/PMC0
LONG
Formulas:
CPI = CPU_CYCLES/INST_SPEC
Load to store ratio = LD_SPEC / ST_SPEC
Load ratio = LD_SPEC / INST_SPEC
Store ratio = ST_SPEC / INST_SPEC
-
This is a metric to determine your load to store ratio.

View File

@@ -0,0 +1,26 @@
SHORT Double Precision MFLOP/s
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC3 FP_DP_FIXED_OPS_SPEC
PMC4 FP_DP_SCALE_OPS_SPEC
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
DP (FP) [MFLOP/s] 1E-06*(PMC3)/time
DP (FP+SVE128) [MFLOP/s] 1E-06*(((PMC4*128)/128)+PMC3)/time
DP (FP+SVE256) [MFLOP/s] 1E-06*(((PMC4*256)/128)+PMC3)/time
DP (FP+SVE512) [MFLOP/s] 1E-06*(((PMC4*512)/128)+PMC3)/time
LONG
Formulas:
DP (FP) [MFLOP/s] = 1E-06*FP_DP_FIXED_OPS_SPEC/time
DP (FP+SVE128) [MFLOP/s] = 1.0E-06*(FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*128)/128))/time
DP (FP+SVE256) [MFLOP/s] = 1.0E-06*(FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*256)/128))/time
DP (FP+SVE512) [MFLOP/s] = 1.0E-06*(FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*512)/128))/time
-
Double-precision FP rate for scalar and SVE vector operations with different widths. The events for
the SVE metrics assumes that all vector elements are active.

View File

@@ -0,0 +1,26 @@
SHORT Half-Precision MFLOP/s
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC3 FP_HP_FIXED_OPS_SPEC
PMC4 FP_HP_SCALE_OPS_SPEC
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
HP (FP) [MFLOP/s] 1E-06*(PMC3)/time
HP (FP+SVE128) [MFLOP/s] 1E-06*(((PMC4*128)/128)+PMC3)/time
HP (FP+SVE256) [MFLOP/s] 1E-06*(((PMC4*256)/128)+PMC3)/time
HP (FP+SVE512) [MFLOP/s] 1E-06*(((PMC4*512)/128)+PMC3)/time
LONG
Formulas:
HP (FP) [MFLOP/s] = 1E-06*FP_HP_FIXED_OPS_SPEC/time
HP (FP+SVE128) [MFLOP/s] = 1.0E-06*(FP_HP_FIXED_OPS_SPEC+((FP_HP_SCALE_OPS_SPEC*128)/128))/time
HP (FP+SVE256) [MFLOP/s] = 1.0E-06*(FP_HP_FIXED_OPS_SPEC+((FP_HP_SCALE_OPS_SPEC*256)/128))/time
HP (FP+SVE512) [MFLOP/s] = 1.0E-06*(FP_HP_FIXED_OPS_SPEC+((FP_HP_SCALE_OPS_SPEC*512)/128))/time
-
Half-precision FP rate for scalar and SVE vector operations with different widths. The events for
the SVE metrics assumes that all vector elements are active.

View File

@@ -0,0 +1,26 @@
SHORT Single Precision MFLOP/s
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC3 FP_SP_FIXED_OPS_SPEC
PMC4 FP_SP_SCALE_OPS_SPEC
METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI PMC1/PMC0
SP (FP) [MFLOP/s] 1E-06*(PMC3)/time
SP (FP+SVE128) [MFLOP/s] 1E-06*(((PMC4*128)/128)+PMC3)/time
SP (FP+SVE256) [MFLOP/s] 1E-06*(((PMC4*256)/128)+PMC3)/time
SP (FP+SVE512) [MFLOP/s] 1E-06*(((PMC4*512)/128)+PMC3)/time
LONG
Formulas:
SP (FP) [MFLOP/s] = 1E-06*FP_SP_FIXED_OPS_SPEC/time
SP (FP+SVE128) [MFLOP/s] = 1.0E-06*(FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*128)/128))/time
SP (FP+SVE256) [MFLOP/s] = 1.0E-06*(FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*256)/128))/time
SP (FP+SVE512) [MFLOP/s] = 1.0E-06*(FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*512)/128))/time
-
Single-precision FP rate for scalar and SVE vector operations with different widths. The events for
the SVE metrics assumes that all vector elements are active.

View File

@@ -0,0 +1,33 @@
SHORT Utilization of FP pipelines
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 FLA_VAL
PMC3 FLA_VAL_PRD_CNT
PMC4 FLB_VAL
PMC5 FLB_VAL_PRD_CNT
METRICS
Runtime (RDTSC) [s] time
CPI PMC1/PMC0
FP operation pipeline A busy rate [%] (PMC2/PMC1)*100.0
FP pipeline A active element rate [%] (PMC3/(PMC2*16))*100.0
FP operation pipeline B busy rate [%] (PMC4/PMC1)*100.0
FP pipeline B active element rate [%] (PMC5/(PMC4*16))*100.0
LONG
Formulas:
CPI = CPU_CYCLES/INST_SPEC
FP operation pipeline A busy rate [%] = (FLA_VAL/CPU_CYCLES)*100.0
FP pipeline A active element rate [%] = (FLA_VAL_PRD_CNT/(FLA_VAL*16))*100.0
FP operation pipeline B busy rate [%] = (FLB_VAL/CPU_CYCLES)*100.0
FP pipeline B active element rate [%] = (FLB_VAL_PRD_CNT/(FLB_VAL*16))*100.0
-
FLx_VAL: This event counts valid cycles of FLx pipeline.
FLx_VAL_PRD_CNT: This event counts the number of 1's in the predicate bits of
request in FLA pipeline, where it is corrected so that it
becomes 16 when all bits are 1.
So each predicate mask has 16 slots, so there are 16 slots per cycle in FLA and
FLB. FLA is partly used by other instructions like SVE stores.

View File

@@ -0,0 +1,24 @@
SHORT Instruction cache miss rate/ratio
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 L1I_CACHE
PMC3 L1I_CACHE_REFILL
METRICS
Runtime (RDTSC) [s] time
CPI PMC1/PMC0
L1I request rate PMC2/PMC0
L1I miss rate PMC3/PMC0
L1I miss ratio PMC3/PMC2
LONG
Formulas:
CPI = CPU_CYCLES/INST_RETIRED
L1I request rate = L1I_CACHE / INST_RETIRED
L1I miss rate = L1I_CACHE_REFILL / INST_RETIRED
L1I miss ratio = L1I_CACHE_REFILL / L1I_CACHE
-
This group measures some L1 instruction cache metrics.

View File

@@ -0,0 +1,40 @@
SHORT L2 cache bandwidth in MBytes/s
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 L1D_CACHE_REFILL
PMC3 L1D_CACHE_WB
PMC4 L1I_CACHE_REFILL
METRICS
Runtime (RDTSC) [s] time
CPI PMC1/PMC0
L1D<-L2 load bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
L1D<-L2 load data volume [GBytes] 1.0E-09*(PMC2)*256.0
L1D->L2 evict bandwidth [MBytes/s] 1.0E-06*PMC3*256.0/time
L1D->L2 evict data volume [GBytes] 1.0E-09*PMC3*256.0
L1I<-L2 load bandwidth [MBytes/s] 1.0E-06*PMC4*256.0/time
L1I<-L2 load data volume [GBytes] 1.0E-09*PMC4*256.0
L1<->L2 bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3+PMC4)*256.0/time
L1<->L2 data volume [GBytes] 1.0E-09*(PMC2+PMC3+PMC4)*256.0
LONG
Formulas:
CPI = CPU_CYCLES/INST_RETIRED
L1D<-L2 load bandwidth [MBytes/s] = 1.0E-06*L1D_CACHE_REFILL*256.0/time
L1D<-L2 load data volume [GBytes] = 1.0E-09*L1D_CACHE_REFILL*256.0
L1D->L2 evict bandwidth [MBytes/s] = 1.0E-06*L1D_CACHE_WB*256.0/time
L1D->L2 evict data volume [GBytes] = 1.0E-09*L1D_CACHE_WB*256.0
L1I<-L2 load bandwidth [MBytes/s] = 1.0E-06*L1I_CACHE_REFILL*256.0/time
L1I<-L2 load data volume [GBytes] = 1.0E-09*L1I_CACHE_REFILL*256.0
L1<->L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_CACHE_REFILL+L1D_CACHE_WB+L1I_CACHE_REFILL)*256.0/time
L1<->L2 data volume [GBytes] = 1.0E-09*(L1D_CACHE_REFILL+L1D_CACHE_WB+L1I_CACHE_REFILL)*256.0
-
Profiling group to measure L2 cache bandwidth. The bandwidth is computed by the
number of cacheline loaded from the L2 to the L1 data cache and the writebacks from
the L1 data cache to the L2 cache. The group also outputs total data volume transfered between
L2 and L1. Note that this bandwidth also includes data transfers due to a write
allocate load on a store miss in L1 and cachelines transfered in the L1 instruction
cache.

View File

@@ -0,0 +1,29 @@
SHORT Main memory bandwidth in MBytes/s
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 BUS_READ_TOTAL_MEM
PMC3 BUS_WRITE_TOTAL_MEM
METRICS
Runtime (RDTSC) [s] time
CPI PMC1/PMC0
Memory read bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
Memory read data volume [GBytes] 1.0E-09*(PMC2)*256.0
Memory write bandwidth [MBytes/s] 1.0E-06*(PMC3)*256.0/time
Memory write data volume [GBytes] 1.0E-09*(PMC3)*256.0
Memory bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*256.0/time
Memory data volume [GBytes] 1.0E-09*(PMC2+PMC3)*256.0
LONG
Formulas:
Memory read bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM)*256.0/runtime
Memory read data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM)*256.0
Memory write bandwidth [MBytes/s] = 1.0E-06*(BUS_WRITE_TOTAL_MEM)*256.0/runtime
Memory write data volume [GBytes] = 1.0E-09*(BUS_WRITE_TOTAL_MEM)*256.0
Memory bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0/runtime
Memory data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0
-
Profiling group to measure memory bandwidth. The cache line size is 256 Byte.

View File

@@ -0,0 +1,50 @@
SHORT Overview of arithmetic and main memory performance
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 BUS_READ_TOTAL_MEM
PMC3 BUS_WRITE_TOTAL_MEM
PMC4 FP_DP_FIXED_OPS_SPEC
PMC5 FP_DP_SCALE_OPS_SPEC
METRICS
Runtime (RDTSC) [s] time
CPI PMC1/PMC0
DP (FP) [MFLOP/s] 1E-06*(PMC4)/time
DP (FP+SVE128) [MFLOP/s] 1E-06*(((PMC5*128.0)/128.0)+PMC4)/time
DP (FP+SVE256) [MFLOP/s] 1E-06*(((PMC5*256.0)/128.0)+PMC4)/time
DP (FP+SVE512) [MFLOP/s] 1E-06*(((PMC5*512.0)/128.0)+PMC4)/time
Memory read bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
Memory read data volume [GBytes] 1.0E-09*(PMC2)*256.0
Memory write bandwidth [MBytes/s] 1.0E-06*(PMC3)*256.0/time
Memory write data volume [GBytes] 1.0E-09*(PMC3)*256.0
Memory bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*256.0/time
Memory data volume [GBytes] 1.0E-09*(PMC2+PMC3)*256.0
Operational intensity (FP) PMC4/((PMC2+PMC3)*256.0)
Operational intensity (FP+SVE128) (((PMC5*128.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
Operational intensity (FP+SVE256) (((PMC5*256.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
Operational intensity (FP+SVE512) (((PMC5*512.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
LONG
Formulas:
DP (FP) [MFLOP/s] = 1E-06*FP_DP_FIXED_OPS_SPEC/time
DP (FP+SVE128) [MFLOP/s] = 1.0E-06*(FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*128)/128))/time
DP (FP+SVE256) [MFLOP/s] = 1.0E-06*(FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*256)/128))/time
DP (FP+SVE512) [MFLOP/s] = 1.0E-06*(FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*512)/128))/time
Memory read bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM)*256.0/runtime
Memory read data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM)*256.0
Memory write bandwidth [MBytes/s] = 1.0E-06*(BUS_WRITE_TOTAL_MEM)*256.0/runtime
Memory write data volume [GBytes] = 1.0E-09*(BUS_WRITE_TOTAL_MEM)*256.0
Memory bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0/runtime
Memory data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0
Operational intensity (FP) = FP_DP_FIXED_OPS_SPEC/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
Operational intensity (FP+SVE128) = (FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*128)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
Operational intensity (FP+SVE256) = (FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*256)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
Operational intensity (FP+SVE512) = (FP_DP_FIXED_OPS_SPEC+((FP_DP_SCALE_OPS_SPEC*512)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
-
Profiling group to measure memory bandwidth and double-precision FP rate for scalar and SVE vector
operations with different widths. The events for the SVE metrics assumes that all vector elements
are active. The cache line size is 256 Byte.

View File

@@ -0,0 +1,50 @@
SHORT Overview of arithmetic and main memory performance
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 BUS_READ_TOTAL_MEM
PMC3 BUS_WRITE_TOTAL_MEM
PMC4 FP_HP_FIXED_OPS_HPEC
PMC5 FP_HP_SCALE_OPS_HPEC
METRICS
Runtime (RDTSC) [s] time
CPI PMC1/PMC0
HP (FP) [MFLOP/s] 1E-06*(PMC4)/time
HP (FP+SVE128) [MFLOP/s] 1E-06*(((PMC5*128.0)/128.0)+PMC4)/time
HP (FP+SVE256) [MFLOP/s] 1E-06*(((PMC5*256.0)/128.0)+PMC4)/time
HP (FP+SVE512) [MFLOP/s] 1E-06*(((PMC5*512.0)/128.0)+PMC4)/time
Memory read bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
Memory read data volume [GBytes] 1.0E-09*(PMC2)*256.0
Memory write bandwidth [MBytes/s] 1.0E-06*(PMC3)*256.0/time
Memory write data volume [GBytes] 1.0E-09*(PMC3)*256.0
Memory bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*256.0/time
Memory data volume [GBytes] 1.0E-09*(PMC2+PMC3)*256.0
Operational intensity (FP) PMC4/((PMC2+PMC3)*256.0)
Operational intensity (FP+SVE128) (((PMC5*128.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
Operational intensity (FP+SVE256) (((PMC5*256.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
Operational intensity (FP+SVE512) (((PMC5*512.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
LONG
Formulas:
HP (FP) [MFLOP/s] = 1E-06*FP_HP_FIXED_OPS_HPEC/time
HP (FP+SVE128) [MFLOP/s] = 1.0E-06*(FP_HP_FIXED_OPS_HPEC+((FP_HP_SCALE_OPS_HPEC*128)/128))/time
HP (FP+SVE256) [MFLOP/s] = 1.0E-06*(FP_HP_FIXED_OPS_HPEC+((FP_HP_SCALE_OPS_HPEC*256)/128))/time
HP (FP+SVE512) [MFLOP/s] = 1.0E-06*(FP_HP_FIXED_OPS_HPEC+((FP_HP_SCALE_OPS_HPEC*512)/128))/time
Memory read bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM)*256.0/runtime
Memory read data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM)*256.0
Memory write bandwidth [MBytes/s] = 1.0E-06*(BUS_WRITE_TOTAL_MEM)*256.0/runtime
Memory write data volume [GBytes] = 1.0E-09*(BUS_WRITE_TOTAL_MEM)*256.0
Memory bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0/runtime
Memory data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0
Operational intensity (FP) = FP_HP_FIXED_OPS_HPEC/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
Operational intensity (FP+SVE128) = (FP_HP_FIXED_OPS_HPEC+((FP_HP_SCALE_OPS_HPEC*128)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
Operational intensity (FP+SVE256) = (FP_HP_FIXED_OPS_HPEC+((FP_HP_SCALE_OPS_HPEC*256)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
Operational intensity (FP+SVE512) = (FP_HP_FIXED_OPS_HPEC+((FP_HP_SCALE_OPS_HPEC*512)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
-
Profiling group to measure memory bandwidth and half-precision FP rate for scalar and SVE vector
operations with different widths. The events for the SVE metrics assumes that all vector elements
are active. The cache line size is 256 Byte.

View File

@@ -0,0 +1,50 @@
SHORT Overview of arithmetic and main memory performance
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 BUS_READ_TOTAL_MEM
PMC3 BUS_WRITE_TOTAL_MEM
PMC4 FP_SP_FIXED_OPS_SPEC
PMC5 FP_SP_SCALE_OPS_SPEC
METRICS
Runtime (RDTSC) [s] time
CPI PMC1/PMC0
SP (FP) [MFLOP/s] 1E-06*(PMC4)/time
SP (FP+SVE128) [MFLOP/s] 1E-06*(((PMC5*128.0)/128.0)+PMC4)/time
SP (FP+SVE256) [MFLOP/s] 1E-06*(((PMC5*256.0)/128.0)+PMC4)/time
SP (FP+SVE512) [MFLOP/s] 1E-06*(((PMC5*512.0)/128.0)+PMC4)/time
Memory read bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
Memory read data volume [GBytes] 1.0E-09*(PMC2)*256.0
Memory write bandwidth [MBytes/s] 1.0E-06*(PMC3)*256.0/time
Memory write data volume [GBytes] 1.0E-09*(PMC3)*256.0
Memory bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*256.0/time
Memory data volume [GBytes] 1.0E-09*(PMC2+PMC3)*256.0
Operational intensity (FP) PMC4/((PMC2+PMC3)*256.0)
Operational intensity (FP+SVE128) (((PMC5*128.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
Operational intensity (FP+SVE256) (((PMC5*256.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
Operational intensity (FP+SVE512) (((PMC5*512.0)/128.0)+PMC4)/((PMC2+PMC3)*256.0)
LONG
Formulas:
SP (FP) [MFLOP/s] = 1E-06*FP_SP_FIXED_OPS_SPEC/time
SP (FP+SVE128) [MFLOP/s] = 1.0E-06*(FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*128)/128))/time
SP (FP+SVE256) [MFLOP/s] = 1.0E-06*(FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*256)/128))/time
SP (FP+SVE512) [MFLOP/s] = 1.0E-06*(FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*512)/128))/time
Memory read bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM)*256.0/runtime
Memory read data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM)*256.0
Memory write bandwidth [MBytes/s] = 1.0E-06*(BUS_WRITE_TOTAL_MEM)*256.0/runtime
Memory write data volume [GBytes] = 1.0E-09*(BUS_WRITE_TOTAL_MEM)*256.0
Memory bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0/runtime
Memory data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0
Operational intensity (FP) = FP_SP_FIXED_OPS_SPEC/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
Operational intensity (FP+SVE128) = (FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*128)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
Operational intensity (FP+SVE256) = (FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*256)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
Operational intensity (FP+SVE512) = (FP_SP_FIXED_OPS_SPEC+((FP_SP_SCALE_OPS_SPEC*512)/128)/((BUS_READ_TOTAL_MEM+BUS_WRITE_TOTAL_MEM)*256.0)
-
Profiling group to measure memory bandwidth and single-precision FP rate for scalar and SVE vector
operations with different widths. The events for the SVE metrics assumes that all vector elements
are active. The cache line size is 256 Byte.

View File

@@ -0,0 +1,29 @@
SHORT PCI bandwidth in MBytes/s
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 BUS_READ_TOTAL_PCI
PMC3 BUS_WRITE_TOTAL_PCI
METRICS
Runtime (RDTSC) [s] time
CPI PMC1/PMC0
PCI read bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
PCI read data volume [GBytes] 1.0E-09*(PMC2)*256.0
PCI write bandwidth [MBytes/s] 1.0E-06*(PMC3)*256.0/time
PCI write data volume [GBytes] 1.0E-09*(PMC3)*256.0
PCI bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*256.0/time
PCI data volume [GBytes] 1.0E-09*(PMC2+PMC3)*256.0
LONG
Formulas:
PCI read bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_PCI)*256.0/runtime
PCI read data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_PCI)*256.0
PCI write bandwidth [MBytes/s] = 1.0E-06*(BUS_WRITE_TOTAL_PCI)*256.0/runtime
PCI write data volume [GBytes] = 1.0E-09*(BUS_WRITE_TOTAL_PCI)*256.0
PCI bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_PCI+BUS_WRITE_TOTAL_PCI)*256.0/runtime
PCI data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_PCI+BUS_WRITE_TOTAL_PCI)*256.0
-
Profiling group to measure PCI bandwidth. The cache line size is 256 Byte.

View File

@@ -0,0 +1,29 @@
SHORT TOFU bandwidth in MBytes/s
EVENTSET
PMC0 INST_RETIRED
PMC1 CPU_CYCLES
PMC2 BUS_READ_TOTAL_TOFU
PMC3 BUS_WRITE_TOTAL_TOFU
METRICS
Runtime (RDTSC) [s] time
CPI PMC1/PMC0
TOFU read bandwidth [MBytes/s] 1.0E-06*(PMC2)*256.0/time
TOFU read data volume [GBytes] 1.0E-09*(PMC2)*256.0
TOFU write bandwidth [MBytes/s] 1.0E-06*(PMC3)*256.0/time
TOFU write data volume [GBytes] 1.0E-09*(PMC3)*256.0
TOFU bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3)*256.0/time
TOFU data volume [GBytes] 1.0E-09*(PMC2+PMC3)*256.0
LONG
Formulas:
TOFU read bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_TOFU)*256.0/runtime
TOFU read data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_TOFU)*256.0
TOFU write bandwidth [MBytes/s] = 1.0E-06*(BUS_WRITE_TOTAL_TOFU)*256.0/runtime
TOFU write data volume [GBytes] = 1.0E-09*(BUS_WRITE_TOTAL_TOFU)*256.0
TOFU bandwidth [MBytes/s] = 1.0E-06*(BUS_READ_TOTAL_TOFU+BUS_WRITE_TOTAL_TOFU)*256.0/runtime
TOFU data volume [GBytes] = 1.0E-09*(BUS_READ_TOTAL_TOFU+BUS_WRITE_TOTAL_TOFU)*256.0
-
Profiling group to measure TOFU bandwidth. The cache line size is 256 Byte.