mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2024-12-28 16:19:05 +01:00
60 lines
2.8 KiB
Plaintext
60 lines
2.8 KiB
Plaintext
|
SHORT Overview of arithmetic and main memory performance
|
||
|
|
||
|
EVENTSET
|
||
|
FIXC0 INSTR_RETIRED_ANY
|
||
|
FIXC1 CPU_CLK_UNHALTED_CORE
|
||
|
FIXC2 CPU_CLK_UNHALTED_REF
|
||
|
PWR0 PWR_PKG_ENERGY
|
||
|
PWR3 PWR_DRAM_ENERGY
|
||
|
PMC0 FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE
|
||
|
PMC1 FP_ARITH_INST_RETIRED_SCALAR_DOUBLE
|
||
|
PMC2 FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE
|
||
|
MBOX0C1 DRAM_READS
|
||
|
MBOX0C2 DRAM_WRITES
|
||
|
|
||
|
METRICS
|
||
|
Runtime (RDTSC) [s] time
|
||
|
Runtime unhalted [s] FIXC1*inverseClock
|
||
|
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||
|
CPI FIXC1/FIXC0
|
||
|
Energy [J] PWR0
|
||
|
Power [W] PWR0/time
|
||
|
Energy DRAM [J] PWR3
|
||
|
Power DRAM [W] PWR3/time
|
||
|
DP [MFLOP/s] 1.0E-06*(PMC0*2.0+PMC1+PMC2*4.0)/time
|
||
|
AVX DP [MFLOP/s] 1.0E-06*(PMC2*4.0)/time
|
||
|
Packed [MUOPS/s] 1.0E-06*(PMC0+PMC2)/time
|
||
|
Scalar [MUOPS/s] 1.0E-06*PMC1/time
|
||
|
Memory load bandwidth [MBytes/s] 1.0E-06*MBOX0C1*64.0/time
|
||
|
Memory load data volume [GBytes] 1.0E-09*MBOX0C1*64.0
|
||
|
Memory evict bandwidth [MBytes/s] 1.0E-06*MBOX0C2*64.0/time
|
||
|
Memory evict data volume [GBytes] 1.0E-09*MBOX0C2*64.0
|
||
|
Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX0C2)*64.0/time
|
||
|
Memory data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX0C2)*64.0
|
||
|
Operational intensity (PMC0*2.0+PMC1+PMC2*4.0)/((MBOX0C1+MBOX0C2)*64.0)
|
||
|
|
||
|
LONG
|
||
|
Formulas:
|
||
|
Power [W] = PWR_PKG_ENERGY/runtime
|
||
|
Power DRAM [W] = PWR_DRAM_ENERGY/runtime
|
||
|
DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE*2+FP_ARITH_INST_RETIRED_SCALAR_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4)/runtime
|
||
|
AVX DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4)/runtime
|
||
|
Packed [MUOPS/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE)/runtime
|
||
|
Scalar [MUOPS/s] = 1.0E-06*FP_ARITH_INST_RETIRED_SCALAR_DOUBLE/runtime
|
||
|
Memory read bandwidth [MBytes/s] = 1.0E-06*DRAM_READS*64.0/runtime
|
||
|
Memory read data volume [GBytes] = 1.0E-09*DRAM_READS*64.0
|
||
|
Memory write bandwidth [MBytes/s] = 1.0E-06*DRAM_WRITES*64.0/runtime
|
||
|
Memory write data volume [GBytes] = 1.0E-09*DRAM_WRITES*64.0
|
||
|
Memory bandwidth [MBytes/s] = 1.0E-06*(DRAM_READS+DRAM_WRITES)*64.0/runtime
|
||
|
Memory data volume [GBytes] = 1.0E-09*(DRAM_READS+DRAM_WRITES)*64.0
|
||
|
Operational intensity = (FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE*2+FP_ARITH_INST_RETIRED_SCALAR_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4)/(DRAM_READS+DRAM_WRITES)*64.0)
|
||
|
--
|
||
|
Profiling group to measure memory bandwidth drawn by all cores of a socket.
|
||
|
Since this group is based on Uncore events it is only possible to measure on
|
||
|
a per socket base. Also outputs total data volume transferred from main memory.
|
||
|
SSE scalar and packed double precision FLOP rates. Also reports on packed AVX
|
||
|
32b instructions.
|
||
|
The operational intensity is calculated using the FP values of the cores and the
|
||
|
memory data volume of the whole socket. The actual operational intensity for
|
||
|
multiple CPUs can be found in the statistics table in the Sum column.
|