mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2024-12-28 16:19:05 +01:00
71 lines
3.7 KiB
Plaintext
71 lines
3.7 KiB
Plaintext
|
SHORT Overview of arithmetic and main memory performance
|
||
|
|
||
|
EVENTSET
|
||
|
FIXC0 INSTR_RETIRED_ANY
|
||
|
FIXC1 CPU_CLK_UNHALTED_CORE
|
||
|
FIXC2 CPU_CLK_UNHALTED_REF
|
||
|
PWR0 PWR_PKG_ENERGY
|
||
|
PWR3 PWR_DRAM_ENERGY
|
||
|
PMC0 FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE
|
||
|
PMC1 FP_ARITH_INST_RETIRED_SCALAR_SINGLE
|
||
|
PMC2 FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE
|
||
|
PMC3 FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE
|
||
|
MBOX0C0 CAS_COUNT_RD
|
||
|
MBOX0C1 CAS_COUNT_WR
|
||
|
MBOX1C0 CAS_COUNT_RD
|
||
|
MBOX1C1 CAS_COUNT_WR
|
||
|
MBOX2C0 CAS_COUNT_RD
|
||
|
MBOX2C1 CAS_COUNT_WR
|
||
|
MBOX3C0 CAS_COUNT_RD
|
||
|
MBOX3C1 CAS_COUNT_WR
|
||
|
MBOX4C0 CAS_COUNT_RD
|
||
|
MBOX4C1 CAS_COUNT_WR
|
||
|
MBOX5C0 CAS_COUNT_RD
|
||
|
MBOX5C1 CAS_COUNT_WR
|
||
|
|
||
|
METRICS
|
||
|
Runtime (RDTSC) [s] time
|
||
|
Runtime unhalted [s] FIXC1*inverseClock
|
||
|
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||
|
CPI FIXC1/FIXC0
|
||
|
Energy [J] PWR0
|
||
|
Power [W] PWR0/time
|
||
|
Energy DRAM [J] PWR3
|
||
|
Power DRAM [W] PWR3/time
|
||
|
SP [MFLOP/s] 1.0E-06*(PMC0*4.0+PMC1+PMC2*8.0+PMC3*16.0)/time
|
||
|
AVX SP [MFLOP/s] 1.0E-06*(PMC2*8.0+PMC3*16.0)/time
|
||
|
Packed [MUOPS/s] 1.0E-06*(PMC0+PMC2+PMC3)/time
|
||
|
Scalar [MUOPS/s] 1.0E-06*PMC1/time
|
||
|
Memory read bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0)*64.0/time
|
||
|
Memory read data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0)*64.0
|
||
|
Memory write bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1)*64.0/time
|
||
|
Memory write data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1)*64.0
|
||
|
Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1)*64.0/time
|
||
|
Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1)*64.0
|
||
|
Operational intensity (PMC0*4.0+PMC1+PMC2*8.0+PMC3*16.0)/((MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1)*64.0)
|
||
|
|
||
|
LONG
|
||
|
Formulas:
|
||
|
Power [W] = PWR_PKG_ENERGY/runtime
|
||
|
Power DRAM [W] = PWR_DRAM_ENERGY/runtime
|
||
|
SP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE*4+FP_ARITH_INST_RETIRED_SCALAR_SINGLE+FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE*8+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE*16)/runtime
|
||
|
AVX SP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE*8+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE*16)/runtime
|
||
|
Packed [MUOPS/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE)/runtime
|
||
|
Scalar [MUOPS/s] = 1.0E-06*FP_ARITH_INST_RETIRED_SCALAR_SINGLE/runtime
|
||
|
Memory read bandwidth [MBytes/s] = 1.0E-06*(SUM(CAS_COUNT_RD))*64.0/runtime
|
||
|
Memory read data volume [GBytes] = 1.0E-09*(SUM(CAS_COUNT_RD))*64.0
|
||
|
Memory write bandwidth [MBytes/s] = 1.0E-06*(SUM(CAS_COUNT_WR))*64.0/runtime
|
||
|
Memory write data volume [GBytes] = 1.0E-09*(SUM(CAS_COUNT_WR))*64.0
|
||
|
Memory bandwidth [MBytes/s] = 1.0E-06*(SUM(CAS_COUNT_RD)+SUM(CAS_COUNT_WR))*64.0/runtime
|
||
|
Memory data volume [GBytes] = 1.0E-09*(SUM(CAS_COUNT_RD)+SUM(CAS_COUNT_WR))*64.0
|
||
|
Operational intensity = (FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE*4+FP_ARITH_INST_RETIRED_SCALAR_SINGLE+FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE*8+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE*16)/(SUM(CAS_COUNT_RD)+SUM(CAS_COUNT_WR))*64.0)
|
||
|
--
|
||
|
Profiling group to measure memory bandwidth drawn by all cores of a socket.
|
||
|
Since this group is based on Uncore events it is only possible to measure on
|
||
|
a per socket base. Also outputs total data volume transferred from main memory.
|
||
|
SSE scalar and packed single precision FLOP rates. Also reports on packed AVX
|
||
|
32b instructions.
|
||
|
The operational intensity is calculated using the FP values of the cores and the
|
||
|
memory data volume of the whole socket. The actual operational intensity for
|
||
|
multiple CPUs can be found in the statistics table in the Sum column.
|