mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2024-11-10 12:37:25 +01:00
67 lines
3.2 KiB
Plaintext
67 lines
3.2 KiB
Plaintext
SHORT Overview of arithmetic and main memory performance
|
|
|
|
EVENTSET
|
|
FIXC0 INSTR_RETIRED_ANY
|
|
FIXC1 CPU_CLK_UNHALTED_CORE
|
|
FIXC2 CPU_CLK_UNHALTED_REF
|
|
PWR0 PWR_PKG_ENERGY
|
|
PWR3 PWR_DRAM_ENERGY
|
|
PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_SINGLE
|
|
PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_SINGLE
|
|
PMC2 SIMD_FP_256_PACKED_SINGLE
|
|
MBOX0C0 CAS_COUNT_RD
|
|
MBOX0C1 CAS_COUNT_WR
|
|
MBOX1C0 CAS_COUNT_RD
|
|
MBOX1C1 CAS_COUNT_WR
|
|
MBOX2C0 CAS_COUNT_RD
|
|
MBOX2C1 CAS_COUNT_WR
|
|
MBOX3C0 CAS_COUNT_RD
|
|
MBOX3C1 CAS_COUNT_WR
|
|
|
|
METRICS
|
|
Runtime (RDTSC) [s] time
|
|
Runtime unhalted [s] FIXC1*inverseClock
|
|
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
|
CPI FIXC1/FIXC0
|
|
Energy [J] PWR0
|
|
Power [W] PWR0/time
|
|
Energy DRAM [J] PWR3
|
|
Power DRAM [W] PWR3/time
|
|
MFLOP/s 1.0E-06*(PMC0*4.0+PMC1+PMC2*8.0)/time
|
|
AVX [MFLOP/s] 1.0E-06*(PMC2*8.0)/time
|
|
Packed [MUOPS/s] 1.0E-06*(PMC0+PMC2)/time
|
|
Scalar [MUOPS/s] 1.0E-06*PMC1/time
|
|
Memory read bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0)*64.0/time
|
|
Memory read data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0)*64.0
|
|
Memory write bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0/time
|
|
Memory write data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0
|
|
Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0/time
|
|
Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0
|
|
Operational intensity (PMC0*4.0+PMC1+PMC2*8.0)/((MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0)
|
|
|
|
LONG
|
|
Formulas:
|
|
Power [W] = PWR_PKG_ENERGY/runtime
|
|
Power DRAM [W] = PWR_DRAM_ENERGY/runtime
|
|
MFLOP/s = 1.0E-06*(FP_COMP_OPS_EXE_SSE_FP_PACKED*4+FP_COMP_OPS_EXE_SSE_FP_SCALAR+SIMD_FP_256_PACKED_SINGLE*8)/runtime
|
|
AVX [MFLOP/s] = 1.0E-06*(SIMD_FP_256_PACKED_SINGLE*8)/runtime
|
|
Packed [MUOPS/s] = 1.0E-06*(FP_COMP_OPS_EXE_SSE_FP_PACKED_SINGLE+SIMD_FP_256_PACKED_SINGLE)/runtime
|
|
Scalar [MUOPS/s] = 1.0E-06*FP_COMP_OPS_EXE_SSE_FP_SCALAR_SINGLE/runtime
|
|
Memory read bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC0))*64.0/time
|
|
Memory read data volume [GBytes] = 1.0E-09*(SUM(MBOXxC0))*64.0
|
|
Memory write bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC1))*64.0/time
|
|
Memory write data volume [GBytes] = 1.0E-09*(SUM(MBOXxC1))*64.0
|
|
Memory bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC0)+SUM(MBOXxC1))*64.0/time
|
|
Memory data volume [GBytes] = 1.0E-09*(SUM(MBOXxC0)+SUM(MBOXxC1))*64.0
|
|
Operational intensity = (FP_COMP_OPS_EXE_SSE_FP_PACKED*4+FP_COMP_OPS_EXE_SSE_FP_SCALAR+SIMD_FP_256_PACKED_SINGLE*8)/((SUM(MBOXxC0)+SUM(MBOXxC1))*64.0)
|
|
--
|
|
Profiling group to measure memory bandwidth drawn by all cores of a socket.
|
|
Since this group is based on Uncore events it is only possible to measure on
|
|
a per socket base. Also outputs total data volume transferred from main memory.
|
|
SSE scalar and packed single precision FLOP rates. Also reports on packed AVX
|
|
32b instructions. Please note that the current FLOP measurements on SandyBridge
|
|
are potentially wrong. So you cannot trust these counters at the moment!
|
|
The operational intensity is calculated using the FP values of the cores and the
|
|
memory data volume of the whole socket. The actual operational intensity for
|
|
multiple CPUs can be found in the statistics table in the Sum column.
|