cc-metric-collector/collectors/likwid/groups/sandybridgeEP/MEM_SP.txt

SHORT Overview of arithmetic and main memory performance

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PWR0  PWR_PKG_ENERGY
PWR3  PWR_DRAM_ENERGY
PMC0  FP_COMP_OPS_EXE_SSE_FP_PACKED_SINGLE
PMC1  FP_COMP_OPS_EXE_SSE_FP_SCALAR_SINGLE
PMC2  SIMD_FP_256_PACKED_SINGLE
MBOX0C0 CAS_COUNT_RD
MBOX0C1 CAS_COUNT_WR
MBOX1C0 CAS_COUNT_RD
MBOX1C1 CAS_COUNT_WR
MBOX2C0 CAS_COUNT_RD
MBOX2C1 CAS_COUNT_WR
MBOX3C0 CAS_COUNT_RD
MBOX3C1 CAS_COUNT_WR

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz]  1.E-06*(FIXC1/FIXC2)/inverseClock
CPI  FIXC1/FIXC0
Energy [J]  PWR0
Power [W] PWR0/time
Energy DRAM [J]  PWR3
Power DRAM [W] PWR3/time
MFLOP/s  1.0E-06*(PMC0*4.0+PMC1+PMC2*8.0)/time
AVX [MFLOP/s] 1.0E-06*(PMC2*8.0)/time
Packed [MUOPS/s]   1.0E-06*(PMC0+PMC2)/time
Scalar [MUOPS/s] 1.0E-06*PMC1/time
Memory read bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0)*64.0/time
Memory read data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0)*64.0
Memory write bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0/time
Memory write data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0
Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0/time
Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0
Operational intensity (PMC0*4.0+PMC1+PMC2*8.0)/((MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0)

LONG
Formulas:
Power [W] = PWR_PKG_ENERGY/runtime
Power DRAM [W] = PWR_DRAM_ENERGY/runtime
MFLOP/s = 1.0E-06*(FP_COMP_OPS_EXE_SSE_FP_PACKED*4+FP_COMP_OPS_EXE_SSE_FP_SCALAR+SIMD_FP_256_PACKED_SINGLE*8)/runtime
AVX [MFLOP/s] = 1.0E-06*(SIMD_FP_256_PACKED_SINGLE*8)/runtime
Packed [MUOPS/s] = 1.0E-06*(FP_COMP_OPS_EXE_SSE_FP_PACKED_SINGLE+SIMD_FP_256_PACKED_SINGLE)/runtime
Scalar [MUOPS/s] = 1.0E-06*FP_COMP_OPS_EXE_SSE_FP_SCALAR_SINGLE/runtime
Memory read bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC0))*64.0/time
Memory read data volume [GBytes] = 1.0E-09*(SUM(MBOXxC0))*64.0
Memory write bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC1))*64.0/time
Memory write data volume [GBytes] = 1.0E-09*(SUM(MBOXxC1))*64.0
Memory bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC0)+SUM(MBOXxC1))*64.0/time
Memory data volume [GBytes] = 1.0E-09*(SUM(MBOXxC0)+SUM(MBOXxC1))*64.0
Operational intensity = (FP_COMP_OPS_EXE_SSE_FP_PACKED*4+FP_COMP_OPS_EXE_SSE_FP_SCALAR+SIMD_FP_256_PACKED_SINGLE*8)/((SUM(MBOXxC0)+SUM(MBOXxC1))*64.0)
--
Profiling group to measure memory bandwidth drawn by all cores of a socket.
Since this group is based on Uncore events it is only possible to measure on
a per socket base. Also outputs total data volume transferred from main memory.
SSE scalar and packed single precision FLOP rates. Also reports on packed AVX
32b instructions. Please note that the current FLOP measurements on SandyBridge
are potentially wrong. So you cannot trust these counters at the moment!
The operational intensity is calculated using the FP values of the cores and the
memory data volume of the whole socket. The actual operational intensity for
multiple CPUs can be found in the statistics table in the Sum column.