SHORT Overview of arithmetic and main memory performance EVENTSET FIXC0 INSTR_RETIRED_ANY FIXC1 CPU_CLK_UNHALTED_CORE FIXC2 CPU_CLK_UNHALTED_REF PWR0 PWR_PKG_ENERGY PWR3 PWR_DRAM_ENERGY PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_DOUBLE PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_DOUBLE PMC2 SIMD_FP_256_PACKED_DOUBLE MBOX0C0 CAS_COUNT_RD MBOX0C1 CAS_COUNT_WR MBOX1C0 CAS_COUNT_RD MBOX1C1 CAS_COUNT_WR MBOX2C0 CAS_COUNT_RD MBOX2C1 CAS_COUNT_WR MBOX3C0 CAS_COUNT_RD MBOX3C1 CAS_COUNT_WR METRICS Runtime (RDTSC) [s] time Runtime unhalted [s] FIXC1*inverseClock Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock CPI FIXC1/FIXC0 Energy [J] PWR0 Power [W] PWR0/time Energy DRAM [J] PWR3 Power DRAM [W] PWR3/time MFLOP/s 1.0E-06*(PMC0*2.0+PMC1+PMC2*4.0)/time AVX [MFLOP/s] 1.0E-06*(PMC2*4.0)/time Packed [MUOPS/s] 1.0E-06*(PMC0+PMC2)/time Scalar [MUOPS/s] 1.0E-06*PMC1/time Memory read bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0)*64.0/time Memory read data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0)*64.0 Memory write bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0/time Memory write data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0 Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0/time Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0 Operational intensity (PMC0*2.0+PMC1+PMC2*4.0)/((MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0) LONG Formulas: Power [W] = PWR_PKG_ENERGY/runtime Power DRAM [W] = PWR_DRAM_ENERGY/runtime MFLOP/s = 1.0E-06*(FP_COMP_OPS_EXE_SSE_FP_PACKED*2+FP_COMP_OPS_EXE_SSE_FP_SCALAR+SIMD_FP_256_PACKED_DOUBLE*4)/runtime AVX [MFLOP/s] = 1.0E-06*(SIMD_FP_256_PACKED_DOUBLE*4)/runtime Packed [MUOPS/s] = 1.0E-06*(FP_COMP_OPS_EXE_SSE_FP_PACKED_DOUBLE+SIMD_FP_256_PACKED_DOUBLE)/runtime Scalar [MUOPS/s] = 1.0E-06*FP_COMP_OPS_EXE_SSE_FP_SCALAR_DOUBLE/runtime Memory read bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC0))*64.0/time Memory read data volume [GBytes] = 1.0E-09*(SUM(MBOXxC0))*64.0 Memory write bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC1))*64.0/time Memory write data volume [GBytes] = 1.0E-09*(SUM(MBOXxC1))*64.0 Memory bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC0)+SUM(MBOXxC1))*64.0/time Memory data volume [GBytes] = 1.0E-09*(SUM(MBOXxC0)+SUM(MBOXxC1))*64.0 Operational intensity = (FP_COMP_OPS_EXE_SSE_FP_PACKED*2+FP_COMP_OPS_EXE_SSE_FP_SCALAR+SIMD_FP_256_PACKED_DOUBLE*4)/((SUM(MBOXxC0)+SUM(MBOXxC1))*64.0) -- Profiling group to measure memory bandwidth drawn by all cores of a socket. Since this group is based on Uncore events it is only possible to measure on a per socket base. Also outputs total data volume transferred from main memory. SSE scalar and packed double precision FLOP rates. Also reports on packed AVX 32b instructions. Please note that the current FLOP measurements on SandyBridge are potentially wrong. So you cannot trust these counters at the moment! The operational intensity is calculated using the FP values of the cores and the memory data volume of the whole socket. The actual operational intensity for multiple CPUs can be found in the statistics table in the Sum column.