SHORT Overview of arithmetic and main memory performance EVENTSET FIXC0 INSTR_RETIRED_ANY FIXC1 CPU_CLK_UNHALTED_CORE FIXC2 CPU_CLK_UNHALTED_REF PWR0 PWR_PKG_ENERGY PWR3 PWR_DRAM_ENERGY PMC0 FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE PMC1 FP_ARITH_INST_RETIRED_SCALAR_DOUBLE PMC2 FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE PMC3 FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE MBOX0C0 CAS_COUNT_RD MBOX0C1 CAS_COUNT_WR MBOX1C0 CAS_COUNT_RD MBOX1C1 CAS_COUNT_WR MBOX2C0 CAS_COUNT_RD MBOX2C1 CAS_COUNT_WR MBOX3C0 CAS_COUNT_RD MBOX3C1 CAS_COUNT_WR MBOX4C0 CAS_COUNT_RD MBOX4C1 CAS_COUNT_WR MBOX5C0 CAS_COUNT_RD MBOX5C1 CAS_COUNT_WR METRICS Runtime (RDTSC) [s] time Runtime unhalted [s] FIXC1*inverseClock Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock CPI FIXC1/FIXC0 Energy [J] PWR0 Power [W] PWR0/time Energy DRAM [J] PWR3 Power DRAM [W] PWR3/time DP [MFLOP/s] 1.0E-06*(PMC0*2.0+PMC1+PMC2*4.0+PMC3*8.0)/time AVX DP [MFLOP/s] 1.0E-06*(PMC2*4.0+PMC3*8.0)/time Packed [MUOPS/s] 1.0E-06*(PMC0+PMC2+PMC3)/time Scalar [MUOPS/s] 1.0E-06*PMC1/time Memory read bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0)*64.0/time Memory read data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0)*64.0 Memory write bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1)*64.0/time Memory write data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1)*64.0 Memory bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1)*64.0/time Memory data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1)*64.0 Operational intensity (PMC0*2.0+PMC1+PMC2*4.0+PMC3*8.0)/((MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1)*64.0) LONG Formulas: Power [W] = PWR_PKG_ENERGY/runtime Power DRAM [W] = PWR_DRAM_ENERGY/runtime DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE*2+FP_ARITH_INST_RETIRED_SCALAR_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime AVX DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime Packed [MUOPS/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE)/runtime Scalar [MUOPS/s] = 1.0E-06*FP_ARITH_INST_RETIRED_SCALAR_DOUBLE/runtime Memory read bandwidth [MBytes/s] = 1.0E-06*(SUM(CAS_COUNT_RD))*64.0/runtime Memory read data volume [GBytes] = 1.0E-09*(SUM(CAS_COUNT_RD))*64.0 Memory write bandwidth [MBytes/s] = 1.0E-06*(SUM(CAS_COUNT_WR))*64.0/runtime Memory write data volume [GBytes] = 1.0E-09*(SUM(CAS_COUNT_WR))*64.0 Memory bandwidth [MBytes/s] = 1.0E-06*(SUM(CAS_COUNT_RD)+SUM(CAS_COUNT_WR))*64.0/runtime Memory data volume [GBytes] = 1.0E-09*(SUM(CAS_COUNT_RD)+SUM(CAS_COUNT_WR))*64.0 Operational intensity = (FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE*2+FP_ARITH_INST_RETIRED_SCALAR_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/(SUM(CAS_COUNT_RD)+SUM(CAS_COUNT_WR))*64.0) -- Profiling group to measure memory bandwidth drawn by all cores of a socket. Since this group is based on Uncore events it is only possible to measure on a per socket base. Also outputs total data volume transferred from main memory. SSE scalar and packed double precision FLOP rates. Also reports on packed AVX 32b instructions. The operational intensity is calculated using the FP values of the cores and the memory data volume of the whole socket. The actual operational intensity for multiple CPUs can be found in the statistics table in the Sum column.