SHORT L3 cache bandwidth in MBytes/s EVENTSET PMC0 INST_RETIRED PMC1 CPU_CYCLES PMC2 L2D_CACHE_REFILL PMC3 L2D_CACHE_WB PMC4 L2D_CACHE_ALLOCATE METRICS Runtime (RDTSC) [s] time Clock [MHz] 1.E-06*PMC1/time CPI PMC1/PMC0 L3 load bandwidth [MBytes/s] 1.0E-06*(PMC2-PMC4)*64.0/time L3 load data volume [GBytes] 1.0E-09*(PMC2-PMC4)*64.0 L3 evict bandwidth [MBytes/s] 1.0E-06*PMC3*64.0/time L3 evict data volume [GBytes] 1.0E-09*PMC3*64.0 L3 bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3-PMC4)*64.0/time L3 data volume [GBytes] 1.0E-09*(PMC2+PMC3-PMC4)*64.0 LONG Formulas: CPI = CPU_CYCLES/INST_RETIRED L3 load bandwidth [MBytes/s] = 1.0E-06*(L2D_CACHE_REFILL-L2D_CACHE_ALLOCATE)*64.0/time L3 load data volume [GBytes] = 1.0E-09*(L2D_CACHE_REFILL-L2D_CACHE_ALLOCATE)*64.0 L3 evict bandwidth [MBytes/s] = 1.0E-06*L2D_CACHE_WB*64.0/time L3 evict data volume [GBytes] = 1.0E-09*L2D_CACHE_WB*64.0 L3 bandwidth [MBytes/s] = 1.0E-06*(L2D_CACHE_REFILL+L2D_CACHE_WB-L2D_CACHE_ALLOCATE))*64.0/time L3 data volume [GBytes] = 1.0E-09*(L2D_CACHE_REFILL+L2D_CACHE_WB-L2D_CACHE_ALLOCATE))*64.0 - Profiling group to measure L2 <-> L3 cache bandwidth. The bandwidth is computed by the number of cache lines loaded from the L3 to the L2 data cache and the writebacks from the L2 data cache to the L3 cache. The group also outputs total data volume transfered between L3 and L2. For streaming-stores, the cache lines are allocated in L2, consequently there is no traffic between L3 and L2 in this case. But the L2D_CACHE_REFILL event counts these allocated cache lines, that's why the value of L2D_CACHE_REFILL is reduced by L2D_CACHE_ALLOCATE.