SHORT  L3 cache bandwidth in MBytes/s

EVENTSET
PMC0  INST_RETIRED
PMC1  CPU_CYCLES
PMC2  L2D_CACHE_REFILL
PMC3  L2D_CACHE_WB
PMC4  L2D_CACHE_ALLOCATE


METRICS
Runtime (RDTSC) [s] time
Clock [MHz] 1.E-06*PMC1/time
CPI  PMC1/PMC0
L3 load bandwidth [MBytes/s]  1.0E-06*(PMC2-PMC4)*64.0/time
L3 load data volume [GBytes]  1.0E-09*(PMC2-PMC4)*64.0
L3 evict bandwidth [MBytes/s]  1.0E-06*PMC3*64.0/time
L3 evict data volume [GBytes]  1.0E-09*PMC3*64.0
L3 bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3-PMC4)*64.0/time
L3 data volume [GBytes] 1.0E-09*(PMC2+PMC3-PMC4)*64.0

LONG
Formulas:
CPI = CPU_CYCLES/INST_RETIRED
L3 load bandwidth [MBytes/s] = 1.0E-06*(L2D_CACHE_REFILL-L2D_CACHE_ALLOCATE)*64.0/time
L3 load data volume [GBytes] = 1.0E-09*(L2D_CACHE_REFILL-L2D_CACHE_ALLOCATE)*64.0
L3 evict bandwidth [MBytes/s] = 1.0E-06*L2D_CACHE_WB*64.0/time
L3 evict data volume [GBytes] = 1.0E-09*L2D_CACHE_WB*64.0
L3 bandwidth [MBytes/s] = 1.0E-06*(L2D_CACHE_REFILL+L2D_CACHE_WB-L2D_CACHE_ALLOCATE))*64.0/time
L3 data volume [GBytes] = 1.0E-09*(L2D_CACHE_REFILL+L2D_CACHE_WB-L2D_CACHE_ALLOCATE))*64.0
-
Profiling group to measure L2 <-> L3 cache bandwidth. The bandwidth is computed by the
number of cache lines loaded from the L3 to the L2 data cache and the writebacks from
the L2 data cache to the L3 cache. The group also outputs total data volume transfered between
L3 and L2. For streaming-stores, the cache lines are allocated in L2, consequently there
is no traffic between L3 and L2 in this case. But the L2D_CACHE_REFILL event counts these
allocated cache lines, that's why the value of L2D_CACHE_REFILL is reduced
by L2D_CACHE_ALLOCATE.