mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2024-11-10 12:37:25 +01:00
49 lines
2.1 KiB
Plaintext
49 lines
2.1 KiB
Plaintext
SHORT L3 cache bandwidth in MBytes/s
|
|
|
|
EVENTSET
|
|
FIXC0 INSTR_RETIRED_ANY
|
|
FIXC1 CPU_CLK_UNHALTED_CORE
|
|
FIXC2 CPU_CLK_UNHALTED_REF
|
|
PMC0 L2_LINES_IN_ALL
|
|
PMC1 L2_TRANS_L2_WB
|
|
PMC2 IDI_MISC_WB_DOWNGRADE
|
|
PMC3 IDI_MISC_WB_UPGRADE
|
|
|
|
|
|
METRICS
|
|
Runtime (RDTSC) [s] time
|
|
Runtime unhalted [s] FIXC1*inverseClock
|
|
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
|
CPI FIXC1/FIXC0
|
|
L3 load bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time
|
|
L3 load data volume [GBytes] 1.0E-09*PMC0*64.0
|
|
L3 evict bandwidth [MBytes/s] 1.0E-06*PMC3*64.0/time
|
|
L3 evict data volume [GBytes] 1.0E-09*PMC3*64.0
|
|
L3|MEM evict bandwidth [MBytes/s] 1.0E-06*PMC1*64.0/time
|
|
L3|MEM evict data volume [GBytes] 1.0E-09*PMC1*64.0
|
|
Dropped CLs bandwidth [MBytes/s] 1.0E-6*PMC2*64.0/time
|
|
Dropped CLs data volume [GBytes] 1.0E-9*PMC2*64.0
|
|
L3 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
|
|
L3 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0
|
|
|
|
LONG
|
|
Formulas:
|
|
L3 load bandwidth [MBytes/s] = 1.0E-06*L2_LINES_IN_ALL*64.0/time
|
|
L3 load data volume [GBytes] = 1.0E-09*L2_LINES_IN_ALL*64.0
|
|
L3 evict bandwidth [MBytes/s] = 1.0E-06*IDI_MISC_WB_UPGRADE*64.0/time
|
|
L3 evict data volume [GBytes] = 1.0E-09*IDI_MISC_WB_UPGRADE*64.0
|
|
Dropped CLs bandwidth [MBytes/s] = 1.0E-6*IDI_MISC_WB_DOWNGRADE*64.0/time
|
|
Dropped CLs data volume [GBytes] = 1.0E-9*IDI_MISC_WB_DOWNGRADE*64.0
|
|
L3|MEM evict bandwidth [MBytes/s] = 1.0E-06*L2_TRANS_L2_WB*64.0/time
|
|
L3|MEM evict data volume [GBytes] = 1.0E-09*L2_TRANS_L2_WB*64.0
|
|
L3 bandwidth [MBytes/s] = 1.0E-06*(L2_LINES_IN_ALL+L2_TRANS_L2_WB)*64/time
|
|
L3 data volume [GBytes] = 1.0E-09*(L2_LINES_IN_ALL+L2_TRANS_L2_WB)*64
|
|
--
|
|
Profiling group to measure L3 cache bandwidth and data volume. For Intel Skylake
|
|
or Cascadelake, the L3 is a victim cache. This means that all data is loaded
|
|
from memory directly into the L2 cache (if L3 prefetcher is inactive). Modified
|
|
data in L2 is evicted to L3 (additional data transfer due to non-inclusivenss of
|
|
L3 can be measured). Clean cache lines (only loaded data) might get dropped in
|
|
L2 to reduce traffic. If amount of clean cache lines is smaller than L3, it
|
|
might be evicted to L3 due to some heuristic.
|