mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2024-11-14 05:57:25 +01:00
40 lines
1.5 KiB
Plaintext
40 lines
1.5 KiB
Plaintext
|
SHORT Overview of arithmetic and main memory performance
|
||
|
|
||
|
EVENTSET
|
||
|
FIXC1 ACTUAL_CPU_CLOCK
|
||
|
FIXC2 MAX_CPU_CLOCK
|
||
|
PMC0 RETIRED_INSTRUCTIONS
|
||
|
PMC1 CPU_CLOCKS_UNHALTED
|
||
|
PMC2 RETIRED_SSE_AVX_FLOPS_DOUBLE_ALL
|
||
|
PMC3 MERGE
|
||
|
DFC0 DATA_FROM_LOCAL_DRAM_CHANNEL
|
||
|
DFC1 DATA_TO_LOCAL_DRAM_CHANNEL
|
||
|
|
||
|
METRICS
|
||
|
Runtime (RDTSC) [s] time
|
||
|
Runtime unhalted [s] FIXC1*inverseClock
|
||
|
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
|
||
|
CPI PMC1/PMC0
|
||
|
DP [MFLOP/s] 1.0E-06*(PMC2)/time
|
||
|
Memory bandwidth [MBytes/s] 1.0E-06*(DFC0+DFC1)*64.0/time
|
||
|
Memory data volume [GBytes] 1.0E-09*(DFC0+DFC1)*64.0
|
||
|
Operational intensity PMC2/((DFC0+DFC1)*64.0)
|
||
|
|
||
|
LONG
|
||
|
Formulas:
|
||
|
DP [MFLOP/s] = 1.0E-06*(RETIRED_SSE_AVX_FLOPS_DOUBLE_ALL)/time
|
||
|
Memory bandwidth [MBytes/s] = 1.0E-06*(DATA_FROM_LOCAL_DRAM_CHANNEL+DATA_TO_LOCAL_DRAM_CHANNEL)*64.0/runtime
|
||
|
Memory data volume [GBytes] = 1.0E-09*(DATA_FROM_LOCAL_DRAM_CHANNEL+DATA_TO_LOCAL_DRAM_CHANNEL)*64.0
|
||
|
Operational intensity = RETIRED_SSE_AVX_FLOPS_DOUBLE_ALL/((DATA_FROM_LOCAL_DRAM_CHANNEL+DATA_TO_LOCAL_DRAM_CHANNEL)*64.0)
|
||
|
-
|
||
|
Profiling group to measure memory bandwidth drawn by all cores of a socket.
|
||
|
Since this group is based on Uncore events it is only possible to measure on a
|
||
|
per socket base.
|
||
|
The group provides almost accurate results for the total bandwidth and data volume.
|
||
|
AMD describes this metric as "approximate" in the documentation for AMD Rome.
|
||
|
|
||
|
Be aware that despite the events imply a traffic direction (FROM and TO), the events
|
||
|
cannot be used to differentiate between read and write traffic. The events will be
|
||
|
renamed to avoid that confusion in the future.
|
||
|
|