cc-metric-collector/collectors/likwid/groups/zen/MEM_DP.txt

SHORT Overview of arithmetic and main memory performance

EVENTSET
FIXC1 ACTUAL_CPU_CLOCK
FIXC2 MAX_CPU_CLOCK
PMC0  RETIRED_INSTRUCTIONS
PMC1  CPU_CLOCKS_UNHALTED
PMC2  RETIRED_SSE_AVX_FLOPS_DOUBLE_ALL
PMC3  MERGE
DFC0  DATA_FROM_LOCAL_DRAM_CHANNEL
DFC1  DATA_TO_LOCAL_DRAM_CHANNEL

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz]  1.E-06*(FIXC1/FIXC2)/inverseClock
CPI  PMC1/PMC0
DP [MFLOP/s]   1.0E-06*(PMC2)/time
Memory bandwidth [MBytes/s] 1.0E-06*(DFC0+DFC1)*64.0/time
Memory data volume [GBytes] 1.0E-09*(DFC0+DFC1)*64.0
Operational intensity PMC2/((DFC0+DFC1)*64.0)

LONG
Formulas:
DP [MFLOP/s] = 1.0E-06*(RETIRED_SSE_AVX_FLOPS_DOUBLE_ALL)/time
Memory bandwidth [MBytes/s] = 1.0E-06*(DATA_FROM_LOCAL_DRAM_CHANNEL+DATA_TO_LOCAL_DRAM_CHANNEL)*64.0/runtime
Memory data volume [GBytes] = 1.0E-09*(DATA_FROM_LOCAL_DRAM_CHANNEL+DATA_TO_LOCAL_DRAM_CHANNEL)*64.0
Operational intensity = RETIRED_SSE_AVX_FLOPS_DOUBLE_ALL/((DATA_FROM_LOCAL_DRAM_CHANNEL+DATA_TO_LOCAL_DRAM_CHANNEL)*64.0)
-
Profiling group to measure memory bandwidth drawn by all cores of a socket.
Since this group is based on Uncore events it is only possible to measure on a
per socket base.
The group provides almost accurate results for the total bandwidth and data volume.
AMD describes this metric as "approximate" in the documentation for AMD Rome.

Be aware that despite the events imply a traffic direction (FROM and TO), the events
cannot be used to differentiate between read and write traffic. The events will be
renamed to avoid that confusion in the future.
Add likwid collector 2021-03-25 14:47:10 +01:00			`SHORT Overview of arithmetic and main memory performance`

			`EVENTSET`
			`FIXC1 ACTUAL_CPU_CLOCK`
			`FIXC2 MAX_CPU_CLOCK`
			`PMC0 RETIRED_INSTRUCTIONS`
			`PMC1 CPU_CLOCKS_UNHALTED`
			`PMC2 RETIRED_SSE_AVX_FLOPS_DOUBLE_ALL`
			`PMC3 MERGE`
			`DFC0 DATA_FROM_LOCAL_DRAM_CHANNEL`
			`DFC1 DATA_TO_LOCAL_DRAM_CHANNEL`

			`METRICS`
			`Runtime (RDTSC) [s] time`
			`Runtime unhalted [s] FIXC1*inverseClock`
			`Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock`
			`CPI PMC1/PMC0`
			`DP [MFLOP/s] 1.0E-06*(PMC2)/time`
			`Memory bandwidth [MBytes/s] 1.0E-06(DFC0+DFC1)64.0/time`
			`Memory data volume [GBytes] 1.0E-09(DFC0+DFC1)64.0`
			`Operational intensity PMC2/((DFC0+DFC1)*64.0)`

			`LONG`
			`Formulas:`
			`DP [MFLOP/s] = 1.0E-06*(RETIRED_SSE_AVX_FLOPS_DOUBLE_ALL)/time`
			`Memory bandwidth [MBytes/s] = 1.0E-06(DATA_FROM_LOCAL_DRAM_CHANNEL+DATA_TO_LOCAL_DRAM_CHANNEL)64.0/runtime`
			`Memory data volume [GBytes] = 1.0E-09(DATA_FROM_LOCAL_DRAM_CHANNEL+DATA_TO_LOCAL_DRAM_CHANNEL)64.0`
			`Operational intensity = RETIRED_SSE_AVX_FLOPS_DOUBLE_ALL/((DATA_FROM_LOCAL_DRAM_CHANNEL+DATA_TO_LOCAL_DRAM_CHANNEL)*64.0)`
			`-`
			`Profiling group to measure memory bandwidth drawn by all cores of a socket.`
			`Since this group is based on Uncore events it is only possible to measure on a`
			`per socket base.`
			`The group provides almost accurate results for the total bandwidth and data volume.`
			`AMD describes this metric as "approximate" in the documentation for AMD Rome.`

			`Be aware that despite the events imply a traffic direction (FROM and TO), the events`
			`cannot be used to differentiate between read and write traffic. The events will be`
			`renamed to avoid that confusion in the future.`