cc-metric-collector/collectors/likwid/groups/knl/FLOPS_DP.txt

SHORT Double Precision MFLOP/s

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PMC0  UOPS_RETIRED_SCALAR_SIMD
PMC1  UOPS_RETIRED_PACKED_SIMD

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz]  1.E-06*(FIXC1/FIXC2)/inverseClock
CPI  FIXC1/FIXC0
DP [MFLOP/s] (SSE assumed) 1.0E-06*((PMC1*2.0)+PMC0)/time
DP [MFLOP/s] (AVX assumed) 1.0E-06*((PMC1*4.0)+PMC0)/time
DP [MFLOP/s] (AVX512 assumed) 1.0E-06*((PMC1*8.0)+PMC0)/time
Packed [MUOPS/s]   1.0E-06*(PMC1)/time
Scalar [MUOPS/s] 1.0E-06*PMC0/time

LONG
Formulas:
DP [MFLOP/s] (SSE assumed) = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD*2+UOPS_RETIRED_SCALAR_SIMD)/runtime
DP [MFLOP/s] (AVX assumed) = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD*4+UOPS_RETIRED_SCALAR_SIMD)/runtime
DP [MFLOP/s] (AVX512 assumed) = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD*8+UOPS_RETIRED_SCALAR_SIMD)/runtime
Packed [MUOPS/s] = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD)/runtime
Scalar [MUOPS/s] = 1.0E-06*UOPS_RETIRED_SCALAR_SIMD/runtime
-
AVX/SSE scalar and packed double precision FLOP rates. The Xeon Phi (Knights Landing) provides
no possibility to differentiate between double and single precision FLOP/s. Therefore, we only
assume that the printed [MFLOP/s] value is for double-precision code. Moreover, there is no way
to distinguish between SSE, AVX or AVX512 packed SIMD operations. Therefore, this group prints
out the [MFLOP/s] for different SIMD techniques.
WARNING: The events also count for integer arithmetics
Add likwid collector 2021-03-25 14:47:10 +01:00			`SHORT Double Precision MFLOP/s`

			`EVENTSET`
			`FIXC0 INSTR_RETIRED_ANY`
			`FIXC1 CPU_CLK_UNHALTED_CORE`
			`FIXC2 CPU_CLK_UNHALTED_REF`
			`PMC0 UOPS_RETIRED_SCALAR_SIMD`
			`PMC1 UOPS_RETIRED_PACKED_SIMD`

			`METRICS`
			`Runtime (RDTSC) [s] time`
			`Runtime unhalted [s] FIXC1*inverseClock`
			`Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock`
			`CPI FIXC1/FIXC0`
			`DP [MFLOP/s] (SSE assumed) 1.0E-06((PMC12.0)+PMC0)/time`
			`DP [MFLOP/s] (AVX assumed) 1.0E-06((PMC14.0)+PMC0)/time`
			`DP [MFLOP/s] (AVX512 assumed) 1.0E-06((PMC18.0)+PMC0)/time`
			`Packed [MUOPS/s] 1.0E-06*(PMC1)/time`
			`Scalar [MUOPS/s] 1.0E-06*PMC0/time`

			`LONG`
			`Formulas:`
			`DP [MFLOP/s] (SSE assumed) = 1.0E-06(UOPS_RETIRED_PACKED_SIMD2+UOPS_RETIRED_SCALAR_SIMD)/runtime`
			`DP [MFLOP/s] (AVX assumed) = 1.0E-06(UOPS_RETIRED_PACKED_SIMD4+UOPS_RETIRED_SCALAR_SIMD)/runtime`
			`DP [MFLOP/s] (AVX512 assumed) = 1.0E-06(UOPS_RETIRED_PACKED_SIMD8+UOPS_RETIRED_SCALAR_SIMD)/runtime`
			`Packed [MUOPS/s] = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD)/runtime`
			`Scalar [MUOPS/s] = 1.0E-06*UOPS_RETIRED_SCALAR_SIMD/runtime`
			`-`
			`AVX/SSE scalar and packed double precision FLOP rates. The Xeon Phi (Knights Landing) provides`
			`no possibility to differentiate between double and single precision FLOP/s. Therefore, we only`
			`assume that the printed [MFLOP/s] value is for double-precision code. Moreover, there is no way`
			`to distinguish between SSE, AVX or AVX512 packed SIMD operations. Therefore, this group prints`
			`out the [MFLOP/s] for different SIMD techniques.`
			`WARNING: The events also count for integer arithmetics`