SHORT Utilization of FP pipelines EVENTSET PMC0 INST_RETIRED PMC1 CPU_CYCLES PMC2 FLA_VAL PMC3 FLA_VAL_PRD_CNT PMC4 FLB_VAL PMC5 FLB_VAL_PRD_CNT METRICS Runtime (RDTSC) [s] time CPI PMC1/PMC0 FP operation pipeline A busy rate [%] (PMC2/PMC1)*100.0 FP pipeline A active element rate [%] (PMC3/(PMC2*16))*100.0 FP operation pipeline B busy rate [%] (PMC4/PMC1)*100.0 FP pipeline B active element rate [%] (PMC5/(PMC4*16))*100.0 LONG Formulas: CPI = CPU_CYCLES/INST_SPEC FP operation pipeline A busy rate [%] = (FLA_VAL/CPU_CYCLES)*100.0 FP pipeline A active element rate [%] = (FLA_VAL_PRD_CNT/(FLA_VAL*16))*100.0 FP operation pipeline B busy rate [%] = (FLB_VAL/CPU_CYCLES)*100.0 FP pipeline B active element rate [%] = (FLB_VAL_PRD_CNT/(FLB_VAL*16))*100.0 - FLx_VAL: This event counts valid cycles of FLx pipeline. FLx_VAL_PRD_CNT: This event counts the number of 1's in the predicate bits of request in FLA pipeline, where it is corrected so that it becomes 16 when all bits are 1. So each predicate mask has 16 slots, so there are 16 slots per cycle in FLA and FLB. FLA is partly used by other instructions like SVE stores.