iwia021h@testfront1:~/MD-Bench/asm$ iaca -arch SKX force_soa_geq1200_markers.o Intel(R) Architecture Code Analyzer Version - v3.0-28-g1ba2cbb build date: 2017-10-23;16:42:45 Analyzed File - force_soa_geq1200_markers.o Binary Format - 64Bit Architecture - SKX Analysis Type - Throughput Throughput Analysis Report -------------------------- Block Throughput: 31.47 Cycles Throughput Bottleneck: Backend Loop Count: 22 Port Binding In Cycles Per Iteration: -------------------------------------------------------------------------------------------------- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------------------------- | Cycles | 18.0 0.0 | 3.0 | 13.0 13.0 | 13.0 13.0 | 0.0 | 18.0 | 3.0 | 0.0 | -------------------------------------------------------------------------------------------------- DV - Divider pipe (on port 0) D - Data fetch pipe (on ports 2 and 3) F - Macro Fusion with the previous instruction occurred * - instruction micro-ops not bound to a port ^ - Micro Fusion occurred # - ESP Tracking sync uop was issued @ - SSE instruction followed an AVX256/AVX512 instruction, dozens of cycles penalty is expected X - instruction not supported, was not accounted in Analysis | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | ----------------------------------------------------------------------------------------- | 1 | | | | | | 1.0 | | | vpcmpgtd k5, ymm3, ymm4 | 1 | | 1.0 | | | | | | | vpaddd ymm4, ymm4, ymm18 | 2 | | 1.0 | 1.0 1.0 | | | | | | vmovdqu32 ymm20{k5}{z}, ymmword ptr [rcx+r15*4] | 1* | | | | | | | | | vmovaps zmm22, zmm19 | 1 | | 1.0 | | | | | | | add r15, 0x8 | 1 | 1.0 | | | | | | | | kmovw k2, k5 | 1* | | | | | | | | | vmovaps zmm21, zmm19 | 1 | 1.0 | | | | | | | | kmovw k1, k5 | 1* | | | | | | | | | vmovaps zmm23, zmm19 | 1 | 1.0 | | | | | | | | kmovw k3, k5 | 5^ | 1.0 | | 4.0 4.0 | 4.0 4.0 | | 1.0 | 1.0 | | vgatherdpd zmm23, k3, zmmword ptr [rsi+ymm20*8] | 5^ | 1.0 | | 4.0 4.0 | 4.0 4.0 | | 1.0 | 1.0 | | vgatherdpd zmm22, k2, zmmword ptr [rax+ymm20*8] | 5^ | 1.0 | | 4.0 4.0 | 4.0 4.0 | | 1.0 | 1.0 | | vgatherdpd zmm21, k1, zmmword ptr [rdx+ymm20*8] | 1 | | | | | | 1.0 | | | vsubpd zmm0, zmm5, zmm22 | 1 | | | | | | 1.0 | | | vsubpd zmm1, zmm2, zmm21 | 1 | | | | | | 1.0 | | | vsubpd zmm21, zmm6, zmm23 | 1 | 1.0 | | | | | | | | vmulpd zmm20, zmm0, zmm0 | 1 | | | | | | 1.0 | | | vfmadd231pd zmm20, zmm1, zmm1 | 1 | 1.0 | | | | | | | | vfmadd231pd zmm20, zmm21, zmm21 | 3 | 2.0 | | | | | 1.0 | | | vrcp14pd zmm31, zmm20 | 1 | | | | | | 1.0 | | | vcmppd k6{k5}, zmm20, zmm16, 0x1 | 1 | | | | | | 1.0 | | | vfpclasspd k0, zmm31, 0x1e | 1* | | | | | | | | | vmovaps zmm24, zmm20 | 2^ | 1.0 | | | 1.0 1.0 | | | | | vfnmadd213pd zmm24, zmm31, qword ptr [rip]{1to8} | 1 | 1.0 | | | | | | | | knotw k4, k0 | 1 | | | | | | 1.0 | | | vmulpd zmm25, zmm24, zmm24 | 1 | | | | | | 1.0 | | | vfmadd213pd zmm31{k4}, zmm24, zmm31 | 1 | 1.0 | | | | | | | | vfmadd213pd zmm31{k4}, zmm25, zmm31 | 1 | | | | | | 1.0 | | | vmulpd zmm26, zmm31, zmm15 | 1 | 1.0 | | | | | | | | vmulpd zmm28, zmm31, zmm14 | 1 | | | | | | 1.0 | | | vmulpd zmm29, zmm31, zmm26 | 1 | 1.0 | | | | | | | | vmulpd zmm27, zmm31, zmm29 | 1 | | | | | | 1.0 | | | vfmsub213pd zmm31, zmm29, zmm7 | 1 | 1.0 | | | | | | | | vmulpd zmm30, zmm27, zmm28 | 1 | | | | | | 1.0 | | | vmulpd zmm24, zmm30, zmm31 | 1 | 1.0 | | | | | | | | vfmadd231pd zmm13{k6}, zmm24, zmm1 | 1 | | | | | | 1.0 | | | vfmadd231pd zmm12{k6}, zmm24, zmm0 | 1 | 1.0 | | | | | | | | vfmadd231pd zmm11{k6}, zmm24, zmm21 | 1* | | | | | | | | | cmp r15, r14 | 0*F | | | | | | | | | jb 0xffffffffffffff19 Total Num Of Uops: 55 Analysis Notes: Backend allocation was stalled due to unavailable allocation resources. There were bubbles in the frontend.