Intel(R) Architecture Code Analyzer Version - v3.0-28-g1ba2cbb build date: 2017-10-23;16:42:45 Analyzed File - gromacs-icc-avx512-dp.o Binary Format - 64Bit Architecture - SKX Analysis Type - Throughput Throughput Analysis Report -------------------------- Block Throughput: 62.00 Cycles Throughput Bottleneck: Backend Loop Count: 22 Port Binding In Cycles Per Iteration: -------------------------------------------------------------------------------------------------- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------------------------- | Cycles | 58.0 0.0 | 16.0 | 16.0 15.0 | 16.0 15.0 | 2.0 | 58.0 | 16.0 | 0.0 | -------------------------------------------------------------------------------------------------- DV - Divider pipe (on port 0) D - Data fetch pipe (on ports 2 and 3) F - Macro Fusion with the previous instruction occurred * - instruction micro-ops not bound to a port ^ - Micro Fusion occurred # - ESP Tracking sync uop was issued @ - SSE instruction followed an AVX256/AVX512 instruction, dozens of cycles penalty is expected X - instruction not supported, was not accounted in Analysis | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | ----------------------------------------------------------------------------------------- | 1 | | | 1.0 1.0 | | | | | | mov edx, dword ptr [r10+rsi*4] | 1 | | | | | | | 1.0 | | inc rsi | 1 | | | | 1.0 1.0 | | | | | vmovups zmm20, zmmword ptr [rsp+0x380] | 1 | | | 1.0 1.0 | | | | | | vmovups zmm25, zmmword ptr [rsp+0x340] | 1 | | | | 1.0 1.0 | | | | | vmovups zmm24, zmmword ptr [rsp+0x1c0] | 1 | | | 1.0 1.0 | | | | | | vmovups zmm23, zmmword ptr [rsp+0x2c0] | 1 | | | | 1.0 1.0 | | | | | vmovups zmm16, zmmword ptr [rsp+0x3c0] | 1 | | | 1.0 1.0 | | | | | | vmovups zmm14, zmmword ptr [rsp+0x300] | 1 | | | | 1.0 1.0 | | | | | vmovups zmm15, zmmword ptr [rsp+0x240] | 1 | | | 1.0 1.0 | | | | | | vmovups zmm12, zmmword ptr [rsp+0x180] | 1 | | | | 1.0 1.0 | | | | | vmovups zmm21, zmmword ptr [rsp+0x200] | 1 | | | 1.0 1.0 | | | | | | vmovups zmm18, zmmword ptr [rsp+0x140] | 1 | | | | 1.0 1.0 | | | | | vmovups zmm22, zmmword ptr [rsp+0x100] | 1 | | | 1.0 1.0 | | | | | | vmovups zmm17, zmmword ptr [rsp+0x280] | 1 | | 1.0 | | | | | | | lea r12d, ptr [rdx+rdx*2] | 1 | | | | | | | 1.0 | | shl r12d, 0x3 | 1 | | 1.0 | | | | | | | lea r13d, ptr [rdx+rdx*1] | 1 | | | | | | | 1.0 | | movsxd r12, r12d | 1 | | 1.0 | | | | | | | cmp r13d, r11d | 1 | | 1.0 | | | | | | | lea eax, ptr [rdx+rdx*1+0x1] | 1 | | | | | | | 1.0 | | mov edx, 0x0 | 1 | | | | | | | 1.0 | | setz dl | 1 | | 1.0 | | | | | | | cmp eax, r11d | 1 | | | | | | | 1.0 | | mov eax, 0x0 | 1* | | | | | | | | | mov r13d, edx | 2 | 1.0 | | | 1.0 1.0 | | | | | vsubpd zmm29, zmm20, zmmword ptr [r8+r12*8+0x80] | 1 | | | | | | | 1.0 | | setz al | 2 | | | 1.0 1.0 | | | 1.0 | | | vsubpd zmm26, zmm25, zmmword ptr [r8+r12*8+0x80] | 2 | 1.0 | | | 1.0 1.0 | | | | | vsubpd zmm25, zmm24, zmmword ptr [r8+r12*8] | 2 | | | 1.0 1.0 | | | 1.0 | | | vsubpd zmm24, zmm23, zmmword ptr [r8+r12*8+0x40] | 2 | 1.0 | | | 1.0 1.0 | | | | | vsubpd zmm23, zmm16, zmmword ptr [r8+r12*8+0x80] | 2 | | | 1.0 1.0 | | | 1.0 | | | vsubpd zmm20, zmm14, zmmword ptr [r8+r12*8+0x80] | 2 | 1.0 | | | 1.0 1.0 | | | | | vsubpd zmm27, zmm12, zmmword ptr [r8+r12*8+0x40] | 2 | | | 1.0 1.0 | | | 1.0 | | | vsubpd zmm28, zmm15, zmmword ptr [r8+r12*8] | 2 | 1.0 | | | 1.0 1.0 | | | | | vsubpd zmm30, zmm21, zmmword ptr [r8+r12*8+0x40] | 2 | | | 1.0 1.0 | | | 1.0 | | | vsubpd zmm21, zmm18, zmmword ptr [r8+r12*8+0x40] | 2 | 1.0 | | | 1.0 1.0 | | | | | vsubpd zmm31, zmm22, zmmword ptr [r8+r12*8] | 2 | | | 1.0 1.0 | | | 1.0 | | | vsubpd zmm22, zmm17, zmmword ptr [r8+r12*8] | 1 | 1.0 | | | | | | | | vmulpd zmm13, zmm29, zmm29 | 1 | | | | | | 1.0 | | | vmulpd zmm15, zmm26, zmm26 | 1 | 1.0 | | | | | | | | vmulpd zmm12, zmm23, zmm23 | 1 | | | | | | 1.0 | | | vmulpd zmm14, zmm20, zmm20 | 1 | | | | 1.0 1.0 | | | | | vmovups zmm16, zmmword ptr [rsp+0xc0] | 1 | 1.0 | | | | | | | | vfmadd231pd zmm13, zmm30, zmm30 | 1 | | | | | | 1.0 | | | vfmadd231pd zmm15, zmm27, zmm27 | 1 | 1.0 | | | | | | | | vfmadd231pd zmm12, zmm24, zmm24 | 1 | | | | | | 1.0 | | | vfmadd231pd zmm14, zmm21, zmm21 | 1 | 1.0 | | | | | | | | vfmadd231pd zmm13, zmm31, zmm31 | 1 | | | | | | 1.0 | | | vfmadd231pd zmm15, zmm28, zmm28 | 1 | 1.0 | | | | | | | | vfmadd231pd zmm12, zmm25, zmm25 | 1 | | | | | | 1.0 | | | vfmadd231pd zmm14, zmm22, zmm22 | 3 | 2.0 | | | | | 1.0 | | | vrcp14pd zmm19, zmm13 | 3 | 2.0 | | | | | 1.0 | | | vrcp14pd zmm18, zmm15 | 3 | 2.0 | | | | | 1.0 | | | vrcp14pd zmm17, zmm12 | 1 | | | | | | 1.0 | | | vcmppd k1, zmm13, zmm16, 0x11 | 1 | | | | | | 1.0 | | | vcmppd k6, zmm15, zmm16, 0x11 | 1 | | | | | | 1.0 | | | vcmppd k7, zmm12, zmm16, 0x11 | 1 | | | | | | 1.0 | | | vcmppd k0, zmm14, zmm16, 0x11 | 3 | 2.0 | | | | | 1.0 | | | vrcp14pd zmm15, zmm14 | 1 | | | 1.0 1.0 | | | | | | vmovups zmm16, zmmword ptr [rsp+0x40] | 1 | | | | 1.0 1.0 | | | | | vmovups zmm12, zmmword ptr [rsp+0x80] | 1 | 1.0 | | | | | | | | vmulpd zmm13, zmm19, zmm16 | 1 | | | | | | 1.0 | | | vmulpd zmm13, zmm19, zmm13 | 1 | | 1.0 | | | | | | | neg r13d | 1 | 1.0 | | | | | | | | vmulpd zmm14, zmm19, zmm13 | 1* | | | | | | | | | mov r12d, eax | 1 | | | | | | 1.0 | | | vfmsub213pd zmm13, zmm19, zmm1 | 1 | 1.0 | | | | | | | | vmulpd zmm19, zmm19, zmm12 | 1 | | | | | | 1.0 | | | vmulpd zmm13, zmm13, zmm19 | 1 | | 1.0 | | | | | | | add r13d, 0xff | 1 | 1.0 | | | | | | | | vmulpd zmm14, zmm14, zmm13 | 1 | | | | | | | 1.0 | | nop | 1 | | | 1.0 1.0 | | | | | | vmovups zmm13, zmmword ptr [rsp+0x400] | 1 | | | | | | 1.0 | | | vmulpd zmm19, zmm10, zmm14 | 1 | | | | | | | 1.0 | | shl r12d, 0x4 | 1 | | 1.0 | | | | | | | sub r13d, r12d | 1 | | | | | | 1.0 | | | kmovb k5, r13d | 1 | 1.0 | | | | | | | | kmovw r13d, k1 | 1 | 1.0 | | | | | | | | kmovb r12d, k5 | 1 | | | | | | 1.0 | | | kmovb k5, r12d | 1 | | | | | | 1.0 | | | kmovb k1, r13d | 1* | | | | | | | | | mov r13d, eax | 1 | 1.0 | | | | | | | | kandb k5, k5, k1 | 1 | 1.0 | | | | | | | | kmovb r12d, k5 | 1 | | | | | | 1.0 | | | kmovw k5, r12d | 1 | | 1.0 | | | | | | | lea r12d, ptr [rdx+rdx*1] | 1 | 1.0 | | | | | | | | vfmadd231pd zmm9{k5}, zmm19, zmm29 | 1 | | | | | | | 1.0 | | neg r12d | 1 | | | | | | 1.0 | | | vfmadd231pd zmm13{k5}, zmm19, zmm31 | 1 | | | | 1.0 1.0 | | | | | vmovups zmm31, zmmword ptr [rsp+0x440] | 1 | 1.0 | | | | | | | | vmulpd zmm29, zmm18, zmm16 | 1 | | | | | | 1.0 | | | vfmadd231pd zmm31{k5}, zmm19, zmm30 | 2^ | | | 1.0 | | 1.0 | | | | vmovups zmmword ptr [rsp+0x400], zmm13 | 1 | 1.0 | | | | | | | | vmulpd zmm30, zmm18, zmm29 | 2^ | | | | 1.0 | 1.0 | | | | vmovups zmmword ptr [rsp+0x440], zmm31 | 1 | | | | | | 1.0 | | | vmulpd zmm13, zmm18, zmm30 | 1 | 1.0 | | | | | | | | vfmsub213pd zmm30, zmm18, zmm1 | 1 | | | | | | 1.0 | | | vmulpd zmm18, zmm18, zmm12 | 1 | 1.0 | | | | | | | | vmulpd zmm14, zmm30, zmm18 | 1 | | 1.0 | | | | | | | add r12d, 0xff | 1 | | | | | | 1.0 | | | vmulpd zmm19, zmm13, zmm14 | 1 | 1.0 | | | | | | | | vmulpd zmm29, zmm10, zmm19 | 1 | | | | | | | 1.0 | | shl r13d, 0x5 | 1 | | 1.0 | | | | | | | sub r12d, r13d | 1 | | | | | | 1.0 | | | kmovb k1, r12d | 1 | 1.0 | | | | | | | | kmovw r12d, k6 | 1 | 1.0 | | | | | | | | kmovb r13d, k1 | 1 | | | | | | 1.0 | | | kmovb k1, r13d | 1 | | | | | | 1.0 | | | kmovb k6, r12d | 1* | | | | | | | | | mov r12d, eax | 1 | 1.0 | | | | | | | | kandb k1, k1, k6 | 1 | 1.0 | | | | | | | | kmovb r13d, k1 | 1 | | | | | | 1.0 | | | kmovw k1, r13d | 1 | | 1.0 | | | | | | | lea r13d, ptr [rdx*4] | 1 | | | | | | 1.0 | | | vfmadd231pd zmm6{k1}, zmm29, zmm26 | 1 | | | | | | | 1.0 | | neg r13d | 1 | 1.0 | | | | | | | | vfmadd231pd zmm7{k1}, zmm29, zmm27 | 1 | | | | | | 1.0 | | | vfmadd231pd zmm8{k1}, zmm29, zmm28 | 1 | 1.0 | | | | | | | | vmulpd zmm26, zmm17, zmm16 | 1 | | | | | | 1.0 | | | vmulpd zmm28, zmm17, zmm12 | 1 | 1.0 | | | | | | | | vmulpd zmm16, zmm15, zmm16 | 1 | | | | | | 1.0 | | | vmulpd zmm12, zmm15, zmm12 | 1 | 1.0 | | | | | | | | vmulpd zmm27, zmm17, zmm26 | 1 | | | | | | 1.0 | | | vmulpd zmm19, zmm15, zmm16 | 1 | 1.0 | | | | | | | | vmulpd zmm13, zmm17, zmm27 | 1 | | | | | | 1.0 | | | vfmsub213pd zmm27, zmm17, zmm1 | 1 | 1.0 | | | | | | | | vmulpd zmm14, zmm27, zmm28 | 1 | | | | | | | 1.0 | | add r13d, 0xff | 1 | | | | | | 1.0 | | | vmulpd zmm17, zmm13, zmm14 | 1 | | | | | | | 1.0 | | shl edx, 0x3 | 1 | | | | | | | 1.0 | | shl r12d, 0x6 | 1 | | 1.0 | | | | | | | neg edx | 1 | 1.0 | | | | | | | | vmulpd zmm18, zmm10, zmm17 | 1 | | 1.0 | | | | | | | sub r13d, r12d | 1 | | | | | | 1.0 | | | kmovb k6, r13d | 1 | | 1.0 | | | | | | | add edx, 0xff | 1 | | | | | | | 1.0 | | shl eax, 0x7 | 1 | | 1.0 | | | | | | | sub edx, eax | 1 | 1.0 | | | | | | | | kmovb eax, k6 | 1 | | | | | | 1.0 | | | kmovb k6, eax | 1 | 1.0 | | | | | | | | kmovw eax, k7 | 1 | | | | | | 1.0 | | | kmovb k7, eax | 1 | 1.0 | | | | | | | | kandb k7, k6, k7 | 1 | | | | | | 1.0 | | | kmovb k6, edx | 1 | 1.0 | | | | | | | | kmovb edx, k7 | 1 | | | | | | 1.0 | | | kmovw k7, edx | 1 | 1.0 | | | | | | | | kmovw edx, k0 | 1 | | | | | | 1.0 | | | vfmadd231pd zmm11{k7}, zmm18, zmm23 | 1 | 1.0 | | | | | | | | vfmadd231pd zmm4{k7}, zmm18, zmm24 | 1 | | | | | | 1.0 | | | vfmadd231pd zmm5{k7}, zmm18, zmm25 | 1 | 1.0 | | | | | | | | vmulpd zmm23, zmm15, zmm19 | 1 | | | | | | 1.0 | | | vfmsub213pd zmm19, zmm15, zmm1 | 1 | 1.0 | | | | | | | | vmulpd zmm15, zmm19, zmm12 | 1 | | | | | | 1.0 | | | vmulpd zmm24, zmm23, zmm15 | 1 | 1.0 | | | | | | | | vmulpd zmm25, zmm10, zmm24 | 1 | 1.0 | | | | | | | | kmovb eax, k6 | 1 | | | | | | 1.0 | | | kmovb k6, eax | 1 | | | | | | 1.0 | | | kmovb k0, edx | 1 | 1.0 | | | | | | | | kandb k0, k6, k0 | 1 | 1.0 | | | | | | | | kmovb r12d, k0 | 1 | | | | | | 1.0 | | | kmovw k6, r12d | 1 | | | | | | 1.0 | | | vfmadd231pd zmm3{k6}, zmm25, zmm22 | 1 | 1.0 | | | | | | | | vfmadd231pd zmm2{k6}, zmm25, zmm21 | 1 | | | | | | 1.0 | | | vfmadd231pd zmm0{k6}, zmm25, zmm20 | 1* | | | | | | | | | cmp rsi, rdi | 0*F | | | | | | | | | jl 0xfffffffffffffc6f Total Num Of Uops: 187 Analysis Notes: Backend allocation was stalled due to unavailable allocation resources.