MD-Bench/static_analysis/jan/analyses/gromacs-icc-avx512-dp-iaca.out
2023-02-13 14:15:08 +01:00

199 lines
20 KiB
Plaintext

Intel(R) Architecture Code Analyzer Version - v3.0-28-g1ba2cbb build date: 2017-10-23;16:42:45
Analyzed File - gromacs-icc-avx512-dp.o
Binary Format - 64Bit
Architecture - SKX
Analysis Type - Throughput
Throughput Analysis Report
--------------------------
Block Throughput: 62.00 Cycles Throughput Bottleneck: Backend
Loop Count: 22
Port Binding In Cycles Per Iteration:
--------------------------------------------------------------------------------------------------
| Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 |
--------------------------------------------------------------------------------------------------
| Cycles | 58.0 0.0 | 16.0 | 16.0 15.0 | 16.0 15.0 | 2.0 | 58.0 | 16.0 | 0.0 |
--------------------------------------------------------------------------------------------------
DV - Divider pipe (on port 0)
D - Data fetch pipe (on ports 2 and 3)
F - Macro Fusion with the previous instruction occurred
* - instruction micro-ops not bound to a port
^ - Micro Fusion occurred
# - ESP Tracking sync uop was issued
@ - SSE instruction followed an AVX256/AVX512 instruction, dozens of cycles penalty is expected
X - instruction not supported, was not accounted in Analysis
| Num Of | Ports pressure in cycles | |
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 |
-----------------------------------------------------------------------------------------
| 1 | | | 1.0 1.0 | | | | | | mov edx, dword ptr [r10+rsi*4]
| 1 | | | | | | | 1.0 | | inc rsi
| 1 | | | | 1.0 1.0 | | | | | vmovups zmm20, zmmword ptr [rsp+0x380]
| 1 | | | 1.0 1.0 | | | | | | vmovups zmm25, zmmword ptr [rsp+0x340]
| 1 | | | | 1.0 1.0 | | | | | vmovups zmm24, zmmword ptr [rsp+0x1c0]
| 1 | | | 1.0 1.0 | | | | | | vmovups zmm23, zmmword ptr [rsp+0x2c0]
| 1 | | | | 1.0 1.0 | | | | | vmovups zmm16, zmmword ptr [rsp+0x3c0]
| 1 | | | 1.0 1.0 | | | | | | vmovups zmm14, zmmword ptr [rsp+0x300]
| 1 | | | | 1.0 1.0 | | | | | vmovups zmm15, zmmword ptr [rsp+0x240]
| 1 | | | 1.0 1.0 | | | | | | vmovups zmm12, zmmword ptr [rsp+0x180]
| 1 | | | | 1.0 1.0 | | | | | vmovups zmm21, zmmword ptr [rsp+0x200]
| 1 | | | 1.0 1.0 | | | | | | vmovups zmm18, zmmword ptr [rsp+0x140]
| 1 | | | | 1.0 1.0 | | | | | vmovups zmm22, zmmword ptr [rsp+0x100]
| 1 | | | 1.0 1.0 | | | | | | vmovups zmm17, zmmword ptr [rsp+0x280]
| 1 | | 1.0 | | | | | | | lea r12d, ptr [rdx+rdx*2]
| 1 | | | | | | | 1.0 | | shl r12d, 0x3
| 1 | | 1.0 | | | | | | | lea r13d, ptr [rdx+rdx*1]
| 1 | | | | | | | 1.0 | | movsxd r12, r12d
| 1 | | 1.0 | | | | | | | cmp r13d, r11d
| 1 | | 1.0 | | | | | | | lea eax, ptr [rdx+rdx*1+0x1]
| 1 | | | | | | | 1.0 | | mov edx, 0x0
| 1 | | | | | | | 1.0 | | setz dl
| 1 | | 1.0 | | | | | | | cmp eax, r11d
| 1 | | | | | | | 1.0 | | mov eax, 0x0
| 1* | | | | | | | | | mov r13d, edx
| 2 | 1.0 | | | 1.0 1.0 | | | | | vsubpd zmm29, zmm20, zmmword ptr [r8+r12*8+0x80]
| 1 | | | | | | | 1.0 | | setz al
| 2 | | | 1.0 1.0 | | | 1.0 | | | vsubpd zmm26, zmm25, zmmword ptr [r8+r12*8+0x80]
| 2 | 1.0 | | | 1.0 1.0 | | | | | vsubpd zmm25, zmm24, zmmword ptr [r8+r12*8]
| 2 | | | 1.0 1.0 | | | 1.0 | | | vsubpd zmm24, zmm23, zmmword ptr [r8+r12*8+0x40]
| 2 | 1.0 | | | 1.0 1.0 | | | | | vsubpd zmm23, zmm16, zmmword ptr [r8+r12*8+0x80]
| 2 | | | 1.0 1.0 | | | 1.0 | | | vsubpd zmm20, zmm14, zmmword ptr [r8+r12*8+0x80]
| 2 | 1.0 | | | 1.0 1.0 | | | | | vsubpd zmm27, zmm12, zmmword ptr [r8+r12*8+0x40]
| 2 | | | 1.0 1.0 | | | 1.0 | | | vsubpd zmm28, zmm15, zmmword ptr [r8+r12*8]
| 2 | 1.0 | | | 1.0 1.0 | | | | | vsubpd zmm30, zmm21, zmmword ptr [r8+r12*8+0x40]
| 2 | | | 1.0 1.0 | | | 1.0 | | | vsubpd zmm21, zmm18, zmmword ptr [r8+r12*8+0x40]
| 2 | 1.0 | | | 1.0 1.0 | | | | | vsubpd zmm31, zmm22, zmmword ptr [r8+r12*8]
| 2 | | | 1.0 1.0 | | | 1.0 | | | vsubpd zmm22, zmm17, zmmword ptr [r8+r12*8]
| 1 | 1.0 | | | | | | | | vmulpd zmm13, zmm29, zmm29
| 1 | | | | | | 1.0 | | | vmulpd zmm15, zmm26, zmm26
| 1 | 1.0 | | | | | | | | vmulpd zmm12, zmm23, zmm23
| 1 | | | | | | 1.0 | | | vmulpd zmm14, zmm20, zmm20
| 1 | | | | 1.0 1.0 | | | | | vmovups zmm16, zmmword ptr [rsp+0xc0]
| 1 | 1.0 | | | | | | | | vfmadd231pd zmm13, zmm30, zmm30
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm15, zmm27, zmm27
| 1 | 1.0 | | | | | | | | vfmadd231pd zmm12, zmm24, zmm24
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm14, zmm21, zmm21
| 1 | 1.0 | | | | | | | | vfmadd231pd zmm13, zmm31, zmm31
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm15, zmm28, zmm28
| 1 | 1.0 | | | | | | | | vfmadd231pd zmm12, zmm25, zmm25
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm14, zmm22, zmm22
| 3 | 2.0 | | | | | 1.0 | | | vrcp14pd zmm19, zmm13
| 3 | 2.0 | | | | | 1.0 | | | vrcp14pd zmm18, zmm15
| 3 | 2.0 | | | | | 1.0 | | | vrcp14pd zmm17, zmm12
| 1 | | | | | | 1.0 | | | vcmppd k1, zmm13, zmm16, 0x11
| 1 | | | | | | 1.0 | | | vcmppd k6, zmm15, zmm16, 0x11
| 1 | | | | | | 1.0 | | | vcmppd k7, zmm12, zmm16, 0x11
| 1 | | | | | | 1.0 | | | vcmppd k0, zmm14, zmm16, 0x11
| 3 | 2.0 | | | | | 1.0 | | | vrcp14pd zmm15, zmm14
| 1 | | | 1.0 1.0 | | | | | | vmovups zmm16, zmmword ptr [rsp+0x40]
| 1 | | | | 1.0 1.0 | | | | | vmovups zmm12, zmmword ptr [rsp+0x80]
| 1 | 1.0 | | | | | | | | vmulpd zmm13, zmm19, zmm16
| 1 | | | | | | 1.0 | | | vmulpd zmm13, zmm19, zmm13
| 1 | | 1.0 | | | | | | | neg r13d
| 1 | 1.0 | | | | | | | | vmulpd zmm14, zmm19, zmm13
| 1* | | | | | | | | | mov r12d, eax
| 1 | | | | | | 1.0 | | | vfmsub213pd zmm13, zmm19, zmm1
| 1 | 1.0 | | | | | | | | vmulpd zmm19, zmm19, zmm12
| 1 | | | | | | 1.0 | | | vmulpd zmm13, zmm13, zmm19
| 1 | | 1.0 | | | | | | | add r13d, 0xff
| 1 | 1.0 | | | | | | | | vmulpd zmm14, zmm14, zmm13
| 1 | | | | | | | 1.0 | | nop
| 1 | | | 1.0 1.0 | | | | | | vmovups zmm13, zmmword ptr [rsp+0x400]
| 1 | | | | | | 1.0 | | | vmulpd zmm19, zmm10, zmm14
| 1 | | | | | | | 1.0 | | shl r12d, 0x4
| 1 | | 1.0 | | | | | | | sub r13d, r12d
| 1 | | | | | | 1.0 | | | kmovb k5, r13d
| 1 | 1.0 | | | | | | | | kmovw r13d, k1
| 1 | 1.0 | | | | | | | | kmovb r12d, k5
| 1 | | | | | | 1.0 | | | kmovb k5, r12d
| 1 | | | | | | 1.0 | | | kmovb k1, r13d
| 1* | | | | | | | | | mov r13d, eax
| 1 | 1.0 | | | | | | | | kandb k5, k5, k1
| 1 | 1.0 | | | | | | | | kmovb r12d, k5
| 1 | | | | | | 1.0 | | | kmovw k5, r12d
| 1 | | 1.0 | | | | | | | lea r12d, ptr [rdx+rdx*1]
| 1 | 1.0 | | | | | | | | vfmadd231pd zmm9{k5}, zmm19, zmm29
| 1 | | | | | | | 1.0 | | neg r12d
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm13{k5}, zmm19, zmm31
| 1 | | | | 1.0 1.0 | | | | | vmovups zmm31, zmmword ptr [rsp+0x440]
| 1 | 1.0 | | | | | | | | vmulpd zmm29, zmm18, zmm16
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm31{k5}, zmm19, zmm30
| 2^ | | | 1.0 | | 1.0 | | | | vmovups zmmword ptr [rsp+0x400], zmm13
| 1 | 1.0 | | | | | | | | vmulpd zmm30, zmm18, zmm29
| 2^ | | | | 1.0 | 1.0 | | | | vmovups zmmword ptr [rsp+0x440], zmm31
| 1 | | | | | | 1.0 | | | vmulpd zmm13, zmm18, zmm30
| 1 | 1.0 | | | | | | | | vfmsub213pd zmm30, zmm18, zmm1
| 1 | | | | | | 1.0 | | | vmulpd zmm18, zmm18, zmm12
| 1 | 1.0 | | | | | | | | vmulpd zmm14, zmm30, zmm18
| 1 | | 1.0 | | | | | | | add r12d, 0xff
| 1 | | | | | | 1.0 | | | vmulpd zmm19, zmm13, zmm14
| 1 | 1.0 | | | | | | | | vmulpd zmm29, zmm10, zmm19
| 1 | | | | | | | 1.0 | | shl r13d, 0x5
| 1 | | 1.0 | | | | | | | sub r12d, r13d
| 1 | | | | | | 1.0 | | | kmovb k1, r12d
| 1 | 1.0 | | | | | | | | kmovw r12d, k6
| 1 | 1.0 | | | | | | | | kmovb r13d, k1
| 1 | | | | | | 1.0 | | | kmovb k1, r13d
| 1 | | | | | | 1.0 | | | kmovb k6, r12d
| 1* | | | | | | | | | mov r12d, eax
| 1 | 1.0 | | | | | | | | kandb k1, k1, k6
| 1 | 1.0 | | | | | | | | kmovb r13d, k1
| 1 | | | | | | 1.0 | | | kmovw k1, r13d
| 1 | | 1.0 | | | | | | | lea r13d, ptr [rdx*4]
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm6{k1}, zmm29, zmm26
| 1 | | | | | | | 1.0 | | neg r13d
| 1 | 1.0 | | | | | | | | vfmadd231pd zmm7{k1}, zmm29, zmm27
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm8{k1}, zmm29, zmm28
| 1 | 1.0 | | | | | | | | vmulpd zmm26, zmm17, zmm16
| 1 | | | | | | 1.0 | | | vmulpd zmm28, zmm17, zmm12
| 1 | 1.0 | | | | | | | | vmulpd zmm16, zmm15, zmm16
| 1 | | | | | | 1.0 | | | vmulpd zmm12, zmm15, zmm12
| 1 | 1.0 | | | | | | | | vmulpd zmm27, zmm17, zmm26
| 1 | | | | | | 1.0 | | | vmulpd zmm19, zmm15, zmm16
| 1 | 1.0 | | | | | | | | vmulpd zmm13, zmm17, zmm27
| 1 | | | | | | 1.0 | | | vfmsub213pd zmm27, zmm17, zmm1
| 1 | 1.0 | | | | | | | | vmulpd zmm14, zmm27, zmm28
| 1 | | | | | | | 1.0 | | add r13d, 0xff
| 1 | | | | | | 1.0 | | | vmulpd zmm17, zmm13, zmm14
| 1 | | | | | | | 1.0 | | shl edx, 0x3
| 1 | | | | | | | 1.0 | | shl r12d, 0x6
| 1 | | 1.0 | | | | | | | neg edx
| 1 | 1.0 | | | | | | | | vmulpd zmm18, zmm10, zmm17
| 1 | | 1.0 | | | | | | | sub r13d, r12d
| 1 | | | | | | 1.0 | | | kmovb k6, r13d
| 1 | | 1.0 | | | | | | | add edx, 0xff
| 1 | | | | | | | 1.0 | | shl eax, 0x7
| 1 | | 1.0 | | | | | | | sub edx, eax
| 1 | 1.0 | | | | | | | | kmovb eax, k6
| 1 | | | | | | 1.0 | | | kmovb k6, eax
| 1 | 1.0 | | | | | | | | kmovw eax, k7
| 1 | | | | | | 1.0 | | | kmovb k7, eax
| 1 | 1.0 | | | | | | | | kandb k7, k6, k7
| 1 | | | | | | 1.0 | | | kmovb k6, edx
| 1 | 1.0 | | | | | | | | kmovb edx, k7
| 1 | | | | | | 1.0 | | | kmovw k7, edx
| 1 | 1.0 | | | | | | | | kmovw edx, k0
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm11{k7}, zmm18, zmm23
| 1 | 1.0 | | | | | | | | vfmadd231pd zmm4{k7}, zmm18, zmm24
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm5{k7}, zmm18, zmm25
| 1 | 1.0 | | | | | | | | vmulpd zmm23, zmm15, zmm19
| 1 | | | | | | 1.0 | | | vfmsub213pd zmm19, zmm15, zmm1
| 1 | 1.0 | | | | | | | | vmulpd zmm15, zmm19, zmm12
| 1 | | | | | | 1.0 | | | vmulpd zmm24, zmm23, zmm15
| 1 | 1.0 | | | | | | | | vmulpd zmm25, zmm10, zmm24
| 1 | 1.0 | | | | | | | | kmovb eax, k6
| 1 | | | | | | 1.0 | | | kmovb k6, eax
| 1 | | | | | | 1.0 | | | kmovb k0, edx
| 1 | 1.0 | | | | | | | | kandb k0, k6, k0
| 1 | 1.0 | | | | | | | | kmovb r12d, k0
| 1 | | | | | | 1.0 | | | kmovw k6, r12d
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm3{k6}, zmm25, zmm22
| 1 | 1.0 | | | | | | | | vfmadd231pd zmm2{k6}, zmm25, zmm21
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm0{k6}, zmm25, zmm20
| 1* | | | | | | | | | cmp rsi, rdi
| 0*F | | | | | | | | | jl 0xfffffffffffffc6f
Total Num Of Uops: 187
Analysis Notes:
Backend allocation was stalled due to unavailable allocation resources.