MD-Bench/arch_analysis/avx512/iaca_force_soa_lt8.txt
Rafael Ravedutti 9c28ff1e9e Add arch_analysis directory and first AVX2 results
Signed-off-by: Rafael Ravedutti <rafaelravedutti@gmail.com>
2021-05-06 23:18:28 +02:00

79 lines
6.7 KiB
Plaintext

iwia021h@testfront1:~/MD-Bench/asm$ iaca -arch SKX force_soa_lt8_markers.o
Intel(R) Architecture Code Analyzer Version - v3.0-28-g1ba2cbb build date: 2017-10-23;16:42:45
Analyzed File - force_soa_lt8_markers.o
Binary Format - 64Bit
Architecture - SKX
Analysis Type - Throughput
Throughput Analysis Report
--------------------------
Block Throughput: 35.00 Cycles Throughput Bottleneck: Backend
Loop Count: 22
Port Binding In Cycles Per Iteration:
--------------------------------------------------------------------------------------------------
| Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 |
--------------------------------------------------------------------------------------------------
| Cycles | 20.0 0.0 | 4.0 | 13.0 13.0 | 13.0 13.0 | 0.0 | 20.0 | 4.0 | 0.0 |
--------------------------------------------------------------------------------------------------
DV - Divider pipe (on port 0)
D - Data fetch pipe (on ports 2 and 3)
F - Macro Fusion with the previous instruction occurred
* - instruction micro-ops not bound to a port
^ - Micro Fusion occurred
# - ESP Tracking sync uop was issued
@ - SSE instruction followed an AVX256/AVX512 instruction, dozens of cycles penalty is expected
X - instruction not supported, was not accounted in Analysis
| Num Of | Ports pressure in cycles | |
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 |
-----------------------------------------------------------------------------------------
| 1 | | 1.0 | | | | | | | imul r8, r12
| 1 | | | | | | 1.0 | | | vbroadcastsd zmm9, xmm9
| 1 | | | | | | 1.0 | | | vbroadcastsd zmm2, xmm8
| 1 | | | | | | 1.0 | | | vbroadcastsd zmm10, xmm10
| 1 | | 1.0 | | | | | | | sub r13d, ebx
| 1 | | 1.0 | | | | | | | add r8, r11
| 1 | | | | | | 1.0 | | | vpbroadcastd ymm0, r13d
| 1 | | | | | | 1.0 | | | vpcmpgtd k5, ymm0, ymm17
| 1 | | | | | | | 1.0 | | movsxd rbx, ebx
| 1* | | | | | | | | | vmovaps zmm4, zmm19
| 1 | 1.0 | | | | | | | | kmovw k2, k5
| 1* | | | | | | | | | vmovaps zmm3, zmm19
| 2 | | 1.0 | 1.0 1.0 | | | | | | vmovdqu32 ymm1{k5}{z}, ymmword ptr [r8+rbx*4]
| 1 | 1.0 | | | | | | | | kmovw k1, k5
| 1* | | | | | | | | | vmovaps zmm5, zmm19
| 1 | 1.0 | | | | | | | | kmovw k3, k5
| 5^ | 2.0 | | 4.0 4.0 | 4.0 4.0 | | | 1.0 | | vgatherdpd zmm5, k3, zmmword ptr [rsi+ymm1*8]
| 5^ | 1.0 | | 4.0 4.0 | 4.0 4.0 | | 1.0 | 1.0 | | vgatherdpd zmm4, k2, zmmword ptr [rax+ymm1*8]
| 5^ | 1.0 | | 4.0 4.0 | 4.0 4.0 | | 1.0 | 1.0 | | vgatherdpd zmm3, k1, zmmword ptr [rdx+ymm1*8]
| 1 | 1.0 | | | | | | | | vsubpd zmm30, zmm10, zmm5
| 1 | | | | | | 1.0 | | | vsubpd zmm28, zmm9, zmm4
| 1 | 1.0 | | | | | | | | vsubpd zmm27, zmm2, zmm3
| 1 | | | | | | 1.0 | | | vmulpd zmm26, zmm28, zmm28
| 1 | 1.0 | | | | | | | | vfmadd231pd zmm26, zmm27, zmm27
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm26, zmm30, zmm30
| 3 | 2.0 | | | | | 1.0 | | | vrcp14pd zmm25, zmm26
| 1 | | | | | | 1.0 | | | vcmppd k6{k5}, zmm26, zmm16, 0x1
| 1 | | | | | | 1.0 | | | vfpclasspd k0, zmm25, 0x1e
| 1* | | | | | | | | | vmovaps zmm6, zmm26
| 2^ | 1.0 | | | 1.0 1.0 | | | | | vfnmadd213pd zmm6, zmm25, qword ptr [rip]{1to8}
| 1 | 1.0 | | | | | | | | knotw k4, k0
| 1 | | | | | | 1.0 | | | vmulpd zmm8, zmm6, zmm6
| 1 | 1.0 | | | | | | | | vfmadd213pd zmm25{k4}, zmm6, zmm25
| 1 | | | | | | 1.0 | | | vfmadd213pd zmm25{k4}, zmm8, zmm25
| 1 | 1.0 | | | | | | | | vmulpd zmm20, zmm25, zmm15
| 1 | | | | | | 1.0 | | | vmulpd zmm22, zmm25, zmm14
| 1 | 1.0 | | | | | | | | vmulpd zmm23, zmm25, zmm20
| 1 | | | | | | 1.0 | | | vmulpd zmm21, zmm25, zmm23
| 1 | 1.0 | | | | | | | | vfmsub213pd zmm25, zmm23, zmm7
| 1 | | | | | | 1.0 | | | vmulpd zmm24, zmm21, zmm22
| 1 | 1.0 | | | | | | | | vmulpd zmm29, zmm24, zmm25
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm13{k6}, zmm29, zmm27
| 1 | 1.0 | | | | | | | | vfmadd231pd zmm12{k6}, zmm29, zmm28
| 1 | | | | | | 1.0 | | | vfmadd231pd zmm11{k6}, zmm29, zmm30
Total Num Of Uops: 60
Analysis Notes:
Backend allocation was stalled due to unavailable allocation resources.
There were bubbles in the frontend.