80 lines
6.9 KiB
Plaintext
80 lines
6.9 KiB
Plaintext
|
iwia021h@testfront1:~/MD-Bench/asm$ iaca -arch SKX force_aos_geq1200_markers.o
|
||
|
Intel(R) Architecture Code Analyzer Version - v3.0-28-g1ba2cbb build date: 2017-10-23;16:42:45
|
||
|
Analyzed File - force_aos_geq1200_markers.o
|
||
|
Binary Format - 64Bit
|
||
|
Architecture - SKX
|
||
|
Analysis Type - Throughput
|
||
|
|
||
|
Throughput Analysis Report
|
||
|
--------------------------
|
||
|
Block Throughput: 33.05 Cycles Throughput Bottleneck: Backend
|
||
|
Loop Count: 22
|
||
|
Port Binding In Cycles Per Iteration:
|
||
|
--------------------------------------------------------------------------------------------------
|
||
|
| Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 |
|
||
|
--------------------------------------------------------------------------------------------------
|
||
|
| Cycles | 20.0 0.0 | 4.5 | 13.0 13.0 | 13.0 13.0 | 0.0 | 18.0 | 4.5 | 0.0 |
|
||
|
--------------------------------------------------------------------------------------------------
|
||
|
|
||
|
DV - Divider pipe (on port 0)
|
||
|
D - Data fetch pipe (on ports 2 and 3)
|
||
|
F - Macro Fusion with the previous instruction occurred
|
||
|
* - instruction micro-ops not bound to a port
|
||
|
^ - Micro Fusion occurred
|
||
|
# - ESP Tracking sync uop was issued
|
||
|
@ - SSE instruction followed an AVX256/AVX512 instruction, dozens of cycles penalty is expected
|
||
|
X - instruction not supported, was not accounted in Analysis
|
||
|
|
||
|
| Num Of | Ports pressure in cycles | |
|
||
|
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 |
|
||
|
-----------------------------------------------------------------------------------------
|
||
|
| 1 | | | | | | 1.0 | | | vpcmpgtd k3, ymm2, ymm3
|
||
|
| 2 | | 1.0 | 1.0 1.0 | | | | | | vmovdqu32 ymm17{k3}{z}, ymmword ptr [r15+r13*4]
|
||
|
| 1 | 1.0 | | | | | | | | kmovw r9d, k3
|
||
|
| 1 | | 1.0 | | | | | | | vpaddd ymm18, ymm17, ymm17
|
||
|
| 1 | | 1.0 | | | | | | | vpaddd ymm17, ymm17, ymm18
|
||
|
| 1 | 1.0 | | | | | | | | kmovw k1, k3
|
||
|
| 1 | 1.0 | | | | | | | | kmovw k2, k3
|
||
|
| 1* | | | | | | | | | vpxord zmm18, zmm18, zmm18
|
||
|
| 1* | | | | | | | | | vpxord zmm19, zmm19, zmm19
|
||
|
| 1* | | | | | | | | | vpxord zmm20, zmm20, zmm20
|
||
|
| 5^ | 1.0 | | 4.0 4.0 | 4.0 4.0 | | 1.0 | 1.0 | | vgatherdpd zmm18, k1, zmmword ptr [rdi+ymm17*8+0x10]
|
||
|
| 5^ | 1.0 | | 4.0 4.0 | 4.0 4.0 | | 1.0 | 1.0 | | vgatherdpd zmm19, k2, zmmword ptr [rdi+ymm17*8+0x8]
|
||
|
| 5^ | 1.0 | | 4.0 4.0 | 4.0 4.0 | | 1.0 | 1.0 | | vgatherdpd zmm20, k3, zmmword ptr [rdi+ymm17*8]
|
||
|
| 1 | | 0.5 | | | | | 0.5 | | add r13, 0x8
|
||
|
| 1 | | 1.0 | | | | | | | vpaddd ymm3, ymm3, ymm16
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vsubpd zmm29, zmm4, zmm18
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vsubpd zmm27, zmm0, zmm19
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vsubpd zmm26, zmm1, zmm20
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm25, zmm27, zmm27
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd231pd zmm25, zmm26, zmm26
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd231pd zmm25, zmm29, zmm29
|
||
|
| 3 | 2.0 | | | | | 1.0 | | | vrcp14pd zmm24, zmm25
|
||
|
| 1 | | | | | | 1.0 | | | vcmppd k2, zmm25, zmm14, 0x1
|
||
|
| 1 | | | | | | 1.0 | | | vfpclasspd k0, zmm24, 0x1e
|
||
|
| 1 | 1.0 | | | | | | | | kmovw edx, k2
|
||
|
| 1 | 1.0 | | | | | | | | knotw k1, k0
|
||
|
| 1* | | | | | | | | | vmovaps zmm17, zmm25
|
||
|
| 1 | | | | | | | 1.0 | | and r9d, edx
|
||
|
| 2^ | | | | 1.0 1.0 | | 1.0 | | | vfnmadd213pd zmm17, zmm24, qword ptr [rip]{1to8}
|
||
|
| 1 | | | | | | 1.0 | | | kmovw k3, r9d
|
||
|
| 1 | 1.0 | | | | | | | | vmulpd zmm18, zmm17, zmm17
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd213pd zmm24{k1}, zmm17, zmm24
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd213pd zmm24{k1}, zmm18, zmm24
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm19, zmm24, zmm13
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm21, zmm24, zmm10
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm22, zmm24, zmm19
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm20, zmm24, zmm22
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vfmsub213pd zmm24, zmm22, zmm5
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm23, zmm20, zmm21
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm28, zmm23, zmm24
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd231pd zmm9{k3}, zmm28, zmm26
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd231pd zmm8{k3}, zmm28, zmm27
|
||
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd231pd zmm11{k3}, zmm28, zmm29
|
||
|
| 1* | | | | | | | | | cmp r13, rbx
|
||
|
| 0*F | | | | | | | | | jb 0xfffffffffffffef7
|
||
|
Total Num Of Uops: 60
|
||
|
Analysis Notes:
|
||
|
Backend allocation was stalled due to unavailable allocation resources.
|
||
|
There were bubbles in the frontend.
|