9c28ff1e9e
Signed-off-by: Rafael Ravedutti <rafaelravedutti@gmail.com>
83 lines
7.2 KiB
Plaintext
83 lines
7.2 KiB
Plaintext
Intel(R) Architecture Code Analyzer Version - v3.0-28-g1ba2cbb build date: 2017-10-23;16:42:45
|
|
Analyzed File - force_aos_lt8_markers.o
|
|
Binary Format - 64Bit
|
|
Architecture - SKX
|
|
Analysis Type - Throughput
|
|
|
|
Throughput Analysis Report
|
|
--------------------------
|
|
Block Throughput: 69.79 Cycles Throughput Bottleneck: Backend
|
|
Loop Count: 22
|
|
Port Binding In Cycles Per Iteration:
|
|
--------------------------------------------------------------------------------------------------
|
|
| Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 |
|
|
--------------------------------------------------------------------------------------------------
|
|
| Cycles | 21.0 0.0 | 5.5 | 13.0 13.0 | 13.0 13.0 | 0.0 | 21.0 | 5.5 | 0.0 |
|
|
--------------------------------------------------------------------------------------------------
|
|
|
|
DV - Divider pipe (on port 0)
|
|
D - Data fetch pipe (on ports 2 and 3)
|
|
F - Macro Fusion with the previous instruction occurred
|
|
* - instruction micro-ops not bound to a port
|
|
^ - Micro Fusion occurred
|
|
# - ESP Tracking sync uop was issued
|
|
@ - SSE instruction followed an AVX256/AVX512 instruction, dozens of cycles penalty is expected
|
|
X - instruction not supported, was not accounted in Analysis
|
|
|
|
| Num Of | Ports pressure in cycles | |
|
|
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 |
|
|
-----------------------------------------------------------------------------------------
|
|
| 1 | | 1.0 | | | | | | | imul rcx, r8
|
|
| 1 | | | | | | 1.0 | | | vbroadcastsd zmm4, xmm6
|
|
| 1 | | | | | | | 1.0 | | sub r11d, r14d
|
|
| 1 | | 0.5 | | | | | 0.5 | | add rcx, r10
|
|
| 1 | | | | | | 1.0 | | | vpbroadcastd ymm0, r11d
|
|
| 1 | | | | | | 1.0 | | | vpcmpgtd k3, ymm0, ymm15
|
|
| 1 | | 0.5 | | | | | 0.5 | | movsxd r14, r14d
|
|
| 1 | 1.0 | | | | | | | | kmovw ebx, k3
|
|
| 2 | | 1.0 | 1.0 1.0 | | | | | | vmovdqu32 ymm1{k3}{z}, ymmword ptr [rcx+r14*4]
|
|
| 1 | | 1.0 | | | | | | | vpaddd ymm2, ymm1, ymm1
|
|
| 1 | | 1.0 | | | | | | | vpaddd ymm0, ymm1, ymm2
|
|
| 1 | 1.0 | | | | | | | | kmovw k1, k3
|
|
| 1 | 1.0 | | | | | | | | kmovw k2, k3
|
|
| 1* | | | | | | | | | vpxord zmm1, zmm1, zmm1
|
|
| 1* | | | | | | | | | vpxord zmm2, zmm2, zmm2
|
|
| 1* | | | | | | | | | vpxord zmm3, zmm3, zmm3
|
|
| 5^ | 1.0 | | 4.0 4.0 | 4.0 4.0 | | 1.0 | 1.0 | | vgatherdpd zmm1, k1, zmmword ptr [rdi+ymm0*8+0x10]
|
|
| 5^ | 1.0 | | 4.0 4.0 | 4.0 4.0 | | 1.0 | 1.0 | | vgatherdpd zmm2, k2, zmmword ptr [rdi+ymm0*8+0x8]
|
|
| 5^ | 1.0 | | 4.0 4.0 | 4.0 4.0 | | 1.0 | 1.0 | | vgatherdpd zmm3, k3, zmmword ptr [rdi+ymm0*8]
|
|
| 1 | | | | | | 1.0 | | | vbroadcastsd zmm7, xmm7
|
|
| 1 | | | | | | 1.0 | | | vbroadcastsd zmm12, xmm12
|
|
| 1 | 1.0 | | | | | | | | vsubpd zmm23, zmm12, zmm1
|
|
| 1 | 1.0 | | | | | | | | vsubpd zmm21, zmm7, zmm2
|
|
| 1 | 0.5 | | | | | 0.5 | | | vsubpd zmm20, zmm4, zmm3
|
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm19, zmm21, zmm21
|
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd231pd zmm19, zmm20, zmm20
|
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd231pd zmm19, zmm23, zmm23
|
|
| 3 | 2.0 | | | | | 1.0 | | | vrcp14pd zmm18, zmm19
|
|
| 1 | | | | | | 1.0 | | | vcmppd k2, zmm19, zmm14, 0x1
|
|
| 1 | | | | | | 1.0 | | | vfpclasspd k0, zmm18, 0x1e
|
|
| 1 | 1.0 | | | | | | | | kmovw ecx, k2
|
|
| 1 | 1.0 | | | | | | | | knotw k1, k0
|
|
| 1* | | | | | | | | | vmovaps zmm0, zmm19
|
|
| 1 | | 0.5 | | | | | 0.5 | | and ebx, ecx
|
|
| 2^ | | | | 1.0 1.0 | | 1.0 | | | vfnmadd213pd zmm0, zmm18, qword ptr [rip]{1to8}
|
|
| 1 | | | | | | 1.0 | | | kmovw k3, ebx
|
|
| 1 | 1.0 | | | | | | | | vmulpd zmm1, zmm0, zmm0
|
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd213pd zmm18{k1}, zmm0, zmm18
|
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd213pd zmm18{k1}, zmm1, zmm18
|
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm2, zmm18, zmm13
|
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm4, zmm18, zmm10
|
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm6, zmm18, zmm2
|
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm3, zmm18, zmm6
|
|
| 1 | 0.5 | | | | | 0.5 | | | vfmsub213pd zmm18, zmm6, zmm5
|
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm17, zmm3, zmm4
|
|
| 1 | 0.5 | | | | | 0.5 | | | vmulpd zmm22, zmm17, zmm18
|
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd231pd zmm9{k3}, zmm22, zmm20
|
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd231pd zmm8{k3}, zmm22, zmm21
|
|
| 1 | 0.5 | | | | | 0.5 | | | vfmadd231pd zmm11{k3}, zmm22, zmm23
|
|
Total Num Of Uops: 65
|
|
Analysis Notes:
|
|
Backend allocation was stalled due to unavailable allocation resources.
|
|
There were bubbles in the frontend.
|