added static analysis and likwid files
This commit is contained in:
88
likwid-outputs/csx-lammps-dp-mem_dp-stub.out
Normal file
88
likwid-outputs/csx-lammps-dp-mem_dp-stub.out
Normal file
@@ -0,0 +1,88 @@
|
||||
--------------------------------------------------------------------------------
|
||||
CPU name: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
|
||||
CPU type: Intel Cascadelake SP processor
|
||||
CPU clock: 2.49 GHz
|
||||
--------------------------------------------------------------------------------
|
||||
Initializing parameters...
|
||||
Initializing atoms...
|
||||
Creating atoms...
|
||||
Pattern: seq
|
||||
Number of timesteps: 200
|
||||
Number of atoms: 256
|
||||
Number of neighbors per atom: 1024
|
||||
Number of times to replicate neighbor lists: 1
|
||||
Estimated total data volume (kB): 1062.9120
|
||||
Estimated atom data volume (kB): 6.1440
|
||||
Estimated neighborlist data volume (kB): 1050.6240
|
||||
Initializing neighbor lists...
|
||||
Creating neighbor lists...
|
||||
Computing forces...
|
||||
Total time: 0.2735, Mega atom updates/s: 0.1872
|
||||
Cycles per atom: 10682.8568, Cycles per neighbor: 10.4325
|
||||
Statistics:
|
||||
Vector width: 8, Processor frequency: 2.0000 GHz
|
||||
Average neighbors per atom: 1018.9055
|
||||
Average SIMD iterations per atom: 127.3632
|
||||
Total number of computed pair interactions: 52428800
|
||||
Total number of SIMD iterations: 6553600
|
||||
Useful read data volume for force computation: 1.47GB
|
||||
Cycles/SIMD iteration: 83.4598
|
||||
--------------------------------------------------------------------------------
|
||||
Region force, Group 1: MEM_DP
|
||||
+-------------------+------------+
|
||||
| Region Info | HWThread 0 |
|
||||
+-------------------+------------+
|
||||
| RDTSC Runtime [s] | 0.110776 |
|
||||
| call count | 200 |
|
||||
+-------------------+------------+
|
||||
|
||||
+------------------------------------------+---------+------------+
|
||||
| Event | Counter | HWThread 0 |
|
||||
+------------------------------------------+---------+------------+
|
||||
| INSTR_RETIRED_ANY | FIXC0 | 267036300 |
|
||||
| CPU_CLK_UNHALTED_CORE | FIXC1 | 219034500 |
|
||||
| CPU_CLK_UNHALTED_REF | FIXC2 | 273793400 |
|
||||
| PWR_PKG_ENERGY | PWR0 | 10.9296 |
|
||||
| PWR_DRAM_ENERGY | PWR3 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE | PMC0 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_SCALAR_DOUBLE | PMC1 | 159400 |
|
||||
| FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE | PMC2 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE | PMC3 | 197068800 |
|
||||
| CAS_COUNT_RD | MBOX0C0 | 8643 |
|
||||
| CAS_COUNT_WR | MBOX0C1 | 1367 |
|
||||
| CAS_COUNT_RD | MBOX1C0 | 9124 |
|
||||
| CAS_COUNT_WR | MBOX1C1 | 1354 |
|
||||
| CAS_COUNT_RD | MBOX2C0 | 9138 |
|
||||
| CAS_COUNT_WR | MBOX2C1 | 1356 |
|
||||
| CAS_COUNT_RD | MBOX3C0 | 5586 |
|
||||
| CAS_COUNT_WR | MBOX3C1 | 1297 |
|
||||
| CAS_COUNT_RD | MBOX4C0 | 5328 |
|
||||
| CAS_COUNT_WR | MBOX4C1 | 1269 |
|
||||
| CAS_COUNT_RD | MBOX5C0 | 5280 |
|
||||
| CAS_COUNT_WR | MBOX5C1 | 1295 |
|
||||
+------------------------------------------+---------+------------+
|
||||
|
||||
+-----------------------------------+------------+
|
||||
| Metric | HWThread 0 |
|
||||
+-----------------------------------+------------+
|
||||
| Runtime (RDTSC) [s] | 0.1108 |
|
||||
| Runtime unhalted [s] | 0.0878 |
|
||||
| Clock [MHz] | 1995.2564 |
|
||||
| CPI | 0.8202 |
|
||||
| Energy [J] | 10.9296 |
|
||||
| Power [W] | 98.6643 |
|
||||
| Energy DRAM [J] | 0 |
|
||||
| Power DRAM [W] | 0 |
|
||||
| DP [MFLOP/s] | 14233.3287 |
|
||||
| AVX DP [MFLOP/s] | 14231.8898 |
|
||||
| Packed [MUOPS/s] | 1778.9862 |
|
||||
| Scalar [MUOPS/s] | 1.4389 |
|
||||
| Memory read bandwidth [MBytes/s] | 24.9001 |
|
||||
| Memory read data volume [GBytes] | 0.0028 |
|
||||
| Memory write bandwidth [MBytes/s] | 4.5861 |
|
||||
| Memory write data volume [GBytes] | 0.0005 |
|
||||
| Memory bandwidth [MBytes/s] | 29.4863 |
|
||||
| Memory data volume [GBytes] | 0.0033 |
|
||||
| Operational intensity | 482.7104 |
|
||||
+-----------------------------------+------------+
|
||||
|
168
likwid-outputs/csx-lammps-dp-mem_dp.out
Normal file
168
likwid-outputs/csx-lammps-dp-mem_dp.out
Normal file
@@ -0,0 +1,168 @@
|
||||
--------------------------------------------------------------------------------
|
||||
CPU name: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
|
||||
CPU type: Intel Cascadelake SP processor
|
||||
CPU clock: 2.49 GHz
|
||||
--------------------------------------------------------------------------------
|
||||
Parameters:
|
||||
Force field: lj
|
||||
Kernel: plain-C
|
||||
Data layout: AoS
|
||||
Floating-point precision: double
|
||||
Unit cells (nx, ny, nz): 32, 32, 32
|
||||
Domain box sizes (x, y, z): 5.374708e+01, 5.374708e+01, 5.374708e+01
|
||||
Periodic (x, y, z): 1, 1, 1
|
||||
Lattice size: 1.679596e+00
|
||||
Epsilon: 1.000000e+00
|
||||
Sigma: 1.000000e+00
|
||||
Spring constant: 1.000000e+00
|
||||
Damping constant: 1.000000e+00
|
||||
Temperature: 1.440000e+00
|
||||
RHO: 8.442000e-01
|
||||
Mass: 1.000000e+00
|
||||
Number of types: 4
|
||||
Number of timesteps: 200
|
||||
Report stats every (timesteps): 100
|
||||
Reneighbor every (timesteps): 20
|
||||
Prune every (timesteps): 1000
|
||||
Output positions every (timesteps): 20
|
||||
Output velocities every (timesteps): 5
|
||||
Delta time (dt): 5.000000e-03
|
||||
Cutoff radius: 2.500000e+00
|
||||
Skin: 3.000000e-01
|
||||
Half neighbor lists: 0
|
||||
Processor frequency (GHz): 2.0000
|
||||
----------------------------------------------------------------------------
|
||||
step temp pressure
|
||||
0 1.440000e+00 1.215639e+00
|
||||
100 8.200895e-01 6.923143e-01
|
||||
200 7.961495e-01 6.721043e-01
|
||||
----------------------------------------------------------------------------
|
||||
System: 131072 atoms 47265 ghost atoms, Steps: 200
|
||||
TOTAL 11.50s FORCE 5.28s NEIGH 5.91s REST 0.31s
|
||||
----------------------------------------------------------------------------
|
||||
Performance: 2.28 million atom updates per second
|
||||
Statistics:
|
||||
Vector width: 8, Processor frequency: 2.0000 GHz
|
||||
Average neighbors per atom: 76.0352
|
||||
Average SIMD iterations per atom: 9.9181
|
||||
Total number of computed pair interactions: 2003182862
|
||||
Total number of SIMD iterations: 261297661
|
||||
Useful read data volume for force computation: 57.46GB
|
||||
Cycles/SIMD iteration: 40.4432
|
||||
--------------------------------------------------------------------------------
|
||||
Region force, Group 1: MEM_DP
|
||||
+-------------------+------------+
|
||||
| Region Info | HWThread 0 |
|
||||
+-------------------+------------+
|
||||
| RDTSC Runtime [s] | 5.115807 |
|
||||
| call count | 201 |
|
||||
+-------------------+------------+
|
||||
|
||||
+------------------------------------------+---------+-------------+
|
||||
| Event | Counter | HWThread 0 |
|
||||
+------------------------------------------+---------+-------------+
|
||||
| INSTR_RETIRED_ANY | FIXC0 | 12592470000 |
|
||||
| CPU_CLK_UNHALTED_CORE | FIXC1 | 10196910000 |
|
||||
| CPU_CLK_UNHALTED_REF | FIXC2 | 12746120000 |
|
||||
| PWR_PKG_ENERGY | PWR0 | 307.9429 |
|
||||
| PWR_DRAM_ENERGY | PWR3 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE | PMC0 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_SCALAR_DOUBLE | PMC1 | 79042240 |
|
||||
| FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE | PMC2 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE | PMC3 | 8076039000 |
|
||||
| CAS_COUNT_RD | MBOX0C0 | 22734550 |
|
||||
| CAS_COUNT_WR | MBOX0C1 | 1147714 |
|
||||
| CAS_COUNT_RD | MBOX1C0 | 22755180 |
|
||||
| CAS_COUNT_WR | MBOX1C1 | 1144415 |
|
||||
| CAS_COUNT_RD | MBOX2C0 | 22762780 |
|
||||
| CAS_COUNT_WR | MBOX2C1 | 1129051 |
|
||||
| CAS_COUNT_RD | MBOX3C0 | 22905660 |
|
||||
| CAS_COUNT_WR | MBOX3C1 | 1143324 |
|
||||
| CAS_COUNT_RD | MBOX4C0 | 22914860 |
|
||||
| CAS_COUNT_WR | MBOX4C1 | 1169116 |
|
||||
| CAS_COUNT_RD | MBOX5C0 | 22890220 |
|
||||
| CAS_COUNT_WR | MBOX5C1 | 1180739 |
|
||||
+------------------------------------------+---------+-------------+
|
||||
|
||||
+-----------------------------------+------------+
|
||||
| Metric | HWThread 0 |
|
||||
+-----------------------------------+------------+
|
||||
| Runtime (RDTSC) [s] | 5.1158 |
|
||||
| Runtime unhalted [s] | 4.0885 |
|
||||
| Clock [MHz] | 1995.2508 |
|
||||
| CPI | 0.8098 |
|
||||
| Energy [J] | 307.9429 |
|
||||
| Power [W] | 60.1944 |
|
||||
| Energy DRAM [J] | 0 |
|
||||
| Power DRAM [W] | 0 |
|
||||
| DP [MFLOP/s] | 12644.6041 |
|
||||
| AVX DP [MFLOP/s] | 12629.1535 |
|
||||
| Packed [MUOPS/s] | 1578.6442 |
|
||||
| Scalar [MUOPS/s] | 15.4506 |
|
||||
| Memory read bandwidth [MBytes/s] | 1713.4438 |
|
||||
| Memory read data volume [GBytes] | 8.7656 |
|
||||
| Memory write bandwidth [MBytes/s] | 86.5003 |
|
||||
| Memory write data volume [GBytes] | 0.4425 |
|
||||
| Memory bandwidth [MBytes/s] | 1799.9442 |
|
||||
| Memory data volume [GBytes] | 9.2082 |
|
||||
| Operational intensity | 7.0250 |
|
||||
+-----------------------------------+------------+
|
||||
|
||||
Region reneighbour, Group 1: MEM_DP
|
||||
+-------------------+------------+
|
||||
| Region Info | HWThread 0 |
|
||||
+-------------------+------------+
|
||||
| RDTSC Runtime [s] | 5.897385 |
|
||||
| call count | 10 |
|
||||
+-------------------+------------+
|
||||
|
||||
+------------------------------------------+---------+-------------+
|
||||
| Event | Counter | HWThread 0 |
|
||||
+------------------------------------------+---------+-------------+
|
||||
| INSTR_RETIRED_ANY | FIXC0 | 18212540000 |
|
||||
| CPU_CLK_UNHALTED_CORE | FIXC1 | 11728500000 |
|
||||
| CPU_CLK_UNHALTED_REF | FIXC2 | 14660630000 |
|
||||
| PWR_PKG_ENERGY | PWR0 | 338.9000 |
|
||||
| PWR_DRAM_ENERGY | PWR3 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE | PMC0 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_SCALAR_DOUBLE | PMC1 | 6240402000 |
|
||||
| FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE | PMC2 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE | PMC3 | 983040 |
|
||||
| CAS_COUNT_RD | MBOX0C0 | 2086787 |
|
||||
| CAS_COUNT_WR | MBOX0C1 | 1115626 |
|
||||
| CAS_COUNT_RD | MBOX1C0 | 2089964 |
|
||||
| CAS_COUNT_WR | MBOX1C1 | 1117021 |
|
||||
| CAS_COUNT_RD | MBOX2C0 | 2103832 |
|
||||
| CAS_COUNT_WR | MBOX2C1 | 1117965 |
|
||||
| CAS_COUNT_RD | MBOX3C0 | 2086930 |
|
||||
| CAS_COUNT_WR | MBOX3C1 | 1102471 |
|
||||
| CAS_COUNT_RD | MBOX4C0 | 2094688 |
|
||||
| CAS_COUNT_WR | MBOX4C1 | 1103018 |
|
||||
| CAS_COUNT_RD | MBOX5C0 | 2097438 |
|
||||
| CAS_COUNT_WR | MBOX5C1 | 1102525 |
|
||||
+------------------------------------------+---------+-------------+
|
||||
|
||||
+-----------------------------------+------------+
|
||||
| Metric | HWThread 0 |
|
||||
+-----------------------------------+------------+
|
||||
| Runtime (RDTSC) [s] | 5.8974 |
|
||||
| Runtime unhalted [s] | 4.7026 |
|
||||
| Clock [MHz] | 1995.2473 |
|
||||
| CPI | 0.6440 |
|
||||
| Energy [J] | 338.9000 |
|
||||
| Power [W] | 57.4661 |
|
||||
| Energy DRAM [J] | 0 |
|
||||
| Power DRAM [W] | 0 |
|
||||
| DP [MFLOP/s] | 1059.4978 |
|
||||
| AVX DP [MFLOP/s] | 1.3335 |
|
||||
| Packed [MUOPS/s] | 0.1667 |
|
||||
| Scalar [MUOPS/s] | 1058.1643 |
|
||||
| Memory read bandwidth [MBytes/s] | 136.3006 |
|
||||
| Memory read data volume [GBytes] | 0.8038 |
|
||||
| Memory write bandwidth [MBytes/s] | 72.2612 |
|
||||
| Memory write data volume [GBytes] | 0.4262 |
|
||||
| Memory bandwidth [MBytes/s] | 208.5618 |
|
||||
| Memory data volume [GBytes] | 1.2300 |
|
||||
| Operational intensity | 5.0800 |
|
||||
+-----------------------------------+------------+
|
||||
|
88
likwid-outputs/csx-lammps-sp-mem_sp-stub.out
Normal file
88
likwid-outputs/csx-lammps-sp-mem_sp-stub.out
Normal file
@@ -0,0 +1,88 @@
|
||||
--------------------------------------------------------------------------------
|
||||
CPU name: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
|
||||
CPU type: Intel Cascadelake SP processor
|
||||
CPU clock: 2.49 GHz
|
||||
--------------------------------------------------------------------------------
|
||||
Initializing parameters...
|
||||
Initializing atoms...
|
||||
Creating atoms...
|
||||
Pattern: seq
|
||||
Number of timesteps: 200
|
||||
Number of atoms: 256
|
||||
Number of neighbors per atom: 1024
|
||||
Number of times to replicate neighbor lists: 1
|
||||
Estimated total data volume (kB): 1056.7680
|
||||
Estimated atom data volume (kB): 3.0720
|
||||
Estimated neighborlist data volume (kB): 1050.6240
|
||||
Initializing neighbor lists...
|
||||
Creating neighbor lists...
|
||||
Computing forces...
|
||||
Total time: 0.2466, Mega atom updates/s: 0.2076
|
||||
Cycles per atom: 9631.9934, Cycles per neighbor: 9.4062
|
||||
Statistics:
|
||||
Vector width: 16, Processor frequency: 2.0000 GHz
|
||||
Average neighbors per atom: 1018.9055
|
||||
Average SIMD iterations per atom: 63.6816
|
||||
Total number of computed pair interactions: 52428800
|
||||
Total number of SIMD iterations: 3276800
|
||||
Useful read data volume for force computation: 0.84GB
|
||||
Cycles/SIMD iteration: 150.4999
|
||||
--------------------------------------------------------------------------------
|
||||
Region force, Group 1: MEM_SP
|
||||
+-------------------+------------+
|
||||
| Region Info | HWThread 0 |
|
||||
+-------------------+------------+
|
||||
| RDTSC Runtime [s] | 0.085843 |
|
||||
| call count | 200 |
|
||||
+-------------------+------------+
|
||||
|
||||
+------------------------------------------+---------+------------+
|
||||
| Event | Counter | HWThread 0 |
|
||||
+------------------------------------------+---------+------------+
|
||||
| INSTR_RETIRED_ANY | FIXC0 | 129769100 |
|
||||
| CPU_CLK_UNHALTED_CORE | FIXC1 | 172300100 |
|
||||
| CPU_CLK_UNHALTED_REF | FIXC2 | 215371300 |
|
||||
| PWR_PKG_ENERGY | PWR0 | 9.2849 |
|
||||
| PWR_DRAM_ENERGY | PWR3 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE | PMC0 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_SCALAR_SINGLE | PMC1 | 154000 |
|
||||
| FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE | PMC2 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE | PMC3 | 89088000 |
|
||||
| CAS_COUNT_RD | MBOX0C0 | 8354 |
|
||||
| CAS_COUNT_WR | MBOX0C1 | 1126 |
|
||||
| CAS_COUNT_RD | MBOX1C0 | 7863 |
|
||||
| CAS_COUNT_WR | MBOX1C1 | 1105 |
|
||||
| CAS_COUNT_RD | MBOX2C0 | 7990 |
|
||||
| CAS_COUNT_WR | MBOX2C1 | 1113 |
|
||||
| CAS_COUNT_RD | MBOX3C0 | 4775 |
|
||||
| CAS_COUNT_WR | MBOX3C1 | 1112 |
|
||||
| CAS_COUNT_RD | MBOX4C0 | 4201 |
|
||||
| CAS_COUNT_WR | MBOX4C1 | 1127 |
|
||||
| CAS_COUNT_RD | MBOX5C0 | 4035 |
|
||||
| CAS_COUNT_WR | MBOX5C1 | 1120 |
|
||||
+------------------------------------------+---------+------------+
|
||||
|
||||
+-----------------------------------+------------+
|
||||
| Metric | HWThread 0 |
|
||||
+-----------------------------------+------------+
|
||||
| Runtime (RDTSC) [s] | 0.0858 |
|
||||
| Runtime unhalted [s] | 0.0691 |
|
||||
| Clock [MHz] | 1995.2787 |
|
||||
| CPI | 1.3277 |
|
||||
| Energy [J] | 9.2849 |
|
||||
| Power [W] | 108.1610 |
|
||||
| Energy DRAM [J] | 0 |
|
||||
| Power DRAM [W] | 0 |
|
||||
| SP [MFLOP/s] | 16606.5397 |
|
||||
| AVX SP [MFLOP/s] | 16604.7458 |
|
||||
| Packed [MUOPS/s] | 1037.7966 |
|
||||
| Scalar [MUOPS/s] | 1.7940 |
|
||||
| Memory read bandwidth [MBytes/s] | 27.7476 |
|
||||
| Memory read data volume [GBytes] | 0.0024 |
|
||||
| Memory write bandwidth [MBytes/s] | 4.9974 |
|
||||
| Memory write data volume [GBytes] | 0.0004 |
|
||||
| Memory bandwidth [MBytes/s] | 32.7450 |
|
||||
| Memory data volume [GBytes] | 0.0028 |
|
||||
| Operational intensity | 507.1471 |
|
||||
+-----------------------------------+------------+
|
||||
|
168
likwid-outputs/csx-lammps-sp-mem_sp.out
Normal file
168
likwid-outputs/csx-lammps-sp-mem_sp.out
Normal file
@@ -0,0 +1,168 @@
|
||||
--------------------------------------------------------------------------------
|
||||
CPU name: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
|
||||
CPU type: Intel Cascadelake SP processor
|
||||
CPU clock: 2.49 GHz
|
||||
--------------------------------------------------------------------------------
|
||||
Parameters:
|
||||
Force field: lj
|
||||
Kernel: plain-C
|
||||
Data layout: AoS
|
||||
Floating-point precision: single
|
||||
Unit cells (nx, ny, nz): 32, 32, 32
|
||||
Domain box sizes (x, y, z): 5.374708e+01, 5.374708e+01, 5.374708e+01
|
||||
Periodic (x, y, z): 1, 1, 1
|
||||
Lattice size: 1.679596e+00
|
||||
Epsilon: 1.000000e+00
|
||||
Sigma: 1.000000e+00
|
||||
Spring constant: 1.000000e+00
|
||||
Damping constant: 1.000000e+00
|
||||
Temperature: 1.440000e+00
|
||||
RHO: 8.442000e-01
|
||||
Mass: 1.000000e+00
|
||||
Number of types: 4
|
||||
Number of timesteps: 200
|
||||
Report stats every (timesteps): 100
|
||||
Reneighbor every (timesteps): 20
|
||||
Prune every (timesteps): 1000
|
||||
Output positions every (timesteps): 20
|
||||
Output velocities every (timesteps): 5
|
||||
Delta time (dt): 5.000000e-03
|
||||
Cutoff radius: 2.500000e+00
|
||||
Skin: 3.000000e-01
|
||||
Half neighbor lists: 0
|
||||
Processor frequency (GHz): 2.0000
|
||||
----------------------------------------------------------------------------
|
||||
step temp pressure
|
||||
0 1.440000e+00 1.215639e+00
|
||||
100 8.200897e-01 6.923144e-01
|
||||
200 7.961481e-01 6.721031e-01
|
||||
----------------------------------------------------------------------------
|
||||
System: 131072 atoms 47265 ghost atoms, Steps: 200
|
||||
TOTAL 10.83s FORCE 4.62s NEIGH 5.94s REST 0.26s
|
||||
----------------------------------------------------------------------------
|
||||
Performance: 2.42 million atom updates per second
|
||||
Statistics:
|
||||
Vector width: 16, Processor frequency: 2.0000 GHz
|
||||
Average neighbors per atom: 76.0351
|
||||
Average SIMD iterations per atom: 5.0875
|
||||
Total number of computed pair interactions: 2003181259
|
||||
Total number of SIMD iterations: 134032075
|
||||
Useful read data volume for force computation: 32.79GB
|
||||
Cycles/SIMD iteration: 68.9511
|
||||
--------------------------------------------------------------------------------
|
||||
Region force, Group 1: MEM_SP
|
||||
+-------------------+------------+
|
||||
| Region Info | HWThread 0 |
|
||||
+-------------------+------------+
|
||||
| RDTSC Runtime [s] | 4.452877 |
|
||||
| call count | 201 |
|
||||
+-------------------+------------+
|
||||
|
||||
+------------------------------------------+---------+-------------+
|
||||
| Event | Counter | HWThread 0 |
|
||||
+------------------------------------------+---------+-------------+
|
||||
| INSTR_RETIRED_ANY | FIXC0 | 7428719000 |
|
||||
| CPU_CLK_UNHALTED_CORE | FIXC1 | 8875251000 |
|
||||
| CPU_CLK_UNHALTED_REF | FIXC2 | 11094050000 |
|
||||
| PWR_PKG_ENERGY | PWR0 | 265.5057 |
|
||||
| PWR_DRAM_ENERGY | PWR3 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE | PMC0 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_SCALAR_SINGLE | PMC1 | 79036820 |
|
||||
| FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE | PMC2 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE | PMC3 | 3935012000 |
|
||||
| CAS_COUNT_RD | MBOX0C0 | 19716700 |
|
||||
| CAS_COUNT_WR | MBOX0C1 | 595747 |
|
||||
| CAS_COUNT_RD | MBOX1C0 | 19734880 |
|
||||
| CAS_COUNT_WR | MBOX1C1 | 597090 |
|
||||
| CAS_COUNT_RD | MBOX2C0 | 19732800 |
|
||||
| CAS_COUNT_WR | MBOX2C1 | 595219 |
|
||||
| CAS_COUNT_RD | MBOX3C0 | 19886430 |
|
||||
| CAS_COUNT_WR | MBOX3C1 | 632443 |
|
||||
| CAS_COUNT_RD | MBOX4C0 | 19887210 |
|
||||
| CAS_COUNT_WR | MBOX4C1 | 633169 |
|
||||
| CAS_COUNT_RD | MBOX5C0 | 19935560 |
|
||||
| CAS_COUNT_WR | MBOX5C1 | 634112 |
|
||||
+------------------------------------------+---------+-------------+
|
||||
|
||||
+-----------------------------------+------------+
|
||||
| Metric | HWThread 0 |
|
||||
+-----------------------------------+------------+
|
||||
| Runtime (RDTSC) [s] | 4.4529 |
|
||||
| Runtime unhalted [s] | 3.5585 |
|
||||
| Clock [MHz] | 1995.2693 |
|
||||
| CPI | 1.1947 |
|
||||
| Energy [J] | 265.5057 |
|
||||
| Power [W] | 59.6257 |
|
||||
| Energy DRAM [J] | 0 |
|
||||
| Power DRAM [W] | 0 |
|
||||
| SP [MFLOP/s] | 14156.9661 |
|
||||
| AVX SP [MFLOP/s] | 14139.2165 |
|
||||
| Packed [MUOPS/s] | 883.7010 |
|
||||
| Scalar [MUOPS/s] | 17.7496 |
|
||||
| Memory read bandwidth [MBytes/s] | 1708.8254 |
|
||||
| Memory read data volume [GBytes] | 7.6092 |
|
||||
| Memory write bandwidth [MBytes/s] | 53.0035 |
|
||||
| Memory write data volume [GBytes] | 0.2360 |
|
||||
| Memory bandwidth [MBytes/s] | 1761.8288 |
|
||||
| Memory data volume [GBytes] | 7.8452 |
|
||||
| Operational intensity | 8.0354 |
|
||||
+-----------------------------------+------------+
|
||||
|
||||
Region reneighbour, Group 1: MEM_SP
|
||||
+-------------------+------------+
|
||||
| Region Info | HWThread 0 |
|
||||
+-------------------+------------+
|
||||
| RDTSC Runtime [s] | 5.935627 |
|
||||
| call count | 10 |
|
||||
+-------------------+------------+
|
||||
|
||||
+------------------------------------------+---------+-------------+
|
||||
| Event | Counter | HWThread 0 |
|
||||
+------------------------------------------+---------+-------------+
|
||||
| INSTR_RETIRED_ANY | FIXC0 | 18208530000 |
|
||||
| CPU_CLK_UNHALTED_CORE | FIXC1 | 11805500000 |
|
||||
| CPU_CLK_UNHALTED_REF | FIXC2 | 14756870000 |
|
||||
| PWR_PKG_ENERGY | PWR0 | 340.7903 |
|
||||
| PWR_DRAM_ENERGY | PWR3 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE | PMC0 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_SCALAR_SINGLE | PMC1 | 6240406000 |
|
||||
| FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE | PMC2 | 0 |
|
||||
| FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE | PMC3 | 491520 |
|
||||
| CAS_COUNT_RD | MBOX0C0 | 1772377 |
|
||||
| CAS_COUNT_WR | MBOX0C1 | 975760 |
|
||||
| CAS_COUNT_RD | MBOX1C0 | 1770611 |
|
||||
| CAS_COUNT_WR | MBOX1C1 | 977433 |
|
||||
| CAS_COUNT_RD | MBOX2C0 | 1771722 |
|
||||
| CAS_COUNT_WR | MBOX2C1 | 979122 |
|
||||
| CAS_COUNT_RD | MBOX3C0 | 1782901 |
|
||||
| CAS_COUNT_WR | MBOX3C1 | 967621 |
|
||||
| CAS_COUNT_RD | MBOX4C0 | 1780789 |
|
||||
| CAS_COUNT_WR | MBOX4C1 | 967179 |
|
||||
| CAS_COUNT_RD | MBOX5C0 | 1784733 |
|
||||
| CAS_COUNT_WR | MBOX5C1 | 969349 |
|
||||
+------------------------------------------+---------+-------------+
|
||||
|
||||
+-----------------------------------+------------+
|
||||
| Metric | HWThread 0 |
|
||||
+-----------------------------------+------------+
|
||||
| Runtime (RDTSC) [s] | 5.9356 |
|
||||
| Runtime unhalted [s] | 4.7334 |
|
||||
| Clock [MHz] | 1995.2675 |
|
||||
| CPI | 0.6483 |
|
||||
| Energy [J] | 340.7903 |
|
||||
| Power [W] | 57.4144 |
|
||||
| Energy DRAM [J] | 0 |
|
||||
| Power DRAM [W] | 0 |
|
||||
| SP [MFLOP/s] | 1052.6723 |
|
||||
| AVX SP [MFLOP/s] | 1.3249 |
|
||||
| Packed [MUOPS/s] | 0.0828 |
|
||||
| Scalar [MUOPS/s] | 1051.3474 |
|
||||
| Memory read bandwidth [MBytes/s] | 114.9736 |
|
||||
| Memory read data volume [GBytes] | 0.6824 |
|
||||
| Memory write bandwidth [MBytes/s] | 62.9308 |
|
||||
| Memory write data volume [GBytes] | 0.3735 |
|
||||
| Memory bandwidth [MBytes/s] | 177.9044 |
|
||||
| Memory data volume [GBytes] | 1.0560 |
|
||||
| Operational intensity | 5.9171 |
|
||||
+-----------------------------------+------------+
|
||||
|
Reference in New Issue
Block a user