mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-08-01 09:00:35 +02:00
Add likwid collector
This commit is contained in:
22
collectors/likwid/groups/phi/CACHE.txt
Normal file
22
collectors/likwid/groups/phi/CACHE.txt
Normal file
@@ -0,0 +1,22 @@
|
||||
SHORT L1 compute to data access ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 VPU_ELEMENTS_ACTIVE
|
||||
PMC1 DATA_READ_OR_WRITE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L1 compute intensity PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1 compute intensity = VPU_ELEMENTS_ACTIVE/DATA_READ_OR_WRITE
|
||||
-
|
||||
These metric is a way to measure the computational density of an
|
||||
application, or how many computations it is performing on average for each
|
||||
piece of data loaded. L1 compute to data access ratio should be
|
||||
used to judge suitability of an application for running on the Intel MIC
|
||||
architecture. Applications that will perform well on the Intel MIC
|
||||
architecture should be vectorized, and ideally be able to perform multiple
|
||||
operations on the same pieces of data (or same cache lines).
|
||||
|
22
collectors/likwid/groups/phi/COMPUTE_TO_DATA_RATIO.txt
Normal file
22
collectors/likwid/groups/phi/COMPUTE_TO_DATA_RATIO.txt
Normal file
@@ -0,0 +1,22 @@
|
||||
SHORT L2 compute to data access ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 VPU_ELEMENTS_ACTIVE
|
||||
PMC1 DATA_READ_MISS_OR_WRITE_MISS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L2 compute intensity PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2 compute intensity = VPU_ELEMENTS_ACTIVE/DATA_READ_MISS_OR_WRITE_MISS
|
||||
-
|
||||
These metric is a way to measure the computational density of an
|
||||
application, or how many computations it is performing on average for each
|
||||
piece of data loaded. L2 compute to data access ratio should be
|
||||
used to judge suitability of an application for running on the Intel MIC
|
||||
architecture. Applications that will perform well on the Intel MIC
|
||||
architecture should be vectorized, and ideally be able to perform multiple
|
||||
operations on the same pieces of data (or same cache lines).
|
||||
|
23
collectors/likwid/groups/phi/CPI.txt
Normal file
23
collectors/likwid/groups/phi/CPI.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT Cycles per instruction
|
||||
|
||||
EVENTSET
|
||||
PMC0 INSTRUCTIONS_EXECUTED
|
||||
PMC1 CPU_CLK_UNHALTED
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] PMC1*inverseClock
|
||||
CPI PMC1/PMC0
|
||||
IPC PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = CPU_CLK_UNHALTED/INSTRUCTIONS_EXECUTED
|
||||
IPC = INSTRUCTIONS_EXECUTED/CPU_CLK_UNHALTED
|
||||
-
|
||||
This group measures how efficient the processor works with
|
||||
regard to instruction throughput. Also important as a standalone
|
||||
metric is INSTRUCTIONS_RETIRED as it tells you how many instruction
|
||||
you need to execute for a task. An optimization might show very
|
||||
low CPI values but execute many more instruction for it.
|
||||
|
18
collectors/likwid/groups/phi/MEM.txt
Normal file
18
collectors/likwid/groups/phi/MEM.txt
Normal file
@@ -0,0 +1,18 @@
|
||||
SHORT Memory bandwidth
|
||||
|
||||
EVENTSET
|
||||
PMC0 DATA_READ_MISS_OR_WRITE_MISS
|
||||
PMC1 DATA_CACHE_LINES_WRITTEN_BACK
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Memory bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
|
||||
Memory data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Memory bandwidth [MBytes/s] = 1.0E-06*(DATA_READ_MISS_OR_WRITE_MISS+DATA_CACHE_LINES_WRITTEN_BACK)*64.0/time
|
||||
Memory data volume [GBytes] = 1.0E-09*(DATA_READ_MISS_OR_WRITE_MISS+DATA_CACHE_LINES_WRITTEN_BACK)*64.0
|
||||
-
|
||||
Total memory bandwidth and data volume.
|
18
collectors/likwid/groups/phi/MEM1.txt
Normal file
18
collectors/likwid/groups/phi/MEM1.txt
Normal file
@@ -0,0 +1,18 @@
|
||||
SHORT L2 write misses
|
||||
|
||||
EVENTSET
|
||||
PMC0 L2_DATA_WRITE_MISS_MEM_FILL
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L2 RFO bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time
|
||||
L2 RFO data volume [GBytes] 1.0E-09*PMC0*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2 RFO bandwidth [MBytes/s] = 1.0E-06*L2_DATA_WRITE_MISS_MEM_FILL*64.0/time
|
||||
L2 RFO data volume [GBytes] = 1.0E-09*L2_DATA_WRITE_MISS_MEM_FILL*64.0
|
||||
-
|
||||
Bandwidth and data volume fetched from memory due to a L2 data write miss. These
|
||||
fetches are commonly called write-allocate loads or read-for-ownership (RFO).
|
||||
|
17
collectors/likwid/groups/phi/MEM2.txt
Normal file
17
collectors/likwid/groups/phi/MEM2.txt
Normal file
@@ -0,0 +1,17 @@
|
||||
SHORT L2 read misses
|
||||
|
||||
EVENTSET
|
||||
PMC0 L2_DATA_READ_MISS_MEM_FILL
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L2 read bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time
|
||||
L2 read data volume [GBytes] 1.0E-09*PMC0*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2 read bandwidth [MBytes/s] = 1.0E-06*L2_DATA_READ_MISS_MEM_FILL*64.0/time
|
||||
L2 read data volume [GBytes] = 1.0E-09*L2_DATA_READ_MISS_MEM_FILL*64.0
|
||||
-
|
||||
The data volume and bandwidth caused by read misses in the L2 cache.
|
||||
|
17
collectors/likwid/groups/phi/MEM3.txt
Normal file
17
collectors/likwid/groups/phi/MEM3.txt
Normal file
@@ -0,0 +1,17 @@
|
||||
SHORT HW prefetch transfers
|
||||
|
||||
EVENTSET
|
||||
PMC0 HWP_L2MISS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Prefetch bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time
|
||||
Prefetch data volume [GBytes] 1.0E-09*PMC0*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Prefetch bandwidth [MBytes/s] = 1.0E-06*HWP_L2MISS*64.0/time
|
||||
Prefetch data volume [GBytes] = 1.0E-09*HWP_L2MISS*64.0
|
||||
-
|
||||
The bandwidth and data volume caused by L2 misses from the hardware prefetcher.
|
||||
|
17
collectors/likwid/groups/phi/MEM4.txt
Normal file
17
collectors/likwid/groups/phi/MEM4.txt
Normal file
@@ -0,0 +1,17 @@
|
||||
SHORT L2 victom requests
|
||||
|
||||
EVENTSET
|
||||
PMC0 L2_VICTIM_REQ_WITH_DATA
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Victim bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time
|
||||
Victim data volume [GBytes] 1.0E-09*PMC0*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Victim bandwidth [MBytes/s] = 1.0E-06*L2_VICTIM_REQ_WITH_DATA*64.0/time
|
||||
Victim data volume [GBytes] = 1.0E-09*L2_VICTIM_REQ_WITH_DATA*64.0
|
||||
-
|
||||
Data volume and bandwidth caused by cache line victims.
|
||||
|
19
collectors/likwid/groups/phi/MEM5.txt
Normal file
19
collectors/likwid/groups/phi/MEM5.txt
Normal file
@@ -0,0 +1,19 @@
|
||||
SHORT L2 snoop hits
|
||||
|
||||
EVENTSET
|
||||
PMC0 SNP_HITM_L2
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Snoop bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time
|
||||
Snoop data volume [GBytes] 1.0E-09*PMC0*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Snoop bandwidth [MBytes/s] = 1.0E-06*SNP_HITM_L2*64.0/time
|
||||
Snoop data volume [GBytes] = 1.0E-09*SNP_HITM_L2*64.0
|
||||
-
|
||||
Snoop traffic caused by HITM requests. HITM requests are L2 requests that
|
||||
are served by another core's L2 cache but the remote cache line is in modified
|
||||
state.
|
||||
|
17
collectors/likwid/groups/phi/MEM6.txt
Normal file
17
collectors/likwid/groups/phi/MEM6.txt
Normal file
@@ -0,0 +1,17 @@
|
||||
SHORT L2 read misses
|
||||
|
||||
EVENTSET
|
||||
PMC0 L2_READ_MISS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L2 read bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time
|
||||
L2 read data volume [GBytes] 1.0E-09*PMC0*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2 read bandwidth [MBytes/s] = 1.0E-06*L2_READ_MISS*64.0/time
|
||||
L2 read data volume [GBytes] = 1.0E-09*L2_READ_MISS*64.0
|
||||
-
|
||||
Data volume and bandwidth caused by read misses in the L2 cache.
|
||||
|
20
collectors/likwid/groups/phi/MEM_READ.txt
Normal file
20
collectors/likwid/groups/phi/MEM_READ.txt
Normal file
@@ -0,0 +1,20 @@
|
||||
SHORT Memory read bandwidth
|
||||
|
||||
EVENTSET
|
||||
PMC0 DATA_READ_MISS
|
||||
PMC1 HWP_L2MISS
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Memory read bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
|
||||
Memory read data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Memory read bandwidth [MBytes/s] = 1.0E-06*(L2_DATA_READ_MISS_MEM_FILL+HWP_L2MISS)*64.0/time
|
||||
Memory read data volume [GBytes] = 1.0E-09*(L2_DATA_READ_MISS_MEM_FILL+HWP_L2MISS)*64.0
|
||||
-
|
||||
Bandwidth and data volume of read operations from the memory to L2 cache. The
|
||||
metric is introduced in the book 'Intel Xeon Phi Coprocessor High-Performance
|
||||
Programming' by James Jeffers and James Reinders.
|
20
collectors/likwid/groups/phi/MEM_WRITE.txt
Normal file
20
collectors/likwid/groups/phi/MEM_WRITE.txt
Normal file
@@ -0,0 +1,20 @@
|
||||
SHORT Memory write bandwidth
|
||||
|
||||
EVENTSET
|
||||
PMC0 L2_VICTIM_REQ_WITH_DATA
|
||||
PMC1 SNP_HITM_L2
|
||||
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Memory write bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
|
||||
Memory write data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Memory write bandwidth [MBytes/s] = 1.0E-06*(L2_VICTIM_REQ_WITH_DATA+SNP_HITM_L2)*64.0/time
|
||||
Memory write data volume [GBytes] = 1.0E-09*(L2_VICTIM_REQ_WITH_DATA+SNP_HITM_L2)*64.0
|
||||
-
|
||||
Bandwidth and data volume of write operations from the L2 cache to memory. The
|
||||
metric is introduced in the book 'Intel Xeon Phi Coprocessor High-Performance
|
||||
Programming' by James Jeffers and James Reinders.
|
21
collectors/likwid/groups/phi/PAIRING.txt
Normal file
21
collectors/likwid/groups/phi/PAIRING.txt
Normal file
@@ -0,0 +1,21 @@
|
||||
SHORT Pairing ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 INSTRUCTIONS_EXECUTED
|
||||
PMC1 INSTRUCTIONS_EXECUTED_V_PIPE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
V-pipe ratio PMC1/PMC0
|
||||
Pairing ratio PMC1/(PMC0-PMC1)
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
V-pipe ratio = INSTRUCTIONS_EXECUTED_V_PIPE/INSTRUCTIONS_EXECUTED
|
||||
Pairing ratio = INSTRUCTIONS_EXECUTED_V_PIPE/(INSTRUCTIONS_EXECUTED-INSTRUCTIONS_EXECUTED_V_PIPE)
|
||||
-
|
||||
Each hardware thread on the Xeon Phi can execute two instruction simultaneously,
|
||||
one in the U-pipe and one in the V-pipe. But this is only possible if the
|
||||
instructions can be paired. The instructions executed in paired fashion are counted
|
||||
by the event INSTRUCTIONS_EXECUTED_V_PIPE. The event INSTRUCTIONS_EXECUTED increments
|
||||
for each instruction, hence the maximal increase per cycle can be 2.
|
15
collectors/likwid/groups/phi/READ_MISS_RATIO.txt
Normal file
15
collectors/likwid/groups/phi/READ_MISS_RATIO.txt
Normal file
@@ -0,0 +1,15 @@
|
||||
SHORT Miss ratio fof data reads
|
||||
|
||||
EVENTSET
|
||||
PMC0 DATA_READ
|
||||
PMC1 DATA_READ_MISS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Read miss ratio PMC1/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Read miss ratio = DATA_READ_MISS/DATA_READ
|
||||
--
|
||||
Miss ratio for data reads.
|
23
collectors/likwid/groups/phi/TLB.txt
Normal file
23
collectors/likwid/groups/phi/TLB.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT TLB Misses
|
||||
|
||||
EVENTSET
|
||||
PMC0 LONG_DATA_PAGE_WALK
|
||||
PMC1 DATA_PAGE_WALK
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L1 TLB misses [misses/s] PMC1/time
|
||||
L2 TLB misses [misses/s] PMC0/time
|
||||
L1 TLB misses per L2 TLB miss PMC1/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1 TLB misses [misses/s] = DATA_PAGE_WALK/time
|
||||
L2 TLB misses [misses/s] = LONG_DATA_PAGE_WALK/time
|
||||
L1 TLB misses per L2 TLB miss = DATA_PAGE_WALK/LONG_DATA_PAGE_WALK
|
||||
-
|
||||
Analysis of the layered TLB of the Intel Xeon Phi. According to the book
|
||||
'Intel Xeon Phi Coprocessor High-Performance Programming' by James Jeffers and
|
||||
James Reinders, a high L1 TLB misses per L2 TLB miss ratio suggests that your
|
||||
working set fits into the L2 TLB but not in L1 TLB. Using large pages may be
|
||||
beneficial.
|
23
collectors/likwid/groups/phi/TLB_L1.txt
Normal file
23
collectors/likwid/groups/phi/TLB_L1.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT L1 TLB misses
|
||||
|
||||
EVENTSET
|
||||
PMC0 DATA_PAGE_WALK
|
||||
PMC1 DATA_READ_OR_WRITE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L1 TLB misses [misses/s] PMC0/time
|
||||
L1 TLB miss ratio PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1 TLB misses [misses/s] = DATA_PAGE_WALK/time
|
||||
L1 TLB miss ratio = DATA_PAGE_WALK/DATA_READ_OR_WRITE
|
||||
-
|
||||
This performance group measures the L1 TLB misses. A L1 TLB miss that hits the
|
||||
L2 TLB has a penelty of about 25 cycles for 4kB pages. For 2MB pages, the penelty
|
||||
for a L1 TLB miss that hits L2 TLB is about 8 cycles. The minimal L1 TLB miss ratio
|
||||
is about 1/64, so a high ratio indicates a bad spartial locality. Data of a page
|
||||
is only partly accessed. It can also indicate trashing because when multiple pages
|
||||
are accessed in a loop iteration, the size and associativity is not sufficient to
|
||||
hold all pages.
|
21
collectors/likwid/groups/phi/TLB_L2.txt
Normal file
21
collectors/likwid/groups/phi/TLB_L2.txt
Normal file
@@ -0,0 +1,21 @@
|
||||
SHORT L2 TLB misses
|
||||
|
||||
EVENTSET
|
||||
PMC0 LONG_DATA_PAGE_WALK
|
||||
PMC1 DATA_READ_OR_WRITE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L2 TLB misses [misses/s] PMC0/time
|
||||
L2 TLB miss ratio PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2 TLB misses [misses/s] = LONG_DATA_PAGE_WALK/time
|
||||
L2 TLB miss ratio = LONG_DATA_PAGE_WALK/DATA_READ_OR_WRITE
|
||||
-
|
||||
This performance group measures the L2 TLB misses. A L2 TLB miss has a penelty
|
||||
of at least 100 cycles, hence it is important to avoid them. A high ratio can
|
||||
indicate trashing because when multiple pages are accessed in a loop iteration,
|
||||
the size and associativity is not sufficient to hold all pages. This would also
|
||||
result in a bad ratio for the L1 TLB.
|
21
collectors/likwid/groups/phi/VECTOR.txt
Normal file
21
collectors/likwid/groups/phi/VECTOR.txt
Normal file
@@ -0,0 +1,21 @@
|
||||
SHORT Vectorization intensity
|
||||
|
||||
EVENTSET
|
||||
PMC0 VPU_INSTRUCTIONS_EXECUTED
|
||||
PMC1 VPU_ELEMENTS_ACTIVE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Vectorization intensity PMC1/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Vectorization intensity = VPU_ELEMENTS_ACTIVE / VPU_INSTRUCTIONS_EXECUTED
|
||||
-
|
||||
Vector instructions include instructions that perform floating-point
|
||||
operations, instructions that load vector registers from memory and store them
|
||||
to memory, instructions to manipulate vector mask registers, and other special
|
||||
purpose instructions such as vector shuffle.
|
||||
According to the book 'Intel Xeon Phi Coprocessor High-Performance Programming'
|
||||
by James Jeffers and James Reinders, the vectorization intensity should be >=8
|
||||
for double precision and >=16 for single precision.
|
20
collectors/likwid/groups/phi/VECTOR2.txt
Normal file
20
collectors/likwid/groups/phi/VECTOR2.txt
Normal file
@@ -0,0 +1,20 @@
|
||||
SHORT Vector unit usage
|
||||
|
||||
EVENTSET
|
||||
PMC0 VPU_INSTRUCTIONS_EXECUTED
|
||||
PMC1 VPU_STALL_REG
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] PMC1*inverseClock
|
||||
VPU stall ratio [%] 100*(VPU_STALL_REG/PMC0)
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
VPU stall ratio [%] = 100*(VPU_STALL_REG/VPU_INSTRUCTIONS_EXECUTED)
|
||||
--
|
||||
This group measures how efficient the processor works with
|
||||
regard to vectorization instruction throughput. The event VPU_STALL_REG counts
|
||||
the VPU stalls due to data dependencies. Dependencies are read-after-write,
|
||||
write-after-write and write-after-read.
|
||||
|
18
collectors/likwid/groups/phi/VPU_FILL_RATIO_DBL.txt
Normal file
18
collectors/likwid/groups/phi/VPU_FILL_RATIO_DBL.txt
Normal file
@@ -0,0 +1,18 @@
|
||||
SHORT VPU filling for double precisiof data
|
||||
|
||||
EVENTSET
|
||||
PMC0 VPU_INSTRUCTIONS_EXECUTED
|
||||
PMC1 VPU_ELEMENTS_ACTIVE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
VPU fill ratio PMC0*8/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
VPU fill ratio = VPU_INSTRUCTIONS_EXECUTED*8/VPU_ELEMENTS_ACTIVE
|
||||
--
|
||||
This performance group measures the number of vector instructions that are
|
||||
performed on each vector loaded to the VPU. It is important to increate the
|
||||
ratio to get a high throughput because memory accesses (loading data to the VPU)
|
||||
are expensive.
|
20
collectors/likwid/groups/phi/VPU_PAIRING.txt
Normal file
20
collectors/likwid/groups/phi/VPU_PAIRING.txt
Normal file
@@ -0,0 +1,20 @@
|
||||
SHORT VPU pairing ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 VPU_INSTRUCTIONS_EXECUTED
|
||||
PMC1 VPU_INSTRUCTIONS_EXECUTED_V_PIPE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
V-pipe ratio PMC1/PMC0
|
||||
Pairing ratio PMC1/(PMC0-PMC1)
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
V-pipe ratio = VPU_INSTRUCTIONS_EXECUTED_V_PIPE/VPU_INSTRUCTIONS_EXECUTED
|
||||
Pairing ratio = VPU_INSTRUCTIONS_EXECUTED_V_PIPE/(VPU_INSTRUCTIONS_EXECUTED-VPU_INSTRUCTIONS_EXECUTED_V_PIPE)
|
||||
--
|
||||
This performance group measures the pairing ratio of vector instructions. The
|
||||
V-pipe can only execute a subset of all instruction, the main workload is done
|
||||
by the U-pipe. A higher throughput can be achieved if the pairing ratio is
|
||||
increased.
|
16
collectors/likwid/groups/phi/VPU_READ_MISS_RATIO.txt
Normal file
16
collectors/likwid/groups/phi/VPU_READ_MISS_RATIO.txt
Normal file
@@ -0,0 +1,16 @@
|
||||
SHORT Miss ratio for VPU data reads
|
||||
|
||||
EVENTSET
|
||||
PMC0 VPU_DATA_READ
|
||||
PMC1 VPU_DATA_READ_MISS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
VPU read miss ratio PMC1/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
VPU read miss ratio = PMC1/PMC0
|
||||
--
|
||||
This performance group determines the ratio between reads and reads that miss
|
||||
the cache and are issued by the VPU.
|
16
collectors/likwid/groups/phi/VPU_WRITE_MISS_RATIO.txt
Normal file
16
collectors/likwid/groups/phi/VPU_WRITE_MISS_RATIO.txt
Normal file
@@ -0,0 +1,16 @@
|
||||
SHORT Miss ratio for VPU data writes
|
||||
|
||||
EVENTSET
|
||||
PMC0 VPU_DATA_WRITE
|
||||
PMC1 VPU_DATA_WRITE_MISS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
VPU write miss ratio PMC1/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
VPU write miss ratio = PMC1/PMC0
|
||||
--
|
||||
This performance group determines the ratio between writes and writes that miss
|
||||
the cache and are issued by the VPU.
|
15
collectors/likwid/groups/phi/WRITE_MISS_RATIO.txt
Normal file
15
collectors/likwid/groups/phi/WRITE_MISS_RATIO.txt
Normal file
@@ -0,0 +1,15 @@
|
||||
SHORT Miss ratio fof data writes
|
||||
|
||||
EVENTSET
|
||||
PMC0 DATA_WRITE
|
||||
PMC1 DATA_WRITE_MISS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Write miss ratio PMC1/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Write miss ratio = DATA_WRITE_MISS/DATA_WRITE
|
||||
--
|
||||
Miss ratio fof data writes.
|
Reference in New Issue
Block a user