Add likwid collector

This commit is contained in:
Thomas Roehl
2021-03-25 14:47:10 +01:00
parent 4fddcb9741
commit a6ac0c5373
670 changed files with 24926 additions and 0 deletions

View File

@@ -0,0 +1,31 @@
SHORT Branch prediction miss rate/ratio
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PMC0 BR_INST_RETIRED_ALL_BRANCHES
PMC1 BR_MISP_RETIRED_ALL_BRANCHES
METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
Branch rate PMC0/FIXC0
Branch misprediction rate PMC1/FIXC0
Branch misprediction ratio PMC1/PMC0
Instructions per branch FIXC0/PMC0
LONG
Formulas:
Branch rate = BR_INST_RETIRED_ALL_BRANCHES/INSTR_RETIRED_ANY
Branch misprediction rate = BR_MISP_RETIRED_ALL_BRANCHES/INSTR_RETIRED_ANY
Branch misprediction ratio = BR_MISP_RETIRED_ALL_BRANCHES/BR_INST_RETIRED_ALL_BRANCHES
Instructions per branch = INSTR_RETIRED_ANY/BR_INST_RETIRED_ALL_BRANCHES
-
The rates state how often on average a branch or a mispredicted branch occurred
per instruction retired in total. The branch misprediction ratio sets directly
into relation what ratio of all branch instruction where mispredicted.
Instructions per branch is 1/branch rate.

View File

@@ -0,0 +1,23 @@
SHORT Power and Energy consumption
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PWR0 PWR_PKG_ENERGY
METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
Energy [J] PWR0
Power [W] PWR0/time
LONG
Formulas:
Power = PWR_PKG_ENERGY / time
-
Silvermont implements the new RAPL interface. This interface enables to
monitor the consumed energy on the package (socket) level.

View File

@@ -0,0 +1,22 @@
SHORT Load to store ratio
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PMC0 MEM_UOPS_RETIRED_ALL_LOADS
PMC1 MEM_UOPS_RETIRED_ALL_STORES
METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
Load to store ratio PMC0/PMC1
LONG
Formulas:
Load to store ratio = MEM_UOPS_RETIRED_ALL_LOADS/MEM_UOPS_RETIRED_ALL_STORES
-
This is a metric to determine your load to store ratio.

View File

@@ -0,0 +1,24 @@
SHORT Divide unit information
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PMC0 CYCLES_DIV_BUSY_ANY
PMC1:EDGEDETECT CYCLES_DIV_BUSY_ANY
METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
Number of divide ops PMC1:EDGEDETECT
Avg. divide unit usage duration PMC0/PMC1:EDGEDETECT
LONG
Formulas:
Number of divide ops = CYCLES_DIV_BUSY_ANY:EDGEDETECT
Avg. divide unit usage duration = CYCLES_DIV_BUSY_ANY/CYCLES_DIV_BUSY_ANY:EDGEDETECT
-
This performance group measures the average latency of divide operations

View File

@@ -0,0 +1,29 @@
SHORT Power and Energy consumption
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
TMP0 TEMP_CORE
PWR0 PWR_PKG_ENERGY
PWR1 PWR_PP0_ENERGY
METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
Temperature [C] TMP0
Energy [J] PWR0
Power [W] PWR0/time
Energy PP0 [J] PWR1
Power PP0 [W] PWR1/time
LONG
Formulas:
Power = PWR_PKG_ENERGY / time
Power PP0 = PWR_PP0_ENERGY / time
-
Silvermont implements the new RAPL interface. This interface enables to
monitor the consumed energy on the package (socket) level.

View File

@@ -0,0 +1,25 @@
SHORT Instruction cache miss rate/ratio
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PMC0 ICACHE_ACCESSES
PMC1 ICACHE_MISSES
METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
L1I request rate PMC0/FIXC0
L1I miss rate PMC1/FIXC0
L1I miss ratio PMC1/PMC0
LONG
Formulas:
L1I request rate = ICACHE_ACCESSES / INSTR_RETIRED_ANY
L1I miss rate = ICACHE_MISSES / INSTR_RETIRED_ANY
L1I miss ratio = ICACHE_MISSES / ICACHE_ACCESSES
-
This group measures some L1 instruction cache metrics.

View File

@@ -0,0 +1,34 @@
SHORT L2 cache miss rate/ratio
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PMC0 LONGEST_LAT_CACHE_REFERENCE
PMC1 LONGEST_LAT_CACHE_MISS
METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
L2 request rate PMC0/FIXC0
L2 miss rate PMC1/FIXC0
L2 miss ratio PMC1/PMC0
LONG
Formulas:
L2 request rate = LONGEST_LAT_CACHE_REFERENCE/INSTR_RETIRED_ANY
L2 miss rate = LONGEST_LAT_CACHE_MISS/INSTR_RETIRED_ANY
L2 miss ratio = LONGEST_LAT_CACHE_MISS/LONGEST_LAT_CACHE_REFERENCE
-
This group measures the locality of your data accesses with regard to the
L2 cache. L2 request rate tells you how data intensive your code is
or how many data accesses you have on average per instruction.
The L2 miss rate gives a measure how often it was necessary to get
cache lines from memory. And finally L2 miss ratio tells you how many of your
memory references required a cache line to be loaded from a higher level.
While the data cache miss rate might be given by your algorithm you should
try to get data cache miss ratio as low as possible by increasing your cache
reuse.

View File

@@ -0,0 +1,37 @@
SHORT Memory load bandwidth in MBytes/s
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PMC0 LONGEST_LAT_CACHE_MISS
PMC1 OFFCORE_RESPONSE_1_WB_ANY
METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
Memory read bandwidth [MBytes/s] 1.0E-06*(PMC0)*64.0/time
Memory read data volume [GBytes] 1.0E-09*(PMC0)*64.0
Memory writeback bandwidth [MBytes/s] 1.0E-06*(PMC1)*64.0/time
Memory writeback data volume [GBytes] 1.0E-09*(PMC1)*64.0
Memory bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
Memory data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0
LONG
Formulas:
Memory read bandwidth [MBytes/s] = 1.0E-06*(LONGEST_LAT_CACHE_MISS)*64/time
Memory read data volume [GBytes] = 1.0E-09*(LONGEST_LAT_CACHE_MISS)*64
Memory writeback bandwidth [MBytes/s] = 1.0E-06*(OFFCORE_RESPONSE_1_WB_ANY)*64/time
Memory writeback data volume [GBytes] = 1.0E-09*(OFFCORE_RESPONSE_1_WB_ANY)*64
Memory bandwidth [MBytes/s] = 1.0E-06*(LONGEST_LAT_CACHE_MISS+OFFCORE_RESPONSE_1_WB_ANY)*64/time
Memory data volume [GBytes] = 1.0E-09*(LONGEST_LAT_CACHE_MISS+OFFCORE_RESPONSE_1_WB_ANY)*64
-
Profiling group to measure L2 to MEM load cache bandwidth. The bandwidth is computed by the
number of cache line allocated in the L2 cache. Since there is no possibility to retrieve
the evicted cache lines, this group measures only the load cache bandwidth. The
writeback metrics count only modified cache lines that are written back to go to
exclusive state
The group also output totally load and writeback data volume transferred between memory and L2.

View File

@@ -0,0 +1,27 @@
SHORT L2 data TLB miss rate/ratio
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PMC0 PAGE_WALKS_DTLB_COUNT
PMC1 PAGE_WALKS_DTLB_CYCLES
METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
L1 DTLB misses PMC0
L1 DTLB miss rate PMC0/FIXC0
L1 DTLB miss duration [Cyc] PMC1/PMC0
LONG
Formulas:
L1 DTLB misses = PAGE_WALKS_DTLB_COUNT
L1 DTLB miss rate = PAGE_WALKS_DTLB_COUNT / INSTR_RETIRED_ANY
L1 DTLB miss duration [Cyc] = PAGE_WALKS_DTLB_CYCLES / PAGE_WALKS_DTLB_COUNT
-
The DTLB load and store miss rates gives a measure how often a TLB miss occurred
per instruction. The duration measures the time in cycles how long a walk did take.

View File

@@ -0,0 +1,27 @@
SHORT L1 Instruction TLB miss rate/ratio
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PMC0 PAGE_WALKS_ITLB_COUNT
PMC1 PAGE_WALKS_ITLB_CYCLES
METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
L1 ITLB misses PMC0
L1 ITLB miss rate PMC0/FIXC0
L1 ITLB miss duration [Cyc] PMC1/PMC0
LONG
Formulas:
L1 ITLB misses = PAGE_WALKS_ITLB_COUNT
L1 ITLB miss rate = PAGE_WALKS_ITLB_COUNT / INSTR_RETIRED_ANY
L1 ITLB miss duration [Cyc] = PAGE_WALKS_ITLB_CYCLES / PAGE_WALKS_ITLB_COUNT
-
The ITLB miss rates gives a measure how often a TLB miss occurred
per instruction. The duration measures the time in cycles how long a walk did take.