Add likwid collector

This commit is contained in:
Thomas Roehl
2021-03-25 14:47:10 +01:00
parent 4fddcb9741
commit a6ac0c5373
670 changed files with 24926 additions and 0 deletions

View File

@@ -0,0 +1,25 @@
SHORT Branch prediction miss rate/ratio
EVENTSET
PMC0 INSTRUCTIONS_RETIRED
PMC1 BRANCH_RETIRED
PMC2 BRANCH_MISPREDICT_RETIRED
METRICS
Runtime (RDTSC) [s] time
Branch rate PMC1/PMC0
Branch misprediction rate PMC2/PMC0
Branch misprediction ratio PMC2/PMC1
Instructions per branch PMC0/PMC1
LONG
Formulas:
Branch rate = BRANCH_RETIRED/INSTRUCTIONS_RETIRED
Branch misprediction rate = BRANCH_MISPREDICT_RETIRED/INSTRUCTIONS_RETIRED
Branch misprediction ratio = BRANCH_MISPREDICT_RETIRED/BRANCH_RETIRED
Instructions per branch = INSTRUCTIONS_RETIRED/BRANCH_RETIRED
-
The rates state how often on average a branch or a mispredicted branch occurred
per instruction retired in total. The branch misprediction ratio sets directly
into relation what ration of all branch instruction where mispredicted.
Instructions per branch is 1/branch rate.

View File

@@ -0,0 +1,33 @@
SHORT Data cache miss rate/ratio
EVENTSET
PMC0 INSTRUCTIONS_RETIRED
PMC1 DATA_CACHE_ACCESSES
PMC2 DATA_CACHE_REFILLS_L2_ALL
PMC3 DATA_CACHE_REFILLS_NORTHBRIDGE_ALL
METRICS
Runtime (RDTSC) [s] time
data cache misses PMC2+PMC3
data cache request rate PMC1/PMC0
data cache miss rate (PMC2+PMC3)/PMC0
data cache miss ratio (PMC2+PMC3)/PMC1
LONG
Formulas:
data cache misses = DATA_CACHE_REFILLS_L2_AL + DATA_CACHE_REFILLS_NORTHBRIDGE_ALL
data cache request rate = DATA_CACHE_ACCESSES / INSTRUCTIONS_RETIRED
data cache miss rate = (DATA_CACHE_REFILLS_L2_AL + DATA_CACHE_REFILLS_NORTHBRIDGE_ALL)/INSTRUCTIONS_RETIRED
data cache miss ratio = (DATA_CACHE_REFILLS_L2_AL + DATA_CACHE_REFILLS_NORTHBRIDGE_ALL)/DATA_CACHE_ACCESSES
-
This group measures the locality of your data accesses with regard to the
L1 cache. Data cache request rate tells you how data intensive your code is
or how many data accesses you have on average per instruction.
The data cache miss rate gives a measure how often it was necessary to get
cache lines from higher levels of the memory hierarchy. And finally
data cache miss ratio tells you how many of your memory references required
a cache line to be loaded from a higher level. While the# data cache miss rate
might be given by your algorithm you should try to get data cache miss ratio
as low as possible by increasing your cache reuse.
This group was taken from the whitepaper -Basic Performance Measurements for AMD Athlon 64,
AMD Opteron and AMD Phenom Processors- from Paul J. Drongowski.

View File

@@ -0,0 +1,26 @@
SHORT Cycles per instruction
EVENTSET
PMC0 INSTRUCTIONS_RETIRED
PMC1 CPU_CLOCKS_UNHALTED
PMC2 UOPS_RETIRED
METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] PMC1*inverseClock
CPI PMC1/PMC0
CPI (based on uops) PMC1/PMC2
IPC PMC0/PMC1
LONG
Formulas:
CPI = CPU_CLOCKS_UNHALTED/RETIRED_INSTRUCTIONS
CPI (based on uops) = CPU_CLOCKS_UNHALTED/RETIRED_UOPS
IPC = RETIRED_INSTRUCTIONS/CPU_CLOCKS_UNHALTED
-
This group measures how efficient the processor works with
regard to instruction throughput. Also important as a standalone
metric is INSTRUCTIONS_RETIRED as it tells you how many instruction
you need to execute for a task. An optimization might show very
low CPI values but execute many more instruction for it.

View File

@@ -0,0 +1,23 @@
SHORT Instruction cache miss rate/ratio
EVENTSET
PMC0 INSTRUCTIONS_RETIRED
PMC1 ICACHE_FETCHES
PMC2 ICACHE_REFILLS_L2
PMC3 ICACHE_REFILLS_MEM
METRICS
Runtime (RDTSC) [s] time
L1I request rate PMC1/PMC0
L1I miss rate (PMC2+PMC3)/PMC0
L1I miss ratio (PMC2+PMC3)/PMC1
LONG
Formulas:
L1I request rate = ICACHE_FETCHES / INSTRUCTIONS_RETIRED
L1I miss rate = (ICACHE_REFILLS_L2+ICACHE_REFILLS_MEM)/INSTRUCTIONS_RETIRED
L1I miss ratio = (ICACHE_REFILLS_L2+ICACHE_REFILLS_MEM)/ICACHE_FETCHES
-
This group measures the locality of your instruction code with regard to the
L1 I-Cache.

View File

@@ -0,0 +1,31 @@
SHORT L2 cache bandwidth in MBytes/s
EVENTSET
PMC0 DATA_CACHE_REFILLS_L2_ALL
PMC1 DATA_CACHE_EVICTED_ALL
PMC2 CPU_CLOCKS_UNHALTED
METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] PMC2*inverseClock
L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0
L2 refill bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time
L2 evict [MBytes/s] 1.0E-06*PMC1*64.0/time
LONG
Formulas:
L2 bandwidth [MBytes/s] = 1.0E-06*(DATA_CACHE_REFILLS_L2_ALL+DATA_CACHE_EVICTED_ALL)*64/time
L2 data volume [GBytes] = 1.0E-09*(DATA_CACHE_REFILLS_L2_ALL+DATA_CACHE_EVICTED_ALL)*64
L2 refill bandwidth [MBytes/s] = 1.0E-06*DATA_CACHE_REFILLS_L2_ALL*64/time
L2 evict [MBytes/s] = 1.0E-06*DATA_CACHE_EVICTED_ALL*64/time
-
Profiling group to measure L2 cache bandwidth. The bandwidth is
computed by the number of cache line loaded from L2 to L1 and the
number of modified cache lines evicted from the L1.
Note that this bandwidth also includes data transfers due to a
write allocate load on a store miss in L1 and copy back transfers if
originated from L2.