mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-08-01 00:56:26 +02:00
Add likwid collector
This commit is contained in:
26
collectors/likwid/groups/interlagos/BRANCH.txt
Normal file
26
collectors/likwid/groups/interlagos/BRANCH.txt
Normal file
@@ -0,0 +1,26 @@
|
||||
SHORT Branch prediction miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 RETIRED_INSTRUCTIONS
|
||||
PMC1 RETIRED_BRANCH_INSTR
|
||||
PMC2 RETIRED_MISPREDICTED_BRANCH_INSTR
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Branch rate PMC1/PMC0
|
||||
Branch misprediction rate PMC2/PMC0
|
||||
Branch misprediction ratio PMC2/PMC1
|
||||
Instructions per branch PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Branch rate = RETIRED_BRANCH_INSTR/RETIRED_INSTRUCTIONS
|
||||
Branch misprediction rate = RETIRED_MISPREDICTED_BRANCH_INSTR/RETIRED_INSTRUCTIONS
|
||||
Branch misprediction ratio = RETIRED_MISPREDICTED_BRANCH_INSTR/RETIRED_BRANCH_INSTR
|
||||
Instructions per branch = RETIRED_INSTRUCTIONS/RETIRED_BRANCH_INSTR
|
||||
-
|
||||
The rates state how often on average a branch or a mispredicted branch occurred
|
||||
per instruction retired in total. The branch misprediction ratio sets directly
|
||||
into relation what ratio of all branch instruction where mispredicted.
|
||||
Instructions per branch is 1/branch rate.
|
||||
|
32
collectors/likwid/groups/interlagos/CACHE.txt
Normal file
32
collectors/likwid/groups/interlagos/CACHE.txt
Normal file
@@ -0,0 +1,32 @@
|
||||
SHORT Data cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 RETIRED_INSTRUCTIONS
|
||||
PMC1 DATA_CACHE_ACCESSES
|
||||
PMC2 DATA_CACHE_REFILLS_VALID
|
||||
PMC3 DATA_CACHE_MISSES_ALL
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
data cache misses PMC3
|
||||
data cache request rate PMC1/PMC0
|
||||
data cache miss rate (PMC2)/PMC0
|
||||
data cache miss ratio (PMC2)/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
data cache misses = DATA_CACHE_MISSES_ALL
|
||||
data cache request rate = DATA_CACHE_ACCESSES / RETIRED_INSTRUCTIONS
|
||||
data cache miss rate = (DATA_CACHE_REFILLS_VALID) / RETIRED_INSTRUCTIONS
|
||||
data cache miss ratio = (DATA_CACHE_REFILLS_VALID)/DATA_CACHE_ACCESSES
|
||||
-
|
||||
This group measures the locality of your data accesses with regard to the
|
||||
L1 cache. Data cache request rate tells you how data intensive your code is
|
||||
or how many data accesses you have on average per instruction.
|
||||
The data cache miss rate gives a measure how often it was necessary to get
|
||||
cache lines from higher levels of the memory hierarchy. And finally
|
||||
data cache miss ratio tells you how many of your memory references required
|
||||
a cache line to be loaded from a higher level. While the# data cache miss rate
|
||||
might be given by your algorithm you should try to get data cache miss ratio
|
||||
as low as possible by increasing your cache reuse.
|
||||
|
26
collectors/likwid/groups/interlagos/CPI.txt
Normal file
26
collectors/likwid/groups/interlagos/CPI.txt
Normal file
@@ -0,0 +1,26 @@
|
||||
SHORT Cycles per instruction
|
||||
|
||||
EVENTSET
|
||||
PMC0 RETIRED_INSTRUCTIONS
|
||||
PMC1 CPU_CLOCKS_UNHALTED
|
||||
PMC2 RETIRED_UOPS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] PMC1*inverseClock
|
||||
CPI PMC1/PMC0
|
||||
CPI (based on uops) PMC1/PMC2
|
||||
IPC PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
CPI = CPU_CLOCKS_UNHALTED/RETIRED_INSTRUCTIONS
|
||||
CPI (based on uops) = CPU_CLOCKS_UNHALTED/RETIRED_UOPS
|
||||
IPC = RETIRED_INSTRUCTIONS/CPU_CLOCKS_UNHALTED
|
||||
-
|
||||
This group measures how efficient the processor works with
|
||||
regard to instruction throughput. Also important as a standalone
|
||||
metric is RETIRED_INSTRUCTIONS as it tells you how many instruction
|
||||
you need to execute for a task. An optimization might show very
|
||||
low CPI values but execute many more instruction for it.
|
||||
|
16
collectors/likwid/groups/interlagos/DATA.txt
Normal file
16
collectors/likwid/groups/interlagos/DATA.txt
Normal file
@@ -0,0 +1,16 @@
|
||||
SHORT Load to store ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 LS_DISPATCH_LOADS
|
||||
PMC1 LS_DISPATCH_STORES
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Load to store ratio PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Load to store ratio = LS_DISPATCH_LOADS/LS_DISPATCH_STORES
|
||||
-
|
||||
This is a simple metric to determine your load to store ratio.
|
||||
|
23
collectors/likwid/groups/interlagos/FLOPS_DP.txt
Normal file
23
collectors/likwid/groups/interlagos/FLOPS_DP.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT Double Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 RETIRED_INSTRUCTIONS
|
||||
PMC1 CPU_CLOCKS_UNHALTED
|
||||
PMC2 RETIRED_UOPS
|
||||
PMC3 RETIRED_FLOPS_DOUBLE_ALL
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] PMC1*inverseClock
|
||||
DP [MFLOP/s] 1.0E-06*(PMC3)/time
|
||||
CPI PMC1/PMC0
|
||||
CPI (based on uops) PMC1/PMC2
|
||||
IPC PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
DP [MFLOP/s] = 1.0E-06*(RETIRED_FLOPS_DOUBLE_ALL)/time
|
||||
-
|
||||
Profiling group to measure double precisision FLOP rate.
|
||||
|
||||
|
23
collectors/likwid/groups/interlagos/FLOPS_SP.txt
Normal file
23
collectors/likwid/groups/interlagos/FLOPS_SP.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT Single Precision MFLOP/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 RETIRED_INSTRUCTIONS
|
||||
PMC1 CPU_CLOCKS_UNHALTED
|
||||
PMC2 RETIRED_UOPS
|
||||
PMC3 RETIRED_FLOPS_SINGLE_ALL
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Runtime unhalted [s] PMC1*inverseClock
|
||||
SP [MFLOP/s] 1.0E-06*(PMC3)/time
|
||||
CPI PMC1/PMC0
|
||||
CPI (based on uops) PMC1/PMC2
|
||||
IPC PMC0/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
SP [MFLOP/s] = 1.0E-06*(RETIRED_FLOPS_SINGLE_ALL)/time
|
||||
-
|
||||
Profiling group to measure single precision FLOP rate.
|
||||
|
||||
|
21
collectors/likwid/groups/interlagos/FPU_EXCEPTION.txt
Normal file
21
collectors/likwid/groups/interlagos/FPU_EXCEPTION.txt
Normal file
@@ -0,0 +1,21 @@
|
||||
SHORT Floating point exceptions
|
||||
|
||||
EVENTSET
|
||||
PMC0 RETIRED_INSTRUCTIONS
|
||||
PMC1 RETIRED_FP_INSTRUCTIONS_ALL
|
||||
PMC2 FPU_EXCEPTION_ALL
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Overall FP exception rate PMC2/PMC0
|
||||
FP exception rate PMC2/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Overall FP exception rate = FPU_EXCEPTIONS_ALL / INSTRUCTIONS_RETIRED
|
||||
FP exception rate = FPU_EXCEPTIONS_ALL / FP_INSTRUCTIONS_RETIRED_ALL
|
||||
-
|
||||
Floating point exceptions occur e.g. on the treatment of denormal numbers.
|
||||
There might be a large penalty if there are too many floating point
|
||||
exceptions.
|
||||
|
23
collectors/likwid/groups/interlagos/ICACHE.txt
Normal file
23
collectors/likwid/groups/interlagos/ICACHE.txt
Normal file
@@ -0,0 +1,23 @@
|
||||
SHORT Instruction cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 INSTRUCTION_CACHE_FETCHES
|
||||
PMC1 INSTRUCTION_CACHE_L2_REFILLS
|
||||
PMC2 INSTRUCTION_CACHE_SYSTEM_REFILLS
|
||||
PMC3 RETIRED_INSTRUCTIONS
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L1I request rate PMC0/PMC3
|
||||
L1I miss rate (PMC1+PMC2)/PMC3
|
||||
L1I miss ratio (PMC1+PMC2)/PMC0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L1I request rate = INSTRUCTION_CACHE_FETCHES / RETIRED_INSTRUCTIONS
|
||||
L1I miss rate = (INSTRUCTION_CACHE_L2_REFILLS + INSTRUCTION_CACHE_SYSTEM_REFILLS)/RETIRED_INSTRUCTIONS
|
||||
L1I miss ratio = (INSTRUCTION_CACHE_L2_REFILLS + INSTRUCTION_CACHE_SYSTEM_REFILLS)/INSTRUCTION_CACHE_FETCHES
|
||||
-
|
||||
This group measures the locality of your instruction code with regard to the
|
||||
L1 I-Cache.
|
||||
|
29
collectors/likwid/groups/interlagos/L2.txt
Normal file
29
collectors/likwid/groups/interlagos/L2.txt
Normal file
@@ -0,0 +1,29 @@
|
||||
SHORT L2 cache bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 DATA_CACHE_REFILLS_ALL
|
||||
PMC1 DATA_CACHE_REFILLS_SYSTEM
|
||||
PMC2 CPU_CLOCKS_UNHALTED
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L2 bandwidth [MBytes/s] 1.0E-06*(PMC0-PMC1)*64.0/time
|
||||
L2 data volume [GBytes] 1.0E-09*(PMC0-PMC1)*64.0
|
||||
Cache refill bandwidth System/L2 [MBytes/s] 1.0E-06*PMC0*64.0/time
|
||||
Cache refill bandwidth System [MBytes/s] 1.0E-06*PMC1*64.0/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2 bandwidth [MBytes/s] = 1.0E-06*(DATA_CACHE_REFILLS_ALL-DATA_CACHE_REFILLS_SYSTEM)*64/time
|
||||
L2 data volume [GBytes] = 1.0E-09*(DATA_CACHE_REFILLS_ALL-DATA_CACHE_REFILLS_SYSTEM)*64
|
||||
Cache refill bandwidth system/L2 [MBytes/s] = 1.0E-06*DATA_CACHE_REFILLS_ALL*64/time
|
||||
Cache refill bandwidth system [MBytes/s] = 1.0E-06*DATA_CACHE_REFILLS_SYSTEM*64/time
|
||||
-
|
||||
Profiling group to measure L2 cache bandwidth. The bandwidth is
|
||||
computed by the number of cache line loaded from L2 to L1 and the
|
||||
number of modified cache lines evicted from the L1.
|
||||
Note that this bandwidth also included data transfers due to a
|
||||
write allocate load on a store miss in L1 and copy back transfers if
|
||||
originated from L2. L2-L2 data volume is the total data volume transferred
|
||||
between L2 and L1.
|
||||
|
31
collectors/likwid/groups/interlagos/L2CACHE.txt
Normal file
31
collectors/likwid/groups/interlagos/L2CACHE.txt
Normal file
@@ -0,0 +1,31 @@
|
||||
SHORT L2 cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 RETIRED_INSTRUCTIONS
|
||||
PMC1 REQUESTS_TO_L2_DC_FILL
|
||||
PMC2 L2_CACHE_MISS_DC_FILL
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L2 request rate PMC1/PMC0
|
||||
L2 miss rate PMC2/PMC0
|
||||
L2 miss ratio PMC2/PMC1
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L2 request rate = L2_REQUESTS_ALL/INSTRUCTIONS_RETIRED
|
||||
L2 miss rate = L2_MISSES_ALL/INSTRUCTIONS_RETIRED
|
||||
L2 miss ratio = L2_MISSES_ALL/L2_REQUESTS_ALL
|
||||
-
|
||||
This group measures the locality of your data accesses with regard to the L2
|
||||
Cache. L2 request rate tells you how data intensive your code is or how many
|
||||
data accesses you have on average per instruction. The L2 miss rate gives a
|
||||
measure how often it was necessary to get cache lines from memory. And finally
|
||||
L2 miss ratio tells you how many of your memory references required a cache line
|
||||
to be loaded from a higher level. While the# data cache miss rate might be
|
||||
given by your algorithm you should try to get data cache miss ratio as low as
|
||||
possible by increasing your cache reuse. This group is inspired from the
|
||||
whitepaper -Basic Performance Measurements for AMD Athlon 64, AMD Opteron and
|
||||
AMD Phenom Processors- from Paul J. Drongowski.
|
||||
|
||||
|
29
collectors/likwid/groups/interlagos/L3.txt
Normal file
29
collectors/likwid/groups/interlagos/L3.txt
Normal file
@@ -0,0 +1,29 @@
|
||||
SHORT L3 cache bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
PMC0 L2_FILL_WB_FILL
|
||||
PMC1 L2_FILL_WB_WB
|
||||
PMC2 CPU_CLOCKS_UNHALTED
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L3 load bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time
|
||||
L3 load data volume [GBytes] 1.0E-09*PMC0*64.0
|
||||
L3 evict bandwidth [MBytes/s] 1.0E-06*PMC1*64.0/time
|
||||
L3 evict data volume [GBytes] 1.0E-09*PMC1*64.0
|
||||
L3 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
|
||||
L3 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L3 load bandwidth [MBytes/s] = 1.0E-06*L2_FILL_WB_FILL*64.0/time
|
||||
L3 load data volume [GBytes] = 1.0E-09*L2_FILL_WB_FILL*64.0
|
||||
L3 evict bandwidth [MBytes/s] = 1.0E-06*L2_FILL_WB_WB*64.0/time
|
||||
L3 evict data volume [GBytes] = 1.0E-09*L2_FILL_WB_WB*64.0
|
||||
L3 bandwidth [MBytes/s] = 1.0E-06*(L2_FILL_WB_FILL+L2_FILL_WB_WB)*64/time
|
||||
L3 data volume [GBytes] = 1.0E-09*(L2_FILL_WB_FILL+L2_FILL_WB_WB)*64
|
||||
-
|
||||
Profiling group to measure L3 cache bandwidth. The bandwidth is
|
||||
computed by the number of cache line loaded from L3 to L2 and the
|
||||
number of modified cache lines evicted from the L2.
|
||||
|
35
collectors/likwid/groups/interlagos/L3CACHE.txt
Normal file
35
collectors/likwid/groups/interlagos/L3CACHE.txt
Normal file
@@ -0,0 +1,35 @@
|
||||
SHORT L3 cache miss rate/ratio
|
||||
|
||||
EVENTSET
|
||||
PMC0 RETIRED_INSTRUCTIONS
|
||||
UPMC0 UNC_READ_REQ_TO_L3_ALL
|
||||
UPMC1 UNC_L3_CACHE_MISS_ALL
|
||||
UPMC2 UNC_L3_LATENCY_CYCLE_COUNT
|
||||
UPMC3 UNC_L3_LATENCY_REQUEST_COUNT
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
L3 request rate UPMC0/PMC0
|
||||
L3 miss rate UPMC1/PMC0
|
||||
L3 miss ratio UPMC1/UPMC0
|
||||
L3 average access latency [cycles] UPMC2/UPMC3
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
L3 request rate = UNC_READ_REQ_TO_L3_ALL/INSTRUCTIONS_RETIRED
|
||||
L3 miss rate = UNC_L3_CACHE_MISS_ALL/INSTRUCTIONS_RETIRED
|
||||
L3 miss ratio = UNC_L3_CACHE_MISS_ALL/UNC_READ_REQ_TO_L3_ALL
|
||||
L3 average access latency = UNC_L3_LATENCY_CYCLE_COUNT/UNC_L3_LATENCY_REQUEST_COUNT
|
||||
-
|
||||
This group measures the locality of your data accesses with regard to the L3
|
||||
Cache. L3 request rate tells you how data intensive your code is or how many
|
||||
data accesses you have on average per instruction. The L3 miss rate gives a
|
||||
measure how often it was necessary to get cache lines from memory. And finally
|
||||
L3 miss ratio tells you how many of your memory references required a cache line
|
||||
to be loaded from a higher level. While the# data cache miss rate might be
|
||||
given by your algorithm you should try to get data cache miss ratio as low as
|
||||
possible by increasing your cache reuse. This group was inspired from the
|
||||
whitepaper - Basic Performance Measurements for AMD Athlon 64, AMD Opteron and
|
||||
AMD Phenom Processors - from Paul J. Drongowski.
|
||||
|
||||
|
26
collectors/likwid/groups/interlagos/LINKS.txt
Normal file
26
collectors/likwid/groups/interlagos/LINKS.txt
Normal file
@@ -0,0 +1,26 @@
|
||||
SHORT Bandwidth on the Hypertransport links
|
||||
|
||||
EVENTSET
|
||||
UPMC0 UNC_LINK_TRANSMIT_BW_L0_USE
|
||||
UPMC1 UNC_LINK_TRANSMIT_BW_L1_USE
|
||||
UPMC2 UNC_LINK_TRANSMIT_BW_L2_USE
|
||||
UPMC3 UNC_LINK_TRANSMIT_BW_L3_USE
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Link bandwidth L0 [MBytes/s] 1.0E-06*UPMC0*4.0/time
|
||||
Link bandwidth L1 [MBytes/s] 1.0E-06*UPMC1*4.0/time
|
||||
Link bandwidth L2 [MBytes/s] 1.0E-06*UPMC2*4.0/time
|
||||
Link bandwidth L3 [MBytes/s] 1.0E-06*UPMC3*4.0/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Link bandwidth L0 [MBytes/s] = 1.0E-06*UNC_LINK_TRANSMIT_BW_L0_USE*4.0/time
|
||||
Link bandwidth L1 [MBytes/s] = 1.0E-06*UNC_LINK_TRANSMIT_BW_L1_USE*4.0/time
|
||||
Link bandwidth L2 [MBytes/s] = 1.0E-06*UNC_LINK_TRANSMIT_BW_L2_USE*4.0/time
|
||||
Link bandwidth L3 [MBytes/s] = 1.0E-06*UNC_LINK_TRANSMIT_BW_L3_USE*4.0/time
|
||||
-
|
||||
Profiling group to measure the HyperTransport link bandwidth for the four links
|
||||
of a local node. This indicates the# data flow between different ccNUMA nodes.
|
||||
|
||||
|
20
collectors/likwid/groups/interlagos/MEM.txt
Normal file
20
collectors/likwid/groups/interlagos/MEM.txt
Normal file
@@ -0,0 +1,20 @@
|
||||
SHORT Main memory bandwidth in MBytes/s
|
||||
|
||||
EVENTSET
|
||||
UPMC0 UNC_DRAM_ACCESSES_DCT0_ALL
|
||||
UPMC1 UNC_DRAM_ACCESSES_DCT1_ALL
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
Memory bandwidth [MBytes/s] 1.0E-06*(UPMC0+UPMC1)*64.0/time
|
||||
Memory data volume [GBytes] 1.0E-09*(UPMC0+UPMC1)*64.0
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
Memory bandwidth [MBytes/s] = 1.0E-06*(DRAM_ACCESSES_DCTO_ALL+DRAM_ACCESSES_DCT1_ALL)*64/time
|
||||
Memory data volume [GBytes] = 1.0E-09*(DRAM_ACCESSES_DCTO_ALL+DRAM_ACCESSES_DCT1_ALL)*64
|
||||
-
|
||||
Profiling group to measure memory bandwidth drawn by all cores of a socket.
|
||||
Note: As this group measures the accesses from all cores it only makes sense
|
||||
to measure with one core per socket, similar as with the Intel Nehalem Uncore events.
|
||||
|
28
collectors/likwid/groups/interlagos/NUMA.txt
Normal file
28
collectors/likwid/groups/interlagos/NUMA.txt
Normal file
@@ -0,0 +1,28 @@
|
||||
SHORT Read/Write Events between the ccNUMA nodes
|
||||
|
||||
EVENTSET
|
||||
UPMC0 UNC_CPU_TO_DRAM_LOCAL_TO_0
|
||||
UPMC1 UNC_CPU_TO_DRAM_LOCAL_TO_1
|
||||
UPMC2 UNC_CPU_TO_DRAM_LOCAL_TO_2
|
||||
UPMC3 UNC_CPU_TO_DRAM_LOCAL_TO_3
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
DRAM read/write local to 0 [MegaEvents/s] 1.0E-06*UPMC0/time
|
||||
DRAM read/write local to 1 [MegaEvents/s] 1.0E-06*UPMC1/time
|
||||
DRAM read/write local to 2 [MegaEvents/s] 1.0E-06*UPMC2/time
|
||||
DRAM read/write local to 3 [MegaEvents/s] 1.0E-06*UPMC3/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
DRAM read/write local to 0 [MegaEvents/s] = 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_0/time
|
||||
DRAM read/write local to 1 [MegaEvents/s] = 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_1/time
|
||||
DRAM read/write local to 2 [MegaEvents/s] = 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_2/time
|
||||
DRAM read/write local to 3 [MegaEvents/s] = 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_3/time
|
||||
-
|
||||
Profiling group to measure the traffic from local CPU to the different
|
||||
DRAM NUMA nodes. This group allows to detect NUMA problems in a threaded
|
||||
code. You must first determine on which memory domains your code is running.
|
||||
A code should only have significant traffic to its own memory domain.
|
||||
|
||||
|
28
collectors/likwid/groups/interlagos/NUMA_0_3.txt
Normal file
28
collectors/likwid/groups/interlagos/NUMA_0_3.txt
Normal file
@@ -0,0 +1,28 @@
|
||||
SHORT Read/Write Events between the ccNUMA nodes
|
||||
|
||||
EVENTSET
|
||||
UPMC0 UNC_CPU_TO_DRAM_LOCAL_TO_0
|
||||
UPMC1 UNC_CPU_TO_DRAM_LOCAL_TO_1
|
||||
UPMC2 UNC_CPU_TO_DRAM_LOCAL_TO_2
|
||||
UPMC3 UNC_CPU_TO_DRAM_LOCAL_TO_3
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
DRAM read/write local to 0 [MegaEvents/s] 1.0E-06*UPMC0/time
|
||||
DRAM read/write local to 1 [MegaEvents/s] 1.0E-06*UPMC1/time
|
||||
DRAM read/write local to 2 [MegaEvents/s] 1.0E-06*UPMC2/time
|
||||
DRAM read/write local to 3 [MegaEvents/s] 1.0E-06*UPMC3/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
DRAM read/write local to 0 [MegaEvents/s] = 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_0/time
|
||||
DRAM read/write local to 1 [MegaEvents/s] = 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_1/time
|
||||
DRAM read/write local to 2 [MegaEvents/s] = 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_2/time
|
||||
DRAM read/write local to 3 [MegaEvents/s] = 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_3/time
|
||||
-
|
||||
Profiling group to measure the traffic from local CPU to the different
|
||||
DRAM NUMA nodes. This group allows to detect NUMA problems in a threaded
|
||||
code. You must first determine on which memory domains your code is running.
|
||||
A code should only have significant traffic to its own memory domain.
|
||||
|
||||
|
28
collectors/likwid/groups/interlagos/NUMA_4_7.txt
Normal file
28
collectors/likwid/groups/interlagos/NUMA_4_7.txt
Normal file
@@ -0,0 +1,28 @@
|
||||
SHORT Read/Write Events between the ccNUMA nodes
|
||||
|
||||
EVENTSET
|
||||
UPMC0 UNC_CPU_TO_DRAM_LOCAL_TO_4
|
||||
UPMC1 UNC_CPU_TO_DRAM_LOCAL_TO_5
|
||||
UPMC2 UNC_CPU_TO_DRAM_LOCAL_TO_6
|
||||
UPMC3 UNC_CPU_TO_DRAM_LOCAL_TO_7
|
||||
|
||||
METRICS
|
||||
Runtime (RDTSC) [s] time
|
||||
DRAM read/write local to 4 [MegaEvents/s] 1.0E-06*UPMC0/time
|
||||
DRAM read/write local to 5 [MegaEvents/s] 1.0E-06*UPMC1/time
|
||||
DRAM read/write local to 6 [MegaEvents/s] 1.0E-06*UPMC2/time
|
||||
DRAM read/write local to 7 [MegaEvents/s] 1.0E-06*UPMC3/time
|
||||
|
||||
LONG
|
||||
Formulas:
|
||||
DRAM read/write local to 4 [MegaEvents/s] = 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_0/time
|
||||
DRAM read/write local to 5 [MegaEvents/s] = 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_1/time
|
||||
DRAM read/write local to 6 [MegaEvents/s] = 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_2/time
|
||||
DRAM read/write local to 7 [MegaEvents/s] = 1.0E-06*UNC_CPU_TO_DRAM_LOCAL_TO_3/time
|
||||
-
|
||||
Profiling group to measure the traffic from local CPU to the different
|
||||
DRAM NUMA nodes. This group allows to detect NUMA problems in a threaded
|
||||
code. You must first determine on which memory domains your code is running.
|
||||
A code should only have significant traffic to its own memory domain.
|
||||
|
||||
|
Reference in New Issue
Block a user