mirror of
				https://github.com/ClusterCockpit/cc-metric-collector.git
				synced 2025-11-04 02:35:07 +01:00 
			
		
		
		
	Modularize the whole thing (#16)
* Use channels, add a metric router, split up configuration and use extended version of Influx line protocol internally * Use central timer for collectors and router. Add expressions to router * Add expression to router config * Update entry points * Start with README * Update README for CCMetric * Formatting * Update README.md * Add README for MultiChanTicker * Add README for MultiChanTicker * Update README.md * Add README to metric router * Update main README * Remove SinkEntity type * Update README for sinks * Update go files * Update README for receivers * Update collectors README * Update collectors README * Use seperate page per collector * Fix for tempstat page * Add docs for customcmd collector * Add docs for ipmistat collector * Add docs for topprocs collector * Update customCmdMetric.md * Use seconds when calculating LIKWID metrics * Add IB metrics ib_recv_pkts and ib_xmit_pkts * Drop domain part of host name * Updated to latest stable version of likwid * Define source code dependencies in Makefile * Add GPFS / IBM Spectrum Scale collector * Add vet and staticcheck make targets * Add vet and staticcheck make targets * Avoid go vet warning: struct field tag `json:"..., omitempty"` not compatible with reflect.StructTag.Get: suspicious space in struct tag value struct field tag `json:"...", omitempty` not compatible with reflect.StructTag.Get: key:"value" pairs not separated by spaces * Add sample collector to README.md * Add CPU frequency collector * Avoid staticcheck warning: redundant return statement * Avoid staticcheck warning: unnecessary assignment to the blank identifier * Simplified code * Add CPUFreqCollectorCpuinfo a metric collector to measure the current frequency of the CPUs as obtained from /proc/cpuinfo Only measure on the first hyperthread * Add collector for NFS clients * Move publication of metrics into Flush() for NatsSink * Update GitHub actions * Refactoring * Avoid vet warning: Println arg list ends with redundant newline * Avoid vet warning struct field commands has json tag but is not exported * Avoid vet warning: return copies lock value. * Corrected typo * Refactoring * Add go sources in internal/... * Bad separator in Makefile * Fix Infiniband collector Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
This commit is contained in:
		
							
								
								
									
										119
									
								
								collectors/likwidMetric.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										119
									
								
								collectors/likwidMetric.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,119 @@
 | 
			
		||||
 | 
			
		||||
## `likwid` collector
 | 
			
		||||
```json
 | 
			
		||||
  "likwid": {
 | 
			
		||||
    "eventsets": [
 | 
			
		||||
      {
 | 
			
		||||
        "events": {
 | 
			
		||||
          "FIXC1": "ACTUAL_CPU_CLOCK",
 | 
			
		||||
          "FIXC2": "MAX_CPU_CLOCK",
 | 
			
		||||
          "PMC0": "RETIRED_INSTRUCTIONS",
 | 
			
		||||
          "PMC1": "CPU_CLOCKS_UNHALTED",
 | 
			
		||||
          "PMC2": "RETIRED_SSE_AVX_FLOPS_ALL",
 | 
			
		||||
          "PMC3": "MERGE",
 | 
			
		||||
          "DFC0": "DRAM_CHANNEL_0",
 | 
			
		||||
          "DFC1": "DRAM_CHANNEL_1",
 | 
			
		||||
          "DFC2": "DRAM_CHANNEL_2",
 | 
			
		||||
          "DFC3": "DRAM_CHANNEL_3"
 | 
			
		||||
        },
 | 
			
		||||
        "metrics": [
 | 
			
		||||
          {
 | 
			
		||||
            "name": "ipc",
 | 
			
		||||
            "calc": "PMC0/PMC1",
 | 
			
		||||
            "socket_scope": false,
 | 
			
		||||
            "publish": true
 | 
			
		||||
          },
 | 
			
		||||
          {
 | 
			
		||||
            "name": "flops_any",
 | 
			
		||||
            "calc": "0.000001*PMC2/time",
 | 
			
		||||
            "socket_scope": false,
 | 
			
		||||
            "publish": true
 | 
			
		||||
          },
 | 
			
		||||
          {
 | 
			
		||||
            "name": "clock_mhz",
 | 
			
		||||
            "calc": "0.000001*(FIXC1/FIXC2)/inverseClock",
 | 
			
		||||
            "socket_scope": false,
 | 
			
		||||
            "publish": true
 | 
			
		||||
          },
 | 
			
		||||
          {
 | 
			
		||||
            "name": "mem1",
 | 
			
		||||
            "calc": "0.000001*(DFC0+DFC1+DFC2+DFC3)*64.0/time",
 | 
			
		||||
            "socket_scope": true,
 | 
			
		||||
            "publish": false
 | 
			
		||||
          }
 | 
			
		||||
        ]
 | 
			
		||||
      },
 | 
			
		||||
      {
 | 
			
		||||
        "events": {
 | 
			
		||||
          "DFC0": "DRAM_CHANNEL_4",
 | 
			
		||||
          "DFC1": "DRAM_CHANNEL_5",
 | 
			
		||||
          "DFC2": "DRAM_CHANNEL_6",
 | 
			
		||||
          "DFC3": "DRAM_CHANNEL_7",
 | 
			
		||||
          "PWR0": "RAPL_CORE_ENERGY",
 | 
			
		||||
          "PWR1": "RAPL_PKG_ENERGY"
 | 
			
		||||
        },
 | 
			
		||||
        "metrics": [
 | 
			
		||||
          {
 | 
			
		||||
            "name": "pwr_core",
 | 
			
		||||
            "calc": "PWR0/time",
 | 
			
		||||
            "socket_scope": false,
 | 
			
		||||
            "publish": true
 | 
			
		||||
          },
 | 
			
		||||
          {
 | 
			
		||||
            "name": "pwr_pkg",
 | 
			
		||||
            "calc": "PWR1/time",
 | 
			
		||||
            "socket_scope": true,
 | 
			
		||||
            "publish": true
 | 
			
		||||
          },
 | 
			
		||||
          {
 | 
			
		||||
            "name": "mem2",
 | 
			
		||||
            "calc": "0.000001*(DFC0+DFC1+DFC2+DFC3)*64.0/time",
 | 
			
		||||
            "socket_scope": true,
 | 
			
		||||
            "publish": false
 | 
			
		||||
          }
 | 
			
		||||
        ]
 | 
			
		||||
      }
 | 
			
		||||
    ],
 | 
			
		||||
    "globalmetrics": [
 | 
			
		||||
      {
 | 
			
		||||
        "name": "mem_bw",
 | 
			
		||||
        "calc": "mem1+mem2",
 | 
			
		||||
        "socket_scope": true,
 | 
			
		||||
        "publish": true
 | 
			
		||||
      }
 | 
			
		||||
    ]
 | 
			
		||||
  }
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
_Example config suitable for AMD Zen3_
 | 
			
		||||
 | 
			
		||||
The `likwid` collector reads hardware performance counters at a **hwthread** and **socket** level. The configuration looks quite complicated but it is basically copy&paste from [LIKWID's performance groups](https://github.com/RRZE-HPC/likwid/tree/master/groups). The collector made multiple iterations and tried to use the performance groups but it lacked flexibility. The current way of configuration provides most flexibility.
 | 
			
		||||
 | 
			
		||||
The logic is as following: There are multiple eventsets, each consisting of a list of counters+events and a list of metrics. If you compare a common performance group with the example setting above, there is not much difference:
 | 
			
		||||
```
 | 
			
		||||
EVENTSET                         ->   "events": {
 | 
			
		||||
FIXC1 ACTUAL_CPU_CLOCK           ->     "FIXC1": "ACTUAL_CPU_CLOCK",
 | 
			
		||||
FIXC2 MAX_CPU_CLOCK              ->     "FIXC2": "MAX_CPU_CLOCK",
 | 
			
		||||
PMC0  RETIRED_INSTRUCTIONS       ->     "PMC0" : "RETIRED_INSTRUCTIONS",
 | 
			
		||||
PMC1  CPU_CLOCKS_UNHALTED        ->     "PMC1" : "CPU_CLOCKS_UNHALTED",
 | 
			
		||||
PMC2  RETIRED_SSE_AVX_FLOPS_ALL  ->     "PMC2": "RETIRED_SSE_AVX_FLOPS_ALL",
 | 
			
		||||
PMC3  MERGE                      ->     "PMC3": "MERGE",
 | 
			
		||||
                                 ->   }
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The metrics are following the same procedure:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
METRICS                          ->   "metrics": [
 | 
			
		||||
IPC   PMC0/PMC1                  ->     {
 | 
			
		||||
                                 ->       "name" : "IPC",
 | 
			
		||||
                                 ->       "calc" : "PMC0/PMC1",
 | 
			
		||||
                                 ->       "socket_scope": false,
 | 
			
		||||
                                 ->       "publish": true
 | 
			
		||||
                                 ->     }
 | 
			
		||||
                                 ->   ]
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The `socket_scope` option tells whether it is submitted per socket or per hwthread. If a metric is only used for internal calculations, you can set `publish = false`.
 | 
			
		||||
 | 
			
		||||
Since some metrics can only be gathered in multiple measurements (like the memory bandwidth on AMD Zen3 chips), configure multiple eventsets like in the example config and use the `globalmetrics` section to combine them. **Be aware** that the combination might be misleading because the "behavior" of a metric changes over time and the multiple measurements might count different computing phases.
 | 
			
		||||
		Reference in New Issue
	
	Block a user