mirror of
				https://github.com/ClusterCockpit/cc-metric-collector.git
				synced 2025-11-04 02:35:07 +01:00 
			
		
		
		
	Add READMEs and Makefile to build and integrate LIKWID
This commit is contained in:
		
							
								
								
									
										42
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										42
									
								
								README.md
									
									
									
									
									
								
							@@ -9,33 +9,31 @@ Open questions:
 | 
			
		||||
# Configuration
 | 
			
		||||
 | 
			
		||||
Configuration is implemented using a single json document that is distributed over network and may be persisted as file.
 | 
			
		||||
Granularity can be either `node`, or `core`. Frequency can be set on a per measurement basis.
 | 
			
		||||
Supported metrics are documented [here](https://github.com/ClusterCockpit/cc-specifications/blob/master/metrics/lineprotocol.md).
 | 
			
		||||
 | 
			
		||||
``` json
 | 
			
		||||
{
 | 
			
		||||
   "sink": "db.monitoring.center.de",
 | 
			
		||||
   "report": {
 | 
			
		||||
      levels: ["core","node"],
 | 
			
		||||
      interval: 120
 | 
			
		||||
    "sink": {
 | 
			
		||||
        "user": "admin",
 | 
			
		||||
        "password": "12345",
 | 
			
		||||
        "host": "localhost",
 | 
			
		||||
        "port": "8080",
 | 
			
		||||
        "database": "testdb",
 | 
			
		||||
        "type": "stdout"
 | 
			
		||||
    },
 | 
			
		||||
   "schedule": {
 | 
			
		||||
      "core": {
 | 
			
		||||
         "frequency": 30,
 | 
			
		||||
         "duration": 10},
 | 
			
		||||
      "node":{
 | 
			
		||||
         "frequency": 60,
 | 
			
		||||
         "duration": 20}
 | 
			
		||||
   },
 | 
			
		||||
   "metrics": [
 | 
			
		||||
      "ipc",
 | 
			
		||||
      "flops_any",
 | 
			
		||||
      "clock",
 | 
			
		||||
      "load",
 | 
			
		||||
      "mem_bw",
 | 
			
		||||
      "mem_used",
 | 
			
		||||
      "net_bw",
 | 
			
		||||
      "file_bw"
 | 
			
		||||
    "interval" : 3,
 | 
			
		||||
    "duration" : 1,
 | 
			
		||||
    "collectors": [
 | 
			
		||||
        "memstat",
 | 
			
		||||
        "likwid",
 | 
			
		||||
        "loadavg",
 | 
			
		||||
        "netstat",
 | 
			
		||||
        "ibstat",
 | 
			
		||||
        "lustrestat"
 | 
			
		||||
    ]
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
All available collectors are listed in the above JSON. There are currently two sinks supported `influxdb` and `stdout`. The `interval` defines how often the metrics should be read and send to the sink. The `duration` tells collectors how long one measurement has to take. An example for this is the `likwid` collector which starts the hardware performance counter, waits for `duration` seconds and stops the counters again. For most systems, the `likwid` collector has to do two measurements, thus `interval` must be larger than two times `duration`.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 
 | 
			
		||||
@@ -1,7 +1,13 @@
 | 
			
		||||
 | 
			
		||||
# LIKWID version
 | 
			
		||||
LIKWID_VERSION = 5.1.0
 | 
			
		||||
# Target user for LIKWID's accessdaemon
 | 
			
		||||
DAEMON_USER=root
 | 
			
		||||
# Target group for LIKWID's accessdaemon
 | 
			
		||||
DAEMON_GROUP=root
 | 
			
		||||
 | 
			
		||||
#################################################
 | 
			
		||||
# No need to change anything below this line
 | 
			
		||||
#################################################
 | 
			
		||||
INSTALL_FOLDER = ./likwid
 | 
			
		||||
BUILD_FOLDER = ./likwid/build
 | 
			
		||||
 | 
			
		||||
 
 | 
			
		||||
							
								
								
									
										53
									
								
								collectors/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										53
									
								
								collectors/README.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,53 @@
 | 
			
		||||
This folder contains the collectors for the cc-metric-collector.
 | 
			
		||||
 | 
			
		||||
# `metricCollector.go`
 | 
			
		||||
The base class/configuration is located in `metricCollector.go`.
 | 
			
		||||
 | 
			
		||||
# Collectors
 | 
			
		||||
 | 
			
		||||
* `memstatMetric.go`: Reads `/proc/meminfo` to calculate the **node** metric `mem_used`
 | 
			
		||||
* `loadavgMetric.go`: Reads `/proc/loadavg` and submits the **node** metrics:
 | 
			
		||||
   * `load_one`
 | 
			
		||||
   * `load_five`
 | 
			
		||||
   * `load_fifteen`
 | 
			
		||||
* `netstatMetric.go`: Reads `/proc/net/dev` and submits for all network devices (except loopback `lo`) the **node** metrics:
 | 
			
		||||
   * `<dev>_bytes_in`
 | 
			
		||||
   * `<dev>_bytes_out`
 | 
			
		||||
   * `<dev>_pkts_in`
 | 
			
		||||
   * `<dev>_pkts_out`
 | 
			
		||||
* `lustreMetric.go`: Reads Lustre's stats file `/proc/fs/lustre/llite/lnec-XXXXXX/stats` and submits the **node** metrics:
 | 
			
		||||
   * `read_bytes`
 | 
			
		||||
   * `read_requests`
 | 
			
		||||
   * `write_bytes`
 | 
			
		||||
   * `write_bytes`
 | 
			
		||||
   * `open`
 | 
			
		||||
   * `close`
 | 
			
		||||
   * `getattr`
 | 
			
		||||
   * `setattr`
 | 
			
		||||
   * `statfs`
 | 
			
		||||
   * `inode_permission`
 | 
			
		||||
* `infinibandMetric.go`: Reads InfiniBand LID from `/sys/class/infiniband/mlx4_0/ports/1/lid` and uses the `perfquery` command to read the **node** metrics:
 | 
			
		||||
   * `ib_recv`
 | 
			
		||||
   * `ib_xmit`
 | 
			
		||||
* `likwidMetric.go`: Reads hardware performance events using LIKWID. It submits **socket** and **cpu** metrics:
 | 
			
		||||
   * `mem_bw` (socket)
 | 
			
		||||
   * `power` (socket, Sum of RAPL domains PKG and DRAM)
 | 
			
		||||
   * `flops_dp` (cpu)
 | 
			
		||||
   * `flops_sp` (cpu)
 | 
			
		||||
   * `flops_any` (cpu, `2*flops_dp + flops_sp`)
 | 
			
		||||
   * `cpi` (cpu)
 | 
			
		||||
   * `clock` (cpu)
 | 
			
		||||
 | 
			
		||||
# Installation
 | 
			
		||||
Only the `likwidMetric.go` requires preparation steps. For this, the `Makefile` can be used. The LIKWID build needs to be configured:
 | 
			
		||||
* Version of LIKWID in `LIKWID_VERSION`
 | 
			
		||||
* Target user for LIKWID's accessdaemon in `DAEMON_USER`. The user has to have enough permissions to read the `msr` and `pci` device files
 | 
			
		||||
* Target group for LIKWID's accessdaemon in `DAEMON_GROUP`
 | 
			
		||||
 | 
			
		||||
It performs the following steps:
 | 
			
		||||
* Download LIKWID tarball
 | 
			
		||||
* Unpacking
 | 
			
		||||
* Adjusting configuration for LIKWID build
 | 
			
		||||
* Build it
 | 
			
		||||
* Copy all required files into `collectors/likwid`
 | 
			
		||||
* The accessdaemon is installed with the suid bit set using `sudo` also into `collectors/likwid`
 | 
			
		||||
							
								
								
									
										12
									
								
								sinks/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										12
									
								
								sinks/README.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,12 @@
 | 
			
		||||
This folder contains the sinks for the cc-metric-collector.
 | 
			
		||||
 | 
			
		||||
# `sink.go`
 | 
			
		||||
The base class/configuration is located in `sink.go`.
 | 
			
		||||
 | 
			
		||||
# Sinks
 | 
			
		||||
There are currently two sinks shipped with the cc-metric-collector:
 | 
			
		||||
* `stdoutSink.go`: Writes all metrics to `stdout` in InfluxDB line protocol. The sink does not use https://github.com/influxdata/line-protocol to reduce the executed code for debugging
 | 
			
		||||
* `influxSink.go`: Writes all metrics to an InfluxDB database instance using a blocking writer. It uses https://github.com/influxdata/influxdb-client-go . Configuration for the server, port, user, password and database name are in the global configuration file
 | 
			
		||||
 | 
			
		||||
# Installation
 | 
			
		||||
Nothing to do, all sinks are pure Go code
 | 
			
		||||
		Reference in New Issue
	
	Block a user