mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2024-11-10 12:37:25 +01:00
Add READMEs and Makefile to build and integrate LIKWID
This commit is contained in:
parent
5c5e8b8f2d
commit
bc5d4b288e
42
README.md
42
README.md
@ -9,33 +9,31 @@ Open questions:
|
|||||||
# Configuration
|
# Configuration
|
||||||
|
|
||||||
Configuration is implemented using a single json document that is distributed over network and may be persisted as file.
|
Configuration is implemented using a single json document that is distributed over network and may be persisted as file.
|
||||||
Granularity can be either `node`, or `core`. Frequency can be set on a per measurement basis.
|
|
||||||
Supported metrics are documented [here](https://github.com/ClusterCockpit/cc-specifications/blob/master/metrics/lineprotocol.md).
|
Supported metrics are documented [here](https://github.com/ClusterCockpit/cc-specifications/blob/master/metrics/lineprotocol.md).
|
||||||
|
|
||||||
``` json
|
``` json
|
||||||
{
|
{
|
||||||
"sink": "db.monitoring.center.de",
|
"sink": {
|
||||||
"report": {
|
"user": "admin",
|
||||||
levels: ["core","node"],
|
"password": "12345",
|
||||||
interval: 120
|
"host": "localhost",
|
||||||
|
"port": "8080",
|
||||||
|
"database": "testdb",
|
||||||
|
"type": "stdout"
|
||||||
},
|
},
|
||||||
"schedule": {
|
"interval" : 3,
|
||||||
"core": {
|
"duration" : 1,
|
||||||
"frequency": 30,
|
"collectors": [
|
||||||
"duration": 10},
|
"memstat",
|
||||||
"node":{
|
"likwid",
|
||||||
"frequency": 60,
|
"loadavg",
|
||||||
"duration": 20}
|
"netstat",
|
||||||
},
|
"ibstat",
|
||||||
"metrics": [
|
"lustrestat"
|
||||||
"ipc",
|
|
||||||
"flops_any",
|
|
||||||
"clock",
|
|
||||||
"load",
|
|
||||||
"mem_bw",
|
|
||||||
"mem_used",
|
|
||||||
"net_bw",
|
|
||||||
"file_bw"
|
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
All available collectors are listed in the above JSON. There are currently two sinks supported `influxdb` and `stdout`. The `interval` defines how often the metrics should be read and send to the sink. The `duration` tells collectors how long one measurement has to take. An example for this is the `likwid` collector which starts the hardware performance counter, waits for `duration` seconds and stops the counters again. For most systems, the `likwid` collector has to do two measurements, thus `interval` must be larger than two times `duration`.
|
||||||
|
|
||||||
|
|
||||||
|
@ -1,7 +1,13 @@
|
|||||||
|
# LIKWID version
|
||||||
LIKWID_VERSION = 5.1.0
|
LIKWID_VERSION = 5.1.0
|
||||||
|
# Target user for LIKWID's accessdaemon
|
||||||
DAEMON_USER=root
|
DAEMON_USER=root
|
||||||
|
# Target group for LIKWID's accessdaemon
|
||||||
DAEMON_GROUP=root
|
DAEMON_GROUP=root
|
||||||
|
|
||||||
|
#################################################
|
||||||
|
# No need to change anything below this line
|
||||||
|
#################################################
|
||||||
INSTALL_FOLDER = ./likwid
|
INSTALL_FOLDER = ./likwid
|
||||||
BUILD_FOLDER = ./likwid/build
|
BUILD_FOLDER = ./likwid/build
|
||||||
|
|
||||||
|
53
collectors/README.md
Normal file
53
collectors/README.md
Normal file
@ -0,0 +1,53 @@
|
|||||||
|
This folder contains the collectors for the cc-metric-collector.
|
||||||
|
|
||||||
|
# `metricCollector.go`
|
||||||
|
The base class/configuration is located in `metricCollector.go`.
|
||||||
|
|
||||||
|
# Collectors
|
||||||
|
|
||||||
|
* `memstatMetric.go`: Reads `/proc/meminfo` to calculate the **node** metric `mem_used`
|
||||||
|
* `loadavgMetric.go`: Reads `/proc/loadavg` and submits the **node** metrics:
|
||||||
|
* `load_one`
|
||||||
|
* `load_five`
|
||||||
|
* `load_fifteen`
|
||||||
|
* `netstatMetric.go`: Reads `/proc/net/dev` and submits for all network devices (except loopback `lo`) the **node** metrics:
|
||||||
|
* `<dev>_bytes_in`
|
||||||
|
* `<dev>_bytes_out`
|
||||||
|
* `<dev>_pkts_in`
|
||||||
|
* `<dev>_pkts_out`
|
||||||
|
* `lustreMetric.go`: Reads Lustre's stats file `/proc/fs/lustre/llite/lnec-XXXXXX/stats` and submits the **node** metrics:
|
||||||
|
* `read_bytes`
|
||||||
|
* `read_requests`
|
||||||
|
* `write_bytes`
|
||||||
|
* `write_bytes`
|
||||||
|
* `open`
|
||||||
|
* `close`
|
||||||
|
* `getattr`
|
||||||
|
* `setattr`
|
||||||
|
* `statfs`
|
||||||
|
* `inode_permission`
|
||||||
|
* `infinibandMetric.go`: Reads InfiniBand LID from `/sys/class/infiniband/mlx4_0/ports/1/lid` and uses the `perfquery` command to read the **node** metrics:
|
||||||
|
* `ib_recv`
|
||||||
|
* `ib_xmit`
|
||||||
|
* `likwidMetric.go`: Reads hardware performance events using LIKWID. It submits **socket** and **cpu** metrics:
|
||||||
|
* `mem_bw` (socket)
|
||||||
|
* `power` (socket, Sum of RAPL domains PKG and DRAM)
|
||||||
|
* `flops_dp` (cpu)
|
||||||
|
* `flops_sp` (cpu)
|
||||||
|
* `flops_any` (cpu, `2*flops_dp + flops_sp`)
|
||||||
|
* `cpi` (cpu)
|
||||||
|
* `clock` (cpu)
|
||||||
|
|
||||||
|
# Installation
|
||||||
|
Only the `likwidMetric.go` requires preparation steps. For this, the `Makefile` can be used. The LIKWID build needs to be configured:
|
||||||
|
* Version of LIKWID in `LIKWID_VERSION`
|
||||||
|
* Target user for LIKWID's accessdaemon in `DAEMON_USER`. The user has to have enough permissions to read the `msr` and `pci` device files
|
||||||
|
* Target group for LIKWID's accessdaemon in `DAEMON_GROUP`
|
||||||
|
|
||||||
|
It performs the following steps:
|
||||||
|
* Download LIKWID tarball
|
||||||
|
* Unpacking
|
||||||
|
* Adjusting configuration for LIKWID build
|
||||||
|
* Build it
|
||||||
|
* Copy all required files into `collectors/likwid`
|
||||||
|
* The accessdaemon is installed with the suid bit set using `sudo` also into `collectors/likwid`
|
12
sinks/README.md
Normal file
12
sinks/README.md
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
This folder contains the sinks for the cc-metric-collector.
|
||||||
|
|
||||||
|
# `sink.go`
|
||||||
|
The base class/configuration is located in `sink.go`.
|
||||||
|
|
||||||
|
# Sinks
|
||||||
|
There are currently two sinks shipped with the cc-metric-collector:
|
||||||
|
* `stdoutSink.go`: Writes all metrics to `stdout` in InfluxDB line protocol. The sink does not use https://github.com/influxdata/line-protocol to reduce the executed code for debugging
|
||||||
|
* `influxSink.go`: Writes all metrics to an InfluxDB database instance using a blocking writer. It uses https://github.com/influxdata/influxdb-client-go . Configuration for the server, port, user, password and database name are in the global configuration file
|
||||||
|
|
||||||
|
# Installation
|
||||||
|
Nothing to do, all sinks are pure Go code
|
Loading…
Reference in New Issue
Block a user