diff --git a/README.md b/README.md index 015dc40..92e4a48 100644 --- a/README.md +++ b/README.md @@ -9,33 +9,31 @@ Open questions: # Configuration Configuration is implemented using a single json document that is distributed over network and may be persisted as file. -Granularity can be either `node`, or `core`. Frequency can be set on a per measurement basis. Supported metrics are documented [here](https://github.com/ClusterCockpit/cc-specifications/blob/master/metrics/lineprotocol.md). ``` json { - "sink": "db.monitoring.center.de", - "report": { - levels: ["core","node"], - interval: 120 - }, - "schedule": { - "core": { - "frequency": 30, - "duration": 10}, - "node":{ - "frequency": 60, - "duration": 20} - }, - "metrics": [ - "ipc", - "flops_any", - "clock", - "load", - "mem_bw", - "mem_used", - "net_bw", - "file_bw" - ] + "sink": { + "user": "admin", + "password": "12345", + "host": "localhost", + "port": "8080", + "database": "testdb", + "type": "stdout" + }, + "interval" : 3, + "duration" : 1, + "collectors": [ + "memstat", + "likwid", + "loadavg", + "netstat", + "ibstat", + "lustrestat" + ] } ``` + +All available collectors are listed in the above JSON. There are currently two sinks supported `influxdb` and `stdout`. The `interval` defines how often the metrics should be read and send to the sink. The `duration` tells collectors how long one measurement has to take. An example for this is the `likwid` collector which starts the hardware performance counter, waits for `duration` seconds and stops the counters again. For most systems, the `likwid` collector has to do two measurements, thus `interval` must be larger than two times `duration`. + + diff --git a/collectors/Makefile b/collectors/Makefile index 406e55a..965ec05 100644 --- a/collectors/Makefile +++ b/collectors/Makefile @@ -1,7 +1,13 @@ - +# LIKWID version LIKWID_VERSION = 5.1.0 +# Target user for LIKWID's accessdaemon DAEMON_USER=root +# Target group for LIKWID's accessdaemon DAEMON_GROUP=root + +################################################# +# No need to change anything below this line +################################################# INSTALL_FOLDER = ./likwid BUILD_FOLDER = ./likwid/build diff --git a/collectors/README.md b/collectors/README.md new file mode 100644 index 0000000..639af05 --- /dev/null +++ b/collectors/README.md @@ -0,0 +1,53 @@ +This folder contains the collectors for the cc-metric-collector. + +# `metricCollector.go` +The base class/configuration is located in `metricCollector.go`. + +# Collectors + +* `memstatMetric.go`: Reads `/proc/meminfo` to calculate the **node** metric `mem_used` +* `loadavgMetric.go`: Reads `/proc/loadavg` and submits the **node** metrics: + * `load_one` + * `load_five` + * `load_fifteen` +* `netstatMetric.go`: Reads `/proc/net/dev` and submits for all network devices (except loopback `lo`) the **node** metrics: + * `_bytes_in` + * `_bytes_out` + * `_pkts_in` + * `_pkts_out` +* `lustreMetric.go`: Reads Lustre's stats file `/proc/fs/lustre/llite/lnec-XXXXXX/stats` and submits the **node** metrics: + * `read_bytes` + * `read_requests` + * `write_bytes` + * `write_bytes` + * `open` + * `close` + * `getattr` + * `setattr` + * `statfs` + * `inode_permission` +* `infinibandMetric.go`: Reads InfiniBand LID from `/sys/class/infiniband/mlx4_0/ports/1/lid` and uses the `perfquery` command to read the **node** metrics: + * `ib_recv` + * `ib_xmit` +* `likwidMetric.go`: Reads hardware performance events using LIKWID. It submits **socket** and **cpu** metrics: + * `mem_bw` (socket) + * `power` (socket, Sum of RAPL domains PKG and DRAM) + * `flops_dp` (cpu) + * `flops_sp` (cpu) + * `flops_any` (cpu, `2*flops_dp + flops_sp`) + * `cpi` (cpu) + * `clock` (cpu) + +# Installation +Only the `likwidMetric.go` requires preparation steps. For this, the `Makefile` can be used. The LIKWID build needs to be configured: +* Version of LIKWID in `LIKWID_VERSION` +* Target user for LIKWID's accessdaemon in `DAEMON_USER`. The user has to have enough permissions to read the `msr` and `pci` device files +* Target group for LIKWID's accessdaemon in `DAEMON_GROUP` + +It performs the following steps: +* Download LIKWID tarball +* Unpacking +* Adjusting configuration for LIKWID build +* Build it +* Copy all required files into `collectors/likwid` +* The accessdaemon is installed with the suid bit set using `sudo` also into `collectors/likwid` diff --git a/sinks/README.md b/sinks/README.md new file mode 100644 index 0000000..6c4b999 --- /dev/null +++ b/sinks/README.md @@ -0,0 +1,12 @@ +This folder contains the sinks for the cc-metric-collector. + +# `sink.go` +The base class/configuration is located in `sink.go`. + +# Sinks +There are currently two sinks shipped with the cc-metric-collector: +* `stdoutSink.go`: Writes all metrics to `stdout` in InfluxDB line protocol. The sink does not use https://github.com/influxdata/line-protocol to reduce the executed code for debugging +* `influxSink.go`: Writes all metrics to an InfluxDB database instance using a blocking writer. It uses https://github.com/influxdata/influxdb-client-go . Configuration for the server, port, user, password and database name are in the global configuration file + +# Installation +Nothing to do, all sinks are pure Go code