From ac41f02dfc8d36f23a3604c315889bd2d00fffaa Mon Sep 17 00:00:00 2001 From: Thomas Roehl Date: Tue, 18 May 2021 15:42:11 +0200 Subject: [PATCH] Update READMEs --- README.md | 20 ++++++++++++++++++-- collectors/README.md | 5 +++++ 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index bea783e..a7ec515 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,7 @@ Supported metrics are documented [here](https://github.com/ClusterCockpit/cc-spe "host": "localhost", "port": "8080", "database": "testdb", + "organisation": "testorg", "type": "stdout" }, "interval" : 3, @@ -29,12 +30,21 @@ Supported metrics are documented [here](https://github.com/ClusterCockpit/cc-spe "loadavg", "netstat", "ibstat", - "lustrestat" + "lustrestat", + "topprocs", + "cpustat", + "nvidia" ] + "receiver": { + "type": "none" + "address": "127.0.0.1", + "port": "4222", + "database": "testdb", + } } ``` -All available collectors are listed in the above JSON. There are currently three sinks supported `influxdb`, `nats` and `stdout`. The `interval` defines how often the metrics should be read and send to the sink. The `duration` tells collectors how long one measurement has to take. An example for this is the `likwid` collector which starts the hardware performance counter, waits for `duration` seconds and stops the counters again. For most systems, the `likwid` collector has to do two measurements, thus `interval` must be larger than two times `duration`. +All available collectors are listed in the above JSON. There are currently three sinks supported `influxdb`, `nats` and `stdout`. The `interval` defines how often the metrics should be read and send to the sink. The `duration` tells collectors how long one measurement has to take. An example for this is the `likwid` collector which starts the hardware performance counter, waits for `duration` seconds and stops the counters again. For most systems, the `likwid` collector has to do two measurements, thus `interval` must be larger than two times `duration`. With `receiver`, the collector can be used as a router by receiving metrics and forwarding them to the configured sink. There are currently only types `none` (for no receiver) and `nats`. # Installation @@ -59,3 +69,9 @@ Usage of metric-collector: Path for logfile (default "stderr") ``` +# Internals +The metric collector sends (and receives) metric in the [InfluxDB line protocol](https://docs.influxdata.com/influxdb/cloud/reference/syntax/line-protocol/) as it provides flexibility while providing a separation between tags (like index columns in relational databases) and fields (like data columns). + +There is a single timer loop that triggers all collectors serially, collects the collectors' data and sends the metrics to the sink. The sinks currently use blocking APIs. + +The receiver runs as a go routine side-by-side with the timer loop and asynchronously forwards received metrics to the sink. diff --git a/collectors/README.md b/collectors/README.md index ab77281..364fa2d 100644 --- a/collectors/README.md +++ b/collectors/README.md @@ -37,6 +37,11 @@ The base class/configuration is located in `metricCollector.go`. * `flops_any` (cpu, `2*flops_dp + flops_sp`) * `cpi` (cpu) * `clock` (cpu) +* `cpustatMetric.go`: Read CPU specific values from `/proc/stat` +* `topprocsMetric.go`: Reads the Top5 processes by their CPU usage +* `nvidiaMetric.go`: Read data about Nvidia GPUs using the NVML library + +If any of the collectors cannot be initialized, it is excluded from all further reads. Like if the Lustre stat file is not a valid path, no Lustre specific metrics will be recorded. # InfiniBand collector The InfiniBand collector requires the LID file to read the data. It has to be configured in the collector itself (`LIDFILE` in `infinibandMetric.go`)