Update READMEs

This commit is contained in:
Thomas Roehl 2021-05-18 15:42:11 +02:00
parent 97c31fbcd0
commit ac41f02dfc
2 changed files with 23 additions and 2 deletions

View File

@ -19,6 +19,7 @@ Supported metrics are documented [here](https://github.com/ClusterCockpit/cc-spe
"host": "localhost", "host": "localhost",
"port": "8080", "port": "8080",
"database": "testdb", "database": "testdb",
"organisation": "testorg",
"type": "stdout" "type": "stdout"
}, },
"interval" : 3, "interval" : 3,
@ -29,12 +30,21 @@ Supported metrics are documented [here](https://github.com/ClusterCockpit/cc-spe
"loadavg", "loadavg",
"netstat", "netstat",
"ibstat", "ibstat",
"lustrestat" "lustrestat",
"topprocs",
"cpustat",
"nvidia"
] ]
"receiver": {
"type": "none"
"address": "127.0.0.1",
"port": "4222",
"database": "testdb",
}
} }
``` ```
All available collectors are listed in the above JSON. There are currently three sinks supported `influxdb`, `nats` and `stdout`. The `interval` defines how often the metrics should be read and send to the sink. The `duration` tells collectors how long one measurement has to take. An example for this is the `likwid` collector which starts the hardware performance counter, waits for `duration` seconds and stops the counters again. For most systems, the `likwid` collector has to do two measurements, thus `interval` must be larger than two times `duration`. All available collectors are listed in the above JSON. There are currently three sinks supported `influxdb`, `nats` and `stdout`. The `interval` defines how often the metrics should be read and send to the sink. The `duration` tells collectors how long one measurement has to take. An example for this is the `likwid` collector which starts the hardware performance counter, waits for `duration` seconds and stops the counters again. For most systems, the `likwid` collector has to do two measurements, thus `interval` must be larger than two times `duration`. With `receiver`, the collector can be used as a router by receiving metrics and forwarding them to the configured sink. There are currently only types `none` (for no receiver) and `nats`.
# Installation # Installation
@ -59,3 +69,9 @@ Usage of metric-collector:
Path for logfile (default "stderr") Path for logfile (default "stderr")
``` ```
# Internals
The metric collector sends (and receives) metric in the [InfluxDB line protocol](https://docs.influxdata.com/influxdb/cloud/reference/syntax/line-protocol/) as it provides flexibility while providing a separation between tags (like index columns in relational databases) and fields (like data columns).
There is a single timer loop that triggers all collectors serially, collects the collectors' data and sends the metrics to the sink. The sinks currently use blocking APIs.
The receiver runs as a go routine side-by-side with the timer loop and asynchronously forwards received metrics to the sink.

View File

@ -37,6 +37,11 @@ The base class/configuration is located in `metricCollector.go`.
* `flops_any` (cpu, `2*flops_dp + flops_sp`) * `flops_any` (cpu, `2*flops_dp + flops_sp`)
* `cpi` (cpu) * `cpi` (cpu)
* `clock` (cpu) * `clock` (cpu)
* `cpustatMetric.go`: Read CPU specific values from `/proc/stat`
* `topprocsMetric.go`: Reads the Top5 processes by their CPU usage
* `nvidiaMetric.go`: Read data about Nvidia GPUs using the NVML library
If any of the collectors cannot be initialized, it is excluded from all further reads. Like if the Lustre stat file is not a valid path, no Lustre specific metrics will be recorded.
# InfiniBand collector # InfiniBand collector
The InfiniBand collector requires the LID file to read the data. It has to be configured in the collector itself (`LIDFILE` in `infinibandMetric.go`) The InfiniBand collector requires the LID file to read the data. It has to be configured in the collector itself (`LIDFILE` in `infinibandMetric.go`)