A node agent for measuring, processing and forwarding node level metrics
Go to file
2021-03-29 15:23:55 +02:00
collectors Update collectors' README 2021-03-29 15:23:55 +02:00
sinks Add NatsSink to send data via the NATS messaging system 2021-03-29 14:14:38 +02:00
.gitignore Initial commit 2021-02-16 16:24:11 +01:00
config.json Add StdoutSink for debugging purposes 2021-03-26 17:03:46 +01:00
go.mod Add sink for InfluxDB (with the original InfluxDB client) 2021-03-26 16:48:09 +01:00
go.sum Add sink for InfluxDB (with the original InfluxDB client) 2021-03-26 16:48:09 +01:00
LICENSE Initial commit 2021-02-16 16:24:11 +01:00
metric-collector.go Add NatsSink to send data via the NATS messaging system 2021-03-29 14:14:38 +02:00
README.md Add READMEs and Makefile to build and integrate LIKWID 2021-03-27 13:39:43 +01:00

cc-metric-collector

A node agent for measuring, processing and forwarding node level metrics.

Open questions:

  • Are hostname unique with a computing center or is it required to store the cluster name in addition to the hostname?
  • What about memory domain granularity?

Configuration

Configuration is implemented using a single json document that is distributed over network and may be persisted as file. Supported metrics are documented here.

{
    "sink": {
        "user": "admin",
        "password": "12345",
        "host": "localhost",
        "port": "8080",
        "database": "testdb",
        "type": "stdout"
    },
    "interval" : 3,
    "duration" : 1,
    "collectors": [
        "memstat",
        "likwid",
        "loadavg",
        "netstat",
        "ibstat",
        "lustrestat"
    ]
}

All available collectors are listed in the above JSON. There are currently two sinks supported influxdb and stdout. The interval defines how often the metrics should be read and send to the sink. The duration tells collectors how long one measurement has to take. An example for this is the likwid collector which starts the hardware performance counter, waits for duration seconds and stops the counters again. For most systems, the likwid collector has to do two measurements, thus interval must be larger than two times duration.