d98076c792
* DiskstatCollector: cast part_max_used metric to int * Add uint types to GangliaSink and LibgangliaSink * Use new sink instances to allow multiple of same sink type * Update sink README and SampleSink * Use new receiver instances to allow multiple of same receiver type * Fix metric scope in likwid configuration script * Mention likwid config script in LikwidCollector README * Refactor: Embed Init() into New() function * Refactor: Embed Init() into New() function * Fix: MetricReceiver uses uninitialized values, when initialization fails * Use Ganglia configuration (#44) * Copy all metric configurations from original Ganglia code * Use metric configurations from Ganglia for some metrics * Format value string also for known metrics * Numa-aware memstat collector (#45) * Add samples for collectors, sinks and receivers * Ping InfluxDB server after connecting to recognize faulty connections * Add sink for Prometheus monitoring system (#46) * Add sink for Prometheus monitoring system * Add prometheus sink to README * Add scraper for Prometheus clients (#47) Co-authored-by: Holger Obermaier <holgerob@gmx.de> Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com> |
||
---|---|---|
.github | ||
collectors | ||
internal | ||
receivers | ||
scripts | ||
sinks | ||
.gitignore | ||
.gitmodules | ||
collectors.json | ||
config.json | ||
go.mod | ||
go.sum | ||
LICENSE | ||
Makefile | ||
metric-collector.go | ||
README.md | ||
receivers.json | ||
router.json | ||
sinks.json |
cc-metric-collector
A node agent for measuring, processing and forwarding node level metrics. It is part of the ClusterCockpit ecosystem.
The metric collector sends (and receives) metric in the InfluxDB line protocol as it provides flexibility while providing a separation between tags (like index columns in relational databases) and fields (like data columns).
There is a single timer loop that triggers all collectors serially, collects the collectors' data and sends the metrics to the sink. This is done as all data is submitted with a single time stamp. The sinks currently use mostly blocking APIs.
The receiver runs as a go routine side-by-side with the timer loop and asynchronously forwards received metrics to the sink.
Configuration
Configuration is implemented using a single json document that is distributed over network and may be persisted as file. Supported metrics are documented here.
There is a main configuration file with basic settings that point to the other configuration files for the different components.
{
"sinks": "sinks.json",
"collectors" : "collectors.json",
"receivers" : "receivers.json",
"router" : "router.json",
"interval": 10,
"duration": 1
}
The interval
defines how often the metrics should be read and send to the sink. The duration
tells collectors how long one measurement has to take. This is important for some collectors, like the likwid
collector.
See the component READMEs for their configuration:
Installation
$ git clone git@github.com:ClusterCockpit/cc-metric-collector.git
$ make (downloads LIKWID, builds it as static library with 'direct' accessmode and copies all required files for the collector)
$ go get (requires at least golang 1.16)
$ make
Running
$ ./cc-metric-collector --help
Usage of metric-collector:
-config string
Path to configuration file (default "./config.json")
-log string
Path for logfile (default "stderr")
-once
Run all collectors only once
Scenarios
The metric collector was designed with flexibility in mind, so it can be used in many scenarios. Here are a few:
flowchart TD
subgraph a ["Cluster A"]
nodeA[NodeA with CC collector]
nodeB[NodeB with CC collector]
nodeC[NodeC with CC collector]
end
a --> db[(Database)]
db <--> ccweb("Webfrontend")
flowchart TD
subgraph a [ClusterA]
direction LR
nodeA[NodeA with CC collector]
nodeB[NodeB with CC collector]
nodeC[NodeC with CC collector]
end
subgraph b [ClusterB]
direction LR
nodeD[NodeD with CC collector]
nodeE[NodeE with CC collector]
nodeF[NodeF with CC collector]
end
a --> ccrecv{"CC collector as receiver"}
b --> ccrecv
ccrecv --> db[("Database1")]
ccrecv -.-> db2[("Database2")]
db <-.-> ccweb("Webfrontend")
Contributing
The ClusterCockpit ecosystem is designed to be used by different HPC computing centers. Since configurations and setups differ between the centers, the centers likely have to put some work into the cc-metric-collector to gather all desired metrics.
You are free to open an issue to request a collector but we would also be happy about PRs.