mirror of https://github.com/ClusterCockpit/cc-metric-collector.git synced 2026-06-01 01:27:29 +02:00

Go to file

Thomas Gruber 3f76947f54 Merge latest developments into main (#67 )

* Update configuration.md

Add an additional receiver to have better alignment of components

* Change default GpfsCollector command to `mmpmon` (#53)

* Set default cmd to 'mmpmon'

* Reuse looked up path

* Cast const to string

* Just download LIKWID to get the headers (#54)

* Just download LIKWID to get the headers

* Remove perl-Data-Dumper from BuildRequires, only required by LIKWID build

* Add HttpReceiver as counterpart to the HttpSink (#49)

* Use GBytes as unit for large memory numbers

* Make maxForward configurable, save old name in meta in rename metrics and make the hostname tag key configurable

* Single release action (#55)

Building all RPMs and releasing in a single workflow

* Makefile target to build binary-only Debian packages (#61)

* Add 'install' and 'DEB' make targets to build binary-only Debian packages

* Add control file for DEB builds

* Use a single line for bash loop in make clean

* Add config options for retry intervals of InfluxDB clients (#59)

* Refactoring of LikwidCollector and metric units (#62)

* Reduce complexity of LikwidCollector and allow metric units

* Add unit to LikwidCollector docu and fix some typos

* Make library path configurable

* Use old metric name in Ganglia if rename has happened in the router (#60)

* Use old metric name if rename has happened in the router

* Also check for Ganglia renames for the oldname

* Derived metrics (#57)

* Add time-based derivatived (e.g. bandwidth) to some collectors

* Add documentation

* Add comments

* Fix: Only compute rates with a valid previous state

* Only compute rates with a valid previous state

* Define const values for net/dev fields

* Set default config values

* Add comments

* Refactor: Consolidate data structures

* Refactor: Consolidate data structures

* Refactor: Avoid struct deep copy

* Refactor: Avoid redundant tag maps

* Refactor: Use int64 type for absolut values

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>

* Simplified iota usage

* Move unit tag to meta data tags

* Derived metrics (#65)

* Add time-based derivatived (e.g. bandwidth) to some collectors

* Add documentation

* Add comments

* Fix: Only compute rates with a valid previous state

* Only compute rates with a valid previous state

* Define const values for net/dev fields

* Set default config values

* Add comments

* Refactor: Consolidate data structures

* Refactor: Consolidate data structures

* Refactor: Avoid struct deep copy

* Refactor: Avoid redundant tag maps

* Refactor: Use int64 type for absolut values

* Update LustreCollector

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>

* Meta to tags list and map for sinks (#63)

* Change ccMetric->Influx functions

* Use a meta_as_tags string list in config but create a lookup map afterwards

* Add meta as tag logic to sampleSink

* Fix staticcheck warnings (#66)

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>

2022-03-15 16:41:11 +01:00

.github

Merge latest developments into main (#67 )

2022-03-15 16:41:11 +01:00

collectors

Merge latest developments into main (#67 )

2022-03-15 16:41:11 +01:00

docs

Merge latest developments into main (#67 )

2022-03-15 16:41:11 +01:00

internal

Merge latest developments into main (#67 )

2022-03-15 16:41:11 +01:00

receivers

Merge latest developments into main (#67 )

2022-03-15 16:41:11 +01:00

scripts

Merge latest developments into main (#67 )

2022-03-15 16:41:11 +01:00

sinks

Merge latest developments into main (#67 )

2022-03-15 16:41:11 +01:00

.gitignore

Initial commit

2021-02-16 16:24:11 +01:00

.gitmodules

Ganglia sink using libganglia.so directly (#35 )

2022-02-16 18:33:46 +01:00

cc-metric-collector.go

Rename main file to match with executable name

2022-03-03 11:02:37 +01:00

collectors.json

Merge latest developments into main (#67 )

2022-03-15 16:41:11 +01:00

config.json

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

go.mod

Add gval to go files

2022-01-30 15:13:12 +01:00

go.sum

Add gval to go files

2022-01-30 15:13:12 +01:00

LICENSE

Initial commit

2021-02-16 16:24:11 +01:00

Makefile

Merge latest developments into main (#67 )

2022-03-15 16:41:11 +01:00

README.md

Update README.md

2022-02-22 15:10:27 +01:00

receivers.json

Use receiver names from config (#34 )

2022-02-21 12:45:08 +01:00

router.json

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

sinks.json

Sink specific configuration maps (#25 )

2022-02-04 18:12:24 +01:00

README.md

cc-metric-collector

A node agent for measuring, processing and forwarding node level metrics. It is part of the ClusterCockpit ecosystem.

The metric collector sends (and receives) metric in the InfluxDB line protocol as it provides flexibility while providing a separation between tags (like index columns in relational databases) and fields (like data columns).

There is a single timer loop that triggers all collectors serially, collects the collectors' data and sends the metrics to the sink. This is done as all data is submitted with a single time stamp. The sinks currently use mostly blocking APIs.

The receiver runs as a go routine side-by-side with the timer loop and asynchronously forwards received metrics to the sink.

Configuration

Configuration is implemented using a single json document that is distributed over network and may be persisted as file. Supported metrics are documented here.

There is a main configuration file with basic settings that point to the other configuration files for the different components.

{
  "sinks": "sinks.json",
  "collectors" : "collectors.json",
  "receivers" : "receivers.json",
  "router" : "router.json",
  "interval": 10,
  "duration": 1
}

The interval defines how often the metrics should be read and send to the sink. The duration tells collectors how long one measurement has to take. This is important for some collectors, like the likwid collector.

See the component READMEs for their configuration:

Installation

$ git clone git@github.com:ClusterCockpit/cc-metric-collector.git
$ make (downloads LIKWID, builds it as static library with 'direct' accessmode and copies all required files for the collector)
$ go get (requires at least golang 1.16)
$ make

Running

$ ./cc-metric-collector --help
Usage of metric-collector:
  -config string
    	Path to configuration file (default "./config.json")
  -log string
    	Path for logfile (default "stderr")
  -once
    	Run all collectors only once

Scenarios

The metric collector was designed with flexibility in mind, so it can be used in many scenarios. Here are a few:

flowchart TD
  subgraph a ["Cluster A"]
  nodeA[NodeA with CC collector]
  nodeB[NodeB with CC collector]
  nodeC[NodeC with CC collector]
  end
  a --> db[(Database)]
  db <--> ccweb("Webfrontend")

flowchart TD
  subgraph a [ClusterA]
  direction LR
  nodeA[NodeA with CC collector]
  nodeB[NodeB with CC collector]
  nodeC[NodeC with CC collector]
  end
  subgraph b [ClusterB]
  direction LR
  nodeD[NodeD with CC collector]
  nodeE[NodeE with CC collector]
  nodeF[NodeF with CC collector]
  end
  a --> ccrecv{"CC collector as receiver"}
  b --> ccrecv
  ccrecv --> db[("Database1")]
  ccrecv -.-> db2[("Database2")]
  db <-.-> ccweb("Webfrontend")

Contributing

The ClusterCockpit ecosystem is designed to be used by different HPC computing centers. Since configurations and setups differ between the centers, the centers likely have to put some work into the cc-metric-collector to gather all desired metrics.

You are free to open an issue to request a collector but we would also be happy about PRs.

README.md

cc-metric-collector

Configuration

Installation

Running

Scenarios

Contributing

Contact