mirror of https://github.com/ClusterCockpit/cc-metric-collector.git synced 2025-12-24 08:06:16 +01:00

Go to file

Thomas Gruber 200af84c54 Modularize the whole thing (#16 )

* Use channels, add a metric router, split up configuration and use extended version of Influx line protocol internally

* Use central timer for collectors and router. Add expressions to router

* Add expression to router config

* Update entry points

* Start with README

* Update README for CCMetric

* Formatting

* Update README.md

* Add README for MultiChanTicker

* Add README for MultiChanTicker

* Update README.md

* Add README to metric router

* Update main README

* Remove SinkEntity type

* Update README for sinks

* Update go files

* Update README for receivers

* Update collectors README

* Update collectors README

* Use seperate page per collector

* Fix for tempstat page

* Add docs for customcmd collector

* Add docs for ipmistat collector

* Add docs for topprocs collector

* Update customCmdMetric.md

* Use seconds when calculating LIKWID metrics

* Add IB metrics ib_recv_pkts and ib_xmit_pkts

* Drop domain part of host name

* Updated to latest stable version of likwid

* Define source code dependencies in Makefile

* Add GPFS / IBM Spectrum Scale collector

* Add vet and staticcheck make targets

* Add vet and staticcheck make targets

* Avoid go vet warning:
struct field tag `json:"..., omitempty"` not compatible with reflect.StructTag.Get: suspicious space in struct tag value
struct field tag `json:"...", omitempty` not compatible with reflect.StructTag.Get: key:"value" pairs not separated by spaces

* Add sample collector to README.md

* Add CPU frequency collector

* Avoid staticcheck warning: redundant return statement

* Avoid staticcheck warning: unnecessary assignment to the blank identifier

* Simplified code

* Add CPUFreqCollectorCpuinfo
a metric collector to measure the current frequency of the CPUs
as obtained from /proc/cpuinfo
Only measure on the first hyperthread

* Add collector for NFS clients

* Move publication of metrics into Flush() for NatsSink

* Update GitHub actions

* Refactoring

* Avoid vet warning: Println arg list ends with redundant newline

* Avoid vet warning struct field commands has json tag but is not exported

* Avoid vet warning: return copies lock value.

* Corrected typo

* Refactoring

* Add go sources in internal/...

* Bad separator in Makefile

* Fix Infiniband collector

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>

2022-01-25 15:37:43 +01:00

.github

Update GitHub actions

2022-01-24 15:55:15 +01:00

collectors

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

internal

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

receivers

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

scripts

Add Github Action to build RPM (#14 )

2021-11-29 16:04:50 +01:00

sinks

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

.gitignore

Initial commit

2021-02-16 16:24:11 +01:00

collectors.json

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

config.json

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

go.mod

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

go.sum

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

LICENSE

Initial commit

2021-02-16 16:24:11 +01:00

Makefile

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

metric-collector.go

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

README.md

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

receivers.json

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

router.json

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

sinks.json

Modularize the whole thing (#16 )

2022-01-25 15:37:43 +01:00

README.md

cc-metric-collector

A node agent for measuring, processing and forwarding node level metrics. It is part of the ClusterCockpit ecosystem.

The metric collector sends (and receives) metric in the InfluxDB line protocol as it provides flexibility while providing a separation between tags (like index columns in relational databases) and fields (like data columns).

There is a single timer loop that triggers all collectors serially, collects the collectors' data and sends the metrics to the sink. This is done as all data is submitted with a single time stamp. The sinks currently use mostly blocking APIs.

The receiver runs as a go routine side-by-side with the timer loop and asynchronously forwards received metrics to the sink.

Configuration

Configuration is implemented using a single json document that is distributed over network and may be persisted as file. Supported metrics are documented here.

There is a main configuration file with basic settings that point to the other configuration files for the different components.

{
  "sinks": "sinks.json",
  "collectors" : "collectors.json",
  "receivers" : "receivers.json",
  "router" : "router.json",
  "interval": 10,
  "duration": 1
}

The interval defines how often the metrics should be read and send to the sink. The duration tells collectors how long one measurement has to take. This is important for some collectors, like the likwid collector.

See the component READMEs for their configuration:

Installation

$ git clone git@github.com:ClusterCockpit/cc-metric-collector.git
$ make (downloads LIKWID, builds it as static library with 'direct' accessmode and copies all required files for the collector)
$ go get (requires at least golang 1.13)
$ go build metric-collector

Running

$ ./metric-collector --help
Usage of metric-collector:
  -config string
    	Path to configuration file (default "./config.json")
  -log string
    	Path for logfile (default "stderr")
  -once
    	Run all collectors only once
  -pidfile string
    	Path for PID file (default "/var/run/cc-metric-collector.pid")

Contributing

The ClusterCockpit ecosystem is designed to be used by different HPC computing centers. Since configurations and setups differ between the centers, the centers likely have to put some work into the cc-metric-collector to gather all desired metrics.

You are free to open an issue to request a collector but we would also be happy about PRs.

README.md

cc-metric-collector

Configuration

Installation

Running

Contributing

Contact