mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2026-03-25 17:37:29 +01:00
Compare commits
3 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
9eaf77db4f | ||
|
|
7cb5d1b47a | ||
|
|
319e71a853 |
18
README.md
18
README.md
@@ -11,13 +11,9 @@ hugo_path: docs/reference/cc-metric-collector/_index.md
|
||||
|
||||
# cc-metric-collector
|
||||
|
||||
A node agent for measuring, processing and forwarding node level metrics. It is part of the [ClusterCockpit ecosystem](https://clustercockpit.org/docs/overview/).
|
||||
A node agent for measuring, processing and forwarding node level metrics. It is part of the [ClusterCockpit ecosystem](https://clustercockpit.org/docs/overview/).
|
||||
|
||||
The metric collector sends (and receives) metric in the [InfluxDB line protocol](https://docs.influxdata.com/influxdb/cloud/reference/syntax/line-protocol/) as it provides flexibility while providing a separation between tags (like index columns in relational databases) and fields (like data columns).
|
||||
|
||||
There is a single timer loop that triggers all collectors serially, collects the collectors' data and sends the metrics to the sink. This is done as all data is submitted with a single time stamp. The sinks currently use mostly blocking APIs.
|
||||
|
||||
The receiver runs as a go routine side-by-side with the timer loop and asynchronously forwards received metrics to the sink.
|
||||
The `cc-metric-collector` sends (and maybe receives) metrics in the [InfluxDB line protocol](https://docs.influxdata.com/influxdb/cloud/reference/syntax/line-protocol/) as it provides flexibility while providing a separation between tags (like index columns in relational databases) and fields (like data columns). The `cc-metric-collector` consists of 4 components: collectors, router, sinks and receivers. The collectors read data from the current system and submit metrics to the router. The router can be configured to manipulate the metrics before forwarding them to the sinks. The receivers are also attached to the router like the collectors but they receive data from external source like other `cc-metric-collector` instances.
|
||||
|
||||
|
||||
[](https://doi.org/10.5281/zenodo.7438287)
|
||||
@@ -43,7 +39,7 @@ There is a main configuration file with basic settings that point to the other c
|
||||
}
|
||||
```
|
||||
|
||||
The `interval` defines how often the metrics should be read and send to the sink. The `duration` tells collectors how long one measurement has to take. This is important for some collectors, like the `likwid` collector. For more information, see [here](./docs/configuration.md).
|
||||
The `interval` defines how often the metrics should be read and send to the sink(s). The `duration` tells the collectors how long one measurement has to take. This is important for some collectors, like the `likwid` collector. For more information, see [here](./docs/configuration.md).
|
||||
|
||||
See the component READMEs for their configuration:
|
||||
|
||||
@@ -57,7 +53,7 @@ See the component READMEs for their configuration:
|
||||
```
|
||||
$ git clone git@github.com:ClusterCockpit/cc-metric-collector.git
|
||||
$ make (downloads LIKWID, builds it as static library with 'direct' accessmode and copies all required files for the collector)
|
||||
$ go get (requires at least golang 1.16)
|
||||
$ go get
|
||||
$ make
|
||||
```
|
||||
|
||||
@@ -67,11 +63,13 @@ For more information, see [here](./docs/building.md).
|
||||
|
||||
```
|
||||
$ ./cc-metric-collector --help
|
||||
Usage of metric-collector:
|
||||
Usage of ./cc-metric-collector:
|
||||
-config string
|
||||
Path to configuration file (default "./config.json")
|
||||
-log string
|
||||
Path for logfile (default "stderr")
|
||||
-loglevel string
|
||||
Set log level (default "info")
|
||||
-once
|
||||
Run all collectors only once
|
||||
```
|
||||
@@ -114,7 +112,7 @@ flowchart TD
|
||||
|
||||
# Contributing
|
||||
|
||||
The ClusterCockpit ecosystem is designed to be used by different HPC computing centers. Since configurations and setups differ between the centers, the centers likely have to put some work into the cc-metric-collector to gather all desired metrics.
|
||||
The ClusterCockpit ecosystem is designed to be used by different HPC computing centers. Since configurations and setups differ between the centers, the centers likely have to put some work into `cc-metric-collector` to gather all desired metrics.
|
||||
|
||||
You are free to open an issue to request a collector but we would also be happy about PRs.
|
||||
|
||||
|
||||
@@ -81,3 +81,16 @@ Metrics:
|
||||
* `gpfs_metaops_rate` (if `send_total_values == true` and `send_derived_values == true`)
|
||||
|
||||
The collector adds a `filesystem` tag to all metrics
|
||||
|
||||
`mmpmon` typically require root to run.
|
||||
In order to run `cc-metric-collector` without root priviliges, you can enable `use_sudo`.
|
||||
Add a file like this in `/etc/sudoers.d/` to allow `cc-metric-collector` to run the required command:
|
||||
|
||||
```
|
||||
# Do not log the following sudo commands from monitoring, since this causes a lot of log spam.
|
||||
# However keep log_denied enabled, to detect failures
|
||||
Defaults: monitoring !log_allowed, !pam_session
|
||||
|
||||
# Allow to use mmpmon
|
||||
monitoring ALL = (root) NOPASSWD:/absolute/path/to/mmpmon -p -s
|
||||
```
|
||||
@@ -28,7 +28,6 @@ type IpmiCollector struct {
|
||||
metricCollector
|
||||
|
||||
config struct {
|
||||
ExcludeDevices []string `json:"exclude_devices"`
|
||||
IpmitoolPath string `json:"ipmitool_path"`
|
||||
IpmisensorsPath string `json:"ipmisensors_path"`
|
||||
Sudo bool `json:"use_sudo"`
|
||||
|
||||
@@ -23,9 +23,9 @@ The `ipmistat` collector reads data from `ipmitool` (`ipmitool sensor`) or `ipmi
|
||||
|
||||
The metrics depend on the output of the underlying tools but contain temperature, power and energy metrics.
|
||||
|
||||
ipmitool and ipmi-sensors typically require root to run.
|
||||
In order to cc-metric-collector without root priviliges, you can enable `use_sudo`.
|
||||
Add a file like this in /etc/sudoers.d/ to allow cc-metric-collector to run this command:
|
||||
`ipmitool` and `ipmi-sensors` typically require root to run.
|
||||
In order to run `cc-metric-collector` without root priviliges, you can enable `use_sudo`.
|
||||
Add a file like this in `/etc/sudoers.d/` to allow `cc-metric-collector` to run the required commands:
|
||||
|
||||
```
|
||||
# Do not log the following sudo commands from monitoring, since this causes a lot of log spam.
|
||||
|
||||
@@ -55,3 +55,16 @@ Metrics:
|
||||
* `lustre_inode_permission_diff` (if `send_diff_values == true`)
|
||||
|
||||
This collector adds an `device` tag.
|
||||
|
||||
`lctl` typically require root to run.
|
||||
In order to run `cc-metric-collector` without root priviliges, you can enable `use_sudo`.
|
||||
Add a file like this in `/etc/sudoers.d/` to allow `cc-metric-collector` to run the required command:
|
||||
|
||||
```
|
||||
# Do not log the following sudo commands from monitoring, since this causes a lot of log spam.
|
||||
# However keep log_denied enabled, to detect failures
|
||||
Defaults: monitoring !log_allowed, !pam_session
|
||||
|
||||
# Allow to use lctl
|
||||
monitoring ALL = (root) NOPASSWD:/absolute/path/to/lctl get_param llite.*.stats
|
||||
```
|
||||
@@ -50,3 +50,18 @@ Metrics:
|
||||
* `smartmon_errlog_entries`: Error log entries
|
||||
* `smartmon_warn_temp_time`: Time above the warning temperature threshold
|
||||
* `smartmon_crit_comp_time`: Time above the critical composite temperature threshold
|
||||
|
||||
`smartctl` typically require root to run.
|
||||
In order to run `cc-metric-collector` without root priviliges, you can enable `use_sudo`.
|
||||
Add a file like this in `/etc/sudoers.d/` to allow `cc-metric-collector` to run the required command:
|
||||
|
||||
```
|
||||
# Do not log the following sudo commands from monitoring, since this causes a lot of log spam.
|
||||
# However keep log_denied enabled, to detect failures
|
||||
Defaults: monitoring !log_allowed, !pam_session
|
||||
|
||||
# Allow to use lctl
|
||||
monitoring ALL = (root) NOPASSWD:/absolute/path/to/smartctl --json=c --device=* "--all" *
|
||||
# Or add individual rules for each device
|
||||
# monitoring ALL = (root) NOPASSWD:/absolute/path/to/smartctl --json=c --device=<device_type> "--all" <device>
|
||||
```
|
||||
Reference in New Issue
Block a user