From 03cd965099a25c8495ccdb164e4d6cd7807572d7 Mon Sep 17 00:00:00 2001 From: Thomas Gruber Date: Thu, 17 Apr 2025 11:37:47 +0200 Subject: [PATCH] Merge develop into main for documentation (#143) * Fix Release part * Fix Release part * Update Hugo integration (#142) --- README.md | 17 ++++++++++++++--- collectors/README.md | 14 ++++++++++++-- collectors/beegfsmetaMetric.md | 16 ++++++++++++++-- collectors/beegfsstorageMetric.md | 13 ++++++++++++- collectors/cpufreqCpuinfoMetric.md | 11 +++++++++++ collectors/cpufreqMetric.md | 11 +++++++++++ collectors/cpustatMetric.md | 13 ++++++++++++- collectors/customCmdMetric.md | 10 ++++++++++ collectors/diskstatMetric.md | 10 ++++++++++ collectors/gpfsMetric.md | 11 +++++++++++ collectors/infinibandMetric.md | 10 ++++++++++ collectors/iostatMetric.md | 10 ++++++++++ collectors/ipmiMetric.md | 10 ++++++++++ collectors/likwidMetric.md | 10 ++++++++++ collectors/loadavgMetric.md | 11 +++++++++++ collectors/lustreMetric.md | 13 ++++++++++++- collectors/memstatMetric.md | 11 +++++++++++ collectors/netstatMetric.md | 12 +++++++++++- collectors/nfs3Metric.md | 11 +++++++++++ collectors/nfs4Metric.md | 11 +++++++++++ collectors/nfsiostatMetric.md | 11 +++++++++++ collectors/numastatsMetric.md | 10 ++++++++++ collectors/nvidiaMetric.md | 12 +++++++++++- collectors/raplMetric.md | 11 +++++++++++ collectors/rocmsmiMetric.md | 11 +++++++++++ collectors/schedstatMetric.md | 12 +++++++++++- collectors/selfMetric.md | 11 +++++++++++ collectors/tempMetric.md | 11 +++++++++++ collectors/topprocsMetric.md | 12 ++++++++++++ internal/metricAggregator/README.md | 13 ++++++++++++- internal/metricRouter/README.md | 19 +++++++++++++++---- pkg/multiChanTicker/README.md | 11 +++++++++++ 32 files changed, 361 insertions(+), 18 deletions(-) diff --git a/README.md b/README.md index 07f5fd4..4bb04c2 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,17 @@ + + # cc-metric-collector -A node agent for measuring, processing and forwarding node level metrics. It is part of the [ClusterCockpit ecosystem](./docs/introduction.md). +A node agent for measuring, processing and forwarding node level metrics. It is part of the [ClusterCockpit ecosystem](https://clustercockpit.org/docs/overview/). The metric collector sends (and receives) metric in the [InfluxDB line protocol](https://docs.influxdata.com/influxdb/cloud/reference/syntax/line-protocol/) as it provides flexibility while providing a separation between tags (like index columns in relational databases) and fields (like data columns). @@ -35,8 +46,8 @@ The `interval` defines how often the metrics should be read and send to the sink See the component READMEs for their configuration: * [`collectors`](./collectors/README.md) -* [`sinks`](./sinks/README.md) -* [`receivers`](./receivers/README.md) +* [`sinks`](https://github.com/ClusterCockpit/cc-lib/blob/main/sinks/README.md) +* [`receivers`](https://github.com/ClusterCockpit/cc-lib/blob/main/receivers/README.md) * [`router`](./internal/metricRouter/README.md) # Installation diff --git a/collectors/README.md b/collectors/README.md index b8db043..ebe2114 100644 --- a/collectors/README.md +++ b/collectors/README.md @@ -1,3 +1,14 @@ + + # CCMetric collectors This folder contains the collectors for the cc-metric-collector. @@ -23,7 +34,6 @@ In contrast to the configuration files for sinks and receivers, the collectors c * [`loadavg`](./loadavgMetric.md) * [`netstat`](./netstatMetric.md) * [`ibstat`](./infinibandMetric.md) -* [`ibstat_perfquery`](./infinibandPerfQueryMetric.md) * [`tempstat`](./tempMetric.md) * [`lustrestat`](./lustreMetric.md) * [`likwid`](./likwidMetric.md) @@ -53,7 +63,7 @@ A collector reads data from any source, parses it to metrics and submits these m * `Name() string`: Return the name of the collector * `Init(config json.RawMessage) error`: Initializes the collector using the given collector-specific config in JSON. Check if needed files/commands exists, ... * `Initialized() bool`: Check if a collector is successfully initialized -* `Read(duration time.Duration, output chan ccMetric.CCMetric)`: Read, parse and submit data to the `output` channel as [`CCMetric`](../internal/ccMetric/README.md). If the collector has to measure anything for some duration, use the provided function argument `duration`. +* `Read(duration time.Duration, output chan ccMessage.CCMessage)`: Read, parse and submit data to the `output` channel as [`CCMessage`](https://github.com/ClusterCockpit/cc-lib/blob/main/ccMessage/README.md). If the collector has to measure anything for some duration, use the provided function argument `duration`. * `Close()`: Closes down the collector. It is recommanded to call `setup()` in the `Init()` function. diff --git a/collectors/beegfsmetaMetric.md b/collectors/beegfsmetaMetric.md index 932e72f..bdaadc7 100644 --- a/collectors/beegfsmetaMetric.md +++ b/collectors/beegfsmetaMetric.md @@ -1,5 +1,17 @@ + + + ## `BeeGFS on Demand` collector -This Collector is to collect BeeGFS on Demand (BeeOND) metadata clientstats. +This Collector is to collect `BeeGFS on Demand` (BeeOND) metadata clientstats. ```json "beegfs_meta": { @@ -72,4 +84,4 @@ Available Metrics: * setXA * mirror -The collector adds a `filesystem` tag to all metrics \ No newline at end of file +The collector adds a `filesystem` tag to all metrics diff --git a/collectors/beegfsstorageMetric.md b/collectors/beegfsstorageMetric.md index 519b5bf..fb9c817 100644 --- a/collectors/beegfsstorageMetric.md +++ b/collectors/beegfsstorageMetric.md @@ -1,3 +1,14 @@ + + ## `BeeGFS on Demand` collector This Collector is to collect BeeGFS on Demand (BeeOND) storage stats. @@ -52,4 +63,4 @@ Available Metrics: * "unlnk" -The collector adds a `filesystem` tag to all metrics \ No newline at end of file +The collector adds a `filesystem` tag to all metrics diff --git a/collectors/cpufreqCpuinfoMetric.md b/collectors/cpufreqCpuinfoMetric.md index fe43bd8..b8868f7 100644 --- a/collectors/cpufreqCpuinfoMetric.md +++ b/collectors/cpufreqCpuinfoMetric.md @@ -1,3 +1,14 @@ + + ## `cpufreq_cpuinfo` collector ```json diff --git a/collectors/cpufreqMetric.md b/collectors/cpufreqMetric.md index 14f9a00..7064b62 100644 --- a/collectors/cpufreqMetric.md +++ b/collectors/cpufreqMetric.md @@ -1,3 +1,14 @@ + + ## `cpufreq_cpuinfo` collector ```json diff --git a/collectors/cpustatMetric.md b/collectors/cpustatMetric.md index f4e0616..f7d291e 100644 --- a/collectors/cpustatMetric.md +++ b/collectors/cpustatMetric.md @@ -1,3 +1,14 @@ + + ## `cpustat` collector @@ -24,4 +35,4 @@ Metrics: * `cpu_guest` with `unit=Percent` * `cpu_guest_nice` with `unit=Percent` * `cpu_used` = `cpu_* - cpu_idle` with `unit=Percent` -* `num_cpus` \ No newline at end of file +* `num_cpus` diff --git a/collectors/customCmdMetric.md b/collectors/customCmdMetric.md index 011135d..7f43947 100644 --- a/collectors/customCmdMetric.md +++ b/collectors/customCmdMetric.md @@ -1,3 +1,13 @@ + ## `customcmd` collector diff --git a/collectors/diskstatMetric.md b/collectors/diskstatMetric.md index 5a4b7a8..2108a21 100644 --- a/collectors/diskstatMetric.md +++ b/collectors/diskstatMetric.md @@ -1,3 +1,13 @@ + ## `diskstat` collector diff --git a/collectors/gpfsMetric.md b/collectors/gpfsMetric.md index ece9a1f..975d5c5 100644 --- a/collectors/gpfsMetric.md +++ b/collectors/gpfsMetric.md @@ -1,3 +1,14 @@ + + ## `gpfs` collector ```json diff --git a/collectors/infinibandMetric.md b/collectors/infinibandMetric.md index c965ea8..de767f8 100644 --- a/collectors/infinibandMetric.md +++ b/collectors/infinibandMetric.md @@ -1,3 +1,13 @@ + ## `ibstat` collector diff --git a/collectors/iostatMetric.md b/collectors/iostatMetric.md index 74172db..400fd7b 100644 --- a/collectors/iostatMetric.md +++ b/collectors/iostatMetric.md @@ -1,3 +1,13 @@ + ## `iostat` collector diff --git a/collectors/ipmiMetric.md b/collectors/ipmiMetric.md index 3c976d2..d6c2307 100644 --- a/collectors/ipmiMetric.md +++ b/collectors/ipmiMetric.md @@ -1,3 +1,13 @@ + ## `ipmistat` collector diff --git a/collectors/likwidMetric.md b/collectors/likwidMetric.md index 0bd5b2b..ef435b2 100644 --- a/collectors/likwidMetric.md +++ b/collectors/likwidMetric.md @@ -1,3 +1,13 @@ + ## `likwid` collector diff --git a/collectors/loadavgMetric.md b/collectors/loadavgMetric.md index d2b3f50..4882820 100644 --- a/collectors/loadavgMetric.md +++ b/collectors/loadavgMetric.md @@ -1,3 +1,14 @@ + + ## `loadavg` collector diff --git a/collectors/lustreMetric.md b/collectors/lustreMetric.md index f11b85f..428841b 100644 --- a/collectors/lustreMetric.md +++ b/collectors/lustreMetric.md @@ -1,3 +1,14 @@ + + ## `lustrestat` collector @@ -43,4 +54,4 @@ Metrics: * `lustre_statfs_diff` (if `send_diff_values == true`) * `lustre_inode_permission_diff` (if `send_diff_values == true`) -This collector adds an `device` tag. \ No newline at end of file +This collector adds an `device` tag. diff --git a/collectors/memstatMetric.md b/collectors/memstatMetric.md index 4b7b8c7..abe9eb0 100644 --- a/collectors/memstatMetric.md +++ b/collectors/memstatMetric.md @@ -1,3 +1,14 @@ + + ## `memstat` collector diff --git a/collectors/netstatMetric.md b/collectors/netstatMetric.md index fc5ee4d..2b3f4e1 100644 --- a/collectors/netstatMetric.md +++ b/collectors/netstatMetric.md @@ -1,3 +1,13 @@ + ## `netstat` collector @@ -28,4 +38,4 @@ Metrics: * `net_pkts_in_bw` (`unit=packets/sec` if `send_derived_values == true`) * `net_pkts_out_bw` (`unit=packets/sec` if `send_derived_values == true`) -The device name is added as tag `stype=network,stype-id=`. \ No newline at end of file +The device name is added as tag `stype=network,stype-id=`. diff --git a/collectors/nfs3Metric.md b/collectors/nfs3Metric.md index 63937ea..02d88b0 100644 --- a/collectors/nfs3Metric.md +++ b/collectors/nfs3Metric.md @@ -1,3 +1,14 @@ + + ## `nfs3stat` collector diff --git a/collectors/nfs4Metric.md b/collectors/nfs4Metric.md index 71d9613..80882cd 100644 --- a/collectors/nfs4Metric.md +++ b/collectors/nfs4Metric.md @@ -1,3 +1,14 @@ + + ## `nfs4stat` collector diff --git a/collectors/nfsiostatMetric.md b/collectors/nfsiostatMetric.md index 3f02b0c..e7a2e99 100644 --- a/collectors/nfsiostatMetric.md +++ b/collectors/nfsiostatMetric.md @@ -1,3 +1,14 @@ + + ## `nfsiostat` collector ```json diff --git a/collectors/numastatsMetric.md b/collectors/numastatsMetric.md index b7e038d..bbdceac 100644 --- a/collectors/numastatsMetric.md +++ b/collectors/numastatsMetric.md @@ -1,3 +1,13 @@ + ## `numastat` collector diff --git a/collectors/nvidiaMetric.md b/collectors/nvidiaMetric.md index 7f0c416..93a7491 100644 --- a/collectors/nvidiaMetric.md +++ b/collectors/nvidiaMetric.md @@ -1,3 +1,13 @@ + ## `nvidia` collector @@ -73,4 +83,4 @@ Metrics: * `nv_nvlink_replay_errors` * `nv_nvlink_recovery_errors` -Some metrics add the additional sub type tag (`stype`) like the `nv_nvlink_*` metrics set `stype=nvlink,stype-id=`. \ No newline at end of file +Some metrics add the additional sub type tag (`stype`) like the `nv_nvlink_*` metrics set `stype=nvlink,stype-id=`. diff --git a/collectors/raplMetric.md b/collectors/raplMetric.md index 8eb792f..3cb9595 100644 --- a/collectors/raplMetric.md +++ b/collectors/raplMetric.md @@ -1,3 +1,14 @@ + + ## `rapl` collector This collector reads running average power limit (RAPL) monitoring attributes to compute average power consumption metrics. See . diff --git a/collectors/rocmsmiMetric.md b/collectors/rocmsmiMetric.md index 9c4da5e..ca440ab 100644 --- a/collectors/rocmsmiMetric.md +++ b/collectors/rocmsmiMetric.md @@ -1,3 +1,14 @@ + + ## `rocm_smi` collector diff --git a/collectors/schedstatMetric.md b/collectors/schedstatMetric.md index 6369eca..1cff2a9 100644 --- a/collectors/schedstatMetric.md +++ b/collectors/schedstatMetric.md @@ -1,3 +1,13 @@ + ## `schedstat` collector ```json @@ -8,4 +18,4 @@ The `schedstat` collector reads data from /proc/schedstat and calculates a load value, separated by hwthread. This might be useful to detect bad cpu pinning on shared nodes etc. Metric: -* `cpu_load_core` \ No newline at end of file +* `cpu_load_core` diff --git a/collectors/selfMetric.md b/collectors/selfMetric.md index ab8e50b..ec52a0e 100644 --- a/collectors/selfMetric.md +++ b/collectors/selfMetric.md @@ -1,3 +1,14 @@ + + ## `self` collector ```json diff --git a/collectors/tempMetric.md b/collectors/tempMetric.md index 1e3d979..844fd82 100644 --- a/collectors/tempMetric.md +++ b/collectors/tempMetric.md @@ -1,3 +1,14 @@ + + ## `tempstat` collector diff --git a/collectors/topprocsMetric.md b/collectors/topprocsMetric.md index ca47582..bb55770 100644 --- a/collectors/topprocsMetric.md +++ b/collectors/topprocsMetric.md @@ -1,3 +1,15 @@ + + + ## `topprocs` collector diff --git a/internal/metricAggregator/README.md b/internal/metricAggregator/README.md index bc07663..4d03155 100644 --- a/internal/metricAggregator/README.md +++ b/internal/metricAggregator/README.md @@ -1,3 +1,14 @@ + + # The MetricAggregator In some cases, further combination of metrics or raw values is required. For that strings like `foo + 1` with runtime dependent `foo` need to be evaluated. The MetricAggregator relies on the [`gval`](https://github.com/PaesslerAG/gval) Golang package to perform all expression evaluation. The `gval` package provides the basic arithmetic operations but the MetricAggregator defines additional ones. @@ -35,4 +46,4 @@ The MetricAggregator provides these functions additional to the `Full` language ## Limitations - Since the metrics are written in JSON files which do not allow `""` without proper escaping inside of JSON strings, you have to use `''` for strings. -- Since `\` is interpreted by JSON as escape character, it cannot be used in metrics. But it is required to write regular expressions. So instead of `/`, use `%` and the MetricAggregator replaces them after reading the JSON file. \ No newline at end of file +- Since `\` is interpreted by JSON as escape character, it cannot be used in metrics. But it is required to write regular expressions. So instead of `/`, use `%` and the MetricAggregator replaces them after reading the JSON file. diff --git a/internal/metricRouter/README.md b/internal/metricRouter/README.md index 546ac62..0ab75be 100644 --- a/internal/metricRouter/README.md +++ b/internal/metricRouter/README.md @@ -1,11 +1,22 @@ + + # CC Metric Router -The CCMetric router sits in between the collectors and the sinks and can be used to add and remove tags to/from traversing [CCMessages](https://pkg.go.dev/github.com/ClusterCockpit/cc-energy-manager@v0.0.0-20240919152819-92a17f2da4f7/pkg/cc-message. +The CCMetric router sits in between the collectors and the sinks and can be used to add and remove tags to/from traversing [CCMessages](https://pkg.go.dev/github.com/ClusterCockpit/cc-lib/ccMessage). # Configuration -**Note**: Use the [message processor configuration](../../pkg/messageProcessor/README.md) with option `process_messages`. +**Note**: Use the [message processor configuration](https://github.com/ClusterCockpit/cc-lib/blob/main/messageProcessor/README.md) with option `process_messages`. ```json { @@ -69,7 +80,7 @@ The CCMetric router sits in between the collectors and the sinks and can be used There are three main options `add_tags`, `delete_tags` and `interval_timestamp`. `add_tags` and `delete_tags` are lists consisting of dicts with `key`, `value` and `if`. The `value` can be omitted in the `delete_tags` part as it only uses the `key` for removal. The `interval_timestamp` setting means that a unique timestamp is applied to all metrics traversing the router during an interval. -**Note**: Use the [message processor configuration](../../pkg/messageProcessor/README.md) (option `process_messages`) instead of `add_tags`, `delete_tags`, `drop_metrics`, `drop_metrics_if`, `rename_metrics`, `normalize_units` and `change_unit_prefix`. These options are deprecated and will be removed in future versions. Until then, they are added to the message processor. +**Note**: Use the [message processor configuration](https://github.com/ClusterCockpit/cc-lib/blob/main/messageProcessor/README.md) (option `process_messages`) instead of `add_tags`, `delete_tags`, `drop_metrics`, `drop_metrics_if`, `rename_metrics`, `normalize_units` and `change_unit_prefix`. These options are deprecated and will be removed in future versions. Until then, they are added to the message processor. # Processing order in the router @@ -263,7 +274,7 @@ The above configuration, collects all metric values for metrics evaluating `if` If you are not interested in the input metrics `sub_metric_%d+` at all, you can add the same condition used here to the `drop_metrics_if` section to drop them. Use cases for `interval_aggregates`: -- Combine multiple metrics of the a collector to a new one like the [MemstatCollector](../../collectors/memstatMetric.md) does it for `mem_used`)): +- Combine multiple metrics of the a collector to a new one like the [MemstatCollector](../../collectors/memstatMetric.md) does it for `mem_used`: ```json { "name" : "mem_used", diff --git a/pkg/multiChanTicker/README.md b/pkg/multiChanTicker/README.md index 30deb4f..782c240 100644 --- a/pkg/multiChanTicker/README.md +++ b/pkg/multiChanTicker/README.md @@ -1,3 +1,14 @@ + + # MultiChanTicker The idea of this ticker is to multiply the output channels. The original Golang `time.Ticker` provides only a single output channel, so the signal can only be received by a single other class. This ticker allows to add multiple channels which get all notified about the time tick.