Compare commits

..

26 Commits

Author SHA1 Message Date
Thomas Gruber
4c05557941 Merge branch 'develop' into nfsio_remount_fix 2025-04-16 13:31:22 +02:00
Thomas Gruber
02623f8c9d Change to cc-lib (#135)
* Change to ccMessage from cc-lib

* Remove local development path

* Use receiver, sinks, ccLogger and ccConfig from cc-lib

* Fix ccLogger import path

* Update CI
2025-04-16 13:00:13 +02:00
Thomas Roehl
e080a82b55 Merge branch 'develop' into nfsio_remount_fix 2025-03-17 17:50:29 +01:00
Thomas Roehl
b5520efc25 Fix artifacts in netstat collector of not done cc-lib switch 2025-03-15 04:02:26 +01:00
Thomas Roehl
d2b1bad1b8 Fix artifacts of not done cc-lib switch 2025-03-15 04:01:01 +01:00
Thomas Roehl
5762afa40b Delete mountpoint when it vanishes, not just its data 2025-03-15 03:48:38 +01:00
brinkcoder
0e57c8db1c Add derived_values for numastats (#134)
* Check creation of CCMessage in NATS receiver

* add derived_values for numastats

* change to ccMessage

* remove vim command artefact

---------

Co-authored-by: Thomas Roehl <thomas.roehl@fau.de>
Co-authored-by: exterr2f <Robert.Externbrink@rub.de>
Co-authored-by: Thomas Gruber <Thomas.Roehl@googlemail.com>
2025-02-19 11:35:32 +01:00
brinkcoder
f2f38c81af Add exclude_devices to iostat (#133)
* Check creation of CCMessage in NATS receiver

* add exclude_device for iostatMetric

* add md file

---------

Co-authored-by: Thomas Roehl <thomas.roehl@fau.de>
Co-authored-by: exterr2f <Robert.Externbrink@rub.de>
Co-authored-by: Thomas Gruber <Thomas.Roehl@googlemail.com>
2025-02-19 11:34:56 +01:00
brinkcoder
f9acc51a50 Add derived values for nfsiostat (#132)
* Check creation of CCMessage in NATS receiver

* add derived_values for nfsiostatMetric

---------

Co-authored-by: Thomas Roehl <thomas.roehl@fau.de>
Co-authored-by: exterr2f <Robert.Externbrink@rub.de>
Co-authored-by: Thomas Gruber <Thomas.Roehl@googlemail.com>
2025-02-19 11:34:06 +01:00
brinkcoder
87346e2eae Fix excluded metrics for diskstat and add exclude_mounts (#131)
* Check creation of CCMessage in NATS receiver

* fix excluded metrics and add optional mountpoint exclude

---------

Co-authored-by: Thomas Roehl <thomas.roehl@fau.de>
Co-authored-by: exterr2f <Robert.Externbrink@rub.de>
Co-authored-by: Thomas Gruber <Thomas.Roehl@googlemail.com>
2025-02-19 11:33:13 +01:00
brinkcoder
0f92f10b66 Add optional interface alias in netstat (#130)
* Check creation of CCMessage in NATS receiver

* add optional interface aliases for netstatMetric

* small fix

---------

Co-authored-by: Thomas Roehl <thomas.roehl@fau.de>
Co-authored-by: exterr2f <Robert.Externbrink@rub.de>
Co-authored-by: Thomas Gruber <Thomas.Roehl@googlemail.com>
2025-02-19 11:32:15 +01:00
Michael Panzlaff
6901b06e44 Rename 'process_message' to 'process_messages' in metricRouter config
This makes the behavior more consistent with the other modules, which
have their MessageProcessor named 'process_messages'. This most likely
was just a typo.
2025-02-03 15:23:51 +01:00
Thomas Roehl
7b343d0bab Use CCMessage FromBytes instead of Influx's decoder 2024-12-27 15:22:59 +00:00
Thomas Roehl
7d3180b526 Check creation of CCMessage in NATS receiver 2024-12-27 15:00:48 +00:00
Thomas Roehl
70a6afc549 Generate HUGO inputs out of Markdown files 2024-12-23 17:55:48 +01:00
Thomas Roehl
e02a018327 Mark all JSON config fields of message processor as omitempty 2024-12-23 17:52:34 +01:00
Thomas Roehl
bcecdd033b Fix documentation of RAPL collector 2024-12-23 17:51:43 +01:00
Thomas Roehl
2645ffeff3 Merge branch 'main' into develop 2024-12-21 02:39:08 +01:00
Thomas Roehl
e968aa1991 Fix wrongly named packages 2024-12-20 20:33:10 +01:00
Thomas Gruber
d2a38e3844 Merge branch 'main' into develop 2024-12-20 20:27:48 +01:00
Thomas Roehl
1f35f6d3ca Fix wrongly named packages 2024-12-20 20:26:38 +01:00
Thomas Roehl
7e6870c7b3 Add golang-race for UBI9 and Alma9 2024-12-20 20:15:59 +01:00
Thomas Roehl
d881093524 Install go-toolkit to fulfill build requirements for RPM 2024-12-20 20:12:03 +01:00
Thomas Roehl
c01096c157 use go-toolkit for RPM builds 2024-12-20 18:49:28 +01:00
Thomas Roehl
3d70c8afc9 Remove condition around BuildRequires and use go-toolkit for RPM builds 2024-12-20 18:43:21 +01:00
Thomas Roehl
7ee85a07dc Remove go-toolkit as build requirement for RPM builds if run in CI 2024-12-20 18:28:32 +01:00
33 changed files with 24 additions and 365 deletions

View File

@@ -1,17 +1,6 @@
<!--
---
title: cc-metric-collector
description: Metric collecting node agent
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/_index.md
---
-->
# cc-metric-collector
A node agent for measuring, processing and forwarding node level metrics. It is part of the [ClusterCockpit ecosystem](https://clustercockpit.org/docs/overview/).
A node agent for measuring, processing and forwarding node level metrics. It is part of the [ClusterCockpit ecosystem](./docs/introduction.md).
The metric collector sends (and receives) metric in the [InfluxDB line protocol](https://docs.influxdata.com/influxdb/cloud/reference/syntax/line-protocol/) as it provides flexibility while providing a separation between tags (like index columns in relational databases) and fields (like data columns).
@@ -46,8 +35,8 @@ The `interval` defines how often the metrics should be read and send to the sink
See the component READMEs for their configuration:
* [`collectors`](./collectors/README.md)
* [`sinks`](https://github.com/ClusterCockpit/cc-lib/blob/main/sinks/README.md)
* [`receivers`](https://github.com/ClusterCockpit/cc-lib/blob/main/receivers/README.md)
* [`sinks`](./sinks/README.md)
* [`receivers`](./receivers/README.md)
* [`router`](./internal/metricRouter/README.md)
# Installation

View File

@@ -1,14 +1,3 @@
<!--
---
title: Metric Collectors
description: Metric collectors for cc-metric-collector
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/_index.md
---
-->
# CCMetric collectors
This folder contains the collectors for the cc-metric-collector.
@@ -34,6 +23,7 @@ In contrast to the configuration files for sinks and receivers, the collectors c
* [`loadavg`](./loadavgMetric.md)
* [`netstat`](./netstatMetric.md)
* [`ibstat`](./infinibandMetric.md)
* [`ibstat_perfquery`](./infinibandPerfQueryMetric.md)
* [`tempstat`](./tempMetric.md)
* [`lustrestat`](./lustreMetric.md)
* [`likwid`](./likwidMetric.md)
@@ -43,10 +33,8 @@ In contrast to the configuration files for sinks and receivers, the collectors c
* [`topprocs`](./topprocsMetric.md)
* [`nfs3stat`](./nfs3Metric.md)
* [`nfs4stat`](./nfs4Metric.md)
* [`nfsiostat`](./nfsiostatMetric.md)
* [`cpufreq`](./cpufreqMetric.md)
* [`cpufreq_cpuinfo`](./cpufreqCpuinfoMetric.md)
* [`schedstat`](./schedstatMetric.md)
* [`numastats`](./numastatsMetric.md)
* [`gpfs`](./gpfsMetric.md)
* [`beegfs_meta`](./beegfsmetaMetric.md)
@@ -63,7 +51,7 @@ A collector reads data from any source, parses it to metrics and submits these m
* `Name() string`: Return the name of the collector
* `Init(config json.RawMessage) error`: Initializes the collector using the given collector-specific config in JSON. Check if needed files/commands exists, ...
* `Initialized() bool`: Check if a collector is successfully initialized
* `Read(duration time.Duration, output chan ccMessage.CCMessage)`: Read, parse and submit data to the `output` channel as [`CCMessage`](https://github.com/ClusterCockpit/cc-lib/blob/main/ccMessage/README.md). If the collector has to measure anything for some duration, use the provided function argument `duration`.
* `Read(duration time.Duration, output chan ccMetric.CCMetric)`: Read, parse and submit data to the `output` channel as [`CCMetric`](../internal/ccMetric/README.md). If the collector has to measure anything for some duration, use the provided function argument `duration`.
* `Close()`: Closes down the collector.
It is recommanded to call `setup()` in the `Init()` function.

View File

@@ -1,17 +1,5 @@
<!--
---
title: BeeGFS metadata metric collector
description: Collect metadata clientstats for `BeeGFS on Demand`
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/beegfsmeta.md
---
-->
## `BeeGFS on Demand` collector
This Collector is to collect `BeeGFS on Demand` (BeeOND) metadata clientstats.
This Collector is to collect BeeGFS on Demand (BeeOND) metadata clientstats.
```json
"beegfs_meta": {

View File

@@ -1,14 +1,3 @@
<!--
---
title: "BeeGFS on Demand metric collector"
description: Collect performance metrics for BeeGFS filesystems
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/beegfsstorage.md
---
-->
## `BeeGFS on Demand` collector
This Collector is to collect BeeGFS on Demand (BeeOND) storage stats.

View File

@@ -1,14 +1,3 @@
<!--
---
title: CPU frequency metric collector through cpuinfo
description: Collect the CPU frequency from `/proc/cpuinfo`
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/cpufreq_cpuinfo.md
---
-->
## `cpufreq_cpuinfo` collector
```json

View File

@@ -1,14 +1,3 @@
<!--
---
title: CPU frequency metric collector through sysfs
description: Collect the CPU frequency metrics from `/sys/.../cpu/.../cpufreq`
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/cpufreq.md
---
-->
## `cpufreq_cpuinfo` collector
```json

View File

@@ -1,14 +1,3 @@
<!--
---
title: CPU usage metric collector
description: Collect CPU metrics from `/proc/stat`
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/cpustat.md
---
-->
## `cpustat` collector

View File

@@ -1,13 +1,3 @@
<!--
---
title: CustomCommand metric collector
description: Collect messages from custom command or files
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/customcmd.md
---
-->
## `customcmd` collector

View File

@@ -1,13 +1,3 @@
<!--
---
title: Disk usage statistics metric collector
description: Collect metrics for various filesystems from `/proc/self/mounts`
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/diskstat.md
---
-->
## `diskstat` collector

View File

@@ -1,14 +1,3 @@
<!--
---
title: GPFS collector
description: Collect infos about GPFS filesystems
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/gpfs.md
---
-->
## `gpfs` collector
```json

View File

@@ -1,13 +1,3 @@
<!--
---
title: InfiniBand Metric collector
description: Collect metrics for InfiniBand devices
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/infiniband.md
---
-->
## `ibstat` collector

View File

@@ -1,13 +1,3 @@
<!--
---
title: IOStat Metric collector
description: Collect metrics from `/proc/diskstats`
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/iostat.md
---
-->
## `iostat` collector

View File

@@ -1,13 +1,3 @@
<!--
---
title: IPMI Metric collector
description: Collect metrics using ipmitool or ipmi-sensors
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/ipmi.md
---
-->
## `ipmistat` collector

View File

@@ -190,8 +190,12 @@ func getBaseFreq() float64 {
}
if math.IsNaN(freq) {
C.timer_init()
freq = float64(C.timer_getCycleClock()) / 1e3
C.power_init(0)
info := C.get_powerInfo()
if float64(info.baseFrequency) != 0 {
freq = float64(info.baseFrequency)
}
C.power_finalize()
}
return freq * 1e3
}

View File

@@ -1,13 +1,3 @@
<!--
---
title: LIKWID collector
description: Collect hardware performance events and metrics using LIKWID
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/likwid.md
---
-->
## `likwid` collector

View File

@@ -1,14 +1,3 @@
<!--
---
title: Load average metric collector
description: Collect metrics from `/proc/loadavg`
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/loadavg.md
---
-->
## `loadavg` collector

View File

@@ -1,14 +1,3 @@
<!--
---
title: Lustre filesystem metric collector
description: Collect metrics for Lustre filesystems
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/lustre.md
---
-->
## `lustrestat` collector

View File

@@ -1,14 +1,3 @@
<!--
---
title: Memory statistics metric collector
description: Collect metrics from `/proc/meminfo`
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/memstat.md
---
-->
## `memstat` collector

View File

@@ -1,13 +1,3 @@
<!--
---
title: Network device metric collector
description: Collect metrics for network devices through procfs
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/netstat.md
---
-->
## `netstat` collector

View File

@@ -1,14 +1,3 @@
<!--
---
title: NFS network filesystem (v3) metric collector
description: Collect metrics for NFS network filesystems in version 3
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/nfs3.md
---
-->
## `nfs3stat` collector

View File

@@ -1,14 +1,3 @@
<!--
---
title: NFS network filesystem (v4) metric collector
description: Collect metrics for NFS network filesystems in version 4
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/nfs4.md
---
-->
## `nfs4stat` collector

View File

@@ -1,14 +1,3 @@
<!--
---
title: NFS network filesystem metrics from procfs
description: Collect NFS network filesystem metrics for mounts from `/proc/self/mountstats`
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/nfsio.md
---
-->
## `nfsiostat` collector
```json

View File

@@ -1,13 +1,3 @@
<!--
---
title: NUMAStat collector
description: Collect infos about NUMA domains
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/numastat.md
---
-->
## `numastat` collector

View File

@@ -1,13 +1,3 @@
<!--
---
title: "Nvidia NVML metric collector"
description: Collect metrics for Nvidia GPUs using the NVML
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/nvidia.md
---
-->
## `nvidia` collector

View File

@@ -1,14 +1,3 @@
<!--
---
title: RAPL metric collector
description: Collect energy data through the RAPL sysfs interface
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/rapl.md
---
-->
## `rapl` collector
This collector reads running average power limit (RAPL) monitoring attributes to compute average power consumption metrics. See <https://www.kernel.org/doc/html/latest/power/powercap/powercap.html#monitoring-attributes>.

View File

@@ -1,14 +1,3 @@
<!--
---
title: "ROCm SMI metric collector"
description: Collect metrics for AMD GPUs using the SMI library
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/rocmsmi.md
---
-->
## `rocm_smi` collector

View File

@@ -1,13 +1,3 @@
<!--
---
title: SchedStat Metric collector
description: Collect metrics from `/proc/schedstat`
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/schedstat.md
---
-->
## `schedstat` collector
```json

View File

@@ -1,14 +1,3 @@
<!--
---
title: Self-monitoring metric collector
description: Collect metrics from the execution of cc-metric-collector itself
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/self.md
---
-->
## `self` collector
```json

View File

@@ -1,14 +1,3 @@
<!--
---
title: Temperature metric collector
description: Collect thermal metrics from `/sys/class/hwmon/*`
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/temp.md
---
-->
## `tempstat` collector

View File

@@ -1,15 +1,3 @@
<!--
---
title: TopProcs collector
description: Collect infos about most CPU-consuming processes
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/topprocs.md
---
-->
## `topprocs` collector

View File

@@ -1,14 +1,3 @@
<!--
---
title: Metric Aggregator
description: Subsystem for evaluating expressions on metrics (deprecated)
categories: [cc-metric-collector]
tags: ['Developer']
weight: 1
hugo_path: docs/reference/cc-metric-collector/internal/metricaggregator/_index.md
---
-->
# The MetricAggregator
In some cases, further combination of metrics or raw values is required. For that strings like `foo + 1` with runtime dependent `foo` need to be evaluated. The MetricAggregator relies on the [`gval`](https://github.com/PaesslerAG/gval) Golang package to perform all expression evaluation. The `gval` package provides the basic arithmetic operations but the MetricAggregator defines additional ones.

View File

@@ -1,22 +1,11 @@
<!--
---
title: Message Router
description: Routing component inside cc-metric-collector
categories: [cc-metric-collector]
tags: ['Developer']
weight: 1
hugo_path: docs/reference/cc-metric-collector/internal/metricrouter/_index.md
---
-->
# CC Metric Router
The CCMetric router sits in between the collectors and the sinks and can be used to add and remove tags to/from traversing [CCMessages](https://pkg.go.dev/github.com/ClusterCockpit/cc-lib/ccMessage).
The CCMetric router sits in between the collectors and the sinks and can be used to add and remove tags to/from traversing [CCMessages](https://pkg.go.dev/github.com/ClusterCockpit/cc-energy-manager@v0.0.0-20240919152819-92a17f2da4f7/pkg/cc-message.
# Configuration
**Note**: Use the [message processor configuration](https://github.com/ClusterCockpit/cc-lib/blob/main/messageProcessor/README.md) with option `process_messages`.
**Note**: Use the [message processor configuration](../../pkg/messageProcessor/README.md) with option `process_messages`.
```json
{
@@ -80,7 +69,7 @@ The CCMetric router sits in between the collectors and the sinks and can be used
There are three main options `add_tags`, `delete_tags` and `interval_timestamp`. `add_tags` and `delete_tags` are lists consisting of dicts with `key`, `value` and `if`. The `value` can be omitted in the `delete_tags` part as it only uses the `key` for removal. The `interval_timestamp` setting means that a unique timestamp is applied to all metrics traversing the router during an interval.
**Note**: Use the [message processor configuration](https://github.com/ClusterCockpit/cc-lib/blob/main/messageProcessor/README.md) (option `process_messages`) instead of `add_tags`, `delete_tags`, `drop_metrics`, `drop_metrics_if`, `rename_metrics`, `normalize_units` and `change_unit_prefix`. These options are deprecated and will be removed in future versions. Until then, they are added to the message processor.
**Note**: Use the [message processor configuration](../../pkg/messageProcessor/README.md) (option `process_messages`) instead of `add_tags`, `delete_tags`, `drop_metrics`, `drop_metrics_if`, `rename_metrics`, `normalize_units` and `change_unit_prefix`. These options are deprecated and will be removed in future versions. Until then, they are added to the message processor.
# Processing order in the router
@@ -274,7 +263,7 @@ The above configuration, collects all metric values for metrics evaluating `if`
If you are not interested in the input metrics `sub_metric_%d+` at all, you can add the same condition used here to the `drop_metrics_if` section to drop them.
Use cases for `interval_aggregates`:
- Combine multiple metrics of the a collector to a new one like the [MemstatCollector](../../collectors/memstatMetric.md) does it for `mem_used`:
- Combine multiple metrics of the a collector to a new one like the [MemstatCollector](../../collectors/memstatMetric.md) does it for `mem_used`)):
```json
{
"name" : "mem_used",

View File

@@ -1,14 +1,3 @@
<!--
---
title: Multi-channel Ticker
description: Timer ticker that sends out the tick to multiple channels
categories: [cc-metric-collector]
tags: ['Developer']
weight: 1
hugo_path: docs/reference/cc-metric-collector/pkg/multichanticker/_index.md
---
-->
# MultiChanTicker
The idea of this ticker is to multiply the output channels. The original Golang `time.Ticker` provides only a single output channel, so the signal can only be received by a single other class. This ticker allows to add multiple channels which get all notified about the time tick.