Merge branch 'develop' into cc_lib_switch

Update CI
Fix ccLogger import path
2025-12-14 03:26:17 +01:00 · 2025-04-16 12:58:29 +02:00 · 2025-03-18 14:32:29 +01:00 · 2025-03-18 14:28:46 +01:00 · 2025-03-18 14:21:58 +01:00 · 2025-03-15 04:02:26 +01:00
35 changed files with 54 additions and 457 deletions
--- a/README.md
+++ b/README.md
@@ -1,17 +1,6 @@
 <!--
 ---
 title: cc-metric-collector
 description: Metric collecting node agent
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/_index.md
 ---
 -->
 # cc-metric-collector
-A node agent for measuring, processing and forwarding node level metrics. It is part of the [ClusterCockpit ecosystem](https://clustercockpit.org/docs/overview/).
+A node agent for measuring, processing and forwarding node level metrics. It is part of the [ClusterCockpit ecosystem](./docs/introduction.md).
 The metric collector sends (and receives) metric in the [InfluxDB line protocol](https://docs.influxdata.com/influxdb/cloud/reference/syntax/line-protocol/) as it provides flexibility while providing a separation between tags (like index columns in relational databases) and fields (like data columns).
@@ -46,8 +35,8 @@ The `interval` defines how often the metrics should be read and send to the sink
 See the component READMEs for their configuration:
 * [`collectors`](./collectors/README.md)
-* [`sinks`](https://github.com/ClusterCockpit/cc-lib/blob/main/sinks/README.md)
+* [`sinks`](./sinks/README.md)
-* [`receivers`](https://github.com/ClusterCockpit/cc-lib/blob/main/receivers/README.md)
+* [`receivers`](./receivers/README.md)
 * [`router`](./internal/metricRouter/README.md)
 # Installation
--- a/collectors/README.md
+++ b/collectors/README.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: Metric Collectors
 description: Metric collectors for cc-metric-collector
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/_index.md
 ---
 -->
 # CCMetric collectors
 This folder contains the collectors for the cc-metric-collector.
@@ -34,6 +23,7 @@ In contrast to the configuration files for sinks and receivers, the collectors c
 * [`loadavg`](./loadavgMetric.md)
 * [`netstat`](./netstatMetric.md)
 * [`ibstat`](./infinibandMetric.md)
 * [`ibstat_perfquery`](./infinibandPerfQueryMetric.md)
 * [`tempstat`](./tempMetric.md)
 * [`lustrestat`](./lustreMetric.md)
 * [`likwid`](./likwidMetric.md)
@@ -43,10 +33,8 @@ In contrast to the configuration files for sinks and receivers, the collectors c
 * [`topprocs`](./topprocsMetric.md)
 * [`nfs3stat`](./nfs3Metric.md)
 * [`nfs4stat`](./nfs4Metric.md)
 * [`nfsiostat`](./nfsiostatMetric.md)
 * [`cpufreq`](./cpufreqMetric.md)
 * [`cpufreq_cpuinfo`](./cpufreqCpuinfoMetric.md)
 * [`schedstat`](./schedstatMetric.md)
 * [`numastats`](./numastatsMetric.md)
 * [`gpfs`](./gpfsMetric.md)
 * [`beegfs_meta`](./beegfsmetaMetric.md)
@@ -63,7 +51,7 @@ A collector reads data from any source, parses it to metrics and submits these m
 * `Name() string`: Return the name of the collector
 * `Init(config json.RawMessage) error`: Initializes the collector using the given collector-specific config in JSON. Check if needed files/commands exists, ...
 * `Initialized() bool`: Check if a collector is successfully initialized
-* `Read(duration time.Duration, output chan ccMessage.CCMessage)`: Read, parse and submit data to the `output` channel as [`CCMessage`](https://github.com/ClusterCockpit/cc-lib/blob/main/ccMessage/README.md). If the collector has to measure anything for some duration, use the provided function argument `duration`.
+* `Read(duration time.Duration, output chan ccMetric.CCMetric)`: Read, parse and submit data to the `output` channel as [`CCMetric`](../internal/ccMetric/README.md). If the collector has to measure anything for some duration, use the provided function argument `duration`.
 * `Close()`: Closes down the collector.
 It is recommanded to call `setup()` in the `Init()` function.
--- a/collectors/beegfsmetaMetric.md
+++ b/collectors/beegfsmetaMetric.md
@@ -1,17 +1,5 @@
 <!--
 ---
 title: BeeGFS metadata metric collector
 description: Collect metadata clientstats for `BeeGFS on Demand`
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/beegfsmeta.md
 ---
 -->
 ## `BeeGFS on Demand` collector
-This Collector is to collect `BeeGFS on Demand` (BeeOND) metadata clientstats.
+This Collector is to collect BeeGFS on Demand (BeeOND) metadata clientstats.
 ```json
  "beegfs_meta": {
@@ -84,4 +72,4 @@ Available Metrics:
 * setXA
 * mirror
-The collector adds a `filesystem` tag to all metrics
+The collector adds a `filesystem` tag to all metrics
--- a/collectors/beegfsstorageMetric.md
+++ b/collectors/beegfsstorageMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: "BeeGFS on Demand metric collector"
 description: Collect performance metrics for BeeGFS filesystems
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/beegfsstorage.md
 ---
 -->
 ## `BeeGFS on Demand` collector
 This Collector is to collect BeeGFS on Demand (BeeOND) storage stats.
@@ -63,4 +52,4 @@ Available Metrics:
 * "unlnk"
-The collector adds a `filesystem` tag to all metrics
+The collector adds a `filesystem` tag to all metrics
--- a/collectors/cpufreqCpuinfoMetric.md
+++ b/collectors/cpufreqCpuinfoMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: CPU frequency metric collector through cpuinfo
 description: Collect the CPU frequency from `/proc/cpuinfo`
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/cpufreq_cpuinfo.md
 ---
 -->
 ## `cpufreq_cpuinfo` collector
 ```json
--- a/collectors/cpufreqMetric.md
+++ b/collectors/cpufreqMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: CPU frequency metric collector through sysfs
 description: Collect the CPU frequency metrics from `/sys/.../cpu/.../cpufreq`
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/cpufreq.md
 ---
 -->
 ## `cpufreq_cpuinfo` collector
 ```json
--- a/collectors/cpustatMetric.md
+++ b/collectors/cpustatMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: CPU usage metric collector
 description: Collect CPU metrics from `/proc/stat`
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/cpustat.md
 ---
 -->
 ## `cpustat` collector
@@ -35,4 +24,4 @@ Metrics:
 * `cpu_guest` with `unit=Percent`
 * `cpu_guest_nice` with `unit=Percent`
 * `cpu_used` = `cpu_* - cpu_idle` with `unit=Percent`
-* `num_cpus`
+* `num_cpus`
--- a/collectors/customCmdMetric.md
+++ b/collectors/customCmdMetric.md
@@ -1,13 +1,3 @@
 <!--
 ---
 title: CustomCommand metric collector
 description: Collect messages from custom command or files
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/customcmd.md
 ---
 -->
 ## `customcmd` collector
--- a/collectors/diskstatMetric.md
+++ b/collectors/diskstatMetric.md
@@ -1,13 +1,3 @@
 <!--
 ---
 title: Disk usage statistics metric collector
 description: Collect metrics for various filesystems from `/proc/self/mounts`
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/diskstat.md
 ---
 -->
 ## `diskstat` collector
--- a/collectors/gpfsMetric.md
+++ b/collectors/gpfsMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: GPFS collector
 description: Collect infos about GPFS filesystems
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/gpfs.md
 ---
 -->
 ## `gpfs` collector
 ```json
--- a/collectors/infinibandMetric.md
+++ b/collectors/infinibandMetric.md
@@ -1,13 +1,3 @@
 <!--
 ---
 title: InfiniBand Metric collector
 description: Collect metrics for InfiniBand devices
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/infiniband.md
 ---
 -->
 ## `ibstat` collector
--- a/collectors/iostatMetric.md
+++ b/collectors/iostatMetric.md
@@ -1,13 +1,3 @@
 <!--
 ---
 title: IOStat Metric collector
 description: Collect metrics from `/proc/diskstats`
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/iostat.md
 ---
 -->
 ## `iostat` collector
--- a/collectors/ipmiMetric.md
+++ b/collectors/ipmiMetric.md
@@ -1,13 +1,3 @@
 <!--
 ---
 title: IPMI Metric collector
 description: Collect metrics using ipmitool or ipmi-sensors
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/ipmi.md
 ---
 -->
 ## `ipmistat` collector
--- a/collectors/likwidMetric.go
+++ b/collectors/likwidMetric.go
@@ -190,8 +190,12 @@ func getBaseFreq() float64 {
 	}
 	if math.IsNaN(freq) {
-		C.timer_init()
+		C.power_init(0)
-		freq = float64(C.timer_getCycleClock()) / 1e3
+		info := C.get_powerInfo()
 		if float64(info.baseFrequency) != 0 {
 			freq = float64(info.baseFrequency)
 		}
 		C.power_finalize()
 	}
 	return freq * 1e3
 }
--- a/collectors/likwidMetric.md
+++ b/collectors/likwidMetric.md
@@ -1,13 +1,3 @@
 <!--
 ---
 title: LIKWID collector
 description: Collect hardware performance events and metrics using LIKWID
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/likwid.md
 ---
 -->
 ## `likwid` collector
--- a/collectors/loadavgMetric.md
+++ b/collectors/loadavgMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: Load average metric collector
 description: Collect metrics from `/proc/loadavg`
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/loadavg.md
 ---
 -->
 ## `loadavg` collector
--- a/collectors/lustreMetric.md
+++ b/collectors/lustreMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: Lustre filesystem metric collector
 description: Collect metrics for Lustre filesystems
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/lustre.md
 ---
 -->
 ## `lustrestat` collector
@@ -54,4 +43,4 @@ Metrics:
 * `lustre_statfs_diff` (if `send_diff_values == true`)
 * `lustre_inode_permission_diff` (if `send_diff_values == true`)
-This collector adds an `device` tag.
+This collector adds an `device` tag.
--- a/collectors/memstatMetric.md
+++ b/collectors/memstatMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: Memory statistics metric collector
 description: Collect metrics from `/proc/meminfo`
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/memstat.md
 ---
 -->
 ## `memstat` collector
--- a/collectors/netstatMetric.md
+++ b/collectors/netstatMetric.md
@@ -1,13 +1,3 @@
 <!--
 ---
 title: Network device metric collector
 description: Collect metrics for network devices through procfs
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/netstat.md
 ---
 -->
 ## `netstat` collector
@@ -38,4 +28,4 @@ Metrics:
 * `net_pkts_in_bw` (`unit=packets/sec` if `send_derived_values == true`)
 * `net_pkts_out_bw` (`unit=packets/sec` if `send_derived_values == true`)
-The device name is added as tag `stype=network,stype-id=<device>`.
+The device name is added as tag `stype=network,stype-id=<device>`.
--- a/collectors/nfs3Metric.md
+++ b/collectors/nfs3Metric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: NFS network filesystem (v3) metric collector
 description: Collect metrics for NFS network filesystems in version 3
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/nfs3.md
 ---
 -->
 ## `nfs3stat` collector
--- a/collectors/nfs4Metric.md
+++ b/collectors/nfs4Metric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: NFS network filesystem (v4) metric collector
 description: Collect metrics for NFS network filesystems in version 4
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/nfs4.md
 ---
 -->
 ## `nfs4stat` collector
--- a/collectors/nfsiostatMetric.go
+++ b/collectors/nfsiostatMetric.go
@@ -171,7 +171,7 @@ func (m *NfsIOStatCollector) Read(interval time.Duration, output chan lp.CCMessa
 			}
 		}
 		if !found {
-			delete(m.data, mntpoint)
+			m.data[mntpoint] = nil
 		}
 	}
 }
--- a/collectors/nfsiostatMetric.md
+++ b/collectors/nfsiostatMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: NFS network filesystem metrics from procfs
 description: Collect NFS network filesystem metrics for mounts from `/proc/self/mountstats`
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/nfsio.md
 ---
 -->
 ## `nfsiostat` collector
 ```json
--- a/collectors/numastatsMetric.md
+++ b/collectors/numastatsMetric.md
@@ -1,13 +1,3 @@
 <!--
 ---
 title: NUMAStat collector
 description: Collect infos about NUMA domains
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/numastat.md
 ---
 -->
 ## `numastat` collector
--- a/collectors/nvidiaMetric.go
+++ b/collectors/nvidiaMetric.go
@@ -27,12 +27,10 @@ type NvidiaCollectorConfig struct {
 }
 type NvidiaCollectorDevice struct {
-	device              nvml.Device
+	device         nvml.Device
-	excludeMetrics      map[string]bool
+	excludeMetrics map[string]bool
-	tags                map[string]string
+	tags           map[string]string
-	meta                map[string]string
+	meta           map[string]string
 	lastEnergyReading   uint64
 	lastEnergyTimestamp time.Time
 }
 type NvidiaCollector struct {
@@ -151,8 +149,6 @@ func (m *NvidiaCollector) Init(config json.RawMessage) error {
 		// Add device handle
 		g.device = device
 		g.lastEnergyReading = 0
 		g.lastEnergyTimestamp = time.Now()
 		// Add tags
 		g.tags = map[string]string{
@@ -210,7 +206,7 @@ func (m *NvidiaCollector) Init(config json.RawMessage) error {
 	return nil
 }
-func readMemoryInfo(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readMemoryInfo(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	if !device.excludeMetrics["nv_fb_mem_total"] || !device.excludeMetrics["nv_fb_mem_used"] || !device.excludeMetrics["nv_fb_mem_reserved"] {
 		var total uint64
 		var used uint64
@@ -254,7 +250,7 @@ func readMemoryInfo(device *NvidiaCollectorDevice, output chan lp.CCMessage) err
 	return nil
 }
-func readBarMemoryInfo(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readBarMemoryInfo(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	if !device.excludeMetrics["nv_bar1_mem_total"] || !device.excludeMetrics["nv_bar1_mem_used"] {
 		meminfo, ret := nvml.DeviceGetBAR1MemoryInfo(device.device)
 		if ret != nvml.SUCCESS {
@@ -281,7 +277,7 @@ func readBarMemoryInfo(device *NvidiaCollectorDevice, output chan lp.CCMessage)
 	return nil
 }
-func readUtilization(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readUtilization(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	isMig, ret := nvml.DeviceIsMigDeviceHandle(device.device)
 	if ret != nvml.SUCCESS {
 		err := errors.New(nvml.ErrorString(ret))
@@ -323,7 +319,7 @@ func readUtilization(device *NvidiaCollectorDevice, output chan lp.CCMessage) er
 	return nil
 }
-func readTemp(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readTemp(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	if !device.excludeMetrics["nv_temp"] {
 		// Retrieves the current temperature readings for the device, in degrees C.
 		//
@@ -342,7 +338,7 @@ func readTemp(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	return nil
 }
-func readFan(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readFan(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	if !device.excludeMetrics["nv_fan"] {
 		// Retrieves the intended operating speed of the device's fan.
 		//
@@ -365,7 +361,7 @@ func readFan(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	return nil
 }
-// func readFans(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+// func readFans(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 // 	if !device.excludeMetrics["nv_fan"] {
 // 		numFans, ret := nvml.DeviceGetNumFans(device.device)
 // 		if ret == nvml.SUCCESS {
@@ -386,7 +382,7 @@ func readFan(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
 // 	return nil
 // }
-func readEccMode(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readEccMode(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	if !device.excludeMetrics["nv_ecc_mode"] {
 		// Retrieves the current and pending ECC modes for the device.
 		//
@@ -420,7 +416,7 @@ func readEccMode(device *NvidiaCollectorDevice, output chan lp.CCMessage) error
 	return nil
 }
-func readPerfState(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readPerfState(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	if !device.excludeMetrics["nv_perf_state"] {
 		// Retrieves the current performance state for the device.
 		//
@@ -440,16 +436,13 @@ func readPerfState(device *NvidiaCollectorDevice, output chan lp.CCMessage) erro
 	return nil
 }
-func readPowerUsage(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readPowerUsage(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	if !device.excludeMetrics["nv_power_usage"] {
 		// Retrieves power usage for this GPU in milliwatts and its associated circuitry (e.g. memory)
 		//
 		// On Fermi and Kepler GPUs the reading is accurate to within +/- 5% of current power draw.
 		// On Ampere (except GA100) or newer GPUs, the API returns power averaged over 1 sec interval.
 		// On GA100 and older architectures, instantaneous power is returned.
 		//
-		// It is only available if power management mode is supported.
+		// It is only available if power management mode is supported
 		mode, ret := nvml.DeviceGetPowerManagementMode(device.device)
 		if ret != nvml.SUCCESS {
 			return nil
@@ -468,54 +461,7 @@ func readPowerUsage(device *NvidiaCollectorDevice, output chan lp.CCMessage) err
 	return nil
 }
-func readEnergyConsumption(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readClocks(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	// Retrieves total energy consumption for this GPU in millijoules (mJ) since the driver was last reloaded
 	// For Volta or newer fully supported devices.
 	if (!device.excludeMetrics["nv_energy"]) && (!device.excludeMetrics["nv_energy_abs"]) && (!device.excludeMetrics["nv_average_power"]) {
 		now := time.Now()
 		mode, ret := nvml.DeviceGetPowerManagementMode(device.device)
 		if ret != nvml.SUCCESS {
 			return nil
 		}
 		if mode == nvml.FEATURE_ENABLED {
 			energy, ret := nvml.DeviceGetTotalEnergyConsumption(device.device)
 			if ret == nvml.SUCCESS {
 				if device.lastEnergyReading != 0 {
 					if !device.excludeMetrics["nv_energy"] {
 						y, err := lp.NewMetric("nv_energy", device.tags, device.meta, (energy-device.lastEnergyReading)/1000, now)
 						if err == nil {
 							y.AddMeta("unit", "Joules")
 							output <- y
 						}
 					}
 					if !device.excludeMetrics["nv_average_power"] {
 						energyDiff := (energy - device.lastEnergyReading) / 1000
 						timeDiff := now.Sub(device.lastEnergyTimestamp)
 						y, err := lp.NewMetric("nv_average_power", device.tags, device.meta, energyDiff/uint64(timeDiff.Seconds()), now)
 						if err == nil {
 							y.AddMeta("unit", "watts")
 							output <- y
 						}
 					}
 				}
 				if !device.excludeMetrics["nv_energy_abs"] {
 					y, err := lp.NewMetric("nv_energy_abs", device.tags, device.meta, energy/1000, now)
 					if err == nil {
 						y.AddMeta("unit", "Joules")
 						output <- y
 					}
 				}
 				device.lastEnergyReading = energy
 				device.lastEnergyTimestamp = time.Now()
 			}
 		}
 	}
 	return nil
 }
 func readClocks(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	// Retrieves the current clock speeds for the device.
 	//
 	// Available clock information:
@@ -567,7 +513,7 @@ func readClocks(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	return nil
 }
-func readMaxClocks(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readMaxClocks(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	// Retrieves the maximum clock speeds for the device.
 	//
 	// Available clock information:
@@ -625,7 +571,7 @@ func readMaxClocks(device *NvidiaCollectorDevice, output chan lp.CCMessage) erro
 	return nil
 }
-func readEccErrors(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readEccErrors(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	if !device.excludeMetrics["nv_ecc_uncorrected_error"] {
 		// Retrieves the total ECC error counts for the device.
 		//
@@ -656,7 +602,7 @@ func readEccErrors(device *NvidiaCollectorDevice, output chan lp.CCMessage) erro
 	return nil
 }
-func readPowerLimit(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readPowerLimit(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	if !device.excludeMetrics["nv_power_max_limit"] {
 		// Retrieves the power management limit associated with this device.
 		//
@@ -676,7 +622,7 @@ func readPowerLimit(device *NvidiaCollectorDevice, output chan lp.CCMessage) err
 	return nil
 }
-func readEncUtilization(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readEncUtilization(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	isMig, ret := nvml.DeviceIsMigDeviceHandle(device.device)
 	if ret != nvml.SUCCESS {
 		err := errors.New(nvml.ErrorString(ret))
@@ -703,7 +649,7 @@ func readEncUtilization(device *NvidiaCollectorDevice, output chan lp.CCMessage)
 	return nil
 }
-func readDecUtilization(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readDecUtilization(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	isMig, ret := nvml.DeviceIsMigDeviceHandle(device.device)
 	if ret != nvml.SUCCESS {
 		err := errors.New(nvml.ErrorString(ret))
@@ -730,7 +676,7 @@ func readDecUtilization(device *NvidiaCollectorDevice, output chan lp.CCMessage)
 	return nil
 }
-func readRemappedRows(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readRemappedRows(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	if !device.excludeMetrics["nv_remapped_rows_corrected"] ||
 		!device.excludeMetrics["nv_remapped_rows_uncorrected"] ||
 		!device.excludeMetrics["nv_remapped_rows_pending"] ||
@@ -783,7 +729,7 @@ func readRemappedRows(device *NvidiaCollectorDevice, output chan lp.CCMessage) e
 	return nil
 }
-func readProcessCounts(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readProcessCounts(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	if !device.excludeMetrics["nv_compute_processes"] {
 		// Get information about processes with a compute context on a device
 		//
@@ -875,7 +821,7 @@ func readProcessCounts(device *NvidiaCollectorDevice, output chan lp.CCMessage)
 	return nil
 }
-func readViolationStats(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readViolationStats(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	var violTime nvml.ViolationTime
 	var ret nvml.Return
@@ -989,7 +935,7 @@ func readViolationStats(device *NvidiaCollectorDevice, output chan lp.CCMessage)
 	return nil
 }
-func readNVLinkStats(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
+func readNVLinkStats(device NvidiaCollectorDevice, output chan lp.CCMessage) error {
 	// Retrieves the specified error counter value
 	// Please refer to \a nvmlNvLinkErrorCounter_t for error counters that are available
 	//
@@ -1124,7 +1070,7 @@ func (m *NvidiaCollector) Read(interval time.Duration, output chan lp.CCMessage)
 		return
 	}
-	readAll := func(device *NvidiaCollectorDevice, output chan lp.CCMessage) {
+	readAll := func(device NvidiaCollectorDevice, output chan lp.CCMessage) {
 		name, ret := nvml.DeviceGetName(device.device)
 		if ret != nvml.SUCCESS {
 			name = "NoName"
@@ -1164,11 +1110,6 @@ func (m *NvidiaCollector) Read(interval time.Duration, output chan lp.CCMessage)
 			cclog.ComponentDebug(m.name, "readPowerUsage for device", name, "failed")
 		}
 		err = readEnergyConsumption(device, output)
 		if err != nil {
 			cclog.ComponentDebug(m.name, "readEnergyConsumption for device", name, "failed")
 		}
 		err = readClocks(device, output)
 		if err != nil {
 			cclog.ComponentDebug(m.name, "readClocks for device", name, "failed")
@@ -1228,7 +1169,7 @@ func (m *NvidiaCollector) Read(interval time.Duration, output chan lp.CCMessage)
 	// Actual read loop over all attached Nvidia GPUs
 	for i := 0; i < m.num_gpus; i++ {
-		readAll(&m.gpus[i], output)
+		readAll(m.gpus[i], output)
 		// Iterate over all MIG devices if any
 		if m.config.ProcessMigDevices {
@@ -1302,7 +1243,7 @@ func (m *NvidiaCollector) Read(interval time.Duration, output chan lp.CCMessage)
 					}
 				}
-				readAll(&migDevice, output)
+				readAll(migDevice, output)
 			}
 		}
 	}
--- a/collectors/nvidiaMetric.md
+++ b/collectors/nvidiaMetric.md
@@ -1,13 +1,3 @@
 <!--
 ---
 title: "Nvidia NVML metric collector"
 description: Collect metrics for Nvidia GPUs using the NVML
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/nvidia.md
 ---
 -->
 ## `nvidia` collector
@@ -82,8 +72,5 @@ Metrics:
 * `nv_nvlink_ecc_errors`
 * `nv_nvlink_replay_errors`
 * `nv_nvlink_recovery_errors`
 * `nv_energy`
 * `nv_energy_abs`
 * `nv_average_power`
-Some metrics add the additional sub type tag (`stype`) like the `nv_nvlink_*` metrics set `stype=nvlink,stype-id=<link_number>`. 
+Some metrics add the additional sub type tag (`stype`) like the `nv_nvlink_*` metrics set `stype=nvlink,stype-id=<link_number>`. 
--- a/collectors/raplMetric.md
+++ b/collectors/raplMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: RAPL metric collector
 description: Collect energy data through the RAPL sysfs interface
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/rapl.md
 ---
 -->
 ## `rapl` collector
 This collector reads running average power limit (RAPL) monitoring attributes to compute average power consumption metrics. See <https://www.kernel.org/doc/html/latest/power/powercap/powercap.html#monitoring-attributes>.
--- a/collectors/rocmsmiMetric.md
+++ b/collectors/rocmsmiMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: "ROCm SMI metric collector"
 description: Collect metrics for AMD GPUs using the SMI library
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/rocmsmi.md
 ---
 -->
 ## `rocm_smi` collector
--- a/collectors/schedstatMetric.md
+++ b/collectors/schedstatMetric.md
@@ -1,13 +1,3 @@
 <!--
 ---
 title: SchedStat Metric collector
 description: Collect metrics from `/proc/schedstat`
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/schedstat.md
 ---
 -->
 ## `schedstat` collector
 ```json
@@ -18,4 +8,4 @@ hugo_path: docs/reference/cc-metric-collector/collectors/schedstat.md
 The `schedstat` collector reads data from /proc/schedstat and calculates a load value, separated by hwthread. This might be useful to detect bad cpu pinning on shared nodes etc. 
 Metric:
-* `cpu_load_core`
+* `cpu_load_core`
--- a/collectors/selfMetric.md
+++ b/collectors/selfMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: Self-monitoring metric collector
 description: Collect metrics from the execution of cc-metric-collector itself
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/self.md
 ---
 -->
 ## `self` collector
 ```json
--- a/collectors/tempMetric.md
+++ b/collectors/tempMetric.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: Temperature metric collector
 description: Collect thermal metrics from `/sys/class/hwmon/*`
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/temp.md
 ---
 -->
 ## `tempstat` collector
--- a/collectors/topprocsMetric.md
+++ b/collectors/topprocsMetric.md
@@ -1,15 +1,3 @@
 <!--
 ---
 title: TopProcs collector
 description: Collect infos about most CPU-consuming processes
 categories: [cc-metric-collector]
 tags: ['Admin']
 weight: 2
 hugo_path: docs/reference/cc-metric-collector/collectors/topprocs.md
 ---
 -->
 ## `topprocs` collector
--- a/internal/metricAggregator/README.md
+++ b/internal/metricAggregator/README.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: Metric Aggregator
 description: Subsystem for evaluating expressions on metrics (deprecated)
 categories: [cc-metric-collector]
 tags: ['Developer']
 weight: 1
 hugo_path: docs/reference/cc-metric-collector/internal/metricaggregator/_index.md
 ---
 -->
 # The MetricAggregator
 In some cases, further combination of metrics or raw values is required. For that strings like `foo + 1` with runtime dependent `foo` need to be evaluated. The MetricAggregator relies on the [`gval`](https://github.com/PaesslerAG/gval) Golang package to perform all expression evaluation. The `gval` package provides the basic arithmetic operations but the MetricAggregator defines additional ones.
@@ -46,4 +35,4 @@ The MetricAggregator provides these functions additional to the `Full` language
 ## Limitations
 - Since the metrics are written in JSON files which do not allow `""` without proper escaping inside of JSON strings, you have to use `''` for strings.
- Since `\` is interpreted by JSON as escape character, it cannot be used in metrics. But it is required to write regular expressions. So instead of `/`, use `%` and the MetricAggregator replaces them after reading the JSON file.
+- Since `\` is interpreted by JSON as escape character, it cannot be used in metrics. But it is required to write regular expressions. So instead of `/`, use `%` and the MetricAggregator replaces them after reading the JSON file.
--- a/internal/metricRouter/README.md
+++ b/internal/metricRouter/README.md
@@ -1,22 +1,11 @@
 <!--
 ---
 title: Message Router
 description: Routing component inside cc-metric-collector
 categories: [cc-metric-collector]
 tags: ['Developer']
 weight: 1
 hugo_path: docs/reference/cc-metric-collector/internal/metricrouter/_index.md
 ---
 -->
 # CC Metric Router
-The CCMetric router sits in between the collectors and the sinks and can be used to add and remove tags to/from traversing [CCMessages](https://pkg.go.dev/github.com/ClusterCockpit/cc-lib/ccMessage).
+The CCMetric router sits in between the collectors and the sinks and can be used to add and remove tags to/from traversing [CCMessages](https://pkg.go.dev/github.com/ClusterCockpit/cc-energy-manager@v0.0.0-20240919152819-92a17f2da4f7/pkg/cc-message.
 # Configuration
-**Note**: Use the [message processor configuration](https://github.com/ClusterCockpit/cc-lib/blob/main/messageProcessor/README.md) with option `process_messages`.
+**Note**: Use the [message processor configuration](../../pkg/messageProcessor/README.md) with option `process_messages`.
 ```json
 {
@@ -80,7 +69,7 @@ The CCMetric router sits in between the collectors and the sinks and can be used
 There are three main options `add_tags`, `delete_tags` and `interval_timestamp`. `add_tags` and `delete_tags` are lists consisting of dicts with `key`, `value` and `if`. The `value` can be omitted in the `delete_tags` part as it only uses the `key` for removal. The `interval_timestamp` setting means that a unique timestamp is applied to all metrics traversing the router during an interval.
-**Note**: Use the [message processor configuration](https://github.com/ClusterCockpit/cc-lib/blob/main/messageProcessor/README.md) (option `process_messages`) instead of `add_tags`, `delete_tags`, `drop_metrics`, `drop_metrics_if`, `rename_metrics`, `normalize_units` and `change_unit_prefix`. These options are deprecated and will be removed in future versions. Until then, they are added to the message processor.
+**Note**: Use the [message processor configuration](../../pkg/messageProcessor/README.md) (option `process_messages`) instead of `add_tags`, `delete_tags`, `drop_metrics`, `drop_metrics_if`, `rename_metrics`, `normalize_units` and `change_unit_prefix`. These options are deprecated and will be removed in future versions. Until then, they are added to the message processor.
 # Processing order in the router
@@ -236,13 +225,13 @@ __deprecated__
 The cc-metric-collector tries to read the data from the system as it is reported. If available, it tries to read the metric unit from the system as well (e.g. from `/proc/meminfo`). The problem is that, depending on the source, the metric units are named differently. Just think about `byte`, `Byte`, `B`, `bytes`, ...
-The [cc-units](https://github.com/ClusterCockpit/cc-lib/ccUnits) package provides us a normalization option to use the same metric unit name for all metrics. It this option is set to true, all `unit` meta tags are normalized.
+The [cc-units](https://github.com/ClusterCockpit/cc-units) package provides us a normalization option to use the same metric unit name for all metrics. It this option is set to true, all `unit` meta tags are normalized.
 ## The `change_unit_prefix` section
 __deprecated__
-It is often the case that metrics are reported by the system using a rather outdated unit prefix (like `/proc/meminfo` still uses kByte despite current memory sizes are in the GByte range). If you want to change the prefix of a unit, you can do that with the help of [cc-units](https://github.com/ClusterCockpit/cc-lib/ccUnits). The setting works on the metric name and requires the new prefix for the metric. The cc-units package determines the scaling factor.
+It is often the case that metrics are reported by the system using a rather outdated unit prefix (like `/proc/meminfo` still uses kByte despite current memory sizes are in the GByte range). If you want to change the prefix of a unit, you can do that with the help of [cc-units](https://github.com/ClusterCockpit/cc-units). The setting works on the metric name and requires the new prefix for the metric. The cc-units package determines the scaling factor.
 # Aggregate metric values of the current interval with the `interval_aggregates` option
@@ -274,7 +263,7 @@ The above configuration, collects all metric values for metrics evaluating `if`
 If you are not interested in the input metrics `sub_metric_%d+` at all, you can add the same condition used here to the `drop_metrics_if` section to drop them.
 Use cases for `interval_aggregates`:
- Combine multiple metrics of the a collector to a new one like the [MemstatCollector](../../collectors/memstatMetric.md) does it for `mem_used`:
+- Combine multiple metrics of the a collector to a new one like the [MemstatCollector](../../collectors/memstatMetric.md) does it for `mem_used`)):
 ```json
  {
    "name" : "mem_used",
--- a/pkg/multiChanTicker/README.md
+++ b/pkg/multiChanTicker/README.md
@@ -1,14 +1,3 @@
 <!--
 ---
 title: Multi-channel Ticker
 description: Timer ticker that sends out the tick to multiple channels
 categories: [cc-metric-collector]
 tags: ['Developer']
 weight: 1
 hugo_path: docs/reference/cc-metric-collector/pkg/multichanticker/_index.md
 ---
 -->
 # MultiChanTicker
 The idea of this ticker is to multiply the output channels. The original Golang `time.Ticker` provides only a single output channel, so the signal can only be received by a single other class. This ticker allows to add multiple channels which get all notified about the time tick.
Author	SHA1	Message	Date
Thomas Gruber	9e321e0766	Merge branch 'develop' into cc_lib_switch	2025-04-16 12:58:29 +02:00
Thomas Roehl	813804ae2d	Update CI	2025-03-18 14:32:29 +01:00
Thomas Roehl	da91813a81	Fix ccLogger import path	2025-03-18 14:28:46 +01:00
Thomas Roehl	6ea79b0099	Use receiver, sinks, ccLogger and ccConfig from cc-lib	2025-03-18 14:21:58 +01:00
Thomas Roehl	b5520efc25	Fix artifacts in netstat collector of not done cc-lib switch	2025-03-15 04:02:26 +01:00
Thomas Roehl	d2b1bad1b8	Fix artifacts of not done cc-lib switch	2025-03-15 04:01:01 +01:00
Thomas Roehl	01ff8b2e9b	Remove local development path	2025-02-24 18:35:16 +01:00
Thomas Roehl	a476f1753e	Change to ccMessage from cc-lib	2025-02-24 18:29:27 +01:00
brinkcoder	0e57c8db1c	Add derived_values for numastats (#134 ) * Check creation of CCMessage in NATS receiver * add derived_values for numastats * change to ccMessage * remove vim command artefact --------- Co-authored-by: Thomas Roehl <thomas.roehl@fau.de> Co-authored-by: exterr2f <Robert.Externbrink@rub.de> Co-authored-by: Thomas Gruber <Thomas.Roehl@googlemail.com>	2025-02-19 11:35:32 +01:00
brinkcoder	f2f38c81af	Add exclude_devices to iostat (#133 ) * Check creation of CCMessage in NATS receiver * add exclude_device for iostatMetric * add md file --------- Co-authored-by: Thomas Roehl <thomas.roehl@fau.de> Co-authored-by: exterr2f <Robert.Externbrink@rub.de> Co-authored-by: Thomas Gruber <Thomas.Roehl@googlemail.com>	2025-02-19 11:34:56 +01:00
brinkcoder	f9acc51a50	Add derived values for nfsiostat (#132 ) * Check creation of CCMessage in NATS receiver * add derived_values for nfsiostatMetric --------- Co-authored-by: Thomas Roehl <thomas.roehl@fau.de> Co-authored-by: exterr2f <Robert.Externbrink@rub.de> Co-authored-by: Thomas Gruber <Thomas.Roehl@googlemail.com>	2025-02-19 11:34:06 +01:00
brinkcoder	87346e2eae	Fix excluded metrics for diskstat and add exclude_mounts (#131 ) * Check creation of CCMessage in NATS receiver * fix excluded metrics and add optional mountpoint exclude --------- Co-authored-by: Thomas Roehl <thomas.roehl@fau.de> Co-authored-by: exterr2f <Robert.Externbrink@rub.de> Co-authored-by: Thomas Gruber <Thomas.Roehl@googlemail.com>	2025-02-19 11:33:13 +01:00
brinkcoder	0f92f10b66	Add optional interface alias in netstat (#130 ) * Check creation of CCMessage in NATS receiver * add optional interface aliases for netstatMetric * small fix --------- Co-authored-by: Thomas Roehl <thomas.roehl@fau.de> Co-authored-by: exterr2f <Robert.Externbrink@rub.de> Co-authored-by: Thomas Gruber <Thomas.Roehl@googlemail.com>	2025-02-19 11:32:15 +01:00
Michael Panzlaff	6901b06e44	Rename 'process_message' to 'process_messages' in metricRouter config This makes the behavior more consistent with the other modules, which have their MessageProcessor named 'process_messages'. This most likely was just a typo.	2025-02-03 15:23:51 +01:00
Thomas Roehl	7b343d0bab	Use CCMessage FromBytes instead of Influx's decoder	2024-12-27 15:22:59 +00:00
Thomas Roehl	7d3180b526	Check creation of CCMessage in NATS receiver	2024-12-27 15:00:48 +00:00
Thomas Roehl	70a6afc549	Generate HUGO inputs out of Markdown files	2024-12-23 17:55:48 +01:00
Thomas Roehl	e02a018327	Mark all JSON config fields of message processor as omitempty	2024-12-23 17:52:34 +01:00
Thomas Roehl	bcecdd033b	Fix documentation of RAPL collector	2024-12-23 17:51:43 +01:00
Thomas Roehl	2645ffeff3	Merge branch 'main' into develop	2024-12-21 02:39:08 +01:00
Thomas Roehl	e968aa1991	Fix wrongly named packages	2024-12-20 20:33:10 +01:00
Thomas Gruber	d2a38e3844	Merge branch 'main' into develop	2024-12-20 20:27:48 +01:00
Thomas Roehl	1f35f6d3ca	Fix wrongly named packages	2024-12-20 20:26:38 +01:00
Thomas Roehl	7e6870c7b3	Add golang-race for UBI9 and Alma9	2024-12-20 20:15:59 +01:00
Thomas Roehl	d881093524	Install go-toolkit to fulfill build requirements for RPM	2024-12-20 20:12:03 +01:00
Thomas Roehl	c01096c157	use go-toolkit for RPM builds	2024-12-20 18:49:28 +01:00
Thomas Roehl	3d70c8afc9	Remove condition around BuildRequires and use go-toolkit for RPM builds	2024-12-20 18:43:21 +01:00
Thomas Roehl	7ee85a07dc	Remove go-toolkit as build requirement for RPM builds if run in CI	2024-12-20 18:28:32 +01:00