Commit Graph

62 Commits

Author SHA1 Message Date
Thomas Gruber
7840de7b82
Merge develop branch into main ()
* Add cpu_used (all-cpu_idle) to CpustatCollector

* Update cc-metric-collector.init

* Allow selection of timestamp precision in HttpSink

* Add comment about precision requirement for cc-metric-store

* Fix for API changes in gofish@v0.15.0

* Update requirements to latest version

* Read sensors through redfish

* Update golang toolchain to 1.21

* Remove stray error check

* Update main config in configuration.md

* Update Release action to use golang 1.22 stable release, no golang RPMs anymore

* Update runonce action to use golang 1.22 stable release, no golang RPMs anymore

* Update README.md

Use right JSON type in configuration

* Update sink's README

* Test whether ipmitool or ipmi-sensors can be executed without errors

* Little fixes to the prometheus sink ()

* Add uint64 to float64 cast option

* Add prometheus sink to the list of available sinks

* Add aggregated counters by gpu for nvlink errors

---------

Co-authored-by: Michael Schwarz <schwarz@uni-paderborn.de>

* Ccmessage migration ()

* Add cpu_used (all-cpu_idle) to CpustatCollector

* Update cc-metric-collector.init

* Allow selection of timestamp precision in HttpSink

* Add comment about precision requirement for cc-metric-store

* Fix for API changes in gofish@v0.15.0

* Update requirements to latest version

* Read sensors through redfish

* Update golang toolchain to 1.21

* Remove stray error check

* Update main config in configuration.md

* Update Release action to use golang 1.22 stable release, no golang RPMs anymore

* Update runonce action to use golang 1.22 stable release, no golang RPMs anymore

* Switch to CCMessage for all files.

---------

Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>

* Switch to ccmessage also for latest additions in nvidiaMetric

* New Message processor ()

* Add cpu_used (all-cpu_idle) to CpustatCollector

* Update cc-metric-collector.init

* Allow selection of timestamp precision in HttpSink

* Add comment about precision requirement for cc-metric-store

* Fix for API changes in gofish@v0.15.0

* Update requirements to latest version

* Read sensors through redfish

* Update golang toolchain to 1.21

* Remove stray error check

* Update main config in configuration.md

* Update Release action to use golang 1.22 stable release, no golang RPMs anymore

* Update runonce action to use golang 1.22 stable release, no golang RPMs anymore

* New message processor to check whether a message should be dropped or manipulate it in flight

* Create a copy of message before manipulation

---------

Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>

* Update collector's Makefile and go.mod/sum files

* Use message processor in router, all sinks and all receivers

* Add support for credential file (NKEY) to NATS sink and receiver

* Fix JSON keys in message processor configuration

* Update docs for message processor, router and the default router config file

* Add link to expr syntax and fix regex matching docs

* Update sample collectors

* Minor style change in collector manager

* Some helpers for ccTopology

* LIKWID collector: write log owner change only once

* Fix for metrics without units and reduce debugging messages for messageProcessor

* Use shorted hostname for hostname added by router

* Define default port for NATS

* CPUstat collector: only add unit for applicable metrics

* Add precision option to all sinks using Influx's encoder

* Add message processor to all sink documentation

* Add units to documentation of cpustat collector

---------

Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: oscarminus <me@oscarminus.de>
Co-authored-by: Michael Schwarz <schwarz@uni-paderborn.de>
2024-12-19 23:00:14 +01:00
Thomas Gruber
6ab45dd3ec
Merge develop into main ()
* Add cpu_used (all-cpu_idle) to CpustatCollector

* Update to line-protocol/v2

* Update runonce.yml with Golang 1.20

* Update fsnotify in LIKWID Collector

* Use not a pointer to line-protocol.Encoder

* Simplify Makefile

* Use only as many arguments as required

* Allow sum function to handle non float types

* Allow values to be a slice of type float64, float32, int, int64, int32, bool

* Use generic function to simplify code

* Add missing case for type []int32

* Use generic function to compute minimum

* Use generic function to compute maximum

* Use generic function to compute average

* Add error value to sumAnyType

* Use generic function to compute median

* For older versions of go slices is not part of the installation

* Remove old entries from go.sum

* Use simpler sort function

* Compute metrics ib_total and ib_total_pkts

* Add aggregated metrics.
Add missing units

* Update likwidMetric.go

Fixes a potential bug when `fsnotify.NewWatcher()` fails with an error

* Completly avoid memory allocations in infinibandMetric read()

* Fixed initialization: Initalization and measurements should run in the same thread

* Add safe.directory to Release action

* Fix path after installation to /usr/bin after installation

* ioutil.ReadFile is deprecated: As of Go 1.16, this function simply calls os.ReadFile

* Switch to package slices from the golang 1.21 default library

* Read file line by line

* Read file line by line

* Read file line by line

* Use CamelCase

* Use CamelCase

* Fix function getNumaDomain, it always returned 0

* Avoid type conversion by using Atoi
Avoid copying structs by using pointer access
Increase readability with CamelCase variable names

* Add caching

* Cache CpuData

* Cleanup

* Use init function to initalize cache structure to avoid multi threading problems

* Reuse information from /proc/cpuinfo

* Avoid slice cloning. Directly use the cache

* Add DieList

* Add NumaDomainList and SMTList

* Cleanup

* Add comment

* Lookup core ID from /sys/devices/system/cpu, /proc/cpuinfo is not portable

* Lookup all information from /sys/devices/system/cpu, /proc/cpuinfo is not portable

* Correctly handle lists from /sys

* Add Simultaneous Multithreading siblings

* Replace deprecated thread_siblings_list by core_cpus_list

* Reduce number of required slices

* Allow to send total values per core, socket and node

* Send all metrics with same time stamp
calcEventsetMetrics does only computiation, counter measurement is done before

* Input parameters should be float64 when evaluating to float64

* Send all metrics with same time stamp
calcGlobalMetrics does only computiation, counter measurement is done before

* Remove unused variable gmresults

* Add comments

* Updated go packages

* Add build with golang 1.21

* Switch to checkout action version 4

* Switch to setup-go action version 4

* Add workflow_dispatch to allow manual run of workflow

* Add workflow_dispatch to allow manual run of workflow

* Add release build jobs to runonce.yml

* Switch to golang 1.20 for RHEL based distributions

* Use dnf to download golang

* Remove golang versions before 1.20

* Upgrade Ubuntu focal -> jammy

* Pipe golang tar package directly to tar

* Update golang version

* Fix Ubuntu version number

* Add links to ipmi and redfish receivers

* Fix http server addr format

* github.com/influxdata/line-protocol -> github.com/influxdata/line-protocol/v2/lineprotocol

* Corrected spelling

* Add some comments

* github.com/influxdata/line-protocol -> github.com/influxdata/line-protocol/v2/lineprotocol

* Allow other fields not only field "value"

* Add some basic debugging documentation

* Add some basic debugging documentation

* Use a lock for the flush timer

* Add tags in lexical order as required by AddTag()

* Only access meta data, when it gets used as tag

* Use slice to store lexialicly orderd key value pairs

* Increase golang version requirement to 1.20.

* Avoid package cmp to allow builds with golang v1.20

* Fix: Error NVML library not found did crash
cc-metric-collector with "SIGSEGV: segmentation violation"

* Add config option idle_timeout

* Add basic authentication support

* Add basic authentication support

* Avoid unneccessary memory allocations

* Add documentation for send_*_total values

* Use generic package maps to clone maps

* Reuse flush timer

* Add Influx client options

* Reuse ccTopology functionality

* Do not store unused topology information

* Add batch_size config

* Cleanup

* Use stype and stype-id for the NIC in NetstatCollector

* Wait for concurrent flush operations to finish

* Be more verbose in error messages

* Reverted previous changes.
Made the code to complex without much advantages

* Use line protocol encoder

* Go pkg update

* Stop flush timer, when immediatelly flushing

* Fix: Corrected unlock access to batch slice

* Add config option to specify whether to use GZip compression in influx write requests

* Add asynchron send of encoder metrics

* Use DefaultServeMux instead of github.com/gorilla/mux

* Add config option for HTTP keep-alives

* Be more strict, when parsing json

* Add config option for HTTP request timeout and Retry interval

* Allow more then one background send operation

* Fix %sysusers_create_package args ()

%sysusers_create_package requires two arguments. See: https://github.com/systemd/systemd/blob/main/src/rpm/macros.systemd.in#L165

* Add nfsiostat to list of collectors

---------

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <holgerob@gmx.de>
Co-authored-by: Obihörnchen <obihoernchende@gmail.com>
2023-12-04 12:21:26 +01:00
Thomas Gruber
195d0794b0
Merge develop branch into main ()
* Add cpu_used (all-cpu_idle) to CpustatCollector

* Update to line-protocol/v2

* Update runonce.yml with Golang 1.20

* Update fsnotify in LIKWID Collector

* Use not a pointer to line-protocol.Encoder

* Simplify Makefile

* Use only as many arguments as required

* Allow sum function to handle non float types

* Allow values to be a slice of type float64, float32, int, int64, int32, bool

* Use generic function to simplify code

* Add missing case for type []int32

* Use generic function to compute minimum

* Use generic function to compute maximum

* Use generic function to compute average

* Add error value to sumAnyType

* Use generic function to compute median

* For older versions of go slices is not part of the installation

* Remove old entries from go.sum

* Use simpler sort function

* Compute metrics ib_total and ib_total_pkts

* Add aggregated metrics.
Add missing units

* Update likwidMetric.go

Fixes a potential bug when `fsnotify.NewWatcher()` fails with an error

* Completly avoid memory allocations in infinibandMetric read()

* Fixed initialization: Initalization and measurements should run in the same thread

---------

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2023-08-29 14:12:49 +02:00
Thomas Gruber
4bd71224df
move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths () 2022-10-10 11:53:11 +02:00
Thomas Roehl
32bb9c5fc0 Update ccMetric README and FromMetric copy 2022-07-27 18:06:41 +02:00
Thomas Roehl
7ddc889f06 MetricRouter: Fix JSON in README 2022-05-20 16:06:54 +02:00
Thomas Gruber
826f364772
CC topology module update ()
* Rename CPU to hardware thread, write some comments

* Do renaming in other parts

* Remove CpuList and SocketList function from metricCollector. Available in ccTopology
2022-05-13 14:28:07 +02:00
Thomas Gruber
1db5f3b29a
Rename cpu type to hwthread ()
* Rename 'cpu' type to 'hwthread' to avoid naming clashes with MetricStore and CC-Webfrontend
2022-05-13 14:09:45 +02:00
Thomas Gruber
80d92d6d28
Units with cc-units ()
* Add option to normalize units with cc-unit

* Add unit conversion to router

* Add option to change unit prefix in the router

* Add to MetricRouter README

* Add order of operations in router to README

* Use second add_tags/del_tags only if metric gets renamed
2022-05-13 13:30:02 +02:00
Thomas Roehl
ecdb4c1bcf Add debug message when updating interval_timestep 2022-04-04 02:55:44 +02:00
Thomas Roehl
4d5b1adbc8 Fix for interval_timestamp option 2022-04-04 02:26:04 +02:00
Thomas Roehl
622e94ae0e Fix DieList() if system does not support dies. Explicitly set entries in CpuData list 2022-03-22 15:58:10 +01:00
Thomas Roehl
c506114480 Add processing order to MetricRouter README and add missing options 2022-03-18 12:29:00 +01:00
Thomas Roehl
657543dded Ensure max_forward is at least 1 2022-03-18 12:28:52 +01:00
Thomas Gruber
57629a2e0a
Meta to tags list and map for sinks ()
* Change ccMetric->Influx functions

* Use a meta_as_tags string list in config but create a lookup map afterwards

* Add meta as tag logic to sampleSink
2022-03-15 16:16:26 +01:00
Thomas Roehl
3cf2f69a07 Make maxForward configurable, save old name in meta in rename metrics and make the hostname tag key configurable 2022-03-09 11:23:17 +01:00
Holger Obermaier
154b56000e Allow concurrent access to condition map 2022-02-16 14:30:11 +01:00
Holger Obermaier
b44e226496 Add caching for condition evaluation 2022-02-16 13:59:32 +01:00
Holger Obermaier
248c815a1c Make created language persistant 2022-02-16 08:40:57 +01:00
Holger Obermaier
d6154ff35b Simplification 2022-02-15 13:51:46 +01:00
Holger Obermaier
2f0e229936 Simplify EvalBoolCondition 2022-02-15 13:35:24 +01:00
Holger Obermaier
a8821b7ac5 Add comments 2022-02-15 12:39:54 +01:00
Holger Obermaier
2031f35d9b Cleanup 2022-02-15 11:36:17 +01:00
Holger Obermaier
69b31e87e7 Fix: Start cache manager only when NumCacheIntervals > 0 2022-02-15 11:27:42 +01:00
Thomas Roehl
247fb23de1 Try to operate on multiple metrics if channels if filled 2022-02-14 18:12:50 +01:00
Thomas Roehl
184d60cc58 Locking in MetricCache 2022-02-10 15:21:26 +01:00
Holger Obermaier
c47ac2ebc3 Cleanup 2022-02-08 12:22:56 +01:00
Holger Obermaier
6d55c376bd Refactoring: Remove all *List() functions from CCMetric 2022-02-08 11:23:19 +01:00
Holger Obermaier
e1a7379c2e Generate influxDB point for data type ccMetric 2022-02-08 09:31:08 +01:00
Holger Obermaier
af051b5e7e Replace FieldList() by Fields() 2022-02-07 22:52:39 +01:00
Holger Obermaier
fe42e8bb95 Switch fields data type from []*lp.Field to map[string]interface{} 2022-02-07 22:41:31 +01:00
Holger Obermaier
627163d4df Add method ToLineProtocol which generates influxDB line protocol for data type ccMetric 2022-02-07 21:14:23 +01:00
Thomas Roehl
6dd95d6fed Export all ccMetric functions 2022-02-07 16:20:42 +01:00
Thomas Roehl
25bb395f02 Fix for NumaDomain getter in ccTopology 2022-02-07 13:22:26 +01:00
Thomas Gruber
fdb58b0be2
Sink specific configuration maps ()
* Use sink-specific configurations to have more flexibility. Adjust sample sink configuration files

* Add documentation

* Add links to individual sink readmes

* Fix link in README

* HTTPS for HttpSink

* If no CPU die id available, use the socket id instead
2022-02-04 18:12:24 +01:00
Thomas Gruber
92d4a9c2b9
Split MetricRouter and MetricAggregator ()
* Split MetricRouter and MetricAggregator

* Missing change in MetricCache

* Add README for MetricAggregator
2022-02-03 16:52:55 +01:00
Thomas Roehl
1222f7a32f Configuration option to disable MetricCache completely 2022-02-02 15:30:14 +01:00
Thomas Roehl
2c13cecf13 Fix link in MetricRouter README 2022-02-02 14:52:19 +01:00
Thomas Roehl
af8654d325 Update MetricRouter README 2022-02-01 18:28:20 +01:00
Thomas Roehl
a4bd141786 Use the MetricAggregator for all calculations in the MetricRouter 2022-02-01 18:27:59 +01:00
Thomas Roehl
64a12b80bb Add and export SetName() function for CCMetric 2022-02-01 18:27:16 +01:00
Thomas Roehl
8319d3de43 Add some more helper functions to ccTopology 2022-02-01 18:26:54 +01:00
Thomas Gruber
6ff6cb7219
Change CCMetric's internal data structure ()
* package ccmetric rewrite

* Create deep copy in New() to avoid access conflicts

* Renamed TagMap() -> Tags(), MetaMap() -> Meta

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2022-02-01 14:54:34 +01:00
Holger Obermaier
fd3c7ed573 Add documentation 2022-01-31 14:02:00 +01:00
Thomas Roehl
df31df149b Fix for missing math.MaxInt in go 1.16 2022-01-30 15:16:46 +01:00
Thomas Gruber
cf810b1c0c
Add Cache and Aggregator to MetricRouter ()
* Add Cache and Aggregator to MetricRouter

* Close done channel in MetricCache
2022-01-30 15:03:21 +01:00
Thomas Gruber
11844d9d5d
Add common topology module for MetricCollectors and MetricRouter () 2022-01-30 14:59:26 +01:00
Thomas Roehl
e4a2927b96 Merge branch 'develop' of github.com:ClusterCockpit/cc-metric-collector into develop 2022-01-30 14:31:19 +01:00
Thomas Roehl
d3f5611541 Add functions to get the fields of a CCMetric and export some more CCMetric functions 2022-01-30 14:30:06 +01:00
Thomas Roehl
4541e50bea Minor fixes in ccLogger 2022-01-30 14:29:25 +01:00