* Add cpu_used (all-cpu_idle) to CpustatCollector
* Update cc-metric-collector.init
* Allow selection of timestamp precision in HttpSink
* Add comment about precision requirement for cc-metric-store
* Fix for API changes in gofish@v0.15.0
* Update requirements to latest version
* Read sensors through redfish
* Update golang toolchain to 1.21
* Remove stray error check
* Update main config in configuration.md
* Update Release action to use golang 1.22 stable release, no golang RPMs anymore
* Update runonce action to use golang 1.22 stable release, no golang RPMs anymore
* Update README.md
Use right JSON type in configuration
* Update sink's README
* Test whether ipmitool or ipmi-sensors can be executed without errors
* Little fixes to the prometheus sink (#115)
* Add uint64 to float64 cast option
* Add prometheus sink to the list of available sinks
* Add aggregated counters by gpu for nvlink errors
---------
Co-authored-by: Michael Schwarz <schwarz@uni-paderborn.de>
* Ccmessage migration (#119)
* Add cpu_used (all-cpu_idle) to CpustatCollector
* Update cc-metric-collector.init
* Allow selection of timestamp precision in HttpSink
* Add comment about precision requirement for cc-metric-store
* Fix for API changes in gofish@v0.15.0
* Update requirements to latest version
* Read sensors through redfish
* Update golang toolchain to 1.21
* Remove stray error check
* Update main config in configuration.md
* Update Release action to use golang 1.22 stable release, no golang RPMs anymore
* Update runonce action to use golang 1.22 stable release, no golang RPMs anymore
* Switch to CCMessage for all files.
---------
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
* Switch to ccmessage also for latest additions in nvidiaMetric
* New Message processor (#118)
* Add cpu_used (all-cpu_idle) to CpustatCollector
* Update cc-metric-collector.init
* Allow selection of timestamp precision in HttpSink
* Add comment about precision requirement for cc-metric-store
* Fix for API changes in gofish@v0.15.0
* Update requirements to latest version
* Read sensors through redfish
* Update golang toolchain to 1.21
* Remove stray error check
* Update main config in configuration.md
* Update Release action to use golang 1.22 stable release, no golang RPMs anymore
* Update runonce action to use golang 1.22 stable release, no golang RPMs anymore
* New message processor to check whether a message should be dropped or manipulate it in flight
* Create a copy of message before manipulation
---------
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
* Update collector's Makefile and go.mod/sum files
* Use message processor in router, all sinks and all receivers
* Add support for credential file (NKEY) to NATS sink and receiver
* Fix JSON keys in message processor configuration
* Update docs for message processor, router and the default router config file
* Add link to expr syntax and fix regex matching docs
* Update sample collectors
* Minor style change in collector manager
* Some helpers for ccTopology
* LIKWID collector: write log owner change only once
* Fix for metrics without units and reduce debugging messages for messageProcessor
* Use shorted hostname for hostname added by router
* Define default port for NATS
* CPUstat collector: only add unit for applicable metrics
* Add precision option to all sinks using Influx's encoder
* Add message processor to all sink documentation
* Add units to documentation of cpustat collector
---------
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: oscarminus <me@oscarminus.de>
Co-authored-by: Michael Schwarz <schwarz@uni-paderborn.de>
* Add cpu_used (all-cpu_idle) to CpustatCollector
* Update cc-metric-collector.init
* Allow selection of timestamp precision in HttpSink
* Add comment about precision requirement for cc-metric-store
* Fix for API changes in gofish@v0.15.0
* Update requirements to latest version
* Read sensors through redfish
* Update golang toolchain to 1.21
* Remove stray error check
* Update main config in configuration.md
* Update Release action to use golang 1.22 stable release, no golang RPMs anymore
* Update runonce action to use golang 1.22 stable release, no golang RPMs anymore
* Update README.md
Use right JSON type in configuration
* Update sink's README
* Test whether ipmitool or ipmi-sensors can be executed without errors
---------
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
* Add cpu_used (all-cpu_idle) to CpustatCollector
* Update to line-protocol/v2
* Update runonce.yml with Golang 1.20
* Update fsnotify in LIKWID Collector
* Use not a pointer to line-protocol.Encoder
* Simplify Makefile
* Use only as many arguments as required
* Allow sum function to handle non float types
* Allow values to be a slice of type float64, float32, int, int64, int32, bool
* Use generic function to simplify code
* Add missing case for type []int32
* Use generic function to compute minimum
* Use generic function to compute maximum
* Use generic function to compute average
* Add error value to sumAnyType
* Use generic function to compute median
* For older versions of go slices is not part of the installation
* Remove old entries from go.sum
* Use simpler sort function
* Compute metrics ib_total and ib_total_pkts
* Add aggregated metrics.
Add missing units
* Update likwidMetric.go
Fixes a potential bug when `fsnotify.NewWatcher()` fails with an error
* Completly avoid memory allocations in infinibandMetric read()
* Fixed initialization: Initalization and measurements should run in the same thread
* Add safe.directory to Release action
* Fix path after installation to /usr/bin after installation
* ioutil.ReadFile is deprecated: As of Go 1.16, this function simply calls os.ReadFile
* Switch to package slices from the golang 1.21 default library
* Read file line by line
* Read file line by line
* Read file line by line
* Use CamelCase
* Use CamelCase
* Fix function getNumaDomain, it always returned 0
* Avoid type conversion by using Atoi
Avoid copying structs by using pointer access
Increase readability with CamelCase variable names
* Add caching
* Cache CpuData
* Cleanup
* Use init function to initalize cache structure to avoid multi threading problems
* Reuse information from /proc/cpuinfo
* Avoid slice cloning. Directly use the cache
* Add DieList
* Add NumaDomainList and SMTList
* Cleanup
* Add comment
* Lookup core ID from /sys/devices/system/cpu, /proc/cpuinfo is not portable
* Lookup all information from /sys/devices/system/cpu, /proc/cpuinfo is not portable
* Correctly handle lists from /sys
* Add Simultaneous Multithreading siblings
* Replace deprecated thread_siblings_list by core_cpus_list
* Reduce number of required slices
* Allow to send total values per core, socket and node
* Send all metrics with same time stamp
calcEventsetMetrics does only computiation, counter measurement is done before
* Input parameters should be float64 when evaluating to float64
* Send all metrics with same time stamp
calcGlobalMetrics does only computiation, counter measurement is done before
* Remove unused variable gmresults
* Add comments
* Updated go packages
* Add build with golang 1.21
* Switch to checkout action version 4
* Switch to setup-go action version 4
* Add workflow_dispatch to allow manual run of workflow
* Add workflow_dispatch to allow manual run of workflow
* Add release build jobs to runonce.yml
* Switch to golang 1.20 for RHEL based distributions
* Use dnf to download golang
* Remove golang versions before 1.20
* Upgrade Ubuntu focal -> jammy
* Pipe golang tar package directly to tar
* Update golang version
* Fix Ubuntu version number
* Add links to ipmi and redfish receivers
* Fix http server addr format
* github.com/influxdata/line-protocol -> github.com/influxdata/line-protocol/v2/lineprotocol
* Corrected spelling
* Add some comments
* github.com/influxdata/line-protocol -> github.com/influxdata/line-protocol/v2/lineprotocol
* Allow other fields not only field "value"
* Add some basic debugging documentation
* Add some basic debugging documentation
* Use a lock for the flush timer
* Add tags in lexical order as required by AddTag()
* Only access meta data, when it gets used as tag
* Use slice to store lexialicly orderd key value pairs
* Increase golang version requirement to 1.20.
* Avoid package cmp to allow builds with golang v1.20
* Fix: Error NVML library not found did crash
cc-metric-collector with "SIGSEGV: segmentation violation"
* Add config option idle_timeout
* Add basic authentication support
* Add basic authentication support
* Avoid unneccessary memory allocations
* Add documentation for send_*_total values
* Use generic package maps to clone maps
* Reuse flush timer
* Add Influx client options
* Reuse ccTopology functionality
* Do not store unused topology information
* Add batch_size config
* Cleanup
* Use stype and stype-id for the NIC in NetstatCollector
* Wait for concurrent flush operations to finish
* Be more verbose in error messages
* Reverted previous changes.
Made the code to complex without much advantages
* Use line protocol encoder
* Go pkg update
* Stop flush timer, when immediatelly flushing
* Fix: Corrected unlock access to batch slice
* Add config option to specify whether to use GZip compression in influx write requests
* Add asynchron send of encoder metrics
* Use DefaultServeMux instead of github.com/gorilla/mux
* Add config option for HTTP keep-alives
* Be more strict, when parsing json
* Add config option for HTTP request timeout and Retry interval
* Allow more then one background send operation
* Fix %sysusers_create_package args (#108)
%sysusers_create_package requires two arguments. See: https://github.com/systemd/systemd/blob/main/src/rpm/macros.systemd.in#L165
* Add nfsiostat to list of collectors
---------
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <holgerob@gmx.de>
Co-authored-by: Obihörnchen <obihoernchende@gmail.com>
* Add cpu_used (all-cpu_idle) to CpustatCollector
* Update to line-protocol/v2
* Update runonce.yml with Golang 1.20
* Update fsnotify in LIKWID Collector
* Use not a pointer to line-protocol.Encoder
* Simplify Makefile
* Use only as many arguments as required
* Allow sum function to handle non float types
* Allow values to be a slice of type float64, float32, int, int64, int32, bool
* Use generic function to simplify code
* Add missing case for type []int32
* Use generic function to compute minimum
* Use generic function to compute maximum
* Use generic function to compute average
* Add error value to sumAnyType
* Use generic function to compute median
* For older versions of go slices is not part of the installation
* Remove old entries from go.sum
* Use simpler sort function
* Compute metrics ib_total and ib_total_pkts
* Add aggregated metrics.
Add missing units
* Update likwidMetric.go
Fixes a potential bug when `fsnotify.NewWatcher()` fails with an error
* Completly avoid memory allocations in infinibandMetric read()
* Fixed initialization: Initalization and measurements should run in the same thread
---------
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
* Cleanup: Remove unused code
* Use Golang duration parser for 'interval' and 'duration'
in main config
* Update handling of LIKWID headers. Download only if not already present in the system. Fixes#73
* Units with cc-units (#64)
* Add option to normalize units with cc-unit
* Add unit conversion to router
* Add option to change unit prefix in the router
* Add to MetricRouter README
* Add order of operations in router to README
* Use second add_tags/del_tags only if metric gets renamed
* Skip disks in DiskstatCollector that have size=0
* Check readability of sensor files in TempCollector
* Fix for --once option
* Rename `cpu` type to `hwthread` (#69)
* Rename 'cpu' type to 'hwthread' to avoid naming clashes with MetricStore and CC-Webfrontend
* Collectors in parallel (#74)
* Provide info to CollectorManager whether the collector can be executed in parallel with others
* Split serial and parallel collectors. Read in parallel first
* Update NvidiaCollector with new metrics, MIG and NvLink support (#75)
* CC topology module update (#76)
* Rename CPU to hardware thread, write some comments
* Do renaming in other parts
* Remove CpuList and SocketList function from metricCollector. Available in ccTopology
* Option to use MIG UUID as subtype-id in NvidiaCollector
* Option to use MIG slice name as subtype-id in NvidiaCollector
* MetricRouter: Fix JSON in README
* Fix for Github Action to really use the selected version
* Remove Ganglia installation in runonce Action and add Go 1.18
* Fix daemon options in init script
* Add separate go.mod files to use it with deprecated 1.16
* Minor updates for Makefiles
* fix string comparison
* AMD ROCm SMI collector (#77)
* Add collector for AMD ROCm SMI metrics
* Fix import path
* Fix imports
* Remove Board Number
* store GPU index explicitly
* Remove board number from description
* Use http instead of ftp to download likwid
* Fix serial number in rocmCollector
* Improved http sink (#78)
* automatic flush in NatsSink
* tweak default options of HttpSink
* shorter cirt. section and retries for HttpSink
* fix error handling
* Remove file added by mistake.
* Use http instead of ftp to download likwid
* Fix serial number in rocmCollector
Co-authored-by: Thomas Roehl <thomas.roehl@fau.de>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Lou <lou.knauer@gmx.de>
* Update configuration.md
Add an additional receiver to have better alignment of components
* Change default GpfsCollector command to `mmpmon` (#53)
* Set default cmd to 'mmpmon'
* Reuse looked up path
* Cast const to string
* Just download LIKWID to get the headers (#54)
* Just download LIKWID to get the headers
* Remove perl-Data-Dumper from BuildRequires, only required by LIKWID build
* Add HttpReceiver as counterpart to the HttpSink (#49)
* Use GBytes as unit for large memory numbers
* Make maxForward configurable, save old name in meta in rename metrics and make the hostname tag key configurable
* Single release action (#55)
Building all RPMs and releasing in a single workflow
* Makefile target to build binary-only Debian packages (#61)
* Add 'install' and 'DEB' make targets to build binary-only Debian packages
* Add control file for DEB builds
* Use a single line for bash loop in make clean
* Add config options for retry intervals of InfluxDB clients (#59)
* Refactoring of LikwidCollector and metric units (#62)
* Reduce complexity of LikwidCollector and allow metric units
* Add unit to LikwidCollector docu and fix some typos
* Make library path configurable
* Use old metric name in Ganglia if rename has happened in the router (#60)
* Use old metric name if rename has happened in the router
* Also check for Ganglia renames for the oldname
* Derived metrics (#57)
* Add time-based derivatived (e.g. bandwidth) to some collectors
* Add documentation
* Add comments
* Fix: Only compute rates with a valid previous state
* Only compute rates with a valid previous state
* Define const values for net/dev fields
* Set default config values
* Add comments
* Refactor: Consolidate data structures
* Refactor: Consolidate data structures
* Refactor: Avoid struct deep copy
* Refactor: Avoid redundant tag maps
* Refactor: Use int64 type for absolut values
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
* Simplified iota usage
* Move unit tag to meta data tags
* Derived metrics (#65)
* Add time-based derivatived (e.g. bandwidth) to some collectors
* Add documentation
* Add comments
* Fix: Only compute rates with a valid previous state
* Only compute rates with a valid previous state
* Define const values for net/dev fields
* Set default config values
* Add comments
* Refactor: Consolidate data structures
* Refactor: Consolidate data structures
* Refactor: Avoid struct deep copy
* Refactor: Avoid redundant tag maps
* Refactor: Use int64 type for absolut values
* Update LustreCollector
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
* Meta to tags list and map for sinks (#63)
* Change ccMetric->Influx functions
* Use a meta_as_tags string list in config but create a lookup map afterwards
* Add meta as tag logic to sampleSink
* Fix staticcheck warnings (#66)
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
* Add sink directly using libganglia.so
* Remove unneeded confuse header
* add submodule init to build action
* add submodule init to runonce action
* add installation og ganglia to runonce
* add installation of ganglia to runonce
* add installation of ganglia to runonce
* libconfuse not required
* Remove ganglia submodule
* Remove ganglia.h
* Add Makefile to help creating the libganglia.so link
* Fix cgo header
* Rename new Ganglia sink to 'libgangliaSink'
* Add documentation for libgangliaSink
* Extend make buildsystem with find&symlink helper for libgangliaSink
* Add metric renaming function
* Add build tag 'ganglia' and create corresponding files
* Fix config for Github Actions
* Fix paths
* Add CentOS Latest and AlmaLinux 8.5 to RPM action
* Fix ID
* Reduce min Go version to 1.16 and use time.Unix in gpfsMetric