* InfiniBandCollector: Scale raw readings from octets to bytes
* Fix clock frequency coming from LikwidCollector and update docs
* Build DEB package for Ubuntu 20.04 for releases
* Fix memstat collector with numa_stats option
* Remove useless prints from MemstatCollector
* Replace ioutils with os and io (#87)
* Use lower case for error strings in RocmSmiCollector
* move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88)
* Add collector for monitoring the execution of cc-metric-collector itself (#81)
* Add collector to monitor execution of cc-metric-collector itself
* Register SelfCollector
* Fix import paths for moved packages
* Check if at least one CPU with frequency information was detected
* Correct type: /proc/stats -> /proc/stat
* Update README.md
* Run ipmitool asynchron. Improved error handling.
* Corrected some typos
* Add running average power limit (RAPL) metric collector
* Add running average power limit (RAPL) metric collector
* Do not mess up with the orignal configuration
* * Corrected json config in numastatsMetric.md
* Added some debug output to numastatsMetric.go
* Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30)
* Fix kernel panic for receiver config with missing receiver type
* Add receiver to gather remote IPMI sensor metrics
* Added config option to add ipmi-sensors command line options
* Add documentaion for IPMI receiver
* Update to latest version of included go modules
* Add go.mod to App dependency
* Try to use common metric tags across hardware vendors
* Add IPMI metric: current
* remove prefix enumeration like 01-...
* Add IPMI receiver example configuration to receivers.json
* Minimal formating changes
* Add hostlist package
* Added tests for hostlist Expand()
* Use package hostlist to expand a host list
* Use package hostlist to expand a host list
* Some servers return "ConsumedPowerWatt":65535 instead of "ConsumedPowerWatt":null
* Updated to latest package versions
* Do not allow unknown fields in JSON configuration file
* Add workflow to customize packages to docs
* NFS I/O Stats Collector (#91)
* Initial version
* Delete values for vanished mount points and comments
* Fix for Likwid collector (#95)
* Run LIKWID in separate thread and check metric type
* Change LIKWID collector documentation to use 'type' instead of 'scope'
* Re-initialize LIKWID after one read is missing due to lock toggle
* Register cc-metric-collector at Zenodo (#93)
* Add initial version of Zenodo project file
* Orcid ID added
* Update .zenodo.json
Co-authored-by: Holger Obermaier <holger.obermaier@kit.edu>
* Update ipmiMetric.go
* Use latest LIKWID version for builds
* Update README.md
* Remove development stuff from Makefile
* Add Requires(pre) to RPM SPEC file
* Use curly brackets in packaging make targets
* Fix for LIKWID collector with separate measurement thread and inotify watcher on the LIKWID lock (#97)
* Debian does not like underscores in the version
* Update cc-metric-collector.service
Remove dependency services not used by cc-metric-collector
* Add new requirements to module file
* Use customcmd commands if they did not error. (#101)
* Merge develop and main (#99)
* InfiniBandCollector: Scale raw readings from octets to bytes
* Fix clock frequency coming from LikwidCollector and update docs
* Build DEB package for Ubuntu 20.04 for releases
* Fix memstat collector with numa_stats option
* Remove useless prints from MemstatCollector
* Replace ioutils with os and io (#87)
* Use lower case for error strings in RocmSmiCollector
* move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88)
* Add collector for monitoring the execution of cc-metric-collector itself (#81)
* Add collector to monitor execution of cc-metric-collector itself
* Register SelfCollector
* Fix import paths for moved packages
* Check if at least one CPU with frequency information was detected
* Correct type: /proc/stats -> /proc/stat
* Update README.md
* Run ipmitool asynchron. Improved error handling.
* Corrected some typos
* Add running average power limit (RAPL) metric collector
* Add running average power limit (RAPL) metric collector
* Do not mess up with the orignal configuration
* * Corrected json config in numastatsMetric.md
* Added some debug output to numastatsMetric.go
* Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30)
* Fix kernel panic for receiver config with missing receiver type
* Add receiver to gather remote IPMI sensor metrics
* Added config option to add ipmi-sensors command line options
* Add documentaion for IPMI receiver
* Update to latest version of included go modules
* Add go.mod to App dependency
* Try to use common metric tags across hardware vendors
* Add IPMI metric: current
* remove prefix enumeration like 01-...
* Add IPMI receiver example configuration to receivers.json
* Minimal formating changes
* Add hostlist package
* Added tests for hostlist Expand()
* Use package hostlist to expand a host list
* Use package hostlist to expand a host list
* Some servers return "ConsumedPowerWatt":65535 instead of "ConsumedPowerWatt":null
* Updated to latest package versions
* Do not allow unknown fields in JSON configuration file
* Add workflow to customize packages to docs
* NFS I/O Stats Collector (#91)
* Initial version
* Delete values for vanished mount points and comments
* Fix for Likwid collector (#95)
* Run LIKWID in separate thread and check metric type
* Change LIKWID collector documentation to use 'type' instead of 'scope'
* Re-initialize LIKWID after one read is missing due to lock toggle
* Register cc-metric-collector at Zenodo (#93)
* Add initial version of Zenodo project file
* Orcid ID added
* Update .zenodo.json
Co-authored-by: Holger Obermaier <holger.obermaier@kit.edu>
* Update ipmiMetric.go
* Use latest LIKWID version for builds
* Update README.md
* Remove development stuff from Makefile
* Add Requires(pre) to RPM SPEC file
* Use curly brackets in packaging make targets
* Fix for LIKWID collector with separate measurement thread and inotify watcher on the LIKWID lock (#97)
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
* Update likwid_perfgroup_to_cc_config.py
* Use customcmd commands if they did not error.
---------
Co-authored-by: Thomas Gruber <Thomas.Roehl@googlemail.com>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
---------
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
Co-authored-by: fodinabor <5982050+fodinabor@users.noreply.github.com>
* InfiniBandCollector: Scale raw readings from octets to bytes
* Fix clock frequency coming from LikwidCollector and update docs
* Build DEB package for Ubuntu 20.04 for releases
* Fix memstat collector with numa_stats option
* Remove useless prints from MemstatCollector
* Replace ioutils with os and io (#87)
* Use lower case for error strings in RocmSmiCollector
* move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88)
* Add collector for monitoring the execution of cc-metric-collector itself (#81)
* Add collector to monitor execution of cc-metric-collector itself
* Register SelfCollector
* Fix import paths for moved packages
* Check if at least one CPU with frequency information was detected
* Correct type: /proc/stats -> /proc/stat
* Update README.md
* Run ipmitool asynchron. Improved error handling.
* Corrected some typos
* Add running average power limit (RAPL) metric collector
* Add running average power limit (RAPL) metric collector
* Do not mess up with the orignal configuration
* * Corrected json config in numastatsMetric.md
* Added some debug output to numastatsMetric.go
* Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30)
* Fix kernel panic for receiver config with missing receiver type
* Add receiver to gather remote IPMI sensor metrics
* Added config option to add ipmi-sensors command line options
* Add documentaion for IPMI receiver
* Update to latest version of included go modules
* Add go.mod to App dependency
* Try to use common metric tags across hardware vendors
* Add IPMI metric: current
* remove prefix enumeration like 01-...
* Add IPMI receiver example configuration to receivers.json
* Minimal formating changes
* Add hostlist package
* Added tests for hostlist Expand()
* Use package hostlist to expand a host list
* Use package hostlist to expand a host list
* Some servers return "ConsumedPowerWatt":65535 instead of "ConsumedPowerWatt":null
* Updated to latest package versions
* Do not allow unknown fields in JSON configuration file
* Add workflow to customize packages to docs
* NFS I/O Stats Collector (#91)
* Initial version
* Delete values for vanished mount points and comments
* Fix for Likwid collector (#95)
* Run LIKWID in separate thread and check metric type
* Change LIKWID collector documentation to use 'type' instead of 'scope'
* Re-initialize LIKWID after one read is missing due to lock toggle
* Register cc-metric-collector at Zenodo (#93)
* Add initial version of Zenodo project file
* Orcid ID added
* Update .zenodo.json
Co-authored-by: Holger Obermaier <holger.obermaier@kit.edu>
* Update ipmiMetric.go
* Use latest LIKWID version for builds
* Update README.md
* Remove development stuff from Makefile
* Add Requires(pre) to RPM SPEC file
* Use curly brackets in packaging make targets
* Fix for LIKWID collector with separate measurement thread and inotify watcher on the LIKWID lock (#97)
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
* InfiniBandCollector: Scale raw readings from octets to bytes
* Fix clock frequency coming from LikwidCollector and update docs
* Build DEB package for Ubuntu 20.04 for releases
* Fix memstat collector with numa_stats option
* Remove useless prints from MemstatCollector
* Replace ioutils with os and io (#87)
* Use lower case for error strings in RocmSmiCollector
* move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88)
* Add collector for monitoring the execution of cc-metric-collector itself (#81)
* Add collector to monitor execution of cc-metric-collector itself
* Register SelfCollector
* Fix import paths for moved packages
* Check if at least one CPU with frequency information was detected
* Correct type: /proc/stats -> /proc/stat
* Update README.md
* Run ipmitool asynchron. Improved error handling.
* Corrected some typos
* Add running average power limit (RAPL) metric collector
* Add running average power limit (RAPL) metric collector
* Do not mess up with the orignal configuration
* * Corrected json config in numastatsMetric.md
* Added some debug output to numastatsMetric.go
* Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30)
* Fix kernel panic for receiver config with missing receiver type
* Add receiver to gather remote IPMI sensor metrics
* Added config option to add ipmi-sensors command line options
* Add documentaion for IPMI receiver
* Update to latest version of included go modules
* Add go.mod to App dependency
* Try to use common metric tags across hardware vendors
* Add IPMI metric: current
* remove prefix enumeration like 01-...
* Add IPMI receiver example configuration to receivers.json
* Minimal formating changes
* Add hostlist package
* Added tests for hostlist Expand()
* Use package hostlist to expand a host list
* Use package hostlist to expand a host list
* Some servers return "ConsumedPowerWatt":65535 instead of "ConsumedPowerWatt":null
* Updated to latest package versions
* Do not allow unknown fields in JSON configuration file
* Add workflow to customize packages to docs
* NFS I/O Stats Collector (#91)
* Initial version
* Delete values for vanished mount points and comments
* Fix for Likwid collector (#95)
* Run LIKWID in separate thread and check metric type
* Change LIKWID collector documentation to use 'type' instead of 'scope'
* Re-initialize LIKWID after one read is missing due to lock toggle
* Register cc-metric-collector at Zenodo (#93)
* Add initial version of Zenodo project file
* Orcid ID added
* Update .zenodo.json
Co-authored-by: Holger Obermaier <holger.obermaier@kit.edu>
* Update ipmiMetric.go
* Use latest LIKWID version for builds
* Update README.md
* Remove development stuff from Makefile
* Add Requires(pre) to RPM SPEC file
* Use curly brackets in packaging make targets
* Fix for LIKWID collector with separate measurement thread and inotify watcher on the LIKWID lock (#97)
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
* InfiniBandCollector: Scale raw readings from octets to bytes
* Fix clock frequency coming from LikwidCollector and update docs
* Build DEB package for Ubuntu 20.04 for releases
* Fix memstat collector with numa_stats option
* Remove useless prints from MemstatCollector
* Replace ioutils with os and io (#87)
* Use lower case for error strings in RocmSmiCollector
* move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88)
* Add collector for monitoring the execution of cc-metric-collector itself (#81)
* Add collector to monitor execution of cc-metric-collector itself
* Register SelfCollector
* Fix import paths for moved packages
* Check if at least one CPU with frequency information was detected
* Correct type: /proc/stats -> /proc/stat
* Update README.md
* Run ipmitool asynchron. Improved error handling.
* Corrected some typos
* Add running average power limit (RAPL) metric collector
* Add running average power limit (RAPL) metric collector
* Do not mess up with the orignal configuration
* * Corrected json config in numastatsMetric.md
* Added some debug output to numastatsMetric.go
* Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30)
* Fix kernel panic for receiver config with missing receiver type
* Add receiver to gather remote IPMI sensor metrics
* Added config option to add ipmi-sensors command line options
* Add documentaion for IPMI receiver
* Update to latest version of included go modules
* Add go.mod to App dependency
* Try to use common metric tags across hardware vendors
* Add IPMI metric: current
* remove prefix enumeration like 01-...
* Add IPMI receiver example configuration to receivers.json
* Minimal formating changes
* Add hostlist package
* Added tests for hostlist Expand()
* Use package hostlist to expand a host list
* Use package hostlist to expand a host list
* Some servers return "ConsumedPowerWatt":65535 instead of "ConsumedPowerWatt":null
* Updated to latest package versions
* Do not allow unknown fields in JSON configuration file
* Add workflow to customize packages to docs
* NFS I/O Stats Collector (#91)
* Initial version
* Delete values for vanished mount points and comments
* Fix for Likwid collector (#95)
* Run LIKWID in separate thread and check metric type
* Change LIKWID collector documentation to use 'type' instead of 'scope'
* Re-initialize LIKWID after one read is missing due to lock toggle
* Register cc-metric-collector at Zenodo (#93)
* Add initial version of Zenodo project file
* Orcid ID added
* Update .zenodo.json
Co-authored-by: Holger Obermaier <holger.obermaier@kit.edu>
* Update ipmiMetric.go
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
* InfiniBandCollector: Scale raw readings from octets to bytes
* Fix clock frequency coming from LikwidCollector and update docs
* Build DEB package for Ubuntu 20.04 for releases
* Fix memstat collector with numa_stats option
* Remove useless prints from MemstatCollector
* Replace ioutils with os and io (#87)
* Use lower case for error strings in RocmSmiCollector
* move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88)
* Add collector for monitoring the execution of cc-metric-collector itself (#81)
* Add collector to monitor execution of cc-metric-collector itself
* Register SelfCollector
* Fix import paths for moved packages
* cpustatMetric.go: Use derived values instead of absolute values
The values in /proc/stat are absolute counters related to the boot
time of the system. To obtain a utilization of the CPU, the changes
in the counters must be derived according to time. To take only the
absolute values leads to the fact that changes in the utilization,
straight with larger values, do not become visible.
* Add new collector for /proc/schedstat
The `schedstat` collector reads data from /proc/schedstat and calculates
a load value, separated by hwthread. This might be useful to detect bad
cpu pinning on shared nodes etc.
Co-authored-by: Michael Schwarz <post@michael-schwarz.name>
* cpustatMetric.go: Use derived values instead of absolute values
The values in /proc/stat are absolute counters related to the boot
time of the system. To obtain a utilization of the CPU, the changes
in the counters must be derived according to time. To take only the
absolute values leads to the fact that changes in the utilization,
straight with larger values, do not become visible.
* Add new collector for /proc/schedstat
The `schedstat` collector reads data from /proc/schedstat and calculates
a load value, separated by hwthread. This might be useful to detect bad
cpu pinning on shared nodes etc.
Co-authored-by: Michael Schwarz <post@michael-schwarz.name>
* Cleanup: Remove unused code
* Use Golang duration parser for 'interval' and 'duration'
in main config
* Update handling of LIKWID headers. Download only if not already present in the system. Fixes#73
* Units with cc-units (#64)
* Add option to normalize units with cc-unit
* Add unit conversion to router
* Add option to change unit prefix in the router
* Add to MetricRouter README
* Add order of operations in router to README
* Use second add_tags/del_tags only if metric gets renamed
* Skip disks in DiskstatCollector that have size=0
* Check readability of sensor files in TempCollector
* Fix for --once option
* Rename `cpu` type to `hwthread` (#69)
* Rename 'cpu' type to 'hwthread' to avoid naming clashes with MetricStore and CC-Webfrontend
* Collectors in parallel (#74)
* Provide info to CollectorManager whether the collector can be executed in parallel with others
* Split serial and parallel collectors. Read in parallel first
* Update NvidiaCollector with new metrics, MIG and NvLink support (#75)
* CC topology module update (#76)
* Rename CPU to hardware thread, write some comments
* Do renaming in other parts
* Remove CpuList and SocketList function from metricCollector. Available in ccTopology
* Option to use MIG UUID as subtype-id in NvidiaCollector
* Option to use MIG slice name as subtype-id in NvidiaCollector
* MetricRouter: Fix JSON in README
* Fix for Github Action to really use the selected version
* Remove Ganglia installation in runonce Action and add Go 1.18
* Fix daemon options in init script
* Add separate go.mod files to use it with deprecated 1.16
* Minor updates for Makefiles
* fix string comparison
* AMD ROCm SMI collector (#77)
* Add collector for AMD ROCm SMI metrics
* Fix import path
* Fix imports
* Remove Board Number
* store GPU index explicitly
* Remove board number from description
* Use http instead of ftp to download likwid
* Fix serial number in rocmCollector
* Improved http sink (#78)
* automatic flush in NatsSink
* tweak default options of HttpSink
* shorter cirt. section and retries for HttpSink
* fix error handling
* Remove file added by mistake.
* Use http instead of ftp to download likwid
* Fix serial number in rocmCollector
Co-authored-by: Thomas Roehl <thomas.roehl@fau.de>
* Fix: When sending metrics failed the batch size could be exceeded
* Improved dropping of metrics failed to send
* Add memstats and topprocs metric
* Updated to latest modules
* Check that at least one sink is running
* Add drop rate, when send buffer is full
* Allow only one timer at a time
* Use mutex to ensure only on flush timer is running
* Fix for NvidiaCollector when devices are not in MiG mode
* Remove Golang version 1.16 an 1.17 from Action. Latest commits require Golang 1.18
* Use Golang 1.18 in Release action to build RPMs
* Change unit of CpufreqCollector to Hz. That's what the sysfs outputs
* Make wget quiet in Release action to reduce log size
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Lou <lou.knauer@gmx.de>