* InfiniBandCollector: Scale raw readings from octets to bytes
* Fix clock frequency coming from LikwidCollector and update docs
* Build DEB package for Ubuntu 20.04 for releases
* Fix memstat collector with numa_stats option
* Remove useless prints from MemstatCollector
* Replace ioutils with os and io (#87)
* Use lower case for error strings in RocmSmiCollector
* move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88)
* Add collector for monitoring the execution of cc-metric-collector itself (#81)
* Add collector to monitor execution of cc-metric-collector itself
* Register SelfCollector
* Fix import paths for moved packages
* Check if at least one CPU with frequency information was detected
* Correct type: /proc/stats -> /proc/stat
* Update README.md
* Run ipmitool asynchron. Improved error handling.
* Corrected some typos
* Add running average power limit (RAPL) metric collector
* Add running average power limit (RAPL) metric collector
* Do not mess up with the orignal configuration
* * Corrected json config in numastatsMetric.md
* Added some debug output to numastatsMetric.go
* Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30)
* Fix kernel panic for receiver config with missing receiver type
* Add receiver to gather remote IPMI sensor metrics
* Added config option to add ipmi-sensors command line options
* Add documentaion for IPMI receiver
* Update to latest version of included go modules
* Add go.mod to App dependency
* Try to use common metric tags across hardware vendors
* Add IPMI metric: current
* remove prefix enumeration like 01-...
* Add IPMI receiver example configuration to receivers.json
* Minimal formating changes
* Add hostlist package
* Added tests for hostlist Expand()
* Use package hostlist to expand a host list
* Use package hostlist to expand a host list
* Some servers return "ConsumedPowerWatt":65535 instead of "ConsumedPowerWatt":null
* Updated to latest package versions
* Do not allow unknown fields in JSON configuration file
* Add workflow to customize packages to docs
* NFS I/O Stats Collector (#91)
* Initial version
* Delete values for vanished mount points and comments
* Fix for Likwid collector (#95)
* Run LIKWID in separate thread and check metric type
* Change LIKWID collector documentation to use 'type' instead of 'scope'
* Re-initialize LIKWID after one read is missing due to lock toggle
* Register cc-metric-collector at Zenodo (#93)
* Add initial version of Zenodo project file
* Orcid ID added
* Update .zenodo.json
Co-authored-by: Holger Obermaier <holger.obermaier@kit.edu>
* Update ipmiMetric.go
* Use latest LIKWID version for builds
* Update README.md
* Remove development stuff from Makefile
* Add Requires(pre) to RPM SPEC file
* Use curly brackets in packaging make targets
* Fix for LIKWID collector with separate measurement thread and inotify watcher on the LIKWID lock (#97)
* Debian does not like underscores in the version
* Update cc-metric-collector.service
Remove dependency services not used by cc-metric-collector
* Add new requirements to module file
* Use customcmd commands if they did not error. (#101)
* Merge develop and main (#99)
* InfiniBandCollector: Scale raw readings from octets to bytes
* Fix clock frequency coming from LikwidCollector and update docs
* Build DEB package for Ubuntu 20.04 for releases
* Fix memstat collector with numa_stats option
* Remove useless prints from MemstatCollector
* Replace ioutils with os and io (#87)
* Use lower case for error strings in RocmSmiCollector
* move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88)
* Add collector for monitoring the execution of cc-metric-collector itself (#81)
* Add collector to monitor execution of cc-metric-collector itself
* Register SelfCollector
* Fix import paths for moved packages
* Check if at least one CPU with frequency information was detected
* Correct type: /proc/stats -> /proc/stat
* Update README.md
* Run ipmitool asynchron. Improved error handling.
* Corrected some typos
* Add running average power limit (RAPL) metric collector
* Add running average power limit (RAPL) metric collector
* Do not mess up with the orignal configuration
* * Corrected json config in numastatsMetric.md
* Added some debug output to numastatsMetric.go
* Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30)
* Fix kernel panic for receiver config with missing receiver type
* Add receiver to gather remote IPMI sensor metrics
* Added config option to add ipmi-sensors command line options
* Add documentaion for IPMI receiver
* Update to latest version of included go modules
* Add go.mod to App dependency
* Try to use common metric tags across hardware vendors
* Add IPMI metric: current
* remove prefix enumeration like 01-...
* Add IPMI receiver example configuration to receivers.json
* Minimal formating changes
* Add hostlist package
* Added tests for hostlist Expand()
* Use package hostlist to expand a host list
* Use package hostlist to expand a host list
* Some servers return "ConsumedPowerWatt":65535 instead of "ConsumedPowerWatt":null
* Updated to latest package versions
* Do not allow unknown fields in JSON configuration file
* Add workflow to customize packages to docs
* NFS I/O Stats Collector (#91)
* Initial version
* Delete values for vanished mount points and comments
* Fix for Likwid collector (#95)
* Run LIKWID in separate thread and check metric type
* Change LIKWID collector documentation to use 'type' instead of 'scope'
* Re-initialize LIKWID after one read is missing due to lock toggle
* Register cc-metric-collector at Zenodo (#93)
* Add initial version of Zenodo project file
* Orcid ID added
* Update .zenodo.json
Co-authored-by: Holger Obermaier <holger.obermaier@kit.edu>
* Update ipmiMetric.go
* Use latest LIKWID version for builds
* Update README.md
* Remove development stuff from Makefile
* Add Requires(pre) to RPM SPEC file
* Use curly brackets in packaging make targets
* Fix for LIKWID collector with separate measurement thread and inotify watcher on the LIKWID lock (#97)
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
* Update likwid_perfgroup_to_cc_config.py
* Use customcmd commands if they did not error.
---------
Co-authored-by: Thomas Gruber <Thomas.Roehl@googlemail.com>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
---------
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
Co-authored-by: fodinabor <5982050+fodinabor@users.noreply.github.com>
* InfiniBandCollector: Scale raw readings from octets to bytes
* Fix clock frequency coming from LikwidCollector and update docs
* Build DEB package for Ubuntu 20.04 for releases
* Fix memstat collector with numa_stats option
* Remove useless prints from MemstatCollector
* Replace ioutils with os and io (#87)
* Use lower case for error strings in RocmSmiCollector
* move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88)
* Add collector for monitoring the execution of cc-metric-collector itself (#81)
* Add collector to monitor execution of cc-metric-collector itself
* Register SelfCollector
* Fix import paths for moved packages
* Check if at least one CPU with frequency information was detected
* Correct type: /proc/stats -> /proc/stat
* Update README.md
* Run ipmitool asynchron. Improved error handling.
* Corrected some typos
* Add running average power limit (RAPL) metric collector
* Add running average power limit (RAPL) metric collector
* Do not mess up with the orignal configuration
* * Corrected json config in numastatsMetric.md
* Added some debug output to numastatsMetric.go
* Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30)
* Fix kernel panic for receiver config with missing receiver type
* Add receiver to gather remote IPMI sensor metrics
* Added config option to add ipmi-sensors command line options
* Add documentaion for IPMI receiver
* Update to latest version of included go modules
* Add go.mod to App dependency
* Try to use common metric tags across hardware vendors
* Add IPMI metric: current
* remove prefix enumeration like 01-...
* Add IPMI receiver example configuration to receivers.json
* Minimal formating changes
* Add hostlist package
* Added tests for hostlist Expand()
* Use package hostlist to expand a host list
* Use package hostlist to expand a host list
* Some servers return "ConsumedPowerWatt":65535 instead of "ConsumedPowerWatt":null
* Updated to latest package versions
* Do not allow unknown fields in JSON configuration file
* Add workflow to customize packages to docs
* NFS I/O Stats Collector (#91)
* Initial version
* Delete values for vanished mount points and comments
* Fix for Likwid collector (#95)
* Run LIKWID in separate thread and check metric type
* Change LIKWID collector documentation to use 'type' instead of 'scope'
* Re-initialize LIKWID after one read is missing due to lock toggle
* Register cc-metric-collector at Zenodo (#93)
* Add initial version of Zenodo project file
* Orcid ID added
* Update .zenodo.json
Co-authored-by: Holger Obermaier <holger.obermaier@kit.edu>
* Update ipmiMetric.go
* Use latest LIKWID version for builds
* Update README.md
* Remove development stuff from Makefile
* Add Requires(pre) to RPM SPEC file
* Use curly brackets in packaging make targets
* Fix for LIKWID collector with separate measurement thread and inotify watcher on the LIKWID lock (#97)
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
* InfiniBandCollector: Scale raw readings from octets to bytes
* Fix clock frequency coming from LikwidCollector and update docs
* Build DEB package for Ubuntu 20.04 for releases
* Fix memstat collector with numa_stats option
* Remove useless prints from MemstatCollector
* Replace ioutils with os and io (#87)
* Use lower case for error strings in RocmSmiCollector
* move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88)
* Add collector for monitoring the execution of cc-metric-collector itself (#81)
* Add collector to monitor execution of cc-metric-collector itself
* Register SelfCollector
* Fix import paths for moved packages
* Check if at least one CPU with frequency information was detected
* Correct type: /proc/stats -> /proc/stat
* Update README.md
* Run ipmitool asynchron. Improved error handling.
* Corrected some typos
* Add running average power limit (RAPL) metric collector
* Add running average power limit (RAPL) metric collector
* Do not mess up with the orignal configuration
* * Corrected json config in numastatsMetric.md
* Added some debug output to numastatsMetric.go
* Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30)
* Fix kernel panic for receiver config with missing receiver type
* Add receiver to gather remote IPMI sensor metrics
* Added config option to add ipmi-sensors command line options
* Add documentaion for IPMI receiver
* Update to latest version of included go modules
* Add go.mod to App dependency
* Try to use common metric tags across hardware vendors
* Add IPMI metric: current
* remove prefix enumeration like 01-...
* Add IPMI receiver example configuration to receivers.json
* Minimal formating changes
* Add hostlist package
* Added tests for hostlist Expand()
* Use package hostlist to expand a host list
* Use package hostlist to expand a host list
* Some servers return "ConsumedPowerWatt":65535 instead of "ConsumedPowerWatt":null
* Updated to latest package versions
* Do not allow unknown fields in JSON configuration file
* Add workflow to customize packages to docs
* NFS I/O Stats Collector (#91)
* Initial version
* Delete values for vanished mount points and comments
* Fix for Likwid collector (#95)
* Run LIKWID in separate thread and check metric type
* Change LIKWID collector documentation to use 'type' instead of 'scope'
* Re-initialize LIKWID after one read is missing due to lock toggle
* Register cc-metric-collector at Zenodo (#93)
* Add initial version of Zenodo project file
* Orcid ID added
* Update .zenodo.json
Co-authored-by: Holger Obermaier <holger.obermaier@kit.edu>
* Update ipmiMetric.go
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
* InfiniBandCollector: Scale raw readings from octets to bytes
* Fix clock frequency coming from LikwidCollector and update docs
* Build DEB package for Ubuntu 20.04 for releases
* Fix memstat collector with numa_stats option
* Remove useless prints from MemstatCollector
* Replace ioutils with os and io (#87)
* Use lower case for error strings in RocmSmiCollector
* move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88)
* Add collector for monitoring the execution of cc-metric-collector itself (#81)
* Add collector to monitor execution of cc-metric-collector itself
* Register SelfCollector
* Fix import paths for moved packages
* cpustatMetric.go: Use derived values instead of absolute values
The values in /proc/stat are absolute counters related to the boot
time of the system. To obtain a utilization of the CPU, the changes
in the counters must be derived according to time. To take only the
absolute values leads to the fact that changes in the utilization,
straight with larger values, do not become visible.
* Add new collector for /proc/schedstat
The `schedstat` collector reads data from /proc/schedstat and calculates
a load value, separated by hwthread. This might be useful to detect bad
cpu pinning on shared nodes etc.
Co-authored-by: Michael Schwarz <post@michael-schwarz.name>
* Add collector for AMD ROCm SMI metrics
* Fix import path
* Fix imports
* Remove Board Number
* store GPU index explicitly
* Remove board number from description
* Rename CPU to hardware thread, write some comments
* Do renaming in other parts
* Remove CpuList and SocketList function from metricCollector. Available in ccTopology
* Provide info to CollectorManager whether the collector can be executed in parallel with others
* Split serial and parallel collectors. Read in parallel first
* Add time-based derivatived (e.g. bandwidth) to some collectors
* Add documentation
* Add comments
* Fix: Only compute rates with a valid previous state
* Only compute rates with a valid previous state
* Define const values for net/dev fields
* Set default config values
* Add comments
* Refactor: Consolidate data structures
* Refactor: Consolidate data structures
* Refactor: Avoid struct deep copy
* Refactor: Avoid redundant tag maps
* Refactor: Use int64 type for absolut values
* Update LustreCollector
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
* Add time-based derivatived (e.g. bandwidth) to some collectors
* Add documentation
* Add comments
* Fix: Only compute rates with a valid previous state
* Only compute rates with a valid previous state
* Define const values for net/dev fields
* Set default config values
* Add comments
* Refactor: Consolidate data structures
* Refactor: Consolidate data structures
* Refactor: Avoid struct deep copy
* Refactor: Avoid redundant tag maps
* Refactor: Use int64 type for absolut values
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
* added beegfs collectors to collectors/README.md
* added beegfs collectors and docs
* added new beegfs collectors to AvailableCollectors list
* Feedback implemented
* changed error type
* changed error to only return
* changed beegfs lookup path
* fixed typo in md files
Co-authored-by: Mehmet Soysal <mehmet.soysal@kit.edu>