Commit Graph

236 Commits

Author SHA1 Message Date
Holger Obermaier 7bb80780e0 Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30) 2022-11-16 14:58:11 +01:00
Holger Obermaier e66d52bb32 * Corrected json config in numastatsMetric.md
* Added some debug output to numastatsMetric.go
2022-11-16 14:10:25 +01:00
Holger Obermaier 9840d0193d Do not mess up with the orignal configuration 2022-11-16 09:37:40 +01:00
Holger Obermaier ce7eef8d30 Add running average power limit (RAPL) metric collector 2022-11-15 17:15:27 +01:00
Holger Obermaier 92e45ca62c Add running average power limit (RAPL) metric collector 2022-11-15 17:09:26 +01:00
Holger Obermaier fd10a279fc Corrected some typos 2022-11-14 09:35:02 +01:00
Holger Obermaier 9e63d0ea59 Run ipmitool asynchron. Improved error handling. 2022-11-11 16:16:14 +01:00
Holger Obermaier deb1bcfa2f Correct type: /proc/stats -> /proc/stat 2022-10-13 15:01:39 +02:00
Holger Obermaier 7a67d5e25f Check if at least one CPU with frequency information was detected 2022-10-13 14:53:55 +02:00
Thomas Gruber 9ae0806aa9 Add collector for monitoring the execution of cc-metric-collector itself (#81)
* Add collector to monitor execution of cc-metric-collector itself

* Register SelfCollector

* Fix import paths for moved packages
2022-10-10 12:18:52 +02:00
Thomas Gruber 4bd71224df move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88) 2022-10-10 11:53:11 +02:00
Thomas Roehl 6bf3bfd10a Use lower case for error strings in RocmSmiCollector 2022-10-09 17:05:49 +02:00
Thomas Gruber 0fbff00996 Replace ioutils with os and io (#87) 2022-10-09 17:03:38 +02:00
Thomas Roehl 8849824ba9 Remove useless prints from MemstatCollector 2022-10-09 02:56:15 +02:00
Thomas Roehl ed511b7c09 Fix memstat collector with numa_stats option 2022-09-28 15:09:36 +02:00
Thomas Roehl 58461f1f72 Fix clock frequency coming from LikwidCollector and update docs 2022-09-09 20:01:21 +02:00
Thomas Röhl c09d8fb118 InfiniBandCollector: Scale raw readings from octets to bytes 2022-09-09 19:27:20 +02:00
oscarminus 8a3446a596 cpustatMetric.go: Use derived values instead of absolute values (#83)
* cpustatMetric.go: Use derived values instead of absolute values

  The values in /proc/stat are absolute counters related to the boot
  time of the system. To obtain a utilization of the CPU, the changes
  in the counters must be derived according to time. To take only the
  absolute values leads to the fact that changes in the utilization,
  straight with larger values, do not become visible.

* Add new collector for /proc/schedstat

  The `schedstat` collector reads data from /proc/schedstat and calculates
  a load value, separated by hwthread. This might be useful to detect bad
  cpu pinning on shared nodes etc.

Co-authored-by: Michael Schwarz <post@michael-schwarz.name>
2022-09-07 14:13:06 +02:00
Thomas Roehl ea33d45d8e Fix link to docs of NumastatsCollector 2022-07-27 17:50:15 +02:00
Thomas Roehl b2bc7b95d3 Change unit of CpufreqCollector to Hz. That's what the sysfs outputs 2022-07-12 11:58:37 +02:00
Thomas Roehl addbfd40a1 Fix for NvidiaCollector when devices are not in MiG mode 2022-07-11 13:05:15 +02:00
Thomas Roehl bef807dd44 Fix serial number in rocmCollector 2022-06-05 15:53:39 +02:00
Thomas Roehl 659d0115c0 Use http instead of ftp to download likwid 2022-06-05 15:50:04 +02:00
Thomas Gruber e13695307f AMD ROCm SMI collector (#77)
* Add collector for AMD ROCm SMI metrics

* Fix import path

* Fix imports

* Remove Board Number

* store GPU index explicitly

* Remove board number from description
2022-05-25 15:55:43 +02:00
Thomas Roehl ad5dbd85ea Minor updates for Makefiles 2022-05-25 15:45:21 +02:00
Thomas Roehl 500685672b Option to use MIG slice name as subtype-id in NvidiaCollector 2022-05-13 15:26:47 +02:00
Thomas Roehl d4c89a4206 Option to use MIG UUID as subtype-id in NvidiaCollector 2022-05-13 14:34:32 +02:00
Thomas Gruber 826f364772 CC topology module update (#76)
* Rename CPU to hardware thread, write some comments

* Do renaming in other parts

* Remove CpuList and SocketList function from metricCollector. Available in ccTopology
2022-05-13 14:28:07 +02:00
Thomas Gruber 5df550b208 Update NvidiaCollector with new metrics, MIG and NvLink support (#75) 2022-05-13 14:11:55 +02:00
Thomas Gruber 5c34805918 Collectors in parallel (#74)
* Provide info to CollectorManager whether the collector can be executed in parallel with others

* Split serial and parallel collectors. Read in parallel first
2022-05-13 14:10:39 +02:00
Thomas Gruber 1db5f3b29a Rename cpu type to hwthread (#69)
* Rename 'cpu' type to 'hwthread' to avoid naming clashes with MetricStore and CC-Webfrontend
2022-05-13 14:09:45 +02:00
Thomas Roehl 9886f14d14 Check readability of sensor files in TempCollector 2022-05-13 13:32:54 +02:00
Thomas Roehl 857903be2b Skip disks in DiskstatCollector that have size=0 2022-05-13 13:31:22 +02:00
Thomas Roehl 8068e59818 Update handling of LIKWID headers. Download only if not already present in the system. Fixes #73 2022-05-13 13:14:47 +02:00
Thomas Roehl 38d4e0a730 Merge branch 'develop' of github.com:ClusterCockpit/cc-metric-collector into develop 2022-05-04 11:54:55 +02:00
Thomas Roehl 54d14519ca Skip mount points in DiskstatCollector if statfs() call does not work (bind mounts, ...) 2022-05-04 11:54:34 +02:00
Holger Obermaier fb6f6a4daa Fix GPFS collector last state handling 2022-05-02 16:57:19 +02:00
Thomas Roehl 017cd58247 Updating page for LikwidCollector 2022-04-05 10:57:09 +02:00
Thomas Roehl 7b098e0b1b Fix for missing metrics in LikwidCollector is hwthread is inactive 2022-04-04 15:16:11 +02:00
Thomas Roehl 5d25a7bf12 Add units to InfiniBandCollector 2022-04-01 17:14:26 +02:00
Thomas Roehl 83b4343310 Likwid receives signal at first Read, check when re-initializing 2022-04-01 17:10:31 +02:00
Thomas Gruber 2a014b6fba Read unit of values from /proc/meminfo (#68) 2022-03-31 11:56:31 +02:00
Thomas Roehl 50479f9325 Move all LIKWID related stuff to late initialization routine 2022-03-24 18:12:23 +01:00
Thomas Roehl e0e91844bc Use late initialization of LIKWID and catch access daemon death. Fixes #70 and fixes #71. 2022-03-24 17:56:51 +01:00
Thomas Roehl 296225f3a8 Always export all metrics in NfsCollectors 2022-03-24 13:50:35 +01:00
Thomas Roehl b66fdd1436 Add missing socket->thread_id map for LikwidCollector 2022-03-16 19:04:39 +01:00
Thomas Gruber c182d295f4 Fix staticcheck warnings (#66) 2022-03-15 16:38:20 +01:00
Thomas Gruber aa1afd745e Derived metrics (#65)
* Add time-based derivatived (e.g. bandwidth) to some collectors

* Add documentation

* Add comments

* Fix: Only compute rates with a valid previous state

* Only compute rates with a valid previous state

* Define const values for net/dev fields

* Set default config values

* Add comments

* Refactor: Consolidate data structures

* Refactor: Consolidate data structures

* Refactor: Avoid struct deep copy

* Refactor: Avoid redundant tag maps

* Refactor: Use int64 type for absolut values

* Update LustreCollector

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2022-03-15 16:09:47 +01:00
Holger Obermaier 992b19d354 Move unit tag to meta data tags 2022-03-11 14:47:18 +01:00
Holger Obermaier 0b08ca9ae0 Simplified iota usage 2022-03-11 14:09:22 +01:00