Holger Obermaier
7bb80780e0
Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30)
2022-11-16 14:58:11 +01:00
Holger Obermaier
e66d52bb32
* Corrected json config in numastatsMetric.md
...
* Added some debug output to numastatsMetric.go
2022-11-16 14:10:25 +01:00
Holger Obermaier
9840d0193d
Do not mess up with the orignal configuration
2022-11-16 09:37:40 +01:00
Holger Obermaier
ce7eef8d30
Add running average power limit (RAPL) metric collector
2022-11-15 17:15:27 +01:00
Holger Obermaier
92e45ca62c
Add running average power limit (RAPL) metric collector
2022-11-15 17:09:26 +01:00
Holger Obermaier
fd10a279fc
Corrected some typos
2022-11-14 09:35:02 +01:00
Holger Obermaier
9e63d0ea59
Run ipmitool asynchron. Improved error handling.
2022-11-11 16:16:14 +01:00
Holger Obermaier
deb1bcfa2f
Correct type: /proc/stats -> /proc/stat
2022-10-13 15:01:39 +02:00
Holger Obermaier
7a67d5e25f
Check if at least one CPU with frequency information was detected
2022-10-13 14:53:55 +02:00
Thomas Gruber
9ae0806aa9
Add collector for monitoring the execution of cc-metric-collector itself ( #81 )
...
* Add collector to monitor execution of cc-metric-collector itself
* Register SelfCollector
* Fix import paths for moved packages
2022-10-10 12:18:52 +02:00
Thomas Gruber
4bd71224df
move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths ( #88 )
2022-10-10 11:53:11 +02:00
Thomas Roehl
6bf3bfd10a
Use lower case for error strings in RocmSmiCollector
2022-10-09 17:05:49 +02:00
Thomas Gruber
0fbff00996
Replace ioutils with os and io ( #87 )
2022-10-09 17:03:38 +02:00
Thomas Roehl
8849824ba9
Remove useless prints from MemstatCollector
2022-10-09 02:56:15 +02:00
Thomas Roehl
ed511b7c09
Fix memstat collector with numa_stats option
2022-09-28 15:09:36 +02:00
Thomas Roehl
58461f1f72
Fix clock frequency coming from LikwidCollector and update docs
2022-09-09 20:01:21 +02:00
Thomas Röhl
c09d8fb118
InfiniBandCollector: Scale raw readings from octets to bytes
2022-09-09 19:27:20 +02:00
oscarminus
8a3446a596
cpustatMetric.go: Use derived values instead of absolute values ( #83 )
...
* cpustatMetric.go: Use derived values instead of absolute values
The values in /proc/stat are absolute counters related to the boot
time of the system. To obtain a utilization of the CPU, the changes
in the counters must be derived according to time. To take only the
absolute values leads to the fact that changes in the utilization,
straight with larger values, do not become visible.
* Add new collector for /proc/schedstat
The `schedstat` collector reads data from /proc/schedstat and calculates
a load value, separated by hwthread. This might be useful to detect bad
cpu pinning on shared nodes etc.
Co-authored-by: Michael Schwarz <post@michael-schwarz.name>
2022-09-07 14:13:06 +02:00
Thomas Roehl
ea33d45d8e
Fix link to docs of NumastatsCollector
2022-07-27 17:50:15 +02:00
Thomas Roehl
b2bc7b95d3
Change unit of CpufreqCollector to Hz. That's what the sysfs outputs
2022-07-12 11:58:37 +02:00
Thomas Roehl
addbfd40a1
Fix for NvidiaCollector when devices are not in MiG mode
2022-07-11 13:05:15 +02:00
Thomas Roehl
bef807dd44
Fix serial number in rocmCollector
2022-06-05 15:53:39 +02:00
Thomas Roehl
659d0115c0
Use http instead of ftp to download likwid
2022-06-05 15:50:04 +02:00
Thomas Gruber
e13695307f
AMD ROCm SMI collector ( #77 )
...
* Add collector for AMD ROCm SMI metrics
* Fix import path
* Fix imports
* Remove Board Number
* store GPU index explicitly
* Remove board number from description
2022-05-25 15:55:43 +02:00
Thomas Roehl
ad5dbd85ea
Minor updates for Makefiles
2022-05-25 15:45:21 +02:00
Thomas Roehl
500685672b
Option to use MIG slice name as subtype-id in NvidiaCollector
2022-05-13 15:26:47 +02:00
Thomas Roehl
d4c89a4206
Option to use MIG UUID as subtype-id in NvidiaCollector
2022-05-13 14:34:32 +02:00
Thomas Gruber
826f364772
CC topology module update ( #76 )
...
* Rename CPU to hardware thread, write some comments
* Do renaming in other parts
* Remove CpuList and SocketList function from metricCollector. Available in ccTopology
2022-05-13 14:28:07 +02:00
Thomas Gruber
5df550b208
Update NvidiaCollector with new metrics, MIG and NvLink support ( #75 )
2022-05-13 14:11:55 +02:00
Thomas Gruber
5c34805918
Collectors in parallel ( #74 )
...
* Provide info to CollectorManager whether the collector can be executed in parallel with others
* Split serial and parallel collectors. Read in parallel first
2022-05-13 14:10:39 +02:00
Thomas Gruber
1db5f3b29a
Rename cpu
type to hwthread
( #69 )
...
* Rename 'cpu' type to 'hwthread' to avoid naming clashes with MetricStore and CC-Webfrontend
2022-05-13 14:09:45 +02:00
Thomas Roehl
9886f14d14
Check readability of sensor files in TempCollector
2022-05-13 13:32:54 +02:00
Thomas Roehl
857903be2b
Skip disks in DiskstatCollector that have size=0
2022-05-13 13:31:22 +02:00
Thomas Roehl
8068e59818
Update handling of LIKWID headers. Download only if not already present in the system. Fixes #73
2022-05-13 13:14:47 +02:00
Thomas Roehl
38d4e0a730
Merge branch 'develop' of github.com:ClusterCockpit/cc-metric-collector into develop
2022-05-04 11:54:55 +02:00
Thomas Roehl
54d14519ca
Skip mount points in DiskstatCollector if statfs() call does not work (bind mounts, ...)
2022-05-04 11:54:34 +02:00
Holger Obermaier
fb6f6a4daa
Fix GPFS collector last state handling
2022-05-02 16:57:19 +02:00
Thomas Roehl
017cd58247
Updating page for LikwidCollector
2022-04-05 10:57:09 +02:00
Thomas Roehl
7b098e0b1b
Fix for missing metrics in LikwidCollector is hwthread is inactive
2022-04-04 15:16:11 +02:00
Thomas Roehl
5d25a7bf12
Add units to InfiniBandCollector
2022-04-01 17:14:26 +02:00
Thomas Roehl
83b4343310
Likwid receives signal at first Read, check when re-initializing
2022-04-01 17:10:31 +02:00
Thomas Gruber
2a014b6fba
Read unit of values from /proc/meminfo ( #68 )
2022-03-31 11:56:31 +02:00
Thomas Roehl
50479f9325
Move all LIKWID related stuff to late initialization routine
2022-03-24 18:12:23 +01:00
Thomas Roehl
e0e91844bc
Use late initialization of LIKWID and catch access daemon death. Fixes #70 and fixes #71 .
2022-03-24 17:56:51 +01:00
Thomas Roehl
296225f3a8
Always export all metrics in NfsCollectors
2022-03-24 13:50:35 +01:00
Thomas Roehl
b66fdd1436
Add missing socket->thread_id map for LikwidCollector
2022-03-16 19:04:39 +01:00
Thomas Gruber
c182d295f4
Fix staticcheck warnings ( #66 )
2022-03-15 16:38:20 +01:00
Thomas Gruber
aa1afd745e
Derived metrics ( #65 )
...
* Add time-based derivatived (e.g. bandwidth) to some collectors
* Add documentation
* Add comments
* Fix: Only compute rates with a valid previous state
* Only compute rates with a valid previous state
* Define const values for net/dev fields
* Set default config values
* Add comments
* Refactor: Consolidate data structures
* Refactor: Consolidate data structures
* Refactor: Avoid struct deep copy
* Refactor: Avoid redundant tag maps
* Refactor: Use int64 type for absolut values
* Update LustreCollector
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2022-03-15 16:09:47 +01:00
Holger Obermaier
992b19d354
Move unit tag to meta data tags
2022-03-11 14:47:18 +01:00
Holger Obermaier
0b08ca9ae0
Simplified iota usage
2022-03-11 14:09:22 +01:00