Commit Graph

216 Commits

Author SHA1 Message Date
Thomas Roehl
addbfd40a1 Fix for NvidiaCollector when devices are not in MiG mode 2022-07-11 13:05:15 +02:00
Thomas Roehl
bef807dd44 Fix serial number in rocmCollector 2022-06-05 15:53:39 +02:00
Thomas Roehl
659d0115c0 Use http instead of ftp to download likwid 2022-06-05 15:50:04 +02:00
Thomas Gruber
e13695307f
AMD ROCm SMI collector (#77)
* Add collector for AMD ROCm SMI metrics

* Fix import path

* Fix imports

* Remove Board Number

* store GPU index explicitly

* Remove board number from description
2022-05-25 15:55:43 +02:00
Thomas Roehl
ad5dbd85ea Minor updates for Makefiles 2022-05-25 15:45:21 +02:00
Thomas Roehl
500685672b Option to use MIG slice name as subtype-id in NvidiaCollector 2022-05-13 15:26:47 +02:00
Thomas Roehl
d4c89a4206 Option to use MIG UUID as subtype-id in NvidiaCollector 2022-05-13 14:34:32 +02:00
Thomas Gruber
826f364772
CC topology module update (#76)
* Rename CPU to hardware thread, write some comments

* Do renaming in other parts

* Remove CpuList and SocketList function from metricCollector. Available in ccTopology
2022-05-13 14:28:07 +02:00
Thomas Gruber
5df550b208
Update NvidiaCollector with new metrics, MIG and NvLink support (#75) 2022-05-13 14:11:55 +02:00
Thomas Gruber
5c34805918
Collectors in parallel (#74)
* Provide info to CollectorManager whether the collector can be executed in parallel with others

* Split serial and parallel collectors. Read in parallel first
2022-05-13 14:10:39 +02:00
Thomas Gruber
1db5f3b29a
Rename cpu type to hwthread (#69)
* Rename 'cpu' type to 'hwthread' to avoid naming clashes with MetricStore and CC-Webfrontend
2022-05-13 14:09:45 +02:00
Thomas Roehl
9886f14d14 Check readability of sensor files in TempCollector 2022-05-13 13:32:54 +02:00
Thomas Roehl
857903be2b Skip disks in DiskstatCollector that have size=0 2022-05-13 13:31:22 +02:00
Thomas Roehl
8068e59818 Update handling of LIKWID headers. Download only if not already present in the system. Fixes #73 2022-05-13 13:14:47 +02:00
Thomas Roehl
38d4e0a730 Merge branch 'develop' of github.com:ClusterCockpit/cc-metric-collector into develop 2022-05-04 11:54:55 +02:00
Thomas Roehl
54d14519ca Skip mount points in DiskstatCollector if statfs() call does not work (bind mounts, ...) 2022-05-04 11:54:34 +02:00
Holger Obermaier
fb6f6a4daa Fix GPFS collector last state handling 2022-05-02 16:57:19 +02:00
Thomas Roehl
017cd58247 Updating page for LikwidCollector 2022-04-05 10:57:09 +02:00
Thomas Roehl
7b098e0b1b Fix for missing metrics in LikwidCollector is hwthread is inactive 2022-04-04 15:16:11 +02:00
Thomas Roehl
5d25a7bf12 Add units to InfiniBandCollector 2022-04-01 17:14:26 +02:00
Thomas Roehl
83b4343310 Likwid receives signal at first Read, check when re-initializing 2022-04-01 17:10:31 +02:00
Thomas Gruber
2a014b6fba
Read unit of values from /proc/meminfo (#68) 2022-03-31 11:56:31 +02:00
Thomas Roehl
50479f9325 Move all LIKWID related stuff to late initialization routine 2022-03-24 18:12:23 +01:00
Thomas Roehl
e0e91844bc Use late initialization of LIKWID and catch access daemon death. Fixes #70 and fixes #71. 2022-03-24 17:56:51 +01:00
Thomas Roehl
296225f3a8 Always export all metrics in NfsCollectors 2022-03-24 13:50:35 +01:00
Thomas Roehl
b66fdd1436 Add missing socket->thread_id map for LikwidCollector 2022-03-16 19:04:39 +01:00
Thomas Gruber
c182d295f4
Fix staticcheck warnings (#66) 2022-03-15 16:38:20 +01:00
Thomas Gruber
aa1afd745e
Derived metrics (#65)
* Add time-based derivatived (e.g. bandwidth) to some collectors

* Add documentation

* Add comments

* Fix: Only compute rates with a valid previous state

* Only compute rates with a valid previous state

* Define const values for net/dev fields

* Set default config values

* Add comments

* Refactor: Consolidate data structures

* Refactor: Consolidate data structures

* Refactor: Avoid struct deep copy

* Refactor: Avoid redundant tag maps

* Refactor: Use int64 type for absolut values

* Update LustreCollector

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2022-03-15 16:09:47 +01:00
Holger Obermaier
992b19d354 Move unit tag to meta data tags 2022-03-11 14:47:18 +01:00
Holger Obermaier
0b08ca9ae0 Simplified iota usage 2022-03-11 14:09:22 +01:00
Thomas Gruber
f6dae7c013
Derived metrics (#57)
* Add time-based derivatived (e.g. bandwidth) to some collectors

* Add documentation

* Add comments

* Fix: Only compute rates with a valid previous state

* Only compute rates with a valid previous state

* Define const values for net/dev fields

* Set default config values

* Add comments

* Refactor: Consolidate data structures

* Refactor: Consolidate data structures

* Refactor: Avoid struct deep copy

* Refactor: Avoid redundant tag maps

* Refactor: Use int64 type for absolut values

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2022-03-11 13:48:18 +01:00
Thomas Gruber
73f22c1041
Refactoring of LikwidCollector and metric units (#62)
* Reduce complexity of LikwidCollector and allow metric units

* Add unit to LikwidCollector docu and fix some typos

* Make library path configurable
2022-03-11 13:43:17 +01:00
Thomas Roehl
e7f7e68095 Use GBytes as unit for large memory numbers 2022-03-09 11:05:26 +01:00
Thomas Gruber
f2486abeab
Just download LIKWID to get the headers (#54)
* Just download LIKWID to get the headers

* Remove perl-Data-Dumper from BuildRequires, only required by LIKWID build
2022-03-05 17:30:40 +01:00
Thomas Gruber
21864e0ac4
Change default GpfsCollector command to mmpmon (#53)
* Set default cmd to 'mmpmon'

* Reuse looked up path

* Cast const to string
2022-03-05 14:42:04 +01:00
Mehmet Soysal
547bc0461f
Beegfs collector (#50)
* added beegfs collectors to collectors/README.md

* added beegfs collectors and docs

* added new beegfs collectors to AvailableCollectors list

* Feedback implemented

* changed error type

* changed error to only return

* changed beegfs lookup path

* fixed typo in md files

Co-authored-by: Mehmet Soysal <mehmet.soysal@kit.edu>
2022-03-04 14:35:47 +01:00
Thomas Roehl
f1d2828e1d Fix error print in LustreCollector 2022-03-04 11:32:10 +01:00
Holger Obermaier
db04c8fbae Removed infinibandPerfQueryMetric.go. infinibandMetric.go offers the same functionality without requiring root privileges. 2022-03-03 15:52:50 +01:00
Thomas Roehl
60de21c41e Switch access mode of LikwidCollector in config file 2022-03-03 13:03:58 +01:00
Thomas Roehl
276c00442a Add option to LustreCollector to call lctl with sudo 2022-03-03 13:02:00 +01:00
Thomas Roehl
092e7f6a71 Add section how to temporarly disable LIKWID access to page 2022-03-02 13:54:43 +01:00
Holger Obermaier
a5325a6535
GitHub actions (#51)
Create new GitHub action which uses unmodified AlmaLinux Docker image
2022-03-01 15:39:26 +01:00
Holger Obermaier
33fec95eac Additional comments 2022-02-28 12:16:48 +01:00
Holger Obermaier
2c08e53be4 Additional comments 2022-02-28 09:57:26 +01:00
Thomas Roehl
bac1f18b1d Add samples for collectors, sinks and receivers 2022-02-25 13:47:19 +01:00
Thomas Gruber
c8bca59de4
Numa-aware memstat collector (#45) 2022-02-24 18:27:05 +01:00
Thomas Roehl
d542f32baa Mention likwid config script in LikwidCollector README 2022-02-22 17:46:44 +01:00
Thomas Roehl
66275ecf74 DiskstatCollector: cast part_max_used metric to int 2022-02-22 15:50:49 +01:00
Thomas Roehl
eed9cd227c Remove doubled import and remove merge artifacts 2022-02-21 14:50:11 +01:00
Thomas Roehl
24a2c9992f Merge branch 'develop' into main 2022-02-21 14:32:24 +01:00