Commit Graph

271 Commits

Author SHA1 Message Date
Thomas Roehl
58461f1f72 Fix clock frequency coming from LikwidCollector and update docs 2022-09-09 20:01:21 +02:00
Thomas Röhl
c09d8fb118 InfiniBandCollector: Scale raw readings from octets to bytes 2022-09-09 19:27:20 +02:00
oscarminus
8a3446a596 cpustatMetric.go: Use derived values instead of absolute values (#83)
* cpustatMetric.go: Use derived values instead of absolute values

  The values in /proc/stat are absolute counters related to the boot
  time of the system. To obtain a utilization of the CPU, the changes
  in the counters must be derived according to time. To take only the
  absolute values leads to the fact that changes in the utilization,
  straight with larger values, do not become visible.

* Add new collector for /proc/schedstat

  The `schedstat` collector reads data from /proc/schedstat and calculates
  a load value, separated by hwthread. This might be useful to detect bad
  cpu pinning on shared nodes etc.

Co-authored-by: Michael Schwarz <post@michael-schwarz.name>
2022-09-07 14:13:06 +02:00
Thomas Roehl
ea33d45d8e Fix link to docs of NumastatsCollector 2022-07-27 17:50:15 +02:00
Thomas Roehl
b2bc7b95d3 Change unit of CpufreqCollector to Hz. That's what the sysfs outputs 2022-07-12 11:58:37 +02:00
Thomas Roehl
addbfd40a1 Fix for NvidiaCollector when devices are not in MiG mode 2022-07-11 13:05:15 +02:00
Thomas Roehl
bef807dd44 Fix serial number in rocmCollector 2022-06-05 15:53:39 +02:00
Thomas Roehl
659d0115c0 Use http instead of ftp to download likwid 2022-06-05 15:50:04 +02:00
Thomas Gruber
e13695307f
AMD ROCm SMI collector (#77)
* Add collector for AMD ROCm SMI metrics

* Fix import path

* Fix imports

* Remove Board Number

* store GPU index explicitly

* Remove board number from description
2022-05-25 15:55:43 +02:00
Thomas Roehl
ad5dbd85ea Minor updates for Makefiles 2022-05-25 15:45:21 +02:00
Thomas Roehl
500685672b Option to use MIG slice name as subtype-id in NvidiaCollector 2022-05-13 15:26:47 +02:00
Thomas Roehl
d4c89a4206 Option to use MIG UUID as subtype-id in NvidiaCollector 2022-05-13 14:34:32 +02:00
Thomas Gruber
826f364772
CC topology module update (#76)
* Rename CPU to hardware thread, write some comments

* Do renaming in other parts

* Remove CpuList and SocketList function from metricCollector. Available in ccTopology
2022-05-13 14:28:07 +02:00
Thomas Gruber
5df550b208
Update NvidiaCollector with new metrics, MIG and NvLink support (#75) 2022-05-13 14:11:55 +02:00
Thomas Gruber
5c34805918
Collectors in parallel (#74)
* Provide info to CollectorManager whether the collector can be executed in parallel with others

* Split serial and parallel collectors. Read in parallel first
2022-05-13 14:10:39 +02:00
Thomas Gruber
1db5f3b29a
Rename cpu type to hwthread (#69)
* Rename 'cpu' type to 'hwthread' to avoid naming clashes with MetricStore and CC-Webfrontend
2022-05-13 14:09:45 +02:00
Thomas Roehl
9886f14d14 Check readability of sensor files in TempCollector 2022-05-13 13:32:54 +02:00
Thomas Roehl
857903be2b Skip disks in DiskstatCollector that have size=0 2022-05-13 13:31:22 +02:00
Thomas Roehl
8068e59818 Update handling of LIKWID headers. Download only if not already present in the system. Fixes #73 2022-05-13 13:14:47 +02:00
Thomas Roehl
38d4e0a730 Merge branch 'develop' of github.com:ClusterCockpit/cc-metric-collector into develop 2022-05-04 11:54:55 +02:00
Thomas Roehl
54d14519ca Skip mount points in DiskstatCollector if statfs() call does not work (bind mounts, ...) 2022-05-04 11:54:34 +02:00
Holger Obermaier
fb6f6a4daa Fix GPFS collector last state handling 2022-05-02 16:57:19 +02:00
Thomas Roehl
017cd58247 Updating page for LikwidCollector 2022-04-05 10:57:09 +02:00
Thomas Roehl
7b098e0b1b Fix for missing metrics in LikwidCollector is hwthread is inactive 2022-04-04 15:16:11 +02:00
Thomas Roehl
5d25a7bf12 Add units to InfiniBandCollector 2022-04-01 17:14:26 +02:00
Thomas Roehl
83b4343310 Likwid receives signal at first Read, check when re-initializing 2022-04-01 17:10:31 +02:00
Thomas Gruber
2a014b6fba
Read unit of values from /proc/meminfo (#68) 2022-03-31 11:56:31 +02:00
Thomas Roehl
50479f9325 Move all LIKWID related stuff to late initialization routine 2022-03-24 18:12:23 +01:00
Thomas Roehl
e0e91844bc Use late initialization of LIKWID and catch access daemon death. Fixes #70 and fixes #71. 2022-03-24 17:56:51 +01:00
Thomas Roehl
296225f3a8 Always export all metrics in NfsCollectors 2022-03-24 13:50:35 +01:00
Thomas Roehl
b66fdd1436 Add missing socket->thread_id map for LikwidCollector 2022-03-16 19:04:39 +01:00
Thomas Gruber
c182d295f4
Fix staticcheck warnings (#66) 2022-03-15 16:38:20 +01:00
Thomas Gruber
aa1afd745e
Derived metrics (#65)
* Add time-based derivatived (e.g. bandwidth) to some collectors

* Add documentation

* Add comments

* Fix: Only compute rates with a valid previous state

* Only compute rates with a valid previous state

* Define const values for net/dev fields

* Set default config values

* Add comments

* Refactor: Consolidate data structures

* Refactor: Consolidate data structures

* Refactor: Avoid struct deep copy

* Refactor: Avoid redundant tag maps

* Refactor: Use int64 type for absolut values

* Update LustreCollector

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2022-03-15 16:09:47 +01:00
Holger Obermaier
992b19d354 Move unit tag to meta data tags 2022-03-11 14:47:18 +01:00
Holger Obermaier
0b08ca9ae0 Simplified iota usage 2022-03-11 14:09:22 +01:00
Thomas Gruber
f6dae7c013
Derived metrics (#57)
* Add time-based derivatived (e.g. bandwidth) to some collectors

* Add documentation

* Add comments

* Fix: Only compute rates with a valid previous state

* Only compute rates with a valid previous state

* Define const values for net/dev fields

* Set default config values

* Add comments

* Refactor: Consolidate data structures

* Refactor: Consolidate data structures

* Refactor: Avoid struct deep copy

* Refactor: Avoid redundant tag maps

* Refactor: Use int64 type for absolut values

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2022-03-11 13:48:18 +01:00
Thomas Gruber
73f22c1041
Refactoring of LikwidCollector and metric units (#62)
* Reduce complexity of LikwidCollector and allow metric units

* Add unit to LikwidCollector docu and fix some typos

* Make library path configurable
2022-03-11 13:43:17 +01:00
Thomas Roehl
e7f7e68095 Use GBytes as unit for large memory numbers 2022-03-09 11:05:26 +01:00
Thomas Gruber
f2486abeab
Just download LIKWID to get the headers (#54)
* Just download LIKWID to get the headers

* Remove perl-Data-Dumper from BuildRequires, only required by LIKWID build
2022-03-05 17:30:40 +01:00
Thomas Gruber
21864e0ac4
Change default GpfsCollector command to mmpmon (#53)
* Set default cmd to 'mmpmon'

* Reuse looked up path

* Cast const to string
2022-03-05 14:42:04 +01:00
Mehmet Soysal
547bc0461f
Beegfs collector (#50)
* added beegfs collectors to collectors/README.md

* added beegfs collectors and docs

* added new beegfs collectors to AvailableCollectors list

* Feedback implemented

* changed error type

* changed error to only return

* changed beegfs lookup path

* fixed typo in md files

Co-authored-by: Mehmet Soysal <mehmet.soysal@kit.edu>
2022-03-04 14:35:47 +01:00
Thomas Roehl
f1d2828e1d Fix error print in LustreCollector 2022-03-04 11:32:10 +01:00
Holger Obermaier
db04c8fbae Removed infinibandPerfQueryMetric.go. infinibandMetric.go offers the same functionality without requiring root privileges. 2022-03-03 15:52:50 +01:00
Thomas Roehl
60de21c41e Switch access mode of LikwidCollector in config file 2022-03-03 13:03:58 +01:00
Thomas Roehl
276c00442a Add option to LustreCollector to call lctl with sudo 2022-03-03 13:02:00 +01:00
Thomas Roehl
092e7f6a71 Add section how to temporarly disable LIKWID access to page 2022-03-02 13:54:43 +01:00
Holger Obermaier
a5325a6535
GitHub actions (#51)
Create new GitHub action which uses unmodified AlmaLinux Docker image
2022-03-01 15:39:26 +01:00
Holger Obermaier
33fec95eac Additional comments 2022-02-28 12:16:48 +01:00
Holger Obermaier
2c08e53be4 Additional comments 2022-02-28 09:57:26 +01:00
Thomas Roehl
bac1f18b1d Add samples for collectors, sinks and receivers 2022-02-25 13:47:19 +01:00
Thomas Gruber
c8bca59de4
Numa-aware memstat collector (#45) 2022-02-24 18:27:05 +01:00
Thomas Roehl
d542f32baa Mention likwid config script in LikwidCollector README 2022-02-22 17:46:44 +01:00
Thomas Roehl
66275ecf74 DiskstatCollector: cast part_max_used metric to int 2022-02-22 15:50:49 +01:00
Thomas Roehl
eed9cd227c Remove doubled import and remove merge artifacts 2022-02-21 14:50:11 +01:00
Thomas Roehl
24a2c9992f Merge branch 'develop' into main 2022-02-21 14:32:24 +01:00
Thomas Gruber
f683f2e6da
Dynamically load liblikwid (#40)
* Check whether LIKWID library is present

* Generalize nan_to_zero option to invalid_to_zero including +Inf,+Inf and NaN

* Remove double error printing and return if measurements do not work
2022-02-21 13:29:33 +01:00
Thomas Gruber
435528fa97
Split diskstat Collector (#38)
* Split diskstats (free, total space) and iostats (reads, writes, ...

* Add iostat Collector to CollectorManager
2022-02-21 12:44:26 +01:00
Holger Obermaier
65c3106af2 Remove tags for num cores and packages 2022-02-18 16:59:59 +01:00
Holger Obermaier
635a75c64b Report maximum and critical temperature 2022-02-18 16:56:41 +01:00
Thomas Roehl
4e8ee59211 Update NetstatCollector to derive bandwidths and use an include list 2022-02-18 02:25:23 +01:00
Thomas Gruber
0152c0dc1e
Update CpustatCollector (#36)
* Update cpustat collector

* Update CpustatCollector to use percentages and add 'num_cpus' metric
2022-02-17 15:46:06 +01:00
Holger Obermaier
542520d2c0 Refactoring: Use array of pointers 2022-02-15 15:37:25 +01:00
Holger Obermaier
01faa3b531 Add comments and units to all nvidia metrics 2022-02-15 10:57:32 +01:00
Holger Obermaier
14c9d6f792 Fixed: All nvidia metrics were excluded 2022-02-15 09:47:24 +01:00
Holger Obermaier
fcfb58c31c Use slice element of m.gpus without slice index 2022-02-15 09:23:57 +01:00
Holger Obermaier
5060497abd Cleanup 2022-02-14 22:14:06 +01:00
Holger Obermaier
342f09fabf Cleanup 2022-02-14 11:19:19 +01:00
Holger Obermaier
09b1ea130e Add error handling. Cleanup. 2022-02-14 10:46:05 +01:00
Holger Obermaier
6b12baff6e Use sensor name and sensor label as metric name 2022-02-12 10:13:38 +01:00
Thomas Roehl
bd246bdacf Fix group for netstat collector 2022-02-11 18:18:10 +01:00
Thomas Roehl
23d13b2ceb Fix group for netstat collector 2022-02-11 18:09:39 +01:00
Holger Obermaier
cfc5279958 Move sensor detection to Init() 2022-02-11 17:17:25 +01:00
Thomas Roehl
b15fdf72b9 Exclude metrics and devices in Init() for NvidiaCollector 2022-02-11 14:20:06 +01:00
Holger Obermaier
82138df48e Refactor: Replace readOneLine() by ioutil.ReadFile() 2022-02-10 09:28:06 +01:00
Thomas Gruber
1ea63332d3
Update README.md 2022-02-08 13:49:48 +01:00
Thomas Roehl
7e4c35e224 Merge branch 'develop' of github.com:ClusterCockpit/cc-metric-collector into develop 2022-02-08 13:46:48 +01:00
Thomas Roehl
fcc25f7d30 Add collector documentation 2022-02-08 13:46:44 +01:00
Thomas Roehl
cc86fc00a0 Add missing error check in InfiniBandPerfQueryMetric 2022-02-08 13:46:19 +01:00
Thomas Roehl
9e73dcd437 Fix type tag for numastat 2022-02-08 13:40:27 +01:00
Thomas Roehl
006b9f91f6 Excluding NaN values in Likwid metrics from sending 2022-02-08 13:39:58 +01:00
Thomas Gruber
e1cf682989
Add other collectors to README 2022-02-08 13:22:20 +01:00
Holger Obermaier
4e0782d66b Use FromInfluxMetric() to convert influx to cc metric 2022-02-08 10:58:53 +01:00
Thomas Roehl
a6bec61b1e LikwidCollector: Filter out NaNs or set them to zero if 'nan_to_zero' option is set 2022-02-07 18:35:08 +01:00
Thomas Roehl
7182b339b9 Respect the publish option in the LikwidCollector 2022-02-07 17:41:35 +01:00
Thomas Roehl
d8ab3b0eb0 Use LookPath in IpmiCollector 2022-02-07 15:44:29 +01:00
Thomas Roehl
b19ae7a4db Fix initialization of InfinibandCollector 2022-02-07 15:43:57 +01:00
Thomas Gruber
5263a974d1
Split NfsCollector in Nfs3Collector and Nfs4Collector (#28)
* Split NfsCollector in Nfs3Collector and Nfs4Collector

* Add documentation
2022-02-07 15:43:01 +01:00
Thomas Roehl
b7ee125942 Merge branch 'develop' of github.com:ClusterCockpit/cc-metric-collector into develop 2022-02-07 13:47:06 +01:00
Holger Obermaier
ead7117cad Add skip_filesystem configuration 2022-02-07 13:30:42 +01:00
Thomas Roehl
52458ce5a1 Fix for LustreCollector. Check for root user 2022-02-07 13:27:35 +01:00
Holger Obermaier
a534f16685 Add documentation for GPFS metric 2022-02-07 11:37:34 +01:00
Holger Obermaier
25c2ae4910 Avoid int -> int64 conversions 2022-02-07 11:12:03 +01:00
Holger Obermaier
3c10c6b340 Add error handling to Read() 2022-02-07 10:02:38 +01:00
Holger Obermaier
79b25ddbee Add markdown documentation for metric collector ibstat_perfquery 2022-02-07 09:46:19 +01:00
Holger Obermaier
5ac3af895d Moved documentation to markdown file 2022-02-07 09:22:59 +01:00
Holger Obermaier
9ab7a6424b Moved check which metric to skip to Init() 2022-02-04 19:22:42 +01:00
Holger Obermaier
f719f1915c Add error handling 2022-02-04 16:11:56 +01:00
Holger Obermaier
76b69c59b4 Switched to cclog.ComponentError() for error reporting in Read() 2022-02-04 14:42:53 +01:00
Thomas Roehl
66b9a25a88 Prefix metrics from NetstatCollector with 'net' 2022-02-04 12:39:59 +01:00
Thomas Roehl
db02c89683 Update LustreCollector to use lctl. Sysfs version is commented out 2022-02-03 22:05:16 +01:00