Thomas Roehl
58461f1f72
Fix clock frequency coming from LikwidCollector and update docs
2022-09-09 20:01:21 +02:00
Thomas Röhl
c09d8fb118
InfiniBandCollector: Scale raw readings from octets to bytes
2022-09-09 19:27:20 +02:00
oscarminus
8a3446a596
cpustatMetric.go: Use derived values instead of absolute values ( #83 )
...
* cpustatMetric.go: Use derived values instead of absolute values
The values in /proc/stat are absolute counters related to the boot
time of the system. To obtain a utilization of the CPU, the changes
in the counters must be derived according to time. To take only the
absolute values leads to the fact that changes in the utilization,
straight with larger values, do not become visible.
* Add new collector for /proc/schedstat
The `schedstat` collector reads data from /proc/schedstat and calculates
a load value, separated by hwthread. This might be useful to detect bad
cpu pinning on shared nodes etc.
Co-authored-by: Michael Schwarz <post@michael-schwarz.name>
2022-09-07 14:13:06 +02:00
Thomas Roehl
ea33d45d8e
Fix link to docs of NumastatsCollector
2022-07-27 17:50:15 +02:00
Thomas Roehl
b2bc7b95d3
Change unit of CpufreqCollector to Hz. That's what the sysfs outputs
2022-07-12 11:58:37 +02:00
Thomas Roehl
addbfd40a1
Fix for NvidiaCollector when devices are not in MiG mode
2022-07-11 13:05:15 +02:00
Thomas Roehl
bef807dd44
Fix serial number in rocmCollector
2022-06-05 15:53:39 +02:00
Thomas Roehl
659d0115c0
Use http instead of ftp to download likwid
2022-06-05 15:50:04 +02:00
Thomas Gruber
e13695307f
AMD ROCm SMI collector ( #77 )
...
* Add collector for AMD ROCm SMI metrics
* Fix import path
* Fix imports
* Remove Board Number
* store GPU index explicitly
* Remove board number from description
2022-05-25 15:55:43 +02:00
Thomas Roehl
ad5dbd85ea
Minor updates for Makefiles
2022-05-25 15:45:21 +02:00
Thomas Roehl
500685672b
Option to use MIG slice name as subtype-id in NvidiaCollector
2022-05-13 15:26:47 +02:00
Thomas Roehl
d4c89a4206
Option to use MIG UUID as subtype-id in NvidiaCollector
2022-05-13 14:34:32 +02:00
Thomas Gruber
826f364772
CC topology module update ( #76 )
...
* Rename CPU to hardware thread, write some comments
* Do renaming in other parts
* Remove CpuList and SocketList function from metricCollector. Available in ccTopology
2022-05-13 14:28:07 +02:00
Thomas Gruber
5df550b208
Update NvidiaCollector with new metrics, MIG and NvLink support ( #75 )
2022-05-13 14:11:55 +02:00
Thomas Gruber
5c34805918
Collectors in parallel ( #74 )
...
* Provide info to CollectorManager whether the collector can be executed in parallel with others
* Split serial and parallel collectors. Read in parallel first
2022-05-13 14:10:39 +02:00
Thomas Gruber
1db5f3b29a
Rename cpu
type to hwthread
( #69 )
...
* Rename 'cpu' type to 'hwthread' to avoid naming clashes with MetricStore and CC-Webfrontend
2022-05-13 14:09:45 +02:00
Thomas Roehl
9886f14d14
Check readability of sensor files in TempCollector
2022-05-13 13:32:54 +02:00
Thomas Roehl
857903be2b
Skip disks in DiskstatCollector that have size=0
2022-05-13 13:31:22 +02:00
Thomas Roehl
8068e59818
Update handling of LIKWID headers. Download only if not already present in the system. Fixes #73
2022-05-13 13:14:47 +02:00
Thomas Roehl
38d4e0a730
Merge branch 'develop' of github.com:ClusterCockpit/cc-metric-collector into develop
2022-05-04 11:54:55 +02:00
Thomas Roehl
54d14519ca
Skip mount points in DiskstatCollector if statfs() call does not work (bind mounts, ...)
2022-05-04 11:54:34 +02:00
Holger Obermaier
fb6f6a4daa
Fix GPFS collector last state handling
2022-05-02 16:57:19 +02:00
Thomas Roehl
017cd58247
Updating page for LikwidCollector
2022-04-05 10:57:09 +02:00
Thomas Roehl
7b098e0b1b
Fix for missing metrics in LikwidCollector is hwthread is inactive
2022-04-04 15:16:11 +02:00
Thomas Roehl
5d25a7bf12
Add units to InfiniBandCollector
2022-04-01 17:14:26 +02:00
Thomas Roehl
83b4343310
Likwid receives signal at first Read, check when re-initializing
2022-04-01 17:10:31 +02:00
Thomas Gruber
2a014b6fba
Read unit of values from /proc/meminfo ( #68 )
2022-03-31 11:56:31 +02:00
Thomas Roehl
50479f9325
Move all LIKWID related stuff to late initialization routine
2022-03-24 18:12:23 +01:00
Thomas Roehl
e0e91844bc
Use late initialization of LIKWID and catch access daemon death. Fixes #70 and fixes #71 .
2022-03-24 17:56:51 +01:00
Thomas Roehl
296225f3a8
Always export all metrics in NfsCollectors
2022-03-24 13:50:35 +01:00
Thomas Roehl
b66fdd1436
Add missing socket->thread_id map for LikwidCollector
2022-03-16 19:04:39 +01:00
Thomas Gruber
c182d295f4
Fix staticcheck warnings ( #66 )
2022-03-15 16:38:20 +01:00
Thomas Gruber
aa1afd745e
Derived metrics ( #65 )
...
* Add time-based derivatived (e.g. bandwidth) to some collectors
* Add documentation
* Add comments
* Fix: Only compute rates with a valid previous state
* Only compute rates with a valid previous state
* Define const values for net/dev fields
* Set default config values
* Add comments
* Refactor: Consolidate data structures
* Refactor: Consolidate data structures
* Refactor: Avoid struct deep copy
* Refactor: Avoid redundant tag maps
* Refactor: Use int64 type for absolut values
* Update LustreCollector
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2022-03-15 16:09:47 +01:00
Holger Obermaier
992b19d354
Move unit tag to meta data tags
2022-03-11 14:47:18 +01:00
Holger Obermaier
0b08ca9ae0
Simplified iota usage
2022-03-11 14:09:22 +01:00
Thomas Gruber
f6dae7c013
Derived metrics ( #57 )
...
* Add time-based derivatived (e.g. bandwidth) to some collectors
* Add documentation
* Add comments
* Fix: Only compute rates with a valid previous state
* Only compute rates with a valid previous state
* Define const values for net/dev fields
* Set default config values
* Add comments
* Refactor: Consolidate data structures
* Refactor: Consolidate data structures
* Refactor: Avoid struct deep copy
* Refactor: Avoid redundant tag maps
* Refactor: Use int64 type for absolut values
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2022-03-11 13:48:18 +01:00
Thomas Gruber
73f22c1041
Refactoring of LikwidCollector and metric units ( #62 )
...
* Reduce complexity of LikwidCollector and allow metric units
* Add unit to LikwidCollector docu and fix some typos
* Make library path configurable
2022-03-11 13:43:17 +01:00
Thomas Roehl
e7f7e68095
Use GBytes as unit for large memory numbers
2022-03-09 11:05:26 +01:00
Thomas Gruber
f2486abeab
Just download LIKWID to get the headers ( #54 )
...
* Just download LIKWID to get the headers
* Remove perl-Data-Dumper from BuildRequires, only required by LIKWID build
2022-03-05 17:30:40 +01:00
Thomas Gruber
21864e0ac4
Change default GpfsCollector command to mmpmon
( #53 )
...
* Set default cmd to 'mmpmon'
* Reuse looked up path
* Cast const to string
2022-03-05 14:42:04 +01:00
Mehmet Soysal
547bc0461f
Beegfs collector ( #50 )
...
* added beegfs collectors to collectors/README.md
* added beegfs collectors and docs
* added new beegfs collectors to AvailableCollectors list
* Feedback implemented
* changed error type
* changed error to only return
* changed beegfs lookup path
* fixed typo in md files
Co-authored-by: Mehmet Soysal <mehmet.soysal@kit.edu>
2022-03-04 14:35:47 +01:00
Thomas Roehl
f1d2828e1d
Fix error print in LustreCollector
2022-03-04 11:32:10 +01:00
Holger Obermaier
db04c8fbae
Removed infinibandPerfQueryMetric.go. infinibandMetric.go offers the same functionality without requiring root privileges.
2022-03-03 15:52:50 +01:00
Thomas Roehl
60de21c41e
Switch access mode of LikwidCollector in config file
2022-03-03 13:03:58 +01:00
Thomas Roehl
276c00442a
Add option to LustreCollector to call lctl with sudo
2022-03-03 13:02:00 +01:00
Thomas Roehl
092e7f6a71
Add section how to temporarly disable LIKWID access to page
2022-03-02 13:54:43 +01:00
Holger Obermaier
a5325a6535
GitHub actions ( #51 )
...
Create new GitHub action which uses unmodified AlmaLinux Docker image
2022-03-01 15:39:26 +01:00
Holger Obermaier
33fec95eac
Additional comments
2022-02-28 12:16:48 +01:00
Holger Obermaier
2c08e53be4
Additional comments
2022-02-28 09:57:26 +01:00
Thomas Roehl
bac1f18b1d
Add samples for collectors, sinks and receivers
2022-02-25 13:47:19 +01:00
Thomas Gruber
c8bca59de4
Numa-aware memstat collector ( #45 )
2022-02-24 18:27:05 +01:00
Thomas Roehl
d542f32baa
Mention likwid config script in LikwidCollector README
2022-02-22 17:46:44 +01:00
Thomas Roehl
66275ecf74
DiskstatCollector: cast part_max_used metric to int
2022-02-22 15:50:49 +01:00
Thomas Roehl
eed9cd227c
Remove doubled import and remove merge artifacts
2022-02-21 14:50:11 +01:00
Thomas Roehl
24a2c9992f
Merge branch 'develop' into main
2022-02-21 14:32:24 +01:00
Thomas Gruber
f683f2e6da
Dynamically load liblikwid ( #40 )
...
* Check whether LIKWID library is present
* Generalize nan_to_zero option to invalid_to_zero including +Inf,+Inf and NaN
* Remove double error printing and return if measurements do not work
2022-02-21 13:29:33 +01:00
Thomas Gruber
435528fa97
Split diskstat Collector ( #38 )
...
* Split diskstats (free, total space) and iostats (reads, writes, ...
* Add iostat Collector to CollectorManager
2022-02-21 12:44:26 +01:00
Holger Obermaier
65c3106af2
Remove tags for num cores and packages
2022-02-18 16:59:59 +01:00
Holger Obermaier
635a75c64b
Report maximum and critical temperature
2022-02-18 16:56:41 +01:00
Thomas Roehl
4e8ee59211
Update NetstatCollector to derive bandwidths and use an include list
2022-02-18 02:25:23 +01:00
Thomas Gruber
0152c0dc1e
Update CpustatCollector ( #36 )
...
* Update cpustat collector
* Update CpustatCollector to use percentages and add 'num_cpus' metric
2022-02-17 15:46:06 +01:00
Holger Obermaier
542520d2c0
Refactoring: Use array of pointers
2022-02-15 15:37:25 +01:00
Holger Obermaier
01faa3b531
Add comments and units to all nvidia metrics
2022-02-15 10:57:32 +01:00
Holger Obermaier
14c9d6f792
Fixed: All nvidia metrics were excluded
2022-02-15 09:47:24 +01:00
Holger Obermaier
fcfb58c31c
Use slice element of m.gpus without slice index
2022-02-15 09:23:57 +01:00
Holger Obermaier
5060497abd
Cleanup
2022-02-14 22:14:06 +01:00
Holger Obermaier
342f09fabf
Cleanup
2022-02-14 11:19:19 +01:00
Holger Obermaier
09b1ea130e
Add error handling. Cleanup.
2022-02-14 10:46:05 +01:00
Holger Obermaier
6b12baff6e
Use sensor name and sensor label as metric name
2022-02-12 10:13:38 +01:00
Thomas Roehl
bd246bdacf
Fix group for netstat collector
2022-02-11 18:18:10 +01:00
Thomas Roehl
23d13b2ceb
Fix group for netstat collector
2022-02-11 18:09:39 +01:00
Holger Obermaier
cfc5279958
Move sensor detection to Init()
2022-02-11 17:17:25 +01:00
Thomas Roehl
b15fdf72b9
Exclude metrics and devices in Init() for NvidiaCollector
2022-02-11 14:20:06 +01:00
Holger Obermaier
82138df48e
Refactor: Replace readOneLine() by ioutil.ReadFile()
2022-02-10 09:28:06 +01:00
Thomas Gruber
1ea63332d3
Update README.md
2022-02-08 13:49:48 +01:00
Thomas Roehl
7e4c35e224
Merge branch 'develop' of github.com:ClusterCockpit/cc-metric-collector into develop
2022-02-08 13:46:48 +01:00
Thomas Roehl
fcc25f7d30
Add collector documentation
2022-02-08 13:46:44 +01:00
Thomas Roehl
cc86fc00a0
Add missing error check in InfiniBandPerfQueryMetric
2022-02-08 13:46:19 +01:00
Thomas Roehl
9e73dcd437
Fix type tag for numastat
2022-02-08 13:40:27 +01:00
Thomas Roehl
006b9f91f6
Excluding NaN values in Likwid metrics from sending
2022-02-08 13:39:58 +01:00
Thomas Gruber
e1cf682989
Add other collectors to README
2022-02-08 13:22:20 +01:00
Holger Obermaier
4e0782d66b
Use FromInfluxMetric() to convert influx to cc metric
2022-02-08 10:58:53 +01:00
Thomas Roehl
a6bec61b1e
LikwidCollector: Filter out NaNs or set them to zero if 'nan_to_zero' option is set
2022-02-07 18:35:08 +01:00
Thomas Roehl
7182b339b9
Respect the publish option in the LikwidCollector
2022-02-07 17:41:35 +01:00
Thomas Roehl
d8ab3b0eb0
Use LookPath in IpmiCollector
2022-02-07 15:44:29 +01:00
Thomas Roehl
b19ae7a4db
Fix initialization of InfinibandCollector
2022-02-07 15:43:57 +01:00
Thomas Gruber
5263a974d1
Split NfsCollector in Nfs3Collector and Nfs4Collector ( #28 )
...
* Split NfsCollector in Nfs3Collector and Nfs4Collector
* Add documentation
2022-02-07 15:43:01 +01:00
Thomas Roehl
b7ee125942
Merge branch 'develop' of github.com:ClusterCockpit/cc-metric-collector into develop
2022-02-07 13:47:06 +01:00
Holger Obermaier
ead7117cad
Add skip_filesystem configuration
2022-02-07 13:30:42 +01:00
Thomas Roehl
52458ce5a1
Fix for LustreCollector. Check for root user
2022-02-07 13:27:35 +01:00
Holger Obermaier
a534f16685
Add documentation for GPFS metric
2022-02-07 11:37:34 +01:00
Holger Obermaier
25c2ae4910
Avoid int -> int64 conversions
2022-02-07 11:12:03 +01:00
Holger Obermaier
3c10c6b340
Add error handling to Read()
2022-02-07 10:02:38 +01:00
Holger Obermaier
79b25ddbee
Add markdown documentation for metric collector ibstat_perfquery
2022-02-07 09:46:19 +01:00
Holger Obermaier
5ac3af895d
Moved documentation to markdown file
2022-02-07 09:22:59 +01:00
Holger Obermaier
9ab7a6424b
Moved check which metric to skip to Init()
2022-02-04 19:22:42 +01:00
Holger Obermaier
f719f1915c
Add error handling
2022-02-04 16:11:56 +01:00
Holger Obermaier
76b69c59b4
Switched to cclog.ComponentError() for error reporting in Read()
2022-02-04 14:42:53 +01:00
Thomas Roehl
66b9a25a88
Prefix metrics from NetstatCollector with 'net'
2022-02-04 12:39:59 +01:00
Thomas Roehl
db02c89683
Update LustreCollector to use lctl. Sysfs version is commented out
2022-02-03 22:05:16 +01:00