Commit Graph

228 Commits

Author SHA1 Message Date
Thomas Roehl
60de21c41e Switch access mode of LikwidCollector in config file 2022-03-03 13:03:58 +01:00
Thomas Roehl
276c00442a Add option to LustreCollector to call lctl with sudo 2022-03-03 13:02:00 +01:00
Thomas Roehl
092e7f6a71 Add section how to temporarly disable LIKWID access to page 2022-03-02 13:54:43 +01:00
Holger Obermaier
a5325a6535
GitHub actions (#51)
Create new GitHub action which uses unmodified AlmaLinux Docker image
2022-03-01 15:39:26 +01:00
Holger Obermaier
33fec95eac Additional comments 2022-02-28 12:16:48 +01:00
Holger Obermaier
2c08e53be4 Additional comments 2022-02-28 09:57:26 +01:00
Thomas Roehl
bac1f18b1d Add samples for collectors, sinks and receivers 2022-02-25 13:47:19 +01:00
Thomas Gruber
c8bca59de4
Numa-aware memstat collector (#45) 2022-02-24 18:27:05 +01:00
Thomas Roehl
d542f32baa Mention likwid config script in LikwidCollector README 2022-02-22 17:46:44 +01:00
Thomas Roehl
66275ecf74 DiskstatCollector: cast part_max_used metric to int 2022-02-22 15:50:49 +01:00
Thomas Roehl
eed9cd227c Remove doubled import and remove merge artifacts 2022-02-21 14:50:11 +01:00
Thomas Roehl
24a2c9992f Merge branch 'develop' into main 2022-02-21 14:32:24 +01:00
Thomas Gruber
f683f2e6da
Dynamically load liblikwid (#40)
* Check whether LIKWID library is present

* Generalize nan_to_zero option to invalid_to_zero including +Inf,+Inf and NaN

* Remove double error printing and return if measurements do not work
2022-02-21 13:29:33 +01:00
Thomas Gruber
435528fa97
Split diskstat Collector (#38)
* Split diskstats (free, total space) and iostats (reads, writes, ...

* Add iostat Collector to CollectorManager
2022-02-21 12:44:26 +01:00
Holger Obermaier
65c3106af2 Remove tags for num cores and packages 2022-02-18 16:59:59 +01:00
Holger Obermaier
635a75c64b Report maximum and critical temperature 2022-02-18 16:56:41 +01:00
Thomas Roehl
4e8ee59211 Update NetstatCollector to derive bandwidths and use an include list 2022-02-18 02:25:23 +01:00
Thomas Gruber
0152c0dc1e
Update CpustatCollector (#36)
* Update cpustat collector

* Update CpustatCollector to use percentages and add 'num_cpus' metric
2022-02-17 15:46:06 +01:00
Holger Obermaier
542520d2c0 Refactoring: Use array of pointers 2022-02-15 15:37:25 +01:00
Holger Obermaier
01faa3b531 Add comments and units to all nvidia metrics 2022-02-15 10:57:32 +01:00
Holger Obermaier
14c9d6f792 Fixed: All nvidia metrics were excluded 2022-02-15 09:47:24 +01:00
Holger Obermaier
fcfb58c31c Use slice element of m.gpus without slice index 2022-02-15 09:23:57 +01:00
Holger Obermaier
5060497abd Cleanup 2022-02-14 22:14:06 +01:00
Holger Obermaier
342f09fabf Cleanup 2022-02-14 11:19:19 +01:00
Holger Obermaier
09b1ea130e Add error handling. Cleanup. 2022-02-14 10:46:05 +01:00
Holger Obermaier
6b12baff6e Use sensor name and sensor label as metric name 2022-02-12 10:13:38 +01:00
Thomas Roehl
bd246bdacf Fix group for netstat collector 2022-02-11 18:18:10 +01:00
Thomas Roehl
23d13b2ceb Fix group for netstat collector 2022-02-11 18:09:39 +01:00
Holger Obermaier
cfc5279958 Move sensor detection to Init() 2022-02-11 17:17:25 +01:00
Thomas Roehl
b15fdf72b9 Exclude metrics and devices in Init() for NvidiaCollector 2022-02-11 14:20:06 +01:00
Holger Obermaier
82138df48e Refactor: Replace readOneLine() by ioutil.ReadFile() 2022-02-10 09:28:06 +01:00
Thomas Gruber
1ea63332d3
Update README.md 2022-02-08 13:49:48 +01:00
Thomas Roehl
7e4c35e224 Merge branch 'develop' of github.com:ClusterCockpit/cc-metric-collector into develop 2022-02-08 13:46:48 +01:00
Thomas Roehl
fcc25f7d30 Add collector documentation 2022-02-08 13:46:44 +01:00
Thomas Roehl
cc86fc00a0 Add missing error check in InfiniBandPerfQueryMetric 2022-02-08 13:46:19 +01:00
Thomas Roehl
9e73dcd437 Fix type tag for numastat 2022-02-08 13:40:27 +01:00
Thomas Roehl
006b9f91f6 Excluding NaN values in Likwid metrics from sending 2022-02-08 13:39:58 +01:00
Thomas Gruber
e1cf682989
Add other collectors to README 2022-02-08 13:22:20 +01:00
Holger Obermaier
4e0782d66b Use FromInfluxMetric() to convert influx to cc metric 2022-02-08 10:58:53 +01:00
Thomas Roehl
a6bec61b1e LikwidCollector: Filter out NaNs or set them to zero if 'nan_to_zero' option is set 2022-02-07 18:35:08 +01:00
Thomas Roehl
7182b339b9 Respect the publish option in the LikwidCollector 2022-02-07 17:41:35 +01:00
Thomas Roehl
d8ab3b0eb0 Use LookPath in IpmiCollector 2022-02-07 15:44:29 +01:00
Thomas Roehl
b19ae7a4db Fix initialization of InfinibandCollector 2022-02-07 15:43:57 +01:00
Thomas Gruber
5263a974d1
Split NfsCollector in Nfs3Collector and Nfs4Collector (#28)
* Split NfsCollector in Nfs3Collector and Nfs4Collector

* Add documentation
2022-02-07 15:43:01 +01:00
Thomas Roehl
b7ee125942 Merge branch 'develop' of github.com:ClusterCockpit/cc-metric-collector into develop 2022-02-07 13:47:06 +01:00
Holger Obermaier
ead7117cad Add skip_filesystem configuration 2022-02-07 13:30:42 +01:00
Thomas Roehl
52458ce5a1 Fix for LustreCollector. Check for root user 2022-02-07 13:27:35 +01:00
Holger Obermaier
a534f16685 Add documentation for GPFS metric 2022-02-07 11:37:34 +01:00
Holger Obermaier
25c2ae4910 Avoid int -> int64 conversions 2022-02-07 11:12:03 +01:00
Holger Obermaier
3c10c6b340 Add error handling to Read() 2022-02-07 10:02:38 +01:00
Holger Obermaier
79b25ddbee Add markdown documentation for metric collector ibstat_perfquery 2022-02-07 09:46:19 +01:00
Holger Obermaier
5ac3af895d Moved documentation to markdown file 2022-02-07 09:22:59 +01:00
Holger Obermaier
9ab7a6424b Moved check which metric to skip to Init() 2022-02-04 19:22:42 +01:00
Holger Obermaier
f719f1915c Add error handling 2022-02-04 16:11:56 +01:00
Holger Obermaier
76b69c59b4 Switched to cclog.ComponentError() for error reporting in Read() 2022-02-04 14:42:53 +01:00
Thomas Roehl
66b9a25a88 Prefix metrics from NetstatCollector with 'net' 2022-02-04 12:39:59 +01:00
Thomas Roehl
db02c89683 Update LustreCollector to use lctl. Sysfs version is commented out 2022-02-03 22:05:16 +01:00
Thomas Gruber
92d4a9c2b9
Split MetricRouter and MetricAggregator (#24)
* Split MetricRouter and MetricAggregator

* Missing change in MetricCache

* Add README for MetricAggregator
2022-02-03 16:52:55 +01:00
Holger Obermaier
d5ff5b83ce Add NUMA metric collector 2022-02-03 16:19:45 +01:00
Holger Obermaier
a016483012 Add NUMA metric collector. 2022-02-03 15:02:13 +01:00
Thomas Roehl
2806b1e7cc Remove debugging artifacts 2022-02-02 17:14:29 +01:00
Thomas Roehl
e59852be03 Fix LikwidCollector, merge artifact causes problems 2022-02-02 16:55:15 +01:00
Thomas Roehl
6f399d5f08 Add scope guidelines in LikwidCollector page 2022-02-02 16:46:35 +01:00
Thomas Roehl
5bf538bf97 Update LikwidCollector page 2022-02-02 16:40:20 +01:00
Thomas Roehl
ed62e952ce Use MetricAggregator to calculate metrics in LIKWID collector. 2022-02-02 14:52:07 +01:00
Thomas Roehl
e550226416 Use gval in LikwidCollector 2022-02-01 16:01:31 +01:00
Holger Obermaier
9e99e47d73 Wait for close of done channel, to ensure manager finished. 2022-01-30 12:08:33 +01:00
Holger Obermaier
8df58c051f Lower minimum required golang version to 1.16. 2022-01-29 10:04:31 +01:00
Holger Obermaier
4e408f9490 Add documentation 2022-01-28 15:16:58 +01:00
Holger Obermaier
82f5c1c5d0 Minimum requirement go version 1.17 2022-01-28 09:42:19 +01:00
Holger Obermaier
db5b4e4f65 Add type=node to gpf metric tags 2022-01-28 09:14:25 +01:00
Holger Obermaier
aea3e2c6b1 Place wait group Add() and Done() near to each other 2022-01-27 20:45:22 +01:00
Holger Obermaier
b9236dcc31 Handle shutdown sequentially 2022-01-27 17:43:00 +01:00
Holger Obermaier
e1d0aacd1e Moved as much work as possible to Init() 2022-01-27 11:08:27 +01:00
Holger Obermaier
7077452a5d Split InfiniBand metric collector, one using
/sys filesystem reads and one using perfquery.
2022-01-26 20:18:47 +01:00
Thomas Roehl
76884c3380 Prefix Nvidia metrics with 'nv_' 2022-01-26 18:45:23 +01:00
Thomas Roehl
86e9b55bc9 Fix for documentation 2022-01-26 18:41:25 +01:00
Thomas Roehl
78834337b0 Fix for documentation 2022-01-26 18:37:59 +01:00
Thomas Roehl
babd7a9af8 Use non-blocking send at close 2022-01-26 16:52:56 +01:00
Holger Obermaier
09b7538479 Avoid labels in collector manager loop 2022-01-26 15:54:49 +01:00
Holger Obermaier
c193b80083 Add documentation 2022-01-26 12:31:04 +01:00
Thomas Roehl
2925ad9f40 Use ccLogger anywhere 2022-01-25 17:43:10 +01:00
Holger Obermaier
b4fde31626 Add documentation 2022-01-25 17:20:20 +01:00
Thomas Roehl
bafc6322e6 Change to own Logger 2022-01-25 16:40:02 +01:00
Thomas Gruber
200af84c54
Modularize the whole thing (#16)
* Use channels, add a metric router, split up configuration and use extended version of Influx line protocol internally

* Use central timer for collectors and router. Add expressions to router

* Add expression to router config

* Update entry points

* Start with README

* Update README for CCMetric

* Formatting

* Update README.md

* Add README for MultiChanTicker

* Add README for MultiChanTicker

* Update README.md

* Add README to metric router

* Update main README

* Remove SinkEntity type

* Update README for sinks

* Update go files

* Update README for receivers

* Update collectors README

* Update collectors README

* Use seperate page per collector

* Fix for tempstat page

* Add docs for customcmd collector

* Add docs for ipmistat collector

* Add docs for topprocs collector

* Update customCmdMetric.md

* Use seconds when calculating LIKWID metrics

* Add IB metrics ib_recv_pkts and ib_xmit_pkts

* Drop domain part of host name

* Updated to latest stable version of likwid

* Define source code dependencies in Makefile

* Add GPFS / IBM Spectrum Scale collector

* Add vet and staticcheck make targets

* Add vet and staticcheck make targets

* Avoid go vet warning:
struct field tag `json:"..., omitempty"` not compatible with reflect.StructTag.Get: suspicious space in struct tag value
struct field tag `json:"...", omitempty` not compatible with reflect.StructTag.Get: key:"value" pairs not separated by spaces

* Add sample collector to README.md

* Add CPU frequency collector

* Avoid staticcheck warning: redundant return statement

* Avoid staticcheck warning: unnecessary assignment to the blank identifier

* Simplified code

* Add CPUFreqCollectorCpuinfo
a metric collector to measure the current frequency of the CPUs
as obtained from /proc/cpuinfo
Only measure on the first hyperthread

* Add collector for NFS clients

* Move publication of metrics into Flush() for NatsSink

* Update GitHub actions

* Refactoring

* Avoid vet warning: Println arg list ends with redundant newline

* Avoid vet warning struct field commands has json tag but is not exported

* Avoid vet warning: return copies lock value.

* Corrected typo

* Refactoring

* Add go sources in internal/...

* Bad separator in Makefile

* Fix Infiniband collector

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2022-01-25 15:37:43 +01:00
Holger Obermaier
d903fc6daa Avoid vet warning struct field commands has json tag but is not exported 2022-01-25 11:16:46 +01:00
Holger Obermaier
222862af32 Avoid vet warning struct field commands has json tag but is not exported 2022-01-25 11:15:36 +01:00
Holger Obermaier
9f8d3ddbd3 Avoid vet warning: Println arg list ends with redundant newline 2022-01-25 10:34:02 +01:00
Holger Obermaier
df77c3fd60 Avoid vet warning: Println arg list ends with redundant newline 2022-01-25 10:33:20 +01:00
Holger Obermaier
ae6ffd4974 Refactoring 2022-01-25 09:48:22 +01:00
Holger Obermaier
e095e4f202 Refactoring 2022-01-25 09:47:24 +01:00
Holger Obermaier
3d377760b8 Refactoring 2022-01-24 22:04:05 +01:00
Holger Obermaier
be8c92676a Refactoring 2022-01-24 22:03:13 +01:00
Holger Obermaier
9157fdbab2 Fixed topology detection 2022-01-24 20:23:24 +01:00
Holger Obermaier
2026c3acd9 Fixed topology detection 2022-01-24 20:22:08 +01:00
Holger Obermaier
f84f7de05c Add CPUFreqCollectorCpuinfo
a metric collector to measure the current frequency of the CPUs
as obtained from /proc/cpuinfo
Only measure on the first hyperthread
2022-01-24 13:12:25 +01:00
Holger Obermaier
8d314ecb19 Add CPUFreqCollectorCpuinfo
a metric collector to measure the current frequency of the CPUs
as obtained from /proc/cpuinfo
Only measure on the first hyperthread
2022-01-24 13:10:33 +01:00
Holger Obermaier
bcce471b27 Simplified code 2022-01-24 11:33:04 +01:00
Holger Obermaier
daa7c6bf99 Simplified code 2022-01-24 11:31:45 +01:00
Holger Obermaier
5987901005 Avoid staticcheck warning: unnecessary assignment to the blank identifier 2022-01-21 15:25:13 +01:00