Compare commits

..

48 Commits

Author SHA1 Message Date
Thomas Röhl
1937ef2587 Update cc-lib to 2.8.2 2026-03-13 18:00:26 +01:00
Holger Obermaier
35510d3d39 Use strict JSON decoding 2026-03-13 17:57:33 +01:00
Holger Obermaier
ef5e4c2604 Corrected json config 2026-03-13 17:57:33 +01:00
Holger Obermaier
44401318e4 Enable same linters as in CI pipeline 2026-03-13 17:57:33 +01:00
Holger Obermaier
2e60d3111c Add config option to exclude metrics 2026-03-13 17:57:33 +01:00
Holger Obermaier
e8734c02db Add config option for manual device configuration 2026-03-13 17:57:33 +01:00
Holger Obermaier
54650d40a6 Store query command for later reuse 2026-03-13 17:57:33 +01:00
Holger Obermaier
e7050834f5 * Honor config option excluded devices
* Use device type in read command
2026-03-13 17:57:33 +01:00
Holger Obermaier
893a0d69de Improve error reporting 2026-03-13 17:57:33 +01:00
Holger Obermaier
345119866a Switch from lp.NewMessage to lp.NewMetric 2026-03-13 17:57:33 +01:00
Holger Obermaier
ec917cf802 Switch from lp.NewMessage to lp.NewMetric 2026-03-13 17:57:33 +01:00
Holger Obermaier
c7cfc0723b Fix all linter warnings 2026-03-13 17:57:33 +01:00
Holger Obermaier
4f2685f4c4 Addapt to new ccMessage syntax 2026-03-13 17:57:33 +01:00
Thomas Roehl
439bfacfd9 Add SmartMonCollector to CollectorManager 2026-03-13 17:57:33 +01:00
Thomas Roehl
cd4ac9c885 Add Collector for S.M.A.R.T disk data 2026-03-13 17:57:33 +01:00
Holger Obermaier
eeb60ba0df Add target to build stripped executable 2026-03-12 11:39:43 +01:00
Holger Obermaier
a481a34dcd Avoid duplicate error printing 2026-03-12 10:08:23 +01:00
Holger Obermaier
b65576431e Stricter json parsing (#204) 2026-03-11 15:59:14 +01:00
Holger Obermaier
a927565868 Fix router config syntax 2026-03-10 13:51:06 +01:00
dependabot[bot]
0b67993eb0 Bump github.com/ClusterCockpit/cc-lib/v2 from 2.7.0 to 2.8.0
Bumps [github.com/ClusterCockpit/cc-lib/v2](https://github.com/ClusterCockpit/cc-lib) from 2.7.0 to 2.8.0.
- [Release notes](https://github.com/ClusterCockpit/cc-lib/releases)
- [Commits](https://github.com/ClusterCockpit/cc-lib/compare/v2.7.0...v2.8.0)

---
updated-dependencies:
- dependency-name: github.com/ClusterCockpit/cc-lib/v2
  dependency-version: 2.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-09 07:58:27 +01:00
dependabot[bot]
4164e3d1a3 Bump golang.org/x/sys from 0.41.0 to 0.42.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.41.0 to 0.42.0.
- [Commits](https://github.com/golang/sys/compare/v0.41.0...v0.42.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-version: 0.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-09 07:58:11 +01:00
Holger Obermaier
ddb504c5c6 Fix: Do not overwrite hostname tag if already set (e.g. by receivers) 2026-03-02 15:47:57 +01:00
dependabot[bot]
367d365a85 Bump github.com/ClusterCockpit/cc-lib/v2 from 2.4.0 to 2.7.0
Bumps [github.com/ClusterCockpit/cc-lib/v2](https://github.com/ClusterCockpit/cc-lib) from 2.4.0 to 2.7.0.
- [Release notes](https://github.com/ClusterCockpit/cc-lib/releases)
- [Commits](https://github.com/ClusterCockpit/cc-lib/compare/v2.4.0...v2.7.0)

---
updated-dependencies:
- dependency-name: github.com/ClusterCockpit/cc-lib/v2
  dependency-version: 2.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-02 12:42:57 +01:00
Michael Panzlaff
e1bdd025f5 .gitignore += cc-metric-collector 2026-02-24 17:54:40 +01:00
Michael Panzlaff
315f2750ea Update to LIKWID 5.5.1 2026-02-24 17:54:40 +01:00
Michael Panzlaff
4a8159ef82 Fix possibly missing metrics in nfs collector
We should not rely on the availablity of certain nfs metrics at just the
collector start time. Allow new metrics to come in at any point.
2026-02-24 17:54:40 +01:00
Holger Obermaier
b1d6388624 Add modernize tool to Makefile 2026-02-24 15:38:13 +01:00
Holger Obermaier
d639c942d5 Fix: Close file /proc/cpuinfo only once 2026-02-20 14:01:51 +01:00
Holger Obermaier
539581f952 Format with gofumpt 2026-02-16 14:16:03 +01:00
Holger Obermaier
9bb21b807a Remove depreceated function ccTopology.CpuList() 2026-02-16 11:34:54 +01:00
Holger Obermaier
47e68dfd2f Remove debug output 2026-02-16 11:02:50 +01:00
Holger Obermaier
40fe94cabb Embedded types should be at the top of the field list of a struct.
And there should be an empty line separating embedded fields from regular fields
2026-02-16 10:54:12 +01:00
Holger Obermaier
83720aa5be Use cc-lib lp.FromBytes, do not use influxdb client directly 2026-02-16 09:57:24 +01:00
Holger Obermaier
5829f86f4a Goroutine creation can be simplified using WaitGroup.Go (modernize) 2026-02-13 15:52:36 +01:00
Holger Obermaier
3eaea4ca62 upgraded go 1.24.0 => 1.25.0 2026-02-13 15:41:50 +01:00
Holger Obermaier
64dab777a5 Replace deprecated functions (#198)
Replace depreceated functions
2026-02-13 15:15:17 +01:00
Holger Obermaier
9908d76aac Cleanup (#197)
* Removed unused code
* Use cclog for logging
* Wrap errors so that they can be unwrapped
* Revert wrong use of slices.Delete()
* Fix derivative values should be float
* Suggestions from the gocritic linter
* Fixed: interface method AddChannel must have all named params (inamedparam)
* Enable linter: errorlint
* Replace fmt.Sprintf("%d", i)) by strconv.Itoa(i) for improved performance
* Correct misspelled words
* Break up very long lines into multiple lines
* lp.NewMessage -> lp.NewMetric
* Preallocate slices of known length
2026-02-13 09:36:14 +01:00
dependabot[bot]
b69281dae6 Bump github.com/ClusterCockpit/cc-lib/v2 from 2.1.0 to 2.2.1 (#193)
Bumps [github.com/ClusterCockpit/cc-lib/v2](https://github.com/ClusterCockpit/cc-lib) from 2.1.0 to 2.2.1.
- [Release notes](https://github.com/ClusterCockpit/cc-lib/releases)
- [Commits](https://github.com/ClusterCockpit/cc-lib/compare/v2.1.0...v2.2.1)

---
updated-dependencies:
- dependency-name: github.com/ClusterCockpit/cc-lib/v2
  dependency-version: 2.2.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-10 13:45:21 +01:00
boesr
053eb27463 fixes rpm config paths (#190) 2026-02-10 13:42:36 +01:00
dependabot[bot]
665db57a11 Bump golang.org/x/sys from 0.40.0 to 0.41.0 (#194)
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.40.0 to 0.41.0.
- [Commits](https://github.com/golang/sys/compare/v0.40.0...v0.41.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-version: 0.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-10 13:40:56 +01:00
Holger Obermaier
fc297854d2 Golangci modernize fixes (#196)
* Fix: Loop can be simplified using slices.Contains
* Fix: for loop can be modernized using range over int
* Fix: interface{} can be replaced by any
* Fix: Replace m[k]=v loop with maps.Copy
* Run all linters with golangci-lint
2026-02-10 13:33:04 +01:00
Holger Obermaier
cca0d23efa Golangci lint fixes (#195)
* Add golangci-lin as make target
* Fix: could omit type ... from declaration; it will be inferred from the right-hand side (staticcheck)
* Fix func intArrayContains is unused (unused)
* Fix: could use strings.ReplaceAll instead (staticcheck)
* Fix: could expand call to math.Pow (staticcheck)
* Fix: could use tagged switch on `...` (staticcheck)
* Fix: Error return value of `...` is not checked (errcheck)
* Fix: ineffectual assignment to err (ineffassign)
* Fix: There is no need to wait for command completion
* Add cpustat, diskstat and schedstat config
* Use slices to exclude metrics
* Replaced stringArrayContains by slices.Contains
* Replace m[k]=v loop with maps.Copy
* Use module slices from the standard library. Remove use of golang.org/x/exp/slices
* Use SplitSeq and max to modernize code
2026-02-09 14:51:31 +01:00
Holger Obermaier
7cff283001 Update ci (#192)
Add static analysis with GolangCI-Lint, govet and staticcheck
2026-01-23 14:39:39 +01:00
Holger Obermaier
fa45d0d973 Update ci (#191)
* Add UBI 10 build
* Add Almalinux 10 build
* Use Appstream Repository from Red Hat Universal Base Image
* Use Appstream Repository from Almalinux
2026-01-21 15:20:12 +01:00
Holger Obermaier
e70fd658f0 Update CI pipeline (#189)
* Updated Action "checkout" and "Setup golang"
* Update go-toolset to latest version
* Add golang-race dependency
* Update download-artifact and upload-artifact
2026-01-15 14:35:43 +01:00
Holger Obermaier
c58790cd54 Switch to cc-lib v2 2026-01-15 11:30:50 +01:00
dependabot[bot]
67ee09ffef Bump golang.org/x/sys from 0.38.0 to 0.39.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.38.0 to 0.39.0.
- [Commits](https://github.com/golang/sys/compare/v0.38.0...v0.39.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-version: 0.39.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-22 11:59:49 +01:00
dependabot[bot]
7f575269eb Bump github.com/ClusterCockpit/cc-lib from 0.11.0 to 1.0.2
Bumps [github.com/ClusterCockpit/cc-lib](https://github.com/ClusterCockpit/cc-lib) from 0.11.0 to 1.0.2.
- [Release notes](https://github.com/ClusterCockpit/cc-lib/releases)
- [Commits](https://github.com/ClusterCockpit/cc-lib/compare/v0.11.0...v1.0.2)

---
updated-dependencies:
- dependency-name: github.com/ClusterCockpit/cc-lib
  dependency-version: 1.0.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-22 11:59:33 +01:00
60 changed files with 2620 additions and 2043 deletions

View File

@@ -5,10 +5,10 @@ name: Release
# Run on tag push # Run on tag push
on: on:
push: push:
tags: tags:
- '**' - '**'
workflow_dispatch: workflow_dispatch:
jobs: jobs:
@@ -36,22 +36,14 @@ jobs:
# fetch-depth must be 0 to use git describe # fetch-depth must be 0 to use git describe
# See: https://github.com/marketplace/actions/checkout # See: https://github.com/marketplace/actions/checkout
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v6
with: with:
submodules: recursive submodules: recursive
fetch-depth: 0 fetch-depth: 0
# - name: Setup Golang
# uses: actions/setup-go@v5
# with:
# go-version: 'stable'
- name: Setup Golang - name: Setup Golang
run: | run: |
dnf --assumeyes --disableplugin=subscription-manager install \ dnf --assumeyes --disableplugin=subscription-manager --enablerepo appstream install go-toolset
https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/go-toolset-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.x86_64.rpm \
https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/golang-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.x86_64.rpm \
https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/golang-bin-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.x86_64.rpm \
https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/golang-src-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.noarch.rpm
- name: RPM build MetricCollector - name: RPM build MetricCollector
id: rpmbuild id: rpmbuild
@@ -78,13 +70,13 @@ jobs:
# See: https://github.com/actions/upload-artifact # See: https://github.com/actions/upload-artifact
- name: Save RPM as artifact - name: Save RPM as artifact
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v6
with: with:
name: cc-metric-collector RPM for AlmaLinux 8 name: cc-metric-collector RPM for AlmaLinux 8
path: ${{ steps.rpmrename.outputs.RPM }} path: ${{ steps.rpmrename.outputs.RPM }}
overwrite: true overwrite: true
- name: Save SRPM as artifact - name: Save SRPM as artifact
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v6
with: with:
name: cc-metric-collector SRPM for AlmaLinux 8 name: cc-metric-collector SRPM for AlmaLinux 8
path: ${{ steps.rpmrename.outputs.SRPM }} path: ${{ steps.rpmrename.outputs.SRPM }}
@@ -114,23 +106,14 @@ jobs:
# fetch-depth must be 0 to use git describe # fetch-depth must be 0 to use git describe
# See: https://github.com/marketplace/actions/checkout # See: https://github.com/marketplace/actions/checkout
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v6
with: with:
submodules: recursive submodules: recursive
fetch-depth: 0 fetch-depth: 0
# - name: Setup Golang
# uses: actions/setup-go@v5
# with:
# go-version: 'stable'
- name: Setup Golang - name: Setup Golang
run: | run: |
dnf --assumeyes --disableplugin=subscription-manager install \ dnf --assumeyes --disableplugin=subscription-manager --enablerepo appstream install go-toolset
https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/go-toolset-1.25.3-1.el9_7.x86_64.rpm \
https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-1.25.3-1.el9_7.x86_64.rpm \
https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-bin-1.25.3-1.el9_7.x86_64.rpm \
https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-src-1.25.3-1.el9_7.noarch.rpm \
https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-race-1.25.3-1.el9_7.x86_64.rpm
- name: RPM build MetricCollector - name: RPM build MetricCollector
id: rpmbuild id: rpmbuild
@@ -157,25 +140,26 @@ jobs:
# See: https://github.com/actions/upload-artifact # See: https://github.com/actions/upload-artifact
- name: Save RPM as artifact - name: Save RPM as artifact
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v6
with: with:
name: cc-metric-collector RPM for AlmaLinux 9 name: cc-metric-collector RPM for AlmaLinux 9
path: ${{ steps.rpmrename.outputs.RPM }} path: ${{ steps.rpmrename.outputs.RPM }}
overwrite: true overwrite: true
- name: Save SRPM as artifact - name: Save SRPM as artifact
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v6
with: with:
name: cc-metric-collector SRPM for AlmaLinux 9 name: cc-metric-collector SRPM for AlmaLinux 9
path: ${{ steps.rpmrename.outputs.SRPM }} path: ${{ steps.rpmrename.outputs.SRPM }}
overwrite: true overwrite: true
# #
# Build on UBI 8 using go-toolset # Build on Red Hat Universal Base Image (UBI 8) using go-toolset
# #
UBI-8-RPM-build: UBI-8-RPM-build:
runs-on: ubuntu-latest runs-on: ubuntu-latest
# See: https://catalog.redhat.com/software/containers/ubi8/ubi/5c35984d70cc534b3a3784e?container-tabs=gti # See: https://catalog.redhat.com/en/search?searchType=Containers&q=Red+Hat+Universal+Base+Image+8
container: registry.access.redhat.com/ubi8/ubi:8.8-1032.1692772289 # https://hub.docker.com/r/redhat/ubi8
container: redhat/ubi8
# The job outputs link to the outputs of the 'rpmbuild' step # The job outputs link to the outputs of the 'rpmbuild' step
outputs: outputs:
rpm : ${{steps.rpmbuild.outputs.RPM}} rpm : ${{steps.rpmbuild.outputs.RPM}}
@@ -190,22 +174,14 @@ jobs:
# fetch-depth must be 0 to use git describe # fetch-depth must be 0 to use git describe
# See: https://github.com/marketplace/actions/checkout # See: https://github.com/marketplace/actions/checkout
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v6
with: with:
submodules: recursive submodules: recursive
fetch-depth: 0 fetch-depth: 0
# - name: Setup Golang
# uses: actions/setup-go@v5
# with:
# go-version: 'stable'
- name: Setup Golang - name: Setup Golang
run: | run: |
dnf --assumeyes --disableplugin=subscription-manager install \ dnf --assumeyes --disableplugin=subscription-manager --enablerepo ubi-8-appstream-rpms install go-toolset
https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/go-toolset-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.x86_64.rpm \
https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/golang-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.x86_64.rpm \
https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/golang-bin-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.x86_64.rpm \
https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/golang-src-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.noarch.rpm
- name: RPM build MetricCollector - name: RPM build MetricCollector
id: rpmbuild id: rpmbuild
@@ -215,24 +191,25 @@ jobs:
# See: https://github.com/actions/upload-artifact # See: https://github.com/actions/upload-artifact
- name: Save RPM as artifact - name: Save RPM as artifact
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v6
with: with:
name: cc-metric-collector RPM for UBI 8 name: cc-metric-collector RPM for UBI 8
path: ${{ steps.rpmbuild.outputs.RPM }} path: ${{ steps.rpmbuild.outputs.RPM }}
overwrite: true overwrite: true
- name: Save SRPM as artifact - name: Save SRPM as artifact
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v6
with: with:
name: cc-metric-collector SRPM for UBI 8 name: cc-metric-collector SRPM for UBI 8
path: ${{ steps.rpmbuild.outputs.SRPM }} path: ${{ steps.rpmbuild.outputs.SRPM }}
overwrite: true overwrite: true
# #
# Build on UBI 9 using go-toolset # Build on Red Hat Universal Base Image (UBI 9) using go-toolset
# #
UBI-9-RPM-build: UBI-9-RPM-build:
runs-on: ubuntu-latest runs-on: ubuntu-latest
# See: https://catalog.redhat.com/software/containers/ubi8/ubi/5c359854d70cc534b3a3784e?container-tabs=gti # See: https://catalog.redhat.com/en/search?searchType=Containers&q=Red+Hat+Universal+Base+Image+9
# https://hub.docker.com/r/redhat/ubi9
container: redhat/ubi9 container: redhat/ubi9
# The job outputs link to the outputs of the 'rpmbuild' step # The job outputs link to the outputs of the 'rpmbuild' step
# The job outputs link to the outputs of the 'rpmbuild' step # The job outputs link to the outputs of the 'rpmbuild' step
@@ -243,30 +220,20 @@ jobs:
# Use dnf to install development packages # Use dnf to install development packages
- name: Install development packages - name: Install development packages
run: dnf --assumeyes --disableplugin=subscription-manager install rpm-build go-srpm-macros gcc make python39 git wget openssl-devel diffutils delve run: dnf --assumeyes --disableplugin=subscription-manager install rpm-build go-srpm-macros gcc make python39 git wget openssl-devel diffutils delve
# Checkout git repository and submodules # Checkout git repository and submodules
# fetch-depth must be 0 to use git describe # fetch-depth must be 0 to use git describe
# See: https://github.com/marketplace/actions/checkout # See: https://github.com/marketplace/actions/checkout
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v6
with: with:
submodules: recursive submodules: recursive
fetch-depth: 0 fetch-depth: 0
# See: https://github.com/marketplace/actions/setup-go-environment
# - name: Setup Golang
# uses: actions/setup-go@v5
# with:
# go-version: 'stable'
- name: Setup Golang - name: Setup Golang
run: | run: |
dnf --assumeyes --disableplugin=subscription-manager install \ dnf --assumeyes --disableplugin=subscription-manager --enablerepo ubi-9-appstream-rpms install go-toolset
https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/go-toolset-1.25.3-1.el9_7.x86_64.rpm \
https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-1.25.3-1.el9_7.x86_64.rpm \
https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-bin-1.25.3-1.el9_7.x86_64.rpm \
https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-src-1.25.3-1.el9_7.noarch.rpm \
https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-race-1.25.3-1.el9_7.x86_64.rpm
- name: RPM build MetricCollector - name: RPM build MetricCollector
id: rpmbuild id: rpmbuild
@@ -276,13 +243,13 @@ jobs:
# See: https://github.com/actions/upload-artifact # See: https://github.com/actions/upload-artifact
- name: Save RPM as artifact - name: Save RPM as artifact
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v6
with: with:
name: cc-metric-collector RPM for UBI 9 name: cc-metric-collector RPM for UBI 9
path: ${{ steps.rpmbuild.outputs.RPM }} path: ${{ steps.rpmbuild.outputs.RPM }}
overwrite: true overwrite: true
- name: Save SRPM as artifact - name: Save SRPM as artifact
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v6
with: with:
name: cc-metric-collector SRPM for UBI 9 name: cc-metric-collector SRPM for UBI 9
path: ${{ steps.rpmbuild.outputs.SRPM }} path: ${{ steps.rpmbuild.outputs.SRPM }}
@@ -308,13 +275,14 @@ jobs:
# fetch-depth must be 0 to use git describe # fetch-depth must be 0 to use git describe
# See: https://github.com/marketplace/actions/checkout # See: https://github.com/marketplace/actions/checkout
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v6
with: with:
submodules: recursive submodules: recursive
fetch-depth: 0 fetch-depth: 0
# Use official golang package
# See: https://github.com/marketplace/actions/setup-go-environment
- name: Setup Golang - name: Setup Golang
uses: actions/setup-go@v5 uses: actions/setup-go@v6
with: with:
go-version: 'stable' go-version: 'stable'
@@ -332,13 +300,13 @@ jobs:
echo "DEB=${NEW_DEB_FILE}" >> $GITHUB_OUTPUT echo "DEB=${NEW_DEB_FILE}" >> $GITHUB_OUTPUT
# See: https://github.com/actions/upload-artifact # See: https://github.com/actions/upload-artifact
- name: Save DEB as artifact - name: Save DEB as artifact
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v6
with: with:
name: cc-metric-collector DEB for Ubuntu 22.04 name: cc-metric-collector DEB for Ubuntu 22.04
path: ${{ steps.debrename.outputs.DEB }} path: ${{ steps.debrename.outputs.DEB }}
overwrite: true overwrite: true
# #
# Build on Ubuntu 24.04 using official go package # Build on Ubuntu 24.04 using official go package
# #
Ubuntu-noblenumbat-build: Ubuntu-noblenumbat-build:
@@ -358,13 +326,14 @@ jobs:
# fetch-depth must be 0 to use git describe # fetch-depth must be 0 to use git describe
# See: https://github.com/marketplace/actions/checkout # See: https://github.com/marketplace/actions/checkout
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v6
with: with:
submodules: recursive submodules: recursive
fetch-depth: 0 fetch-depth: 0
# Use official golang package
# See: https://github.com/marketplace/actions/setup-go-environment
- name: Setup Golang - name: Setup Golang
uses: actions/setup-go@v5 uses: actions/setup-go@v6
with: with:
go-version: 'stable' go-version: 'stable'
@@ -382,7 +351,7 @@ jobs:
echo "DEB=${NEW_DEB_FILE}" >> $GITHUB_OUTPUT echo "DEB=${NEW_DEB_FILE}" >> $GITHUB_OUTPUT
# See: https://github.com/actions/upload-artifact # See: https://github.com/actions/upload-artifact
- name: Save DEB as artifact - name: Save DEB as artifact
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v6
with: with:
name: cc-metric-collector DEB for Ubuntu 24.04 name: cc-metric-collector DEB for Ubuntu 24.04
path: ${{ steps.debrename.outputs.DEB }} path: ${{ steps.debrename.outputs.DEB }}
@@ -400,48 +369,48 @@ jobs:
steps: steps:
# See: https://github.com/actions/download-artifact # See: https://github.com/actions/download-artifact
- name: Download AlmaLinux 8 RPM - name: Download AlmaLinux 8 RPM
uses: actions/download-artifact@v4 uses: actions/download-artifact@v7
with: with:
name: cc-metric-collector RPM for AlmaLinux 8 name: cc-metric-collector RPM for AlmaLinux 8
- name: Download AlmaLinux 8 SRPM - name: Download AlmaLinux 8 SRPM
uses: actions/download-artifact@v4 uses: actions/download-artifact@v7
with: with:
name: cc-metric-collector SRPM for AlmaLinux 8 name: cc-metric-collector SRPM for AlmaLinux 8
- name: Download AlmaLinux 9 RPM - name: Download AlmaLinux 9 RPM
uses: actions/download-artifact@v4 uses: actions/download-artifact@v7
with: with:
name: cc-metric-collector RPM for AlmaLinux 9 name: cc-metric-collector RPM for AlmaLinux 9
- name: Download AlmaLinux 9 SRPM - name: Download AlmaLinux 9 SRPM
uses: actions/download-artifact@v4 uses: actions/download-artifact@v7
with: with:
name: cc-metric-collector SRPM for AlmaLinux 9 name: cc-metric-collector SRPM for AlmaLinux 9
- name: Download UBI 8 RPM - name: Download UBI 8 RPM
uses: actions/download-artifact@v4 uses: actions/download-artifact@v7
with: with:
name: cc-metric-collector RPM for UBI 8 name: cc-metric-collector RPM for UBI 8
- name: Download UBI 8 SRPM - name: Download UBI 8 SRPM
uses: actions/download-artifact@v4 uses: actions/download-artifact@v7
with: with:
name: cc-metric-collector SRPM for UBI 8 name: cc-metric-collector SRPM for UBI 8
- name: Download UBI 9 RPM - name: Download UBI 9 RPM
uses: actions/download-artifact@v4 uses: actions/download-artifact@v7
with: with:
name: cc-metric-collector RPM for UBI 9 name: cc-metric-collector RPM for UBI 9
- name: Download UBI 9 SRPM - name: Download UBI 9 SRPM
uses: actions/download-artifact@v4 uses: actions/download-artifact@v7
with: with:
name: cc-metric-collector SRPM for UBI 9 name: cc-metric-collector SRPM for UBI 9
- name: Download Ubuntu 22.04 DEB - name: Download Ubuntu 22.04 DEB
uses: actions/download-artifact@v4 uses: actions/download-artifact@v7
with: with:
name: cc-metric-collector DEB for Ubuntu 22.04 name: cc-metric-collector DEB for Ubuntu 22.04
- name: Download Ubuntu 24.04 DEB - name: Download Ubuntu 24.04 DEB
uses: actions/download-artifact@v4 uses: actions/download-artifact@v7
with: with:
name: cc-metric-collector DEB for Ubuntu 24.04 name: cc-metric-collector DEB for Ubuntu 24.04

View File

@@ -16,15 +16,7 @@ jobs:
# #
build-latest: build-latest:
runs-on: ubuntu-latest runs-on: ubuntu-latest
container: ubuntu:24.04
env:
CGO_LDFLAGS : "-L/usr/lib"
steps: steps:
# Use apt to install development packages
- name: Install development packages
run: |
apt -qq update && apt -qq --assume-yes upgrade
apt --assume-yes -qq install build-essential sed git wget bash hwloc libhwloc-dev
# See: https://github.com/marketplace/actions/checkout # See: https://github.com/marketplace/actions/checkout
# Checkout git repository and submodules # Checkout git repository and submodules
- name: Checkout - name: Checkout
@@ -36,7 +28,17 @@ jobs:
- name: Setup Golang - name: Setup Golang
uses: actions/setup-go@v6 uses: actions/setup-go@v6
with: with:
go-version-file: 'go.mod' go-version: 'stable'
check-latest: true
- name: Install reviewdog
run: |
go install github.com/reviewdog/reviewdog/cmd/reviewdog@latest
# See: https://golangci-lint.run
- name: Install GolangCI-Lint
run: |
go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@latest
- name: Build MetricCollector - name: Build MetricCollector
run: make run: make
@@ -44,247 +46,287 @@ jobs:
- name: Run MetricCollector once - name: Run MetricCollector once
run: ./cc-metric-collector --once --config .github/ci-config.json run: ./cc-metric-collector --once --config .github/ci-config.json
# # # Running the linter requires likwid.h, which gets downloaded in the build step
# # Build on AlmaLinux 8 - name: Static Analysis with GolangCI-Lint and Upload Report with reviewdog
# # run: |
# AlmaLinux8-RPM-build: golangci-lint run --enable errorlint,govet,misspell,modernize,prealloc,staticcheck,unconvert,wastedassign | reviewdog -f=golangci-lint -name "Check golangci-lint on build-latest" -reporter=github-check -filter-mode=nofilter -fail-level none
# runs-on: ubuntu-latest env:
# # See: https://hub.docker.com/_/almalinux REVIEWDOG_GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# container: almalinux:8
# # The job outputs link to the outputs of the 'rpmrename' step
# # Only job outputs can be used in child jobs
# steps:
# # Use dnf to install development packages #
# - name: Install development packages # Build on AlmaLinux 8 using go-toolset
# run: | #
# dnf --assumeyes group install "Development Tools" "RPM Development Tools" AlmaLinux8-RPM-build:
# dnf --assumeyes install wget openssl-devel diffutils delve which runs-on: ubuntu-latest
# See: https://hub.docker.com/_/almalinux
container: almalinux:8
# The job outputs link to the outputs of the 'rpmrename' step
# Only job outputs can be used in child jobs
steps:
# # Checkout git repository and submodules # Use dnf to install development packages
# # fetch-depth must be 0 to use git describe - name: Install development packages
# # See: https://github.com/marketplace/actions/checkout run: |
# - name: Checkout dnf --assumeyes group install "Development Tools" "RPM Development Tools"
# uses: actions/checkout@v4 dnf --assumeyes install wget openssl-devel diffutils delve which
# with:
# submodules: recursive
# fetch-depth: 0
# # See: https://github.com/marketplace/actions/setup-go-environment # Checkout git repository and submodules
# # - name: Setup Golang # fetch-depth must be 0 to use git describe
# # uses: actions/setup-go@v5 # See: https://github.com/marketplace/actions/checkout
# # with: - name: Checkout
# # go-version: 'stable' uses: actions/checkout@v6
# - name: Setup Golang with:
# run: | submodules: recursive
# dnf --assumeyes --disableplugin=subscription-manager install \ fetch-depth: 0
# https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/go-toolset-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.x86_64.rpm \
# https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/golang-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.x86_64.rpm \
# https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/golang-bin-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.x86_64.rpm \
# https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/golang-src-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.noarch.rpm
# - name: RPM build MetricCollector - name: Setup Golang
# id: rpmbuild run: |
# run: | dnf --assumeyes --disableplugin=subscription-manager --enablerepo appstream install go-toolset
# git config --global --add safe.directory /__w/cc-metric-collector/cc-metric-collector
# make RPM
# # - name: RPM build MetricCollector
# # Build on AlmaLinux 9 id: rpmbuild
# # run: |
# AlmaLinux9-RPM-build: git config --global --add safe.directory /__w/cc-metric-collector/cc-metric-collector
# runs-on: ubuntu-latest make RPM
# # See: https://hub.docker.com/_/almalinux
# container: almalinux:9
# # The job outputs link to the outputs of the 'rpmrename' step
# # Only job outputs can be used in child jobs
# steps:
# # Use dnf to install development packages #
# - name: Install development packages # Build on AlmaLinux 9 using go-toolset
# run: | #
# dnf --assumeyes group install "Development Tools" "RPM Development Tools" AlmaLinux9-RPM-build:
# dnf --assumeyes install wget openssl-devel diffutils delve which runs-on: ubuntu-latest
# See: https://hub.docker.com/_/almalinux
container: almalinux:9
# The job outputs link to the outputs of the 'rpmrename' step
# Only job outputs can be used in child jobs
steps:
# # Checkout git repository and submodules # Use dnf to install development packages
# # fetch-depth must be 0 to use git describe - name: Install development packages
# # See: https://github.com/marketplace/actions/checkout run: |
# - name: Checkout dnf --assumeyes group install "Development Tools" "RPM Development Tools"
# uses: actions/checkout@v4 dnf --assumeyes install wget openssl-devel diffutils delve which
# with:
# submodules: recursive
# fetch-depth: 0
# # See: https://github.com/marketplace/actions/setup-go-environment # Checkout git repository and submodules
# # - name: Setup Golang # fetch-depth must be 0 to use git describe
# # uses: actions/setup-go@v5 # See: https://github.com/marketplace/actions/checkout
# # with: - name: Checkout
# # go-version: 'stable' uses: actions/checkout@v6
# - name: Setup Golang with:
# run: | submodules: recursive
# dnf --assumeyes --disableplugin=subscription-manager install \ fetch-depth: 0
# https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/go-toolset-1.25.3-1.el9_7.x86_64.rpm \
# https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-1.25.3-1.el9_7.x86_64.rpm \
# https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-bin-1.25.3-1.el9_7.x86_64.rpm \
# https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-src-1.25.3-1.el9_7.noarch.rpm \
# https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-race-1.25.3-1.el9_7.x86_64.rpm
# - name: RPM build MetricCollector - name: Setup Golang
# id: rpmbuild run: |
# run: | dnf --assumeyes --disableplugin=subscription-manager --enablerepo appstream install go-toolset
# git config --global --add safe.directory /__w/cc-metric-collector/cc-metric-collector
# make RPM
- name: RPM build MetricCollector
id: rpmbuild
run: |
git config --global --add safe.directory /__w/cc-metric-collector/cc-metric-collector
make RPM
# # #
# # Build on UBI 8 using go-toolset # Build on AlmaLinux 10 using go-toolset
# # #
# UBI-8-RPM-build: AlmaLinux10-RPM-build:
# runs-on: ubuntu-latest runs-on: ubuntu-latest
# # See: https://catalog.redhat.com/software/containers/ubi8/ubi/5c359854d70cc534b3a3784e?container-tabs=gti # See: https://hub.docker.com/_/almalinux
# container: redhat/ubi8 container: almalinux:10
# # The job outputs link to the outputs of the 'rpmbuild' step # The job outputs link to the outputs of the 'rpmrename' step
# steps: # Only job outputs can be used in child jobs
steps:
# # Use dnf to install development packages # Use dnf to install development packages
# - name: Install development packages - name: Install development packages
# run: dnf --assumeyes --disableplugin=subscription-manager install rpm-build go-srpm-macros rpm-build-libs rpm-libs gcc make python38 git wget openssl-devel diffutils delve which run: |
dnf --assumeyes group install "Development Tools" "RPM Development Tools"
dnf --assumeyes install wget openssl-devel diffutils delve which
# # Checkout git repository and submodules # Checkout git repository and submodules
# # fetch-depth must be 0 to use git describe # fetch-depth must be 0 to use git describe
# # See: https://github.com/marketplace/actions/checkout # See: https://github.com/marketplace/actions/checkout
# - name: Checkout - name: Checkout
# uses: actions/checkout@v4 uses: actions/checkout@v6
# with: with:
# submodules: recursive submodules: recursive
# fetch-depth: 0 fetch-depth: 0
# # See: https://github.com/marketplace/actions/setup-go-environment - name: Setup Golang
# # - name: Setup Golang run: |
# # uses: actions/setup-go@v5 dnf --assumeyes --disableplugin=subscription-manager --enablerepo appstream install go-toolset
# # with:
# # go-version: 'stable'
# - name: Setup Golang
# run: |
# dnf --assumeyes --disableplugin=subscription-manager install \
# https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/go-toolset-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.x86_64.rpm \
# https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/golang-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.x86_64.rpm \
# https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/golang-bin-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.x86_64.rpm \
# https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/golang-src-1.23.9-1.module_el8.10.0+4000+1ad1b2cc.noarch.rpm
# - name: RPM build MetricCollector - name: RPM build MetricCollector
# id: rpmbuild id: rpmbuild
# run: | run: |
# git config --global --add safe.directory /__w/cc-metric-collector/cc-metric-collector git config --global --add safe.directory /__w/cc-metric-collector/cc-metric-collector
# make RPM make RPM
# # #
# # Build on UBI 9 using go-toolset # Build on Red Hat Universal Base Image (UBI 8) using go-toolset
# # #
# UBI-9-RPM-build: UBI-8-RPM-build:
# runs-on: ubuntu-latest runs-on: ubuntu-latest
# # See: https://catalog.redhat.com/software/containers/ubi8/ubi/5c359854d70cc534b3a3784e?container-tabs=gti # See: https://catalog.redhat.com/en/search?searchType=Containers&q=Red+Hat+Universal+Base+Image+8
# container: redhat/ubi9 # https://hub.docker.com/r/redhat/ubi8
# # The job outputs link to the outputs of the 'rpmbuild' step container: redhat/ubi8
# steps: # The job outputs link to the outputs of the 'rpmbuild' step
steps:
# # Use dnf to install development packages # Use dnf to install development packages
# - name: Install development packages - name: Install development packages
# run: dnf --assumeyes --disableplugin=subscription-manager install rpm-build go-srpm-macros gcc make python39 git wget openssl-devel diffutils delve run: dnf --assumeyes --disableplugin=subscription-manager install rpm-build go-srpm-macros rpm-build-libs rpm-libs gcc make python38 git wget openssl-devel diffutils delve which
# # Checkout git repository and submodules # Checkout git repository and submodules
# # fetch-depth must be 0 to use git describe # fetch-depth must be 0 to use git describe
# # See: https://github.com/marketplace/actions/checkout # See: https://github.com/marketplace/actions/checkout
# - name: Checkout - name: Checkout
# uses: actions/checkout@v4 uses: actions/checkout@v6
# with: with:
# submodules: recursive submodules: recursive
# fetch-depth: 0 fetch-depth: 0
# # See: https://github.com/marketplace/actions/setup-go-environment - name: Setup Golang
# # - name: Setup Golang run: |
# # uses: actions/setup-go@v5 dnf --assumeyes --disableplugin=subscription-manager --enablerepo ubi-8-appstream-rpms install go-toolset
# # with:
# # go-version: 'stable'
# - name: Setup Golang
# run: |
# dnf --assumeyes --disableplugin=subscription-manager install \
# https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/go-toolset-1.25.3-1.el9_7.x86_64.rpm \
# https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-1.25.3-1.el9_7.x86_64.rpm \
# https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-bin-1.25.3-1.el9_7.x86_64.rpm \
# https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-src-1.25.3-1.el9_7.noarch.rpm \
# https://repo.almalinux.org/almalinux/9/AppStream/x86_64/os/Packages/golang-race-1.25.3-1.el9_7.x86_64.rpm
# - name: RPM build MetricCollector - name: RPM build MetricCollector
# id: rpmbuild id: rpmbuild
# run: | run: |
# git config --global --add safe.directory /__w/cc-metric-collector/cc-metric-collector git config --global --add safe.directory /__w/cc-metric-collector/cc-metric-collector
# make RPM make RPM
# # #
# # Build on Ubuntu 22.04 using official go package # Build on Red Hat Universal Base Image (UBI 9) using go-toolset
# # #
# Ubuntu-jammy-build: UBI-9-RPM-build:
# runs-on: ubuntu-latest runs-on: ubuntu-latest
# container: ubuntu:22.04 # See: https://catalog.redhat.com/en/search?searchType=Containers&q=Red+Hat+Universal+Base+Image+9
# https://hub.docker.com/r/redhat/ubi9
container: redhat/ubi9
# The job outputs link to the outputs of the 'rpmbuild' step
steps:
# steps: # Use dnf to install development packages
# # Use apt to install development packages - name: Install development packages
# - name: Install development packages run: dnf --assumeyes --disableplugin=subscription-manager install rpm-build go-srpm-macros gcc make python39 git wget openssl-devel diffutils delve
# run: |
# apt update && apt --assume-yes upgrade
# apt --assume-yes install build-essential sed git wget bash
# # Checkout git repository and submodules
# # fetch-depth must be 0 to use git describe
# # See: https://github.com/marketplace/actions/checkout
# - name: Checkout
# uses: actions/checkout@v4
# with:
# submodules: recursive
# fetch-depth: 0
# # Use official golang package
# # See: https://github.com/marketplace/actions/setup-go-environment
# - name: Setup Golang
# uses: actions/setup-go@v5
# with:
# go-version: 'stable'
# - name: DEB build MetricCollector # Checkout git repository and submodules
# id: dpkg-build # fetch-depth must be 0 to use git describe
# run: | # See: https://github.com/marketplace/actions/checkout
# export PATH=/usr/local/go/bin:/usr/local/go/pkg/tool/linux_amd64:$PATH - name: Checkout
# make DEB uses: actions/checkout@v6
with:
submodules: recursive
fetch-depth: 0
# # - name: Setup Golang
# # Build on Ubuntu 24.04 using official go package run: |
# # dnf --assumeyes --disableplugin=subscription-manager --enablerepo ubi-9-appstream-rpms install go-toolset
# Ubuntu-noblenumbat-build:
# runs-on: ubuntu-latest
# container: ubuntu:24.04
# steps: - name: RPM build MetricCollector
# # Use apt to install development packages id: rpmbuild
# - name: Install development packages run: |
# run: | git config --global --add safe.directory /__w/cc-metric-collector/cc-metric-collector
# apt update && apt --assume-yes upgrade make RPM
# apt --assume-yes install build-essential sed git wget bash
# # Checkout git repository and submodules
# # fetch-depth must be 0 to use git describe
# # See: https://github.com/marketplace/actions/checkout
# - name: Checkout
# uses: actions/checkout@v4
# with:
# submodules: recursive
# fetch-depth: 0
# # Use official golang package
# # See: https://github.com/marketplace/actions/setup-go-environment
# - name: Setup Golang
# uses: actions/setup-go@v5
# with:
# go-version: 'stable'
# - name: DEB build MetricCollector #
# id: dpkg-build # Build on Red Hat Universal Base Image (UBI 10) using go-toolset
# run: | #
# export PATH=/usr/local/go/bin:/usr/local/go/pkg/tool/linux_amd64:$PATH UBI-10-RPM-build:
# make DEB runs-on: ubuntu-latest
# See: https://catalog.redhat.com/en/search?searchType=Containers&q=Red+Hat+Universal+Base+Image+10
# https://hub.docker.com/r/redhat/ubi10
container: redhat/ubi10
# The job outputs link to the outputs of the 'rpmbuild' step
steps:
# Use dnf to install development packages
- name: Install development packages
run: dnf --assumeyes --disableplugin=subscription-manager install rpm-build go-srpm-macros gcc make python3 git wget openssl-devel diffutils delve
# Checkout git repository and submodules
# fetch-depth must be 0 to use git describe
# See: https://github.com/marketplace/actions/checkout
- name: Checkout
uses: actions/checkout@v6
with:
submodules: recursive
fetch-depth: 0
- name: Setup Golang
run: |
dnf --assumeyes --disableplugin=subscription-manager --enablerepo ubi-10-for-x86_64-appstream-rpms install go-toolset
- name: RPM build MetricCollector
id: rpmbuild
run: |
git config --global --add safe.directory /__w/cc-metric-collector/cc-metric-collector
make RPM
#
# Build on Ubuntu 22.04 using official go package
#
Ubuntu-jammy-build:
runs-on: ubuntu-latest
container: ubuntu:22.04
steps:
# Use apt to install development packages
- name: Install development packages
run: |
apt update && apt --assume-yes upgrade
apt --assume-yes install build-essential sed git wget bash
# Checkout git repository and submodules
# fetch-depth must be 0 to use git describe
# See: https://github.com/marketplace/actions/checkout
- name: Checkout
uses: actions/checkout@v6
with:
submodules: recursive
fetch-depth: 0
# Use official golang package
# See: https://github.com/marketplace/actions/setup-go-environment
- name: Setup Golang
uses: actions/setup-go@v6
with:
go-version: 'stable'
- name: DEB build MetricCollector
id: dpkg-build
run: |
export PATH=/usr/local/go/bin:/usr/local/go/pkg/tool/linux_amd64:$PATH
make DEB
#
# Build on Ubuntu 24.04 using official go package
#
Ubuntu-noblenumbat-build:
runs-on: ubuntu-latest
container: ubuntu:24.04
steps:
# Use apt to install development packages
- name: Install development packages
run: |
apt update && apt --assume-yes upgrade
apt --assume-yes install build-essential sed git wget bash
# Checkout git repository and submodules
# fetch-depth must be 0 to use git describe
# See: https://github.com/marketplace/actions/checkout
- name: Checkout
uses: actions/checkout@v6
with:
submodules: recursive
fetch-depth: 0
# Use official golang package
# See: https://github.com/marketplace/actions/setup-go-environment
- name: Setup Golang
uses: actions/setup-go@v6
with:
go-version: 'stable'
- name: DEB build MetricCollector
id: dpkg-build
run: |
export PATH=/usr/local/go/bin:/usr/local/go/pkg/tool/linux_amd64:$PATH
make DEB

4
.gitignore vendored
View File

@@ -1,4 +1,5 @@
# Binaries for programs and plugins # Binaries for programs and plugins
/cc-metric-collector
*.exe *.exe
*.exe~ *.exe~
*.dll *.dll
@@ -13,3 +14,6 @@
# Dependency directories (remove the comment below to include it) # Dependency directories (remove the comment below to include it)
# vendor/ # vendor/
# Local copy of LIKWID headers
/collectors/likwid

View File

@@ -27,6 +27,17 @@ $(APP): $(GOSRC) go.mod
$(GOBIN) get $(GOBIN) get
$(GOBIN) build -o $(APP) $(GOSRC_APP) $(GOBIN) build -o $(APP) $(GOSRC_APP)
# -ldflags:
# -s : drops the OS symbol table
# -w : drops DWARF
# -> Panic stack traces still show function names and file:line
.PHONY: build-stripped
build-stripped:
make -C collectors
$(GOBIN) get
$(GOBIN) build -ldflags "-s -w" -trimpath -o $(APP) $(GOSRC_APP)
.PHONY: install
install: $(APP) install: $(APP)
@WORKSPACE=$(PREFIX) @WORKSPACE=$(PREFIX)
@if [ -z "$${WORKSPACE}" ]; then exit 1; fi @if [ -z "$${WORKSPACE}" ]; then exit 1; fi
@@ -58,12 +69,26 @@ fmt:
$(GOBIN) fmt $(GOSRC_APP) $(GOBIN) fmt $(GOSRC_APP)
@for F in $(GOSRC_INTERNAL); do $(GOBIN) fmt $$F; done @for F in $(GOSRC_INTERNAL); do $(GOBIN) fmt $$F; done
# gofumpt <https://github.com/mvdan/gofumpt>:
# Enforce a stricter format than gofmt
.PHONY: gofumpt
gofumpt:
$(GOBIN) install mvdan.cc/gofumpt@latest
gofumpt -w $(GOSRC_COLLECTORS)
gofumpt -w $(GOSRC_SINKS)
gofumpt -w $(GOSRC_RECEIVERS)
gofumpt -w $(GOSRC_APP)
@for F in $(GOSRC_INTERNAL); do gofumpt -w $$F; done
# Examine Go source code and reports suspicious constructs # Examine Go source code and reports suspicious constructs
.PHONY: vet .PHONY: vet
vet: vet:
$(GOBIN) vet ./... $(GOBIN) vet ./...
.PHONY: modernize
modernize:
$(GOBIN) run golang.org/x/tools/go/analysis/passes/modernize/cmd/modernize@latest ./...
# Run linter for the Go programming language. # Run linter for the Go programming language.
# Using static analysis, it finds bugs and performance issues, offers simplifications, and enforces style rules # Using static analysis, it finds bugs and performance issues, offers simplifications, and enforces style rules
@@ -72,6 +97,11 @@ staticcheck:
$(GOBIN) install honnef.co/go/tools/cmd/staticcheck@latest $(GOBIN) install honnef.co/go/tools/cmd/staticcheck@latest
$$($(GOBIN) env GOPATH)/bin/staticcheck ./... $$($(GOBIN) env GOPATH)/bin/staticcheck ./...
.PHONY: golangci-lint
golangci-lint:
$(GOBIN) install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@latest
$$($(GOBIN) env GOPATH)/bin/golangci-lint run --enable errorlint,govet,misspell,modernize,prealloc,staticcheck,unconvert,wastedassign
.ONESHELL: .ONESHELL:
.PHONY: RPM .PHONY: RPM
RPM: scripts/cc-metric-collector.spec RPM: scripts/cc-metric-collector.spec

View File

@@ -36,9 +36,10 @@ There is a main configuration file with basic settings that point to the other c
"collectors-file" : "collectors.json", "collectors-file" : "collectors.json",
"receivers-file" : "receivers.json", "receivers-file" : "receivers.json",
"router-file" : "router.json", "router-file" : "router.json",
"startup-file": "startup.json", "main": {
"interval": "10s", "interval": "10s",
"duration": "1s" "duration": "1s"
}
} }
``` ```
@@ -50,7 +51,6 @@ See the component READMEs for their configuration:
* [`sinks`](https://github.com/ClusterCockpit/cc-lib/blob/main/sinks/README.md) * [`sinks`](https://github.com/ClusterCockpit/cc-lib/blob/main/sinks/README.md)
* [`receivers`](https://github.com/ClusterCockpit/cc-lib/blob/main/receivers/README.md) * [`receivers`](https://github.com/ClusterCockpit/cc-lib/blob/main/receivers/README.md)
* [`router`](./internal/metricRouter/README.md) * [`router`](./internal/metricRouter/README.md)
* [`startup`](https://github.com/ClusterCockpit/cc-lib/blob/main/ccStartup/README.md)
# Installation # Installation

View File

@@ -8,24 +8,22 @@
package main package main
import ( import (
"bytes"
"encoding/json" "encoding/json"
"flag" "flag"
"os" "os"
"os/signal" "os/signal"
"syscall"
"github.com/ClusterCockpit/cc-lib/receivers"
"github.com/ClusterCockpit/cc-lib/sinks"
"github.com/ClusterCockpit/cc-metric-collector/collectors"
// "strings"
"sync" "sync"
"syscall"
"time" "time"
ccconf "github.com/ClusterCockpit/cc-lib/ccConfig" "github.com/ClusterCockpit/cc-lib/v2/receivers"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" "github.com/ClusterCockpit/cc-lib/v2/sinks"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" "github.com/ClusterCockpit/cc-metric-collector/collectors"
start "github.com/ClusterCockpit/cc-lib/ccStartup"
ccconf "github.com/ClusterCockpit/cc-lib/v2/ccConfig"
cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
mr "github.com/ClusterCockpit/cc-metric-collector/internal/metricRouter" mr "github.com/ClusterCockpit/cc-metric-collector/internal/metricRouter"
mct "github.com/ClusterCockpit/cc-metric-collector/pkg/multiChanTicker" mct "github.com/ClusterCockpit/cc-metric-collector/pkg/multiChanTicker"
) )
@@ -51,65 +49,25 @@ type RuntimeConfig struct {
Sync sync.WaitGroup Sync sync.WaitGroup
} }
//// Structure of the configuration file // ReadCli reads the command line arguments
//type GlobalConfig struct {
// Sink sinks.SinkConfig `json:"sink"`
// Interval int `json:"interval"`
// Duration int `json:"duration"`
// Collectors []string `json:"collectors"`
// Receiver receivers.ReceiverConfig `json:"receiver"`
// DefTags map[string]string `json:"default_tags"`
// CollectConfigs map[string]json.RawMessage `json:"collect_config"`
//}
//// Load JSON configuration file
//func LoadConfiguration(file string, config *GlobalConfig) error {
// configFile, err := os.Open(file)
// defer configFile.Close()
// if err != nil {
// fmt.Println(err.Error())
// return err
// }
// jsonParser := json.NewDecoder(configFile)
// err = jsonParser.Decode(config)
// return err
//}
func ReadCli() map[string]string { func ReadCli() map[string]string {
var m map[string]string
cfg := flag.String("config", "./config.json", "Path to configuration file") cfg := flag.String("config", "./config.json", "Path to configuration file")
logfile := flag.String("log", "stderr", "Path for logfile") logfile := flag.String("log", "stderr", "Path for logfile")
once := flag.Bool("once", false, "Run all collectors only once") once := flag.Bool("once", false, "Run all collectors only once")
loglevel := flag.String("loglevel", "info", "Set log level") loglevel := flag.String("loglevel", "info", "Set log level")
flag.Parse() flag.Parse()
m = make(map[string]string) m := map[string]string{
m["configfile"] = *cfg "configfile": *cfg,
m["logfile"] = *logfile "logfile": *logfile,
"once": "false",
"loglevel": *loglevel,
}
if *once { if *once {
m["once"] = "true" m["once"] = "true"
} else {
m["once"] = "false"
} }
m["loglevel"] = *loglevel
return m return m
} }
//func SetLogging(logfile string) error {
// var file *os.File
// var err error
// if logfile != "stderr" {
// file, err = os.OpenFile(logfile, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0600)
// if err != nil {
// log.Fatal(err)
// return err
// }
// } else {
// file = os.Stderr
// }
// log.SetOutput(file)
// return nil
//}
// General shutdownHandler function that gets executed in case of interrupt or graceful shutdownHandler // General shutdownHandler function that gets executed in case of interrupt or graceful shutdownHandler
func shutdownHandler(config *RuntimeConfig, shutdownSignal chan os.Signal) { func shutdownHandler(config *RuntimeConfig, shutdownSignal chan os.Signal) {
defer config.Sync.Done() defer config.Sync.Done()
@@ -163,9 +121,10 @@ func mainFunc() int {
// Load and check configuration // Load and check configuration
main := ccconf.GetPackageConfig("main") main := ccconf.GetPackageConfig("main")
err = json.Unmarshal(main, &rcfg.ConfigFile) d := json.NewDecoder(bytes.NewReader(main))
if err != nil { d.DisallowUnknownFields()
cclog.Error("Error reading configuration file ", rcfg.CliArgs["configfile"], ": ", err.Error()) if err := d.Decode(&rcfg.ConfigFile); err != nil {
cclog.Errorf("Error reading configuration file %s: %v", rcfg.CliArgs["configfile"], err)
return 1 return 1
} }
@@ -217,19 +176,6 @@ func mainFunc() int {
return 1 return 1
} }
startupConf := ccconf.GetPackageConfig("startup")
if startupConf != nil && len(startupConf) > 0 {
err := start.CCStartup(startupConf)
if err != nil {
cclog.Errorf("Sending startup topology failed: %s", err.Error())
}
}
// Set log file
// if logfile := rcfg.CliArgs["logfile"]; logfile != "stderr" {
// cclog.SetOutput(logfile)
// }
// Creat new multi channel ticker // Creat new multi channel ticker
rcfg.MultiChanTicker = mct.NewTicker(rcfg.Interval) rcfg.MultiChanTicker = mct.NewTicker(rcfg.Interval)

View File

@@ -1,5 +1,5 @@
# LIKWID version # LIKWID version
LIKWID_VERSION := 5.4.1 LIKWID_VERSION := 5.5.1
LIKWID_INSTALLED_FOLDER := $(shell dirname $$(which likwid-topology 2>/dev/null) 2>/dev/null) LIKWID_INSTALLED_FOLDER := $(shell dirname $$(which likwid-topology 2>/dev/null) 2>/dev/null)
LIKWID_FOLDER := $(CURDIR)/likwid LIKWID_FOLDER := $(CURDIR)/likwid

View File

@@ -59,6 +59,7 @@ In contrast to the configuration files for sinks and receivers, the collectors c
* [ ] Aggreate metrics to higher topology entity (sum hwthread metrics to socket metric, ...). Needs to be configurable * [ ] Aggreate metrics to higher topology entity (sum hwthread metrics to socket metric, ...). Needs to be configurable
# Contributing own collectors # Contributing own collectors
A collector reads data from any source, parses it to metrics and submits these metrics to the `metric-collector`. A collector provides three function: A collector reads data from any source, parses it to metrics and submits these metrics to the `metric-collector`. A collector provides three function:
* `Name() string`: Return the name of the collector * `Name() string`: Return the name of the collector
@@ -67,7 +68,7 @@ A collector reads data from any source, parses it to metrics and submits these m
* `Read(duration time.Duration, output chan ccMessage.CCMessage)`: Read, parse and submit data to the `output` channel as [`CCMessage`](https://github.com/ClusterCockpit/cc-lib/blob/main/ccMessage/README.md). If the collector has to measure anything for some duration, use the provided function argument `duration`. * `Read(duration time.Duration, output chan ccMessage.CCMessage)`: Read, parse and submit data to the `output` channel as [`CCMessage`](https://github.com/ClusterCockpit/cc-lib/blob/main/ccMessage/README.md). If the collector has to measure anything for some duration, use the provided function argument `duration`.
* `Close()`: Closes down the collector. * `Close()`: Closes down the collector.
It is recommanded to call `setup()` in the `Init()` function. It is recommended to call `setup()` in the `Init()` function.
Finally, the collector needs to be registered in the `collectorManager.go`. There is a list of collectors called `AvailableCollectors` which is a map (`collector_type_string` -> `pointer to MetricCollector interface`). Add a new entry with a descriptive name and the new collector. Finally, the collector needs to be registered in the `collectorManager.go`. There is a list of collectors called `AvailableCollectors` which is a map (`collector_type_string` -> `pointer to MetricCollector interface`). Add a new entry with a descriptive name and the new collector.
@@ -100,11 +101,14 @@ func (m *SampleCollector) Init(config json.RawMessage) error {
} }
m.name = "SampleCollector" m.name = "SampleCollector"
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
if len(config) > 0 { if len(config) > 0 {
err := json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
m.meta = map[string]string{"source": m.name, "group": "Sample"} m.meta = map[string]string{"source": m.name, "group": "Sample"}

View File

@@ -17,25 +17,27 @@ import (
"os/exec" "os/exec"
"os/user" "os/user"
"regexp" "regexp"
"slices"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
const DEFAULT_BEEGFS_CMD = "beegfs-ctl" const DEFAULT_BEEGFS_CMD = "beegfs-ctl"
// Struct for the collector-specific JSON config // Struct for the collector-specific JSON config
type BeegfsMetaCollectorConfig struct { type BeegfsMetaCollectorConfig struct {
Beegfs string `json:"beegfs_path"` Beegfs string `json:"beegfs_path"`
ExcludeMetrics []string `json:"exclude_metrics,omitempty"` ExcludeMetrics []string `json:"exclude_metrics,omitempty"`
ExcludeFilesystem []string `json:"exclude_filesystem"` ExcludeFilesystems []string `json:"exclude_filesystem"`
} }
type BeegfsMetaCollector struct { type BeegfsMetaCollector struct {
metricCollector metricCollector
tags map[string]string tags map[string]string
matches map[string]string matches map[string]string
config BeegfsMetaCollectorConfig config BeegfsMetaCollectorConfig
@@ -48,7 +50,7 @@ func (m *BeegfsMetaCollector) Init(config json.RawMessage) error {
return nil return nil
} }
// Metrics // Metrics
var nodeMdstat_array = [39]string{ nodeMdstat_array := [39]string{
"sum", "ack", "close", "entInf", "sum", "ack", "close", "entInf",
"fndOwn", "mkdir", "create", "rddir", "fndOwn", "mkdir", "create", "rddir",
"refrEn", "mdsInf", "rmdir", "rmLnk", "refrEn", "mdsInf", "rmdir", "rmLnk",
@@ -58,10 +60,13 @@ func (m *BeegfsMetaCollector) Init(config json.RawMessage) error {
"lookLI", "statLI", "revalLI", "openLI", "lookLI", "statLI", "revalLI", "openLI",
"createLI", "hardlnk", "flckAp", "flckEn", "createLI", "hardlnk", "flckAp", "flckEn",
"flckRg", "dirparent", "listXA", "getXA", "flckRg", "dirparent", "listXA", "getXA",
"rmXA", "setXA", "mirror"} "rmXA", "setXA", "mirror",
}
m.name = "BeegfsMetaCollector" m.name = "BeegfsMetaCollector"
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true m.parallel = true
// Set default beegfs-ctl binary // Set default beegfs-ctl binary
@@ -69,17 +74,17 @@ func (m *BeegfsMetaCollector) Init(config json.RawMessage) error {
// Read JSON configuration // Read JSON configuration
if len(config) > 0 { if len(config) > 0 {
err := json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Failed to decode JSON config: %w", m.name, err)
} }
} }
//create map with possible variables // Create map with possible variables
m.matches = make(map[string]string) m.matches = make(map[string]string)
for _, value := range nodeMdstat_array { for _, value := range nodeMdstat_array {
_, skip := stringArrayContains(m.config.ExcludeMetrics, value) if slices.Contains(m.config.ExcludeMetrics, value) {
if skip {
m.matches["other"] = "0" m.matches["other"] = "0"
} else { } else {
m.matches["beegfs_cmeta_"+value] = "0" m.matches["beegfs_cmeta_"+value] = "0"
@@ -95,23 +100,23 @@ func (m *BeegfsMetaCollector) Init(config json.RawMessage) error {
"filesystem": "", "filesystem": "",
} }
m.skipFS = make(map[string]struct{}) m.skipFS = make(map[string]struct{})
for _, fs := range m.config.ExcludeFilesystem { for _, fs := range m.config.ExcludeFilesystems {
m.skipFS[fs] = struct{}{} m.skipFS[fs] = struct{}{}
} }
// Beegfs file system statistics can only be queried by user root // Beegfs file system statistics can only be queried by user root
user, err := user.Current() user, err := user.Current()
if err != nil { if err != nil {
return fmt.Errorf("BeegfsMetaCollector.Init(): Failed to get current user: %v", err) return fmt.Errorf("%s Init(): Failed to get current user: %w", m.name, err)
} }
if user.Uid != "0" { if user.Uid != "0" {
return fmt.Errorf("BeegfsMetaCollector.Init(): BeeGFS file system statistics can only be queried by user root") return fmt.Errorf("%s Init(): BeeGFS file system statistics can only be queried by user root", m.name)
} }
// Check if beegfs-ctl is in executable search path // Check if beegfs-ctl is in executable search path
_, err = exec.LookPath(m.config.Beegfs) _, err = exec.LookPath(m.config.Beegfs)
if err != nil { if err != nil {
return fmt.Errorf("BeegfsMetaCollector.Init(): Failed to find beegfs-ctl binary '%s': %v", m.config.Beegfs, err) return fmt.Errorf("%s Init(): Failed to find beegfs-ctl binary '%s': %w", m.name, m.config.Beegfs, err)
} }
m.init = true m.init = true
return nil return nil
@@ -121,7 +126,7 @@ func (m *BeegfsMetaCollector) Read(interval time.Duration, output chan lp.CCMess
if !m.init { if !m.init {
return return
} }
//get mounpoint // Get mounpoint
buffer, _ := os.ReadFile(string("/proc/mounts")) buffer, _ := os.ReadFile(string("/proc/mounts"))
mounts := strings.Split(string(buffer), "\n") mounts := strings.Split(string(buffer), "\n")
var mountpoints []string var mountpoints []string
@@ -151,7 +156,6 @@ func (m *BeegfsMetaCollector) Read(interval time.Duration, output chan lp.CCMess
// --nodetype=meta: The node type to query (meta, storage). // --nodetype=meta: The node type to query (meta, storage).
// --interval: // --interval:
// --mount=/mnt/beeond/: Which mount point // --mount=/mnt/beeond/: Which mount point
//cmd := exec.Command(m.config.Beegfs, "/root/mc/test.txt")
mountoption := "--mount=" + mountpoint mountoption := "--mount=" + mountpoint
cmd := exec.Command(m.config.Beegfs, "--clientstats", cmd := exec.Command(m.config.Beegfs, "--clientstats",
"--nodetype=meta", mountoption, "--allstats") "--nodetype=meta", mountoption, "--allstats")
@@ -162,26 +166,27 @@ func (m *BeegfsMetaCollector) Read(interval time.Duration, output chan lp.CCMess
cmd.Stderr = cmdStderr cmd.Stderr = cmdStderr
err := cmd.Run() err := cmd.Run()
if err != nil { if err != nil {
fmt.Fprintf(os.Stderr, "BeegfsMetaCollector.Read(): Failed to execute command \"%s\": %s\n", cmd.String(), err.Error()) dataStdErr, _ := io.ReadAll(cmdStderr)
fmt.Fprintf(os.Stderr, "BeegfsMetaCollector.Read(): command exit code: \"%d\"\n", cmd.ProcessState.ExitCode()) dataStdOut, _ := io.ReadAll(cmdStdout)
data, _ := io.ReadAll(cmdStderr) cclog.ComponentError(
fmt.Fprintf(os.Stderr, "BeegfsMetaCollector.Read(): command stderr: \"%s\"\n", string(data)) m.name,
data, _ = io.ReadAll(cmdStdout) fmt.Sprintf("Read(): Failed to execute command \"%s\": %v\n", cmd.String(), err),
fmt.Fprintf(os.Stderr, "BeegfsMetaCollector.Read(): command stdout: \"%s\"\n", string(data)) fmt.Sprintf("Read(): command exit code: \"%d\"\n", cmd.ProcessState.ExitCode()),
fmt.Sprintf("Read(): command stderr: \"%s\"\n", string(dataStdErr)),
fmt.Sprintf("Read(): command stdout: \"%s\"\n", string(dataStdOut)),
)
return return
} }
// Read I/O statistics // Read I/O statistics
scanner := bufio.NewScanner(cmdStdout) scanner := bufio.NewScanner(cmdStdout)
sumLine := regexp.MustCompile(`^Sum:\s+\d+\s+\[[a-zA-Z]+\]+`) sumLine := regexp.MustCompile(`^Sum:\s+\d+\s+\[[a-zA-Z]+\]+`)
//Line := regexp.MustCompile(`^(.*)\s+(\d)+\s+\[([a-zA-Z]+)\]+`)
statsLine := regexp.MustCompile(`^(.*?)\s+?(\d.*?)$`) statsLine := regexp.MustCompile(`^(.*?)\s+?(\d.*?)$`)
singleSpacePattern := regexp.MustCompile(`\s+`) singleSpacePattern := regexp.MustCompile(`\s+`)
removePattern := regexp.MustCompile(`[\[|\]]`) removePattern := regexp.MustCompile(`[\[|\]]`)
for scanner.Scan() { for scanner.Scan() {
readLine := scanner.Text() readLine := scanner.Text()
//fmt.Println(readLine)
// Jump few lines, we only want the I/O stats from nodes // Jump few lines, we only want the I/O stats from nodes
if !sumLine.MatchString(readLine) { if !sumLine.MatchString(readLine) {
continue continue
@@ -190,7 +195,7 @@ func (m *BeegfsMetaCollector) Read(interval time.Duration, output chan lp.CCMess
match := statsLine.FindStringSubmatch(readLine) match := statsLine.FindStringSubmatch(readLine)
// nodeName = "Sum:" or would be nodes // nodeName = "Sum:" or would be nodes
// nodeName := match[1] // nodeName := match[1]
//Remove multiple whitespaces // Remove multiple whitespaces
dummy := removePattern.ReplaceAllString(match[2], " ") dummy := removePattern.ReplaceAllString(match[2], " ")
metaStats := strings.TrimSpace(singleSpacePattern.ReplaceAllString(dummy, " ")) metaStats := strings.TrimSpace(singleSpacePattern.ReplaceAllString(dummy, " "))
split := strings.Split(metaStats, " ") split := strings.Split(metaStats, " ")
@@ -216,14 +221,13 @@ func (m *BeegfsMetaCollector) Read(interval time.Duration, output chan lp.CCMess
fmt.Sprintf("Metric (other): Failed to convert str written '%s' to float: %v", m.matches["other"], err)) fmt.Sprintf("Metric (other): Failed to convert str written '%s' to float: %v", m.matches["other"], err))
continue continue
} }
//mdStat["other"] = fmt.Sprintf("%f", f1+f2)
m.matches["beegfs_cstorage_other"] = fmt.Sprintf("%f", f1+f2) m.matches["beegfs_cstorage_other"] = fmt.Sprintf("%f", f1+f2)
} }
} }
for key, data := range m.matches { for key, data := range m.matches {
value, _ := strconv.ParseFloat(data, 32) value, _ := strconv.ParseFloat(data, 32)
y, err := lp.NewMessage(key, m.tags, m.meta, map[string]interface{}{"value": value}, time.Now()) y, err := lp.NewMessage(key, m.tags, m.meta, map[string]any{"value": value}, time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }

View File

@@ -17,23 +17,25 @@ import (
"os/exec" "os/exec"
"os/user" "os/user"
"regexp" "regexp"
"slices"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
// Struct for the collector-specific JSON config // Struct for the collector-specific JSON config
type BeegfsStorageCollectorConfig struct { type BeegfsStorageCollectorConfig struct {
Beegfs string `json:"beegfs_path"` Beegfs string `json:"beegfs_path"`
ExcludeMetrics []string `json:"exclude_metrics,omitempty"` ExcludeMetrics []string `json:"exclude_metrics,omitempty"`
ExcludeFilesystem []string `json:"exclude_filesystem"` ExcludeFilesystems []string `json:"exclude_filesystem"`
} }
type BeegfsStorageCollector struct { type BeegfsStorageCollector struct {
metricCollector metricCollector
tags map[string]string tags map[string]string
matches map[string]string matches map[string]string
config BeegfsStorageCollectorConfig config BeegfsStorageCollectorConfig
@@ -46,15 +48,18 @@ func (m *BeegfsStorageCollector) Init(config json.RawMessage) error {
return nil return nil
} }
// Metrics // Metrics
var storageStat_array = [18]string{ storageStat_array := [18]string{
"sum", "ack", "sChDrct", "getFSize", "sum", "ack", "sChDrct", "getFSize",
"sAttr", "statfs", "trunc", "close", "sAttr", "statfs", "trunc", "close",
"fsync", "ops-rd", "MiB-rd/s", "ops-wr", "fsync", "ops-rd", "MiB-rd/s", "ops-wr",
"MiB-wr/s", "gendbg", "hrtbeat", "remNode", "MiB-wr/s", "gendbg", "hrtbeat", "remNode",
"storInf", "unlnk"} "storInf", "unlnk",
}
m.name = "BeegfsStorageCollector" m.name = "BeegfsStorageCollector"
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true m.parallel = true
// Set default beegfs-ctl binary // Set default beegfs-ctl binary
@@ -62,17 +67,17 @@ func (m *BeegfsStorageCollector) Init(config json.RawMessage) error {
// Read JSON configuration // Read JSON configuration
if len(config) > 0 { if len(config) > 0 {
err := json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): failed to decode JSON config: %w", m.name, err)
} }
} }
println(m.config.Beegfs)
//create map with possible variables // Create map with possible variables
m.matches = make(map[string]string) m.matches = make(map[string]string)
for _, value := range storageStat_array { for _, value := range storageStat_array {
_, skip := stringArrayContains(m.config.ExcludeMetrics, value) if slices.Contains(m.config.ExcludeMetrics, value) {
if skip {
m.matches["other"] = "0" m.matches["other"] = "0"
} else { } else {
m.matches["beegfs_cstorage_"+value] = "0" m.matches["beegfs_cstorage_"+value] = "0"
@@ -88,23 +93,23 @@ func (m *BeegfsStorageCollector) Init(config json.RawMessage) error {
"filesystem": "", "filesystem": "",
} }
m.skipFS = make(map[string]struct{}) m.skipFS = make(map[string]struct{})
for _, fs := range m.config.ExcludeFilesystem { for _, fs := range m.config.ExcludeFilesystems {
m.skipFS[fs] = struct{}{} m.skipFS[fs] = struct{}{}
} }
// Beegfs file system statistics can only be queried by user root // Beegfs file system statistics can only be queried by user root
user, err := user.Current() user, err := user.Current()
if err != nil { if err != nil {
return fmt.Errorf("BeegfsStorageCollector.Init(): Failed to get current user: %v", err) return fmt.Errorf("%s Init(): Failed to get current user: %w", m.name, err)
} }
if user.Uid != "0" { if user.Uid != "0" {
return fmt.Errorf("BeegfsStorageCollector.Init(): BeeGFS file system statistics can only be queried by user root") return fmt.Errorf("%s Init(): BeeGFS file system statistics can only be queried by user root", m.name)
} }
// Check if beegfs-ctl is in executable search path // Check if beegfs-ctl is in executable search path
_, err = exec.LookPath(m.config.Beegfs) _, err = exec.LookPath(m.config.Beegfs)
if err != nil { if err != nil {
return fmt.Errorf("BeegfsStorageCollector.Init(): Failed to find beegfs-ctl binary '%s': %v", m.config.Beegfs, err) return fmt.Errorf("%s Init(): Failed to find beegfs-ctl binary '%s': %w", m.name, m.config.Beegfs, err)
} }
m.init = true m.init = true
return nil return nil
@@ -114,11 +119,10 @@ func (m *BeegfsStorageCollector) Read(interval time.Duration, output chan lp.CCM
if !m.init { if !m.init {
return return
} }
//get mounpoint // Get mounpoint
buffer, _ := os.ReadFile(string("/proc/mounts")) buffer, _ := os.ReadFile("/proc/mounts")
mounts := strings.Split(string(buffer), "\n")
var mountpoints []string var mountpoints []string
for _, line := range mounts { for line := range strings.Lines(string(buffer)) {
if len(line) == 0 { if len(line) == 0 {
continue continue
} }
@@ -143,7 +147,6 @@ func (m *BeegfsStorageCollector) Read(interval time.Duration, output chan lp.CCM
// --nodetype=meta: The node type to query (meta, storage). // --nodetype=meta: The node type to query (meta, storage).
// --interval: // --interval:
// --mount=/mnt/beeond/: Which mount point // --mount=/mnt/beeond/: Which mount point
//cmd := exec.Command(m.config.Beegfs, "/root/mc/test.txt")
mountoption := "--mount=" + mountpoint mountoption := "--mount=" + mountpoint
cmd := exec.Command(m.config.Beegfs, "--clientstats", cmd := exec.Command(m.config.Beegfs, "--clientstats",
"--nodetype=storage", mountoption, "--allstats") "--nodetype=storage", mountoption, "--allstats")
@@ -154,26 +157,27 @@ func (m *BeegfsStorageCollector) Read(interval time.Duration, output chan lp.CCM
cmd.Stderr = cmdStderr cmd.Stderr = cmdStderr
err := cmd.Run() err := cmd.Run()
if err != nil { if err != nil {
fmt.Fprintf(os.Stderr, "BeegfsStorageCollector.Read(): Failed to execute command \"%s\": %s\n", cmd.String(), err.Error()) dataStdErr, _ := io.ReadAll(cmdStderr)
fmt.Fprintf(os.Stderr, "BeegfsStorageCollector.Read(): command exit code: \"%d\"\n", cmd.ProcessState.ExitCode()) dataStdOut, _ := io.ReadAll(cmdStdout)
data, _ := io.ReadAll(cmdStderr) cclog.ComponentError(
fmt.Fprintf(os.Stderr, "BeegfsStorageCollector.Read(): command stderr: \"%s\"\n", string(data)) m.name,
data, _ = io.ReadAll(cmdStdout) fmt.Sprintf("Read(): Failed to execute command \"%s\": %v\n", cmd.String(), err),
fmt.Fprintf(os.Stderr, "BeegfsStorageCollector.Read(): command stdout: \"%s\"\n", string(data)) fmt.Sprintf("Read(): command exit code: \"%d\"\n", cmd.ProcessState.ExitCode()),
fmt.Sprintf("Read(): command stderr: \"%s\"\n", string(dataStdErr)),
fmt.Sprintf("Read(): command stdout: \"%s\"\n", string(dataStdOut)),
)
return return
} }
// Read I/O statistics // Read I/O statistics
scanner := bufio.NewScanner(cmdStdout) scanner := bufio.NewScanner(cmdStdout)
sumLine := regexp.MustCompile(`^Sum:\s+\d+\s+\[[a-zA-Z]+\]+`) sumLine := regexp.MustCompile(`^Sum:\s+\d+\s+\[[a-zA-Z]+\]+`)
//Line := regexp.MustCompile(`^(.*)\s+(\d)+\s+\[([a-zA-Z]+)\]+`)
statsLine := regexp.MustCompile(`^(.*?)\s+?(\d.*?)$`) statsLine := regexp.MustCompile(`^(.*?)\s+?(\d.*?)$`)
singleSpacePattern := regexp.MustCompile(`\s+`) singleSpacePattern := regexp.MustCompile(`\s+`)
removePattern := regexp.MustCompile(`[\[|\]]`) removePattern := regexp.MustCompile(`[\[|\]]`)
for scanner.Scan() { for scanner.Scan() {
readLine := scanner.Text() readLine := scanner.Text()
//fmt.Println(readLine)
// Jump few lines, we only want the I/O stats from nodes // Jump few lines, we only want the I/O stats from nodes
if !sumLine.MatchString(readLine) { if !sumLine.MatchString(readLine) {
continue continue
@@ -182,7 +186,7 @@ func (m *BeegfsStorageCollector) Read(interval time.Duration, output chan lp.CCM
match := statsLine.FindStringSubmatch(readLine) match := statsLine.FindStringSubmatch(readLine)
// nodeName = "Sum:" or would be nodes // nodeName = "Sum:" or would be nodes
// nodeName := match[1] // nodeName := match[1]
//Remove multiple whitespaces // Remove multiple whitespaces
dummy := removePattern.ReplaceAllString(match[2], " ") dummy := removePattern.ReplaceAllString(match[2], " ")
metaStats := strings.TrimSpace(singleSpacePattern.ReplaceAllString(dummy, " ")) metaStats := strings.TrimSpace(singleSpacePattern.ReplaceAllString(dummy, " "))
split := strings.Split(metaStats, " ") split := strings.Split(metaStats, " ")
@@ -193,7 +197,6 @@ func (m *BeegfsStorageCollector) Read(interval time.Duration, output chan lp.CCM
for i := 0; i <= len(split)-1; i += 2 { for i := 0; i <= len(split)-1; i += 2 {
if _, ok := m.matches[split[i+1]]; ok { if _, ok := m.matches[split[i+1]]; ok {
m.matches["beegfs_cstorage_"+split[i+1]] = split[i] m.matches["beegfs_cstorage_"+split[i+1]] = split[i]
//m.matches[split[i+1]] = split[i]
} else { } else {
f1, err := strconv.ParseFloat(m.matches["other"], 32) f1, err := strconv.ParseFloat(m.matches["other"], 32)
if err != nil { if err != nil {
@@ -215,7 +218,7 @@ func (m *BeegfsStorageCollector) Read(interval time.Duration, output chan lp.CCM
for key, data := range m.matches { for key, data := range m.matches {
value, _ := strconv.ParseFloat(data, 32) value, _ := strconv.ParseFloat(data, 32)
y, err := lp.NewMessage(key, m.tags, m.meta, map[string]interface{}{"value": value}, time.Now()) y, err := lp.NewMessage(key, m.tags, m.meta, map[string]any{"value": value}, time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }

View File

@@ -14,14 +14,14 @@ This Collector is to collect BeeGFS on Demand (BeeOND) storage stats.
```json ```json
"beegfs_storage": { "beegfs_storage": {
"beegfs_path": "/usr/bin/beegfs-ctl", "beegfs_path": "/usr/bin/beegfs-ctl",
"exclude_filesystem": [ "exclude_filesystem": [
"/mnt/ignore_me" "/mnt/ignore_me"
], ],
"exclude_metrics": [ "exclude_metrics": [
"ack", "ack",
"storInf", "storInf",
"unlnk" "unlnk"
] ]
} }
``` ```

View File

@@ -8,18 +8,19 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"fmt"
"sync" "sync"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
mct "github.com/ClusterCockpit/cc-metric-collector/pkg/multiChanTicker" mct "github.com/ClusterCockpit/cc-metric-collector/pkg/multiChanTicker"
) )
// Map of all available metric collectors // Map of all available metric collectors
var AvailableCollectors = map[string]MetricCollector{ var AvailableCollectors = map[string]MetricCollector{
"likwid": new(LikwidCollector), "likwid": new(LikwidCollector),
"loadavg": new(LoadavgCollector), "loadavg": new(LoadavgCollector),
"memstat": new(MemstatCollector), "memstat": new(MemstatCollector),
@@ -48,6 +49,7 @@ var AvailableCollectors = map[string]MetricCollector{
"schedstat": new(SchedstatCollector), "schedstat": new(SchedstatCollector),
"nfsiostat": new(NfsIOStatCollector), "nfsiostat": new(NfsIOStatCollector),
"slurm_cgroup": new(SlurmCgroupCollector), "slurm_cgroup": new(SlurmCgroupCollector),
"smartmon": new(SmartMonCollector),
} }
// Metric collector manager data structure // Metric collector manager data structure
@@ -88,10 +90,10 @@ func (cm *collectorManager) Init(ticker mct.MultiChanTicker, duration time.Durat
cm.ticker = ticker cm.ticker = ticker
cm.duration = duration cm.duration = duration
err := json.Unmarshal(collectConfig, &cm.config) d := json.NewDecoder(bytes.NewReader(collectConfig))
if err != nil { d.DisallowUnknownFields()
cclog.Error(err.Error()) if err := d.Decode(&cm.config); err != nil {
return err return fmt.Errorf("%s Init(): Error decoding collector manager config: %w", "CollectorManager", err)
} }
// Initialize configured collectors // Initialize configured collectors
@@ -102,9 +104,9 @@ func (cm *collectorManager) Init(ticker mct.MultiChanTicker, duration time.Durat
} }
collector := AvailableCollectors[collectorName] collector := AvailableCollectors[collectorName]
err = collector.Init(collectorCfg) err := collector.Init(collectorCfg)
if err != nil { if err != nil {
cclog.ComponentError("CollectorManager", "Collector", collectorName, "initialization failed:", err.Error()) cclog.ComponentError("CollectorManager", fmt.Sprintf("Collector %s initialization failed: %v", collectorName, err))
continue continue
} }
cclog.ComponentDebug("CollectorManager", "ADD COLLECTOR", collector.Name()) cclog.ComponentDebug("CollectorManager", "ADD COLLECTOR", collector.Name())
@@ -122,9 +124,7 @@ func (cm *collectorManager) Start() {
tick := make(chan time.Time) tick := make(chan time.Time)
cm.ticker.AddChannel(tick) cm.ticker.AddChannel(tick)
cm.wg.Add(1) cm.wg.Go(func() {
go func() {
defer cm.wg.Done()
// Collector manager is done // Collector manager is done
done := func() { done := func() {
// close all metric collectors // close all metric collectors
@@ -179,7 +179,7 @@ func (cm *collectorManager) Start() {
} }
} }
} }
}() })
// Collector manager is started // Collector manager is started
cclog.ComponentDebug("CollectorManager", "STARTED") cclog.ComponentDebug("CollectorManager", "STARTED")

View File

@@ -10,15 +10,14 @@ package collectors
import ( import (
"bufio" "bufio"
"encoding/json" "encoding/json"
"fmt" "fmt"
"os" "os"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
// CPUFreqCollector // CPUFreqCollector
@@ -32,18 +31,20 @@ type CPUFreqCpuInfoCollectorTopology struct {
type CPUFreqCpuInfoCollector struct { type CPUFreqCpuInfoCollector struct {
metricCollector metricCollector
topology []CPUFreqCpuInfoCollectorTopology topology []CPUFreqCpuInfoCollectorTopology
} }
func (m *CPUFreqCpuInfoCollector) Init(config json.RawMessage) error { func (m *CPUFreqCpuInfoCollector) Init(_ json.RawMessage) error {
// Check if already initialized // Check if already initialized
if m.init { if m.init {
return nil return nil
} }
m.setup()
m.name = "CPUFreqCpuInfoCollector" m.name = "CPUFreqCpuInfoCollector"
if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true m.parallel = true
m.meta = map[string]string{ m.meta = map[string]string{
"source": m.name, "source": m.name,
@@ -54,9 +55,8 @@ func (m *CPUFreqCpuInfoCollector) Init(config json.RawMessage) error {
const cpuInfoFile = "/proc/cpuinfo" const cpuInfoFile = "/proc/cpuinfo"
file, err := os.Open(cpuInfoFile) file, err := os.Open(cpuInfoFile)
if err != nil { if err != nil {
return fmt.Errorf("failed to open file '%s': %v", cpuInfoFile, err) return fmt.Errorf("%s Init(): failed to open file '%s': %w", m.name, cpuInfoFile, err)
} }
defer file.Close()
// Collect topology information from file cpuinfo // Collect topology information from file cpuinfo
foundFreq := false foundFreq := false
@@ -117,9 +117,13 @@ func (m *CPUFreqCpuInfoCollector) Init(config json.RawMessage) error {
} }
} }
if err := file.Close(); err != nil {
return fmt.Errorf("%s Init(): Call to file.Close() failed: %w", m.name, err)
}
// Check if at least one CPU with frequency information was detected // Check if at least one CPU with frequency information was detected
if len(m.topology) == 0 { if len(m.topology) == 0 {
return fmt.Errorf("no CPU frequency info found in %s", cpuInfoFile) return fmt.Errorf("%s Init(): no CPU frequency info found in %s", m.name, cpuInfoFile)
} }
m.init = true m.init = true
@@ -140,7 +144,13 @@ func (m *CPUFreqCpuInfoCollector) Read(interval time.Duration, output chan lp.CC
fmt.Sprintf("Read(): Failed to open file '%s': %v", cpuInfoFile, err)) fmt.Sprintf("Read(): Failed to open file '%s': %v", cpuInfoFile, err))
return return
} }
defer file.Close() defer func() {
if err := file.Close(); err != nil {
cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to close file '%s': %v", cpuInfoFile, err))
}
}()
processorCounter := 0 processorCounter := 0
now := time.Now() now := time.Now()
@@ -161,7 +171,7 @@ func (m *CPUFreqCpuInfoCollector) Read(interval time.Duration, output chan lp.CC
fmt.Sprintf("Read(): Failed to convert cpu MHz '%s' to float64: %v", lineSplit[1], err)) fmt.Sprintf("Read(): Failed to convert cpu MHz '%s' to float64: %v", lineSplit[1], err))
return return
} }
if y, err := lp.NewMessage("cpufreq", t.tagSet, m.meta, map[string]interface{}{"value": value}, now); err == nil { if y, err := lp.NewMessage("cpufreq", t.tagSet, m.meta, map[string]any{"value": value}, now); err == nil {
output <- y output <- y
} }
} }

View File

@@ -12,7 +12,9 @@ hugo_path: docs/reference/cc-metric-collector/collectors/cpufreq_cpuinfo.md
## `cpufreq_cpuinfo` collector ## `cpufreq_cpuinfo` collector
```json ```json
"cpufreq_cpuinfo": {} "cpufreq_cpuinfo": {
"exclude_metrics": []
}
``` ```
The `cpufreq_cpuinfo` collector reads the clock frequency from `/proc/cpuinfo` and outputs a handful **hwthread** metrics. The `cpufreq_cpuinfo` collector reads the clock frequency from `/proc/cpuinfo` and outputs a handful **hwthread** metrics.

View File

@@ -8,6 +8,7 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"fmt" "fmt"
"os" "os"
@@ -16,8 +17,8 @@ import (
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
"github.com/ClusterCockpit/cc-metric-collector/pkg/ccTopology" "github.com/ClusterCockpit/cc-metric-collector/pkg/ccTopology"
"golang.org/x/sys/unix" "golang.org/x/sys/unix"
) )
@@ -35,6 +36,7 @@ type CPUFreqCollectorTopology struct {
// See: https://www.kernel.org/doc/html/latest/admin-guide/pm/cpufreq.html // See: https://www.kernel.org/doc/html/latest/admin-guide/pm/cpufreq.html
type CPUFreqCollector struct { type CPUFreqCollector struct {
metricCollector metricCollector
topology []CPUFreqCollectorTopology topology []CPUFreqCollectorTopology
config struct { config struct {
ExcludeMetrics []string `json:"exclude_metrics,omitempty"` ExcludeMetrics []string `json:"exclude_metrics,omitempty"`
@@ -48,12 +50,15 @@ func (m *CPUFreqCollector) Init(config json.RawMessage) error {
} }
m.name = "CPUFreqCollector" m.name = "CPUFreqCollector"
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true m.parallel = true
if len(config) > 0 { if len(config) > 0 {
err := json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): failed to decode JSON config: %w", m.name, err)
} }
} }
m.meta = map[string]string{ m.meta = map[string]string{
@@ -74,15 +79,15 @@ func (m *CPUFreqCollector) Init(config json.RawMessage) error {
scalingCurFreqFile := filepath.Join("/sys/devices/system/cpu", fmt.Sprintf("cpu%d", c.CpuID), "cpufreq/scaling_cur_freq") scalingCurFreqFile := filepath.Join("/sys/devices/system/cpu", fmt.Sprintf("cpu%d", c.CpuID), "cpufreq/scaling_cur_freq")
err := unix.Access(scalingCurFreqFile, unix.R_OK) err := unix.Access(scalingCurFreqFile, unix.R_OK)
if err != nil { if err != nil {
return fmt.Errorf("unable to access file '%s': %v", scalingCurFreqFile, err) return fmt.Errorf("%s Init(): unable to access file '%s': %w", m.name, scalingCurFreqFile, err)
} }
m.topology = append(m.topology, m.topology = append(m.topology,
CPUFreqCollectorTopology{ CPUFreqCollectorTopology{
tagSet: map[string]string{ tagSet: map[string]string{
"type": "hwthread", "type": "hwthread",
"type-id": fmt.Sprint(c.CpuID), "type-id": strconv.Itoa(c.CpuID),
"package_id": fmt.Sprint(c.Socket), "package_id": strconv.Itoa(c.Socket),
}, },
scalingCurFreqFile: scalingCurFreqFile, scalingCurFreqFile: scalingCurFreqFile,
}, },
@@ -124,7 +129,7 @@ func (m *CPUFreqCollector) Read(interval time.Duration, output chan lp.CCMessage
continue continue
} }
if y, err := lp.NewMessage("cpufreq", t.tagSet, m.meta, map[string]interface{}{"value": cpuFreq}, now); err == nil { if y, err := lp.NewMessage("cpufreq", t.tagSet, m.meta, map[string]any{"value": cpuFreq}, now); err == nil {
output <- y output <- y
} }
} }

View File

@@ -9,15 +9,17 @@ package collectors
import ( import (
"bufio" "bufio"
"bytes"
"encoding/json" "encoding/json"
"fmt" "fmt"
"os" "os"
"slices"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
sysconf "github.com/tklauser/go-sysconf" sysconf "github.com/tklauser/go-sysconf"
) )
@@ -29,6 +31,7 @@ type CpustatCollectorConfig struct {
type CpustatCollector struct { type CpustatCollector struct {
metricCollector metricCollector
config CpustatCollectorConfig config CpustatCollectorConfig
lastTimestamp time.Time // Store time stamp of last tick to derive values lastTimestamp time.Time // Store time stamp of last tick to derive values
matches map[string]int matches map[string]int
@@ -39,14 +42,22 @@ type CpustatCollector struct {
func (m *CpustatCollector) Init(config json.RawMessage) error { func (m *CpustatCollector) Init(config json.RawMessage) error {
m.name = "CpustatCollector" m.name = "CpustatCollector"
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true m.parallel = true
m.meta = map[string]string{"source": m.name, "group": "CPU"} m.meta = map[string]string{
m.nodetags = map[string]string{"type": "node"} "source": m.name,
"group": "CPU",
}
m.nodetags = map[string]string{
"type": "node",
}
if len(config) > 0 { if len(config) > 0 {
err := json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
matches := map[string]int{ matches := map[string]int{
@@ -64,24 +75,16 @@ func (m *CpustatCollector) Init(config json.RawMessage) error {
m.matches = make(map[string]int) m.matches = make(map[string]int)
for match, index := range matches { for match, index := range matches {
doExclude := false if !slices.Contains(m.config.ExcludeMetrics, match) {
for _, exclude := range m.config.ExcludeMetrics {
if match == exclude {
doExclude = true
break
}
}
if !doExclude {
m.matches[match] = index m.matches[match] = index
} }
} }
// Check input file // Check input file
file, err := os.Open(string(CPUSTATFILE)) file, err := os.Open(CPUSTATFILE)
if err != nil { if err != nil {
cclog.ComponentError(m.name, err.Error()) return fmt.Errorf("%s Init(): Failed to open file '%s': %w", m.name, CPUSTATFILE, err)
} }
defer file.Close()
// Pre-generate tags for all CPUs // Pre-generate tags for all CPUs
num_cpus := 0 num_cpus := 0
@@ -99,7 +102,10 @@ func (m *CpustatCollector) Init(config json.RawMessage) error {
} else if strings.HasPrefix(linefields[0], "cpu") && strings.Compare(linefields[0], "cpu") != 0 { } else if strings.HasPrefix(linefields[0], "cpu") && strings.Compare(linefields[0], "cpu") != 0 {
cpustr := strings.TrimLeft(linefields[0], "cpu") cpustr := strings.TrimLeft(linefields[0], "cpu")
cpu, _ := strconv.Atoi(cpustr) cpu, _ := strconv.Atoi(cpustr)
m.cputags[linefields[0]] = map[string]string{"type": "hwthread", "type-id": fmt.Sprintf("%d", cpu)} m.cputags[linefields[0]] = map[string]string{
"type": "hwthread",
"type-id": strconv.Itoa(cpu),
}
m.olddata[linefields[0]] = make(map[string]int64) m.olddata[linefields[0]] = make(map[string]int64)
for k, v := range m.matches { for k, v := range m.matches {
m.olddata[linefields[0]][k], _ = strconv.ParseInt(linefields[v], 0, 64) m.olddata[linefields[0]][k], _ = strconv.ParseInt(linefields[v], 0, 64)
@@ -107,6 +113,12 @@ func (m *CpustatCollector) Init(config json.RawMessage) error {
num_cpus++ num_cpus++
} }
} }
// Close file
if err := file.Close(); err != nil {
return fmt.Errorf("%s Init(): Failed to close file '%s': %w", m.name, CPUSTATFILE, err)
}
m.lastTimestamp = time.Now() m.lastTimestamp = time.Now()
m.init = true m.init = true
return nil return nil
@@ -129,7 +141,7 @@ func (m *CpustatCollector) parseStatLine(linefields []string, tags map[string]st
sum := float64(0) sum := float64(0)
for name, value := range values { for name, value := range values {
sum += value sum += value
y, err := lp.NewMessage(name, tags, m.meta, map[string]interface{}{"value": value * 100}, now) y, err := lp.NewMessage(name, tags, m.meta, map[string]any{"value": value * 100}, now)
if err == nil { if err == nil {
y.AddTag("unit", "Percent") y.AddTag("unit", "Percent")
output <- y output <- y
@@ -137,7 +149,7 @@ func (m *CpustatCollector) parseStatLine(linefields []string, tags map[string]st
} }
if v, ok := values["cpu_idle"]; ok { if v, ok := values["cpu_idle"]; ok {
sum -= v sum -= v
y, err := lp.NewMessage("cpu_used", tags, m.meta, map[string]interface{}{"value": sum * 100}, now) y, err := lp.NewMessage("cpu_used", tags, m.meta, map[string]any{"value": sum * 100}, now)
if err == nil { if err == nil {
y.AddTag("unit", "Percent") y.AddTag("unit", "Percent")
output <- y output <- y
@@ -153,11 +165,19 @@ func (m *CpustatCollector) Read(interval time.Duration, output chan lp.CCMessage
now := time.Now() now := time.Now()
tsdelta := now.Sub(m.lastTimestamp) tsdelta := now.Sub(m.lastTimestamp)
file, err := os.Open(string(CPUSTATFILE)) file, err := os.Open(CPUSTATFILE)
if err != nil { if err != nil {
cclog.ComponentError(m.name, err.Error()) cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to open file '%s': %v", CPUSTATFILE, err))
} }
defer file.Close() defer func() {
if err := file.Close(); err != nil {
cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to close file '%s': %v", string(CPUSTATFILE), err))
}
}()
scanner := bufio.NewScanner(file) scanner := bufio.NewScanner(file)
for scanner.Scan() { for scanner.Scan() {
@@ -174,7 +194,7 @@ func (m *CpustatCollector) Read(interval time.Duration, output chan lp.CCMessage
num_cpus_metric, err := lp.NewMessage("num_cpus", num_cpus_metric, err := lp.NewMessage("num_cpus",
m.nodetags, m.nodetags,
m.meta, m.meta,
map[string]interface{}{"value": int(num_cpus)}, map[string]any{"value": num_cpus},
now, now,
) )
if err == nil { if err == nil {

View File

@@ -8,16 +8,17 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"errors" "fmt"
"log"
"os" "os"
"os/exec" "os/exec"
"slices"
"strings" "strings"
"time" "time"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
influx "github.com/influxdata/line-protocol" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
const CUSTOMCMDPATH = `/home/unrz139/Work/cc-metric-collector/collectors/custom` const CUSTOMCMDPATH = `/home/unrz139/Work/cc-metric-collector/collectors/custom`
@@ -30,102 +31,124 @@ type CustomCmdCollectorConfig struct {
type CustomCmdCollector struct { type CustomCmdCollector struct {
metricCollector metricCollector
handler *influx.MetricHandler
parser *influx.Parser config CustomCmdCollectorConfig
config CustomCmdCollectorConfig cmdFieldsSlice [][]string
commands []string files []string
files []string
} }
func (m *CustomCmdCollector) Init(config json.RawMessage) error { func (m *CustomCmdCollector) Init(config json.RawMessage) error {
var err error
m.name = "CustomCmdCollector" m.name = "CustomCmdCollector"
m.parallel = true m.parallel = true
m.meta = map[string]string{"source": m.name, "group": "Custom"} m.meta = map[string]string{
"source": m.name,
"group": "Custom",
}
// Read configuration
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
log.Print(err.Error()) if err := d.Decode(&m.config); err != nil {
return err return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
m.setup()
// Setup
if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
// Check if command can be executed
for _, c := range m.config.Commands { for _, c := range m.config.Commands {
cmdfields := strings.Fields(c) cmdFields := strings.Fields(c)
command := exec.Command(cmdfields[0], strings.Join(cmdfields[1:], " ")) command := exec.Command(cmdFields[0], cmdFields[1:]...)
command.Wait() if _, err := command.Output(); err != nil {
_, err = command.Output() cclog.ComponentWarn(
if err == nil { m.name,
m.commands = append(m.commands, c) fmt.Sprintf("%s Init(): Execution of command \"%s\" failed: %v", m.name, command.String(), err))
}
}
for _, f := range m.config.Files {
_, err = os.ReadFile(f)
if err == nil {
m.files = append(m.files, f)
} else {
log.Print(err.Error())
continue continue
} }
m.cmdFieldsSlice = append(m.cmdFieldsSlice, cmdFields)
} }
if len(m.files) == 0 && len(m.commands) == 0 {
return errors.New("no metrics to collect") // Check if file can be read
for _, fileName := range m.config.Files {
if _, err := os.ReadFile(fileName); err != nil {
cclog.ComponentWarn(
m.name,
fmt.Sprintf("%s Init(): Reading of file \"%s\" failed: %v", m.name, fileName, err))
continue
}
m.files = append(m.files, fileName)
}
if len(m.files) == 0 && len(m.cmdFieldsSlice) == 0 {
return fmt.Errorf("%s Init(): no metrics to collect", m.name)
} }
m.handler = influx.NewMetricHandler()
m.parser = influx.NewParser(m.handler)
m.parser.SetTimeFunc(DefaultTime)
m.init = true m.init = true
return nil return nil
} }
var DefaultTime = func() time.Time {
return time.Unix(42, 0)
}
func (m *CustomCmdCollector) Read(interval time.Duration, output chan lp.CCMessage) { func (m *CustomCmdCollector) Read(interval time.Duration, output chan lp.CCMessage) {
if !m.init { if !m.init {
return return
} }
for _, cmd := range m.commands {
cmdfields := strings.Fields(cmd) // Execute configured commands
command := exec.Command(cmdfields[0], strings.Join(cmdfields[1:], " ")) for _, cmdFields := range m.cmdFieldsSlice {
command.Wait() command := exec.Command(cmdFields[0], cmdFields[1:]...)
stdout, err := command.Output() stdout, err := command.Output()
if err != nil { if err != nil {
log.Print(err) cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to read command output for command \"%s\": %v", command.String(), err),
)
continue continue
} }
cmdmetrics, err := m.parser.Parse(stdout)
// Read and decode influxDB line-protocol from command output
metrics, err := lp.FromBytes(stdout)
if err != nil { if err != nil {
log.Print(err) cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to decode influx Message: %v", err),
)
continue continue
} }
for _, c := range cmdmetrics { for _, metric := range metrics {
_, skip := stringArrayContains(m.config.ExcludeMetrics, c.Name()) if slices.Contains(m.config.ExcludeMetrics, metric.Name()) {
if skip {
continue continue
} }
output <- metric
output <- lp.FromInfluxMetric(c)
} }
} }
for _, file := range m.files {
buffer, err := os.ReadFile(file) // Read configured files
for _, filename := range m.files {
input, err := os.ReadFile(filename)
if err != nil { if err != nil {
log.Print(err) cclog.ComponentError(
return m.name,
} fmt.Sprintf("Read(): Failed to read file \"%s\": %v\n", filename, err),
fmetrics, err := m.parser.Parse(buffer) )
if err != nil {
log.Print(err)
continue continue
} }
for _, f := range fmetrics {
_, skip := stringArrayContains(m.config.ExcludeMetrics, f.Name()) // Read and decode influxDB line-protocol from file
if skip { metrics, err := lp.FromBytes(input)
if err != nil {
cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to decode influx Message: %v", err),
)
continue
}
for _, metric := range metrics {
if slices.Contains(m.config.ExcludeMetrics, metric.Name()) {
continue continue
} }
output <- lp.FromInfluxMetric(f) output <- metric
} }
} }
} }

View File

@@ -9,14 +9,16 @@ package collectors
import ( import (
"bufio" "bufio"
"bytes"
"encoding/json" "encoding/json"
"fmt"
"os" "os"
"strings" "strings"
"syscall" "syscall"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
const MOUNTFILE = `/proc/self/mounts` const MOUNTFILE = `/proc/self/mounts`
@@ -28,6 +30,7 @@ type DiskstatCollectorConfig struct {
type DiskstatCollector struct { type DiskstatCollector struct {
metricCollector metricCollector
config DiskstatCollectorConfig config DiskstatCollectorConfig
allowedMetrics map[string]bool allowedMetrics map[string]bool
} }
@@ -36,10 +39,14 @@ func (m *DiskstatCollector) Init(config json.RawMessage) error {
m.name = "DiskstatCollector" m.name = "DiskstatCollector"
m.parallel = true m.parallel = true
m.meta = map[string]string{"source": m.name, "group": "Disk"} m.meta = map[string]string{"source": m.name, "group": "Disk"}
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
if len(config) > 0 { if len(config) > 0 {
if err := json.Unmarshal(config, &m.config); err != nil { d := json.NewDecoder(bytes.NewReader(config))
return err d.DisallowUnknownFields()
if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
m.allowedMetrics = map[string]bool{ m.allowedMetrics = map[string]bool{
@@ -54,10 +61,11 @@ func (m *DiskstatCollector) Init(config json.RawMessage) error {
} }
file, err := os.Open(MOUNTFILE) file, err := os.Open(MOUNTFILE)
if err != nil { if err != nil {
cclog.ComponentError(m.name, err.Error()) return fmt.Errorf("%s Init(): file open for file \"%s\" failed: %w", m.name, MOUNTFILE, err)
return err }
if err := file.Close(); err != nil {
return fmt.Errorf("%s Init(): file close for file \"%s\" failed: %w", m.name, MOUNTFILE, err)
} }
defer file.Close()
m.init = true m.init = true
return nil return nil
} }
@@ -69,10 +77,18 @@ func (m *DiskstatCollector) Read(interval time.Duration, output chan lp.CCMessag
file, err := os.Open(MOUNTFILE) file, err := os.Open(MOUNTFILE)
if err != nil { if err != nil {
cclog.ComponentError(m.name, err.Error()) cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to open file '%s': %v", MOUNTFILE, err))
return return
} }
defer file.Close() defer func() {
if err := file.Close(); err != nil {
cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to close file '%s': %v", MOUNTFILE, err))
}
}()
part_max_used := uint64(0) part_max_used := uint64(0)
scanner := bufio.NewScanner(file) scanner := bufio.NewScanner(file)
@@ -93,7 +109,7 @@ mountLoop:
continue continue
} }
mountPath := strings.Replace(linefields[1], `\040`, " ", -1) mountPath := strings.ReplaceAll(linefields[1], `\040`, " ")
for _, excl := range m.config.ExcludeMounts { for _, excl := range m.config.ExcludeMounts {
if strings.Contains(mountPath, excl) { if strings.Contains(mountPath, excl) {
@@ -110,17 +126,31 @@ mountLoop:
continue continue
} }
tags := map[string]string{"type": "node", "device": linefields[0]} tags := map[string]string{"type": "node", "device": linefields[0]}
total := (stat.Blocks * uint64(stat.Bsize)) / uint64(1000000000) total := (stat.Blocks * uint64(stat.Bsize)) / uint64(1000_000_000)
if m.allowedMetrics["disk_total"] { if m.allowedMetrics["disk_total"] {
y, err := lp.NewMessage("disk_total", tags, m.meta, map[string]interface{}{"value": total}, time.Now()) y, err := lp.NewMessage(
"disk_total",
tags,
m.meta,
map[string]any{
"value": total,
},
time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "GBytes") y.AddMeta("unit", "GBytes")
output <- y output <- y
} }
} }
free := (stat.Bfree * uint64(stat.Bsize)) / uint64(1000000000) free := (stat.Bfree * uint64(stat.Bsize)) / uint64(1000_000_000)
if m.allowedMetrics["disk_free"] { if m.allowedMetrics["disk_free"] {
y, err := lp.NewMessage("disk_free", tags, m.meta, map[string]interface{}{"value": free}, time.Now()) y, err := lp.NewMessage(
"disk_free",
tags,
m.meta,
map[string]any{
"value": free,
},
time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "GBytes") y.AddMeta("unit", "GBytes")
output <- y output <- y
@@ -134,7 +164,16 @@ mountLoop:
} }
} }
if m.allowedMetrics["part_max_used"] { if m.allowedMetrics["part_max_used"] {
y, err := lp.NewMessage("part_max_used", map[string]string{"type": "node"}, m.meta, map[string]interface{}{"value": int(part_max_used)}, time.Now()) y, err := lp.NewMessage(
"part_max_used",
map[string]string{
"type": "node",
},
m.meta,
map[string]any{
"value": int(part_max_used),
},
time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "percent") y.AddMeta("unit", "percent")
output <- y output <- y

View File

@@ -14,16 +14,16 @@ import (
"errors" "errors"
"fmt" "fmt"
"io" "io"
"log"
"os/exec" "os/exec"
"os/user" "os/user"
"slices"
"strconv" "strconv"
"strings" "strings"
"syscall" "syscall"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
const DEFAULT_GPFS_CMD = "mmpmon" const DEFAULT_GPFS_CMD = "mmpmon"
@@ -32,7 +32,7 @@ type GpfsCollectorState map[string]int64
type GpfsCollectorConfig struct { type GpfsCollectorConfig struct {
Mmpmon string `json:"mmpmon_path,omitempty"` Mmpmon string `json:"mmpmon_path,omitempty"`
ExcludeFilesystem []string `json:"exclude_filesystem,omitempty"` ExcludeFilesystems []string `json:"exclude_filesystem,omitempty"`
ExcludeMetrics []string `json:"exclude_metrics,omitempty"` ExcludeMetrics []string `json:"exclude_metrics,omitempty"`
Sudo bool `json:"use_sudo,omitempty"` Sudo bool `json:"use_sudo,omitempty"`
SendAbsoluteValues bool `json:"send_abs_values,omitempty"` SendAbsoluteValues bool `json:"send_abs_values,omitempty"`
@@ -43,264 +43,265 @@ type GpfsCollectorConfig struct {
} }
type GpfsMetricDefinition struct { type GpfsMetricDefinition struct {
name string name string
desc string desc string
prefix string prefix string
unit string unit string
calc string calc string
} }
type GpfsCollector struct { type GpfsCollector struct {
metricCollector metricCollector
tags map[string]string tags map[string]string
config GpfsCollectorConfig config GpfsCollectorConfig
sudoCmd string sudoCmd string
skipFS map[string]struct{} skipFS map[string]struct{}
lastTimestamp map[string]time.Time // Store timestamp of lastState per filesystem to derive bandwidths lastTimestamp map[string]time.Time // Store timestamp of lastState per filesystem to derive bandwidths
definitions []GpfsMetricDefinition // all metrics to report definitions []GpfsMetricDefinition // all metrics to report
lastState map[string]GpfsCollectorState // one GpfsCollectorState per filesystem lastState map[string]GpfsCollectorState // one GpfsCollectorState per filesystem
} }
var GpfsAbsMetrics = []GpfsMetricDefinition{ var GpfsAbsMetrics = []GpfsMetricDefinition{
{ {
name: "gpfs_num_opens", name: "gpfs_num_opens",
desc: "number of opens", desc: "number of opens",
prefix: "_oc_", prefix: "_oc_",
unit: "requests", unit: "requests",
calc: "none", calc: "none",
}, },
{ {
name: "gpfs_num_closes", name: "gpfs_num_closes",
desc: "number of closes", desc: "number of closes",
prefix: "_cc_", prefix: "_cc_",
unit: "requests", unit: "requests",
calc: "none", calc: "none",
}, },
{ {
name: "gpfs_num_reads", name: "gpfs_num_reads",
desc: "number of reads", desc: "number of reads",
prefix: "_rdc_", prefix: "_rdc_",
unit: "requests", unit: "requests",
calc: "none", calc: "none",
}, },
{ {
name: "gpfs_num_writes", name: "gpfs_num_writes",
desc: "number of writes", desc: "number of writes",
prefix: "_wc_", prefix: "_wc_",
unit: "requests", unit: "requests",
calc: "none", calc: "none",
}, },
{ {
name: "gpfs_num_readdirs", name: "gpfs_num_readdirs",
desc: "number of readdirs", desc: "number of readdirs",
prefix: "_dir_", prefix: "_dir_",
unit: "requests", unit: "requests",
calc: "none", calc: "none",
}, },
{ {
name: "gpfs_num_inode_updates", name: "gpfs_num_inode_updates",
desc: "number of Inode Updates", desc: "number of Inode Updates",
prefix: "_iu_", prefix: "_iu_",
unit: "requests", unit: "requests",
calc: "none", calc: "none",
}, },
{ {
name: "gpfs_bytes_read", name: "gpfs_bytes_read",
desc: "bytes read", desc: "bytes read",
prefix: "_br_", prefix: "_br_",
unit: "bytes", unit: "bytes",
calc: "none", calc: "none",
}, },
{ {
name: "gpfs_bytes_written", name: "gpfs_bytes_written",
desc: "bytes written", desc: "bytes written",
prefix: "_bw_", prefix: "_bw_",
unit: "bytes", unit: "bytes",
calc: "none", calc: "none",
}, },
} }
var GpfsDiffMetrics = []GpfsMetricDefinition{ var GpfsDiffMetrics = []GpfsMetricDefinition{
{ {
name: "gpfs_num_opens_diff", name: "gpfs_num_opens_diff",
desc: "number of opens (diff)", desc: "number of opens (diff)",
prefix: "_oc_", prefix: "_oc_",
unit: "requests", unit: "requests",
calc: "difference", calc: "difference",
}, },
{ {
name: "gpfs_num_closes_diff", name: "gpfs_num_closes_diff",
desc: "number of closes (diff)", desc: "number of closes (diff)",
prefix: "_cc_", prefix: "_cc_",
unit: "requests", unit: "requests",
calc: "difference", calc: "difference",
}, },
{ {
name: "gpfs_num_reads_diff", name: "gpfs_num_reads_diff",
desc: "number of reads (diff)", desc: "number of reads (diff)",
prefix: "_rdc_", prefix: "_rdc_",
unit: "requests", unit: "requests",
calc: "difference", calc: "difference",
}, },
{ {
name: "gpfs_num_writes_diff", name: "gpfs_num_writes_diff",
desc: "number of writes (diff)", desc: "number of writes (diff)",
prefix: "_wc_", prefix: "_wc_",
unit: "requests", unit: "requests",
calc: "difference", calc: "difference",
}, },
{ {
name: "gpfs_num_readdirs_diff", name: "gpfs_num_readdirs_diff",
desc: "number of readdirs (diff)", desc: "number of readdirs (diff)",
prefix: "_dir_", prefix: "_dir_",
unit: "requests", unit: "requests",
calc: "difference", calc: "difference",
}, },
{ {
name: "gpfs_num_inode_updates_diff", name: "gpfs_num_inode_updates_diff",
desc: "number of Inode Updates (diff)", desc: "number of Inode Updates (diff)",
prefix: "_iu_", prefix: "_iu_",
unit: "requests", unit: "requests",
calc: "difference", calc: "difference",
}, },
{ {
name: "gpfs_bytes_read_diff", name: "gpfs_bytes_read_diff",
desc: "bytes read (diff)", desc: "bytes read (diff)",
prefix: "_br_", prefix: "_br_",
unit: "bytes", unit: "bytes",
calc: "difference", calc: "difference",
}, },
{ {
name: "gpfs_bytes_written_diff", name: "gpfs_bytes_written_diff",
desc: "bytes written (diff)", desc: "bytes written (diff)",
prefix: "_bw_", prefix: "_bw_",
unit: "bytes", unit: "bytes",
calc: "difference", calc: "difference",
}, },
} }
var GpfsDeriveMetrics = []GpfsMetricDefinition{ var GpfsDeriveMetrics = []GpfsMetricDefinition{
{ {
name: "gpfs_opens_rate", name: "gpfs_opens_rate",
desc: "number of opens (rate)", desc: "number of opens (rate)",
prefix: "_oc_", prefix: "_oc_",
unit: "requests/sec", unit: "requests/sec",
calc: "derivative", calc: "derivative",
}, },
{ {
name: "gpfs_closes_rate", name: "gpfs_closes_rate",
desc: "number of closes (rate)", desc: "number of closes (rate)",
prefix: "_oc_", prefix: "_oc_",
unit: "requests/sec", unit: "requests/sec",
calc: "derivative", calc: "derivative",
}, },
{ {
name: "gpfs_reads_rate", name: "gpfs_reads_rate",
desc: "number of reads (rate)", desc: "number of reads (rate)",
prefix: "_rdc_", prefix: "_rdc_",
unit: "requests/sec", unit: "requests/sec",
calc: "derivative", calc: "derivative",
}, },
{ {
name: "gpfs_writes_rate", name: "gpfs_writes_rate",
desc: "number of writes (rate)", desc: "number of writes (rate)",
prefix: "_wc_", prefix: "_wc_",
unit: "requests/sec", unit: "requests/sec",
calc: "derivative", calc: "derivative",
}, },
{ {
name: "gpfs_readdirs_rate", name: "gpfs_readdirs_rate",
desc: "number of readdirs (rate)", desc: "number of readdirs (rate)",
prefix: "_dir_", prefix: "_dir_",
unit: "requests/sec", unit: "requests/sec",
calc: "derivative", calc: "derivative",
}, },
{ {
name: "gpfs_inode_updates_rate", name: "gpfs_inode_updates_rate",
desc: "number of Inode Updates (rate)", desc: "number of Inode Updates (rate)",
prefix: "_iu_", prefix: "_iu_",
unit: "requests/sec", unit: "requests/sec",
calc: "derivative", calc: "derivative",
}, },
{ {
name: "gpfs_bw_read", name: "gpfs_bw_read",
desc: "bytes read (rate)", desc: "bytes read (rate)",
prefix: "_br_", prefix: "_br_",
unit: "bytes/sec", unit: "bytes/sec",
calc: "derivative", calc: "derivative",
}, },
{ {
name: "gpfs_bw_write", name: "gpfs_bw_write",
desc: "bytes written (rate)", desc: "bytes written (rate)",
prefix: "_bw_", prefix: "_bw_",
unit: "bytes/sec", unit: "bytes/sec",
calc: "derivative", calc: "derivative",
}, },
} }
var GpfsTotalMetrics = []GpfsMetricDefinition{ var GpfsTotalMetrics = []GpfsMetricDefinition{
{ {
name: "gpfs_bytes_total", name: "gpfs_bytes_total",
desc: "bytes total", desc: "bytes total",
prefix: "bytesTotal", prefix: "bytesTotal",
unit: "bytes", unit: "bytes",
calc: "none", calc: "none",
}, },
{ {
name: "gpfs_bytes_total_diff", name: "gpfs_bytes_total_diff",
desc: "bytes total (diff)", desc: "bytes total (diff)",
prefix: "bytesTotal", prefix: "bytesTotal",
unit: "bytes", unit: "bytes",
calc: "difference", calc: "difference",
}, },
{ {
name: "gpfs_bw_total", name: "gpfs_bw_total",
desc: "bytes total (rate)", desc: "bytes total (rate)",
prefix: "bytesTotal", prefix: "bytesTotal",
unit: "bytes/sec", unit: "bytes/sec",
calc: "derivative", calc: "derivative",
}, },
{ {
name: "gpfs_iops", name: "gpfs_iops",
desc: "iops", desc: "iops",
prefix: "iops", prefix: "iops",
unit: "requests", unit: "requests",
calc: "none", calc: "none",
}, },
{ {
name: "gpfs_iops_diff", name: "gpfs_iops_diff",
desc: "iops (diff)", desc: "iops (diff)",
prefix: "iops", prefix: "iops",
unit: "requests", unit: "requests",
calc: "difference", calc: "difference",
}, },
{ {
name: "gpfs_iops_rate", name: "gpfs_iops_rate",
desc: "iops (rate)", desc: "iops (rate)",
prefix: "iops", prefix: "iops",
unit: "requests/sec", unit: "requests/sec",
calc: "derivative", calc: "derivative",
}, },
{ {
name: "gpfs_metaops", name: "gpfs_metaops",
desc: "metaops", desc: "metaops",
prefix: "metaops", prefix: "metaops",
unit: "requests", unit: "requests",
calc: "none", calc: "none",
}, },
{ {
name: "gpfs_metaops_diff", name: "gpfs_metaops_diff",
desc: "metaops (diff)", desc: "metaops (diff)",
prefix: "metaops", prefix: "metaops",
unit: "requests", unit: "requests",
calc: "difference", calc: "difference",
}, },
{ {
name: "gpfs_metaops_rate", name: "gpfs_metaops_rate",
desc: "metaops (rate)", desc: "metaops (rate)",
prefix: "metaops", prefix: "metaops",
unit: "requests/sec", unit: "requests/sec",
calc: "derivative", calc: "derivative",
}, },
} }
@@ -310,9 +311,10 @@ func (m *GpfsCollector) Init(config json.RawMessage) error {
return nil return nil
} }
var err error
m.name = "GpfsCollector" m.name = "GpfsCollector"
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true m.parallel = true
// Set default mmpmon binary // Set default mmpmon binary
@@ -320,10 +322,10 @@ func (m *GpfsCollector) Init(config json.RawMessage) error {
// Read JSON configuration // Read JSON configuration
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
log.Print(err.Error()) if err := d.Decode(&m.config); err != nil {
return err return fmt.Errorf("%s Init(): failed to decode JSON config: %w", m.name, err)
} }
} }
m.meta = map[string]string{ m.meta = map[string]string{
@@ -335,7 +337,7 @@ func (m *GpfsCollector) Init(config json.RawMessage) error {
"filesystem": "", "filesystem": "",
} }
m.skipFS = make(map[string]struct{}) m.skipFS = make(map[string]struct{})
for _, fs := range m.config.ExcludeFilesystem { for _, fs := range m.config.ExcludeFilesystems {
m.skipFS[fs] = struct{}{} m.skipFS[fs] = struct{}{}
} }
m.lastState = make(map[string]GpfsCollectorState) m.lastState = make(map[string]GpfsCollectorState)
@@ -345,18 +347,15 @@ func (m *GpfsCollector) Init(config json.RawMessage) error {
if !m.config.Sudo { if !m.config.Sudo {
user, err := user.Current() user, err := user.Current()
if err != nil { if err != nil {
cclog.ComponentError(m.name, "Failed to get current user:", err.Error()) return fmt.Errorf("%s Init(): failed to get current user: %w", m.name, err)
return err
} }
if user.Uid != "0" { if user.Uid != "0" {
cclog.ComponentError(m.name, "GPFS file system statistics can only be queried by user root") return fmt.Errorf("%s Init(): GPFS file system statistics can only be queried by user root", m.name)
return err
} }
} else { } else {
p, err := exec.LookPath("sudo") p, err := exec.LookPath("sudo")
if err != nil { if err != nil {
cclog.ComponentError(m.name, "Cannot find 'sudo'") return fmt.Errorf("%s Init(): cannot find 'sudo': %w", m.name, err)
return err
} }
m.sudoCmd = p m.sudoCmd = p
} }
@@ -364,9 +363,9 @@ func (m *GpfsCollector) Init(config json.RawMessage) error {
// when using sudo, the full path of mmpmon must be specified because // when using sudo, the full path of mmpmon must be specified because
// exec.LookPath will not work as mmpmon is not executable as user // exec.LookPath will not work as mmpmon is not executable as user
if m.config.Sudo && !strings.HasPrefix(m.config.Mmpmon, "/") { if m.config.Sudo && !strings.HasPrefix(m.config.Mmpmon, "/") {
return fmt.Errorf("when using sudo, mmpmon_path must be provided and an absolute path: %s", m.config.Mmpmon) return fmt.Errorf("%s Init(): when using sudo, mmpmon_path must be provided and an absolute path: %s", m.name, m.config.Mmpmon)
} }
// Check if mmpmon is in executable search path // Check if mmpmon is in executable search path
p, err := exec.LookPath(m.config.Mmpmon) p, err := exec.LookPath(m.config.Mmpmon)
if err != nil { if err != nil {
@@ -376,8 +375,7 @@ func (m *GpfsCollector) Init(config json.RawMessage) error {
// the file was given in the config, use it // the file was given in the config, use it
p = m.config.Mmpmon p = m.config.Mmpmon
} else { } else {
cclog.ComponentError(m.name, fmt.Sprintf("failed to find mmpmon binary '%s': %v", m.config.Mmpmon, err)) return fmt.Errorf("%s Init(): failed to find mmpmon binary '%s': %w", m.name, m.config.Mmpmon, err)
return fmt.Errorf("failed to find mmpmon binary '%s': %v", m.config.Mmpmon, err)
} }
} }
m.config.Mmpmon = p m.config.Mmpmon = p
@@ -385,28 +383,28 @@ func (m *GpfsCollector) Init(config json.RawMessage) error {
m.definitions = []GpfsMetricDefinition{} m.definitions = []GpfsMetricDefinition{}
if m.config.SendAbsoluteValues { if m.config.SendAbsoluteValues {
for _, def := range GpfsAbsMetrics { for _, def := range GpfsAbsMetrics {
if _, skip := stringArrayContains(m.config.ExcludeMetrics, def.name); !skip { if !slices.Contains(m.config.ExcludeMetrics, def.name) {
m.definitions = append(m.definitions, def) m.definitions = append(m.definitions, def)
} }
} }
} }
if m.config.SendDiffValues { if m.config.SendDiffValues {
for _, def := range GpfsDiffMetrics { for _, def := range GpfsDiffMetrics {
if _, skip := stringArrayContains(m.config.ExcludeMetrics, def.name); !skip { if !slices.Contains(m.config.ExcludeMetrics, def.name) {
m.definitions = append(m.definitions, def) m.definitions = append(m.definitions, def)
} }
} }
} }
if m.config.SendDerivedValues { if m.config.SendDerivedValues {
for _, def := range GpfsDeriveMetrics { for _, def := range GpfsDeriveMetrics {
if _, skip := stringArrayContains(m.config.ExcludeMetrics, def.name); !skip { if !slices.Contains(m.config.ExcludeMetrics, def.name) {
m.definitions = append(m.definitions, def) m.definitions = append(m.definitions, def)
} }
} }
} else if m.config.SendBandwidths { } else if m.config.SendBandwidths {
for _, def := range GpfsDeriveMetrics { for _, def := range GpfsDeriveMetrics {
if def.unit == "bytes/sec" { if def.unit == "bytes/sec" {
if _, skip := stringArrayContains(m.config.ExcludeMetrics, def.name); !skip { if !slices.Contains(m.config.ExcludeMetrics, def.name) {
m.definitions = append(m.definitions, def) m.definitions = append(m.definitions, def)
} }
} }
@@ -414,26 +412,26 @@ func (m *GpfsCollector) Init(config json.RawMessage) error {
} }
if m.config.SendTotalValues { if m.config.SendTotalValues {
for _, def := range GpfsTotalMetrics { for _, def := range GpfsTotalMetrics {
if _, skip := stringArrayContains(m.config.ExcludeMetrics, def.name); !skip { if !slices.Contains(m.config.ExcludeMetrics, def.name) {
// only send total metrics of the types requested // only send total metrics of the types requested
if ( def.calc == "none" && m.config.SendAbsoluteValues ) || if (def.calc == "none" && m.config.SendAbsoluteValues) ||
( def.calc == "difference" && m.config.SendDiffValues ) || (def.calc == "difference" && m.config.SendDiffValues) ||
( def.calc == "derivative" && m.config.SendDerivedValues ) { (def.calc == "derivative" && m.config.SendDerivedValues) {
m.definitions = append(m.definitions, def) m.definitions = append(m.definitions, def)
} }
} }
} }
} else if m.config.SendBandwidths { } else if m.config.SendBandwidths {
for _, def := range GpfsTotalMetrics { for _, def := range GpfsTotalMetrics {
if def.unit == "bytes/sec" { if def.unit == "bytes/sec" {
if _, skip := stringArrayContains(m.config.ExcludeMetrics, def.name); !skip { if !slices.Contains(m.config.ExcludeMetrics, def.name) {
m.definitions = append(m.definitions, def) m.definitions = append(m.definitions, def)
} }
} }
} }
} }
if len(m.definitions) == 0 { if len(m.definitions) == 0 {
return errors.New("no metrics to collect") return fmt.Errorf("%s Init(): no metrics to collect", m.name)
} }
m.init = true m.init = true
@@ -456,7 +454,7 @@ func (m *GpfsCollector) Read(interval time.Duration, output chan lp.CCMessage) {
} else { } else {
cmd = exec.Command(m.config.Mmpmon, "-p", "-s") cmd = exec.Command(m.config.Mmpmon, "-p", "-s")
} }
cmd.Stdin = strings.NewReader("once fs_io_s\n") cmd.Stdin = strings.NewReader("once fs_io_s\n")
cmdStdout := new(bytes.Buffer) cmdStdout := new(bytes.Buffer)
cmdStderr := new(bytes.Buffer) cmdStderr := new(bytes.Buffer)
@@ -562,36 +560,36 @@ func (m *GpfsCollector) Read(interval time.Duration, output chan lp.CCMessage) {
// compute total metrics (map[...] will return 0 if key not found) // compute total metrics (map[...] will return 0 if key not found)
// bytes read and written // bytes read and written
if br, br_ok := newstate["_br_"]; br_ok { if br, br_ok := newstate["_br_"]; br_ok {
newstate["bytesTotal"] = newstate["bytesTotal"] + br newstate["bytesTotal"] += br
} }
if bw, bw_ok := newstate["_bw_"]; bw_ok { if bw, bw_ok := newstate["_bw_"]; bw_ok {
newstate["bytesTotal"] = newstate["bytesTotal"] + bw newstate["bytesTotal"] += bw
} }
// read and write count // read and write count
if rdc, rdc_ok := newstate["_rdc_"]; rdc_ok { if rdc, rdc_ok := newstate["_rdc_"]; rdc_ok {
newstate["iops"] = newstate["iops"] + rdc newstate["iops"] += rdc
} }
if wc, wc_ok := newstate["_wc_"]; wc_ok { if wc, wc_ok := newstate["_wc_"]; wc_ok {
newstate["iops"] = newstate["iops"] + wc newstate["iops"] += wc
} }
// meta operations // meta operations
if oc, oc_ok := newstate["_oc_"]; oc_ok { if oc, oc_ok := newstate["_oc_"]; oc_ok {
newstate["metaops"] = newstate["metaops"] + oc newstate["metaops"] += oc
} }
if cc, cc_ok := newstate["_cc_"]; cc_ok { if cc, cc_ok := newstate["_cc_"]; cc_ok {
newstate["metaops"] = newstate["metaops"] + cc newstate["metaops"] += cc
} }
if dir, dir_ok := newstate["_dir_"]; dir_ok { if dir, dir_ok := newstate["_dir_"]; dir_ok {
newstate["metaops"] = newstate["metaops"] + dir newstate["metaops"] += dir
} }
if iu, iu_ok := newstate["_iu_"]; iu_ok { if iu, iu_ok := newstate["_iu_"]; iu_ok {
newstate["metaops"] = newstate["metaops"] + iu newstate["metaops"] += iu
} }
// send desired metrics for this filesystem // send desired metrics for this filesystem
for _, metric := range m.definitions { for _, metric := range m.definitions {
vold, vold_ok := m.lastState[filesystem][metric.prefix] vold, vold_ok := m.lastState[filesystem][metric.prefix]
vnew, vnew_ok := newstate[metric.prefix] vnew, vnew_ok := newstate[metric.prefix]
var value interface{} var value any
value_ok := false value_ok := false
switch metric.calc { switch metric.calc {
case "none": case "none":
@@ -617,14 +615,14 @@ func (m *GpfsCollector) Read(interval time.Duration, output chan lp.CCMessage) {
} }
case "derivative": case "derivative":
if vnew_ok && vold_ok && timeDiff > 0 { if vnew_ok && vold_ok && timeDiff > 0 {
value = float64(vnew - vold) / timeDiff value = float64(vnew-vold) / timeDiff
if value.(float64) < 0 { if value.(float64) < 0.0 {
value = 0 value = 0.0
} }
value_ok = true value_ok = true
} else if vold_ok { } else if vold_ok {
// if the difference is not computable, return 0 // if the difference is not computable, return 0
value = 0 value = 0.0
value_ok = true value_ok = true
} }
} }

View File

@@ -14,7 +14,7 @@ hugo_path: docs/reference/cc-metric-collector/collectors/gpfs.md
```json ```json
"gpfs": { "gpfs": {
"mmpmon_path": "/path/to/mmpmon", "mmpmon_path": "/path/to/mmpmon",
"use_sudo": "true", "use_sudo": true,
"exclude_filesystem": [ "exclude_filesystem": [
"fs1" "fs1"
], ],

View File

@@ -8,18 +8,19 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json"
"fmt" "fmt"
"os" "os"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage"
"golang.org/x/sys/unix"
"encoding/json"
"path/filepath" "path/filepath"
"slices"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
"golang.org/x/sys/unix"
) )
const IB_BASEPATH = "/sys/class/infiniband/" const IB_BASEPATH = "/sys/class/infiniband/"
@@ -45,6 +46,7 @@ type InfinibandCollectorInfo struct {
type InfinibandCollector struct { type InfinibandCollector struct {
metricCollector metricCollector
config struct { config struct {
ExcludeDevices []string `json:"exclude_devices,omitempty"` // IB device to exclude e.g. mlx5_0 ExcludeDevices []string `json:"exclude_devices,omitempty"` // IB device to exclude e.g. mlx5_0
SendAbsoluteValues bool `json:"send_abs_values"` // Send absolut values as read from sys filesystem SendAbsoluteValues bool `json:"send_abs_values"` // Send absolut values as read from sys filesystem
@@ -57,7 +59,6 @@ type InfinibandCollector struct {
// Init initializes the Infiniband collector by walking through files below IB_BASEPATH // Init initializes the Infiniband collector by walking through files below IB_BASEPATH
func (m *InfinibandCollector) Init(config json.RawMessage) error { func (m *InfinibandCollector) Init(config json.RawMessage) error {
// Check if already initialized // Check if already initialized
if m.init { if m.init {
return nil return nil
@@ -65,7 +66,9 @@ func (m *InfinibandCollector) Init(config json.RawMessage) error {
var err error var err error
m.name = "InfinibandCollector" m.name = "InfinibandCollector"
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true m.parallel = true
m.meta = map[string]string{ m.meta = map[string]string{
"source": m.name, "source": m.name,
@@ -77,9 +80,10 @@ func (m *InfinibandCollector) Init(config json.RawMessage) error {
m.config.SendDerivedValues = false m.config.SendDerivedValues = false
// Read configuration file, allow overwriting default config // Read configuration file, allow overwriting default config
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): failed to decode JSON config: %w", m.name, err)
} }
} }
@@ -87,10 +91,10 @@ func (m *InfinibandCollector) Init(config json.RawMessage) error {
globPattern := filepath.Join(IB_BASEPATH, "*", "ports", "*") globPattern := filepath.Join(IB_BASEPATH, "*", "ports", "*")
ibDirs, err := filepath.Glob(globPattern) ibDirs, err := filepath.Glob(globPattern)
if err != nil { if err != nil {
return fmt.Errorf("unable to glob files with pattern %s: %v", globPattern, err) return fmt.Errorf("%s Init(): unable to glob files with pattern %s: %w", m.name, globPattern, err)
} }
if ibDirs == nil { if ibDirs == nil {
return fmt.Errorf("unable to find any directories with pattern %s", globPattern) return fmt.Errorf("%s Init(): unable to find any directories with pattern %s", m.name, globPattern)
} }
for _, path := range ibDirs { for _, path := range ibDirs {
@@ -111,14 +115,7 @@ func (m *InfinibandCollector) Init(config json.RawMessage) error {
port := pathSplit[6] port := pathSplit[6]
// Skip excluded devices // Skip excluded devices
skip := false if slices.Contains(m.config.ExcludeDevices, device) {
for _, excludedDevice := range m.config.ExcludeDevices {
if excludedDevice == device {
skip = true
break
}
}
if skip {
continue continue
} }
@@ -161,7 +158,7 @@ func (m *InfinibandCollector) Init(config json.RawMessage) error {
for _, counter := range portCounterFiles { for _, counter := range portCounterFiles {
err := unix.Access(counter.path, unix.R_OK) err := unix.Access(counter.path, unix.R_OK)
if err != nil { if err != nil {
return fmt.Errorf("unable to access %s: %v", counter.path, err) return fmt.Errorf("%s Init(): unable to access %s: %w", m.name, counter.path, err)
} }
} }
@@ -181,7 +178,7 @@ func (m *InfinibandCollector) Init(config json.RawMessage) error {
} }
if len(m.info) == 0 { if len(m.info) == 0 {
return fmt.Errorf("found no IB devices") return fmt.Errorf("%s Init(): found no IB devices", m.name)
} }
m.init = true m.init = true
@@ -190,7 +187,6 @@ func (m *InfinibandCollector) Init(config json.RawMessage) error {
// Read reads Infiniband counter files below IB_BASEPATH // Read reads Infiniband counter files below IB_BASEPATH
func (m *InfinibandCollector) Read(interval time.Duration, output chan lp.CCMessage) { func (m *InfinibandCollector) Read(interval time.Duration, output chan lp.CCMessage) {
// Check if already initialized // Check if already initialized
if !m.init { if !m.init {
return return
@@ -236,15 +232,14 @@ func (m *InfinibandCollector) Read(interval time.Duration, output chan lp.CCMess
// Send absolut values // Send absolut values
if m.config.SendAbsoluteValues { if m.config.SendAbsoluteValues {
if y, err := if y, err := lp.NewMessage(
lp.NewMessage( counterDef.name,
counterDef.name, info.tagSet,
info.tagSet, m.meta,
m.meta, map[string]any{
map[string]interface{}{ "value": counterDef.currentState,
"value": counterDef.currentState, },
}, now); err == nil {
now); err == nil {
y.AddMeta("unit", counterDef.unit) y.AddMeta("unit", counterDef.unit)
output <- y output <- y
} }
@@ -254,15 +249,14 @@ func (m *InfinibandCollector) Read(interval time.Duration, output chan lp.CCMess
if m.config.SendDerivedValues { if m.config.SendDerivedValues {
if counterDef.lastState >= 0 { if counterDef.lastState >= 0 {
rate := float64((counterDef.currentState - counterDef.lastState)) / timeDiff rate := float64((counterDef.currentState - counterDef.lastState)) / timeDiff
if y, err := if y, err := lp.NewMessage(
lp.NewMessage( counterDef.name+"_bw",
counterDef.name+"_bw", info.tagSet,
info.tagSet, m.meta,
m.meta, map[string]any{
map[string]interface{}{ "value": rate,
"value": rate, },
}, now); err == nil {
now); err == nil {
y.AddMeta("unit", counterDef.unit+"/sec") y.AddMeta("unit", counterDef.unit+"/sec")
output <- y output <- y
@@ -284,28 +278,26 @@ func (m *InfinibandCollector) Read(interval time.Duration, output chan lp.CCMess
// Send total values // Send total values
if m.config.SendTotalValues { if m.config.SendTotalValues {
if y, err := if y, err := lp.NewMessage(
lp.NewMessage( "ib_total",
"ib_total", info.tagSet,
info.tagSet, m.meta,
m.meta, map[string]any{
map[string]interface{}{ "value": ib_total,
"value": ib_total, },
}, now); err == nil {
now); err == nil {
y.AddMeta("unit", "bytes") y.AddMeta("unit", "bytes")
output <- y output <- y
} }
if y, err := if y, err := lp.NewMessage(
lp.NewMessage( "ib_total_pkts",
"ib_total_pkts", info.tagSet,
info.tagSet, m.meta,
m.meta, map[string]any{
map[string]interface{}{ "value": ib_total_pkts,
"value": ib_total_pkts, },
}, now); err == nil {
now); err == nil {
y.AddMeta("unit", "packets") y.AddMeta("unit", "packets")
output <- y output <- y
} }

View File

@@ -9,15 +9,17 @@ package collectors
import ( import (
"bufio" "bufio"
"bytes"
"encoding/json" "encoding/json"
"errors" "fmt"
"os" "os"
"slices"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
const IOSTATFILE = `/proc/diskstats` const IOSTATFILE = `/proc/diskstats`
@@ -35,21 +37,24 @@ type IOstatCollectorEntry struct {
type IOstatCollector struct { type IOstatCollector struct {
metricCollector metricCollector
matches map[string]int matches map[string]int
config IOstatCollectorConfig config IOstatCollectorConfig
devices map[string]IOstatCollectorEntry devices map[string]IOstatCollectorEntry
} }
func (m *IOstatCollector) Init(config json.RawMessage) error { func (m *IOstatCollector) Init(config json.RawMessage) error {
var err error
m.name = "IOstatCollector" m.name = "IOstatCollector"
m.parallel = true m.parallel = true
m.meta = map[string]string{"source": m.name, "group": "Disk"} m.meta = map[string]string{"source": m.name, "group": "Disk"}
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
// https://www.kernel.org/doc/html/latest/admin-guide/iostats.html // https://www.kernel.org/doc/html/latest/admin-guide/iostats.html
@@ -75,19 +80,17 @@ func (m *IOstatCollector) Init(config json.RawMessage) error {
m.devices = make(map[string]IOstatCollectorEntry) m.devices = make(map[string]IOstatCollectorEntry)
m.matches = make(map[string]int) m.matches = make(map[string]int)
for k, v := range matches { for k, v := range matches {
if _, skip := stringArrayContains(m.config.ExcludeMetrics, k); !skip { if !slices.Contains(m.config.ExcludeMetrics, k) {
m.matches[k] = v m.matches[k] = v
} }
} }
if len(m.matches) == 0 { if len(m.matches) == 0 {
return errors.New("no metrics to collect") return fmt.Errorf("%s Init(): no metrics to collect", m.name)
} }
file, err := os.Open(IOSTATFILE) file, err := os.Open(IOSTATFILE)
if err != nil { if err != nil {
cclog.ComponentError(m.name, err.Error()) return fmt.Errorf("%s Init(): Failed to open file \"%s\": %w", m.name, IOSTATFILE, err)
return err
} }
defer file.Close()
scanner := bufio.NewScanner(file) scanner := bufio.NewScanner(file)
for scanner.Scan() { for scanner.Scan() {
@@ -101,7 +104,7 @@ func (m *IOstatCollector) Init(config json.RawMessage) error {
if strings.Contains(device, "loop") { if strings.Contains(device, "loop") {
continue continue
} }
if _, skip := stringArrayContains(m.config.ExcludeDevices, device); skip { if slices.Contains(m.config.ExcludeDevices, device) {
continue continue
} }
currentValues := make(map[string]int64) currentValues := make(map[string]int64)
@@ -127,8 +130,12 @@ func (m *IOstatCollector) Init(config json.RawMessage) error {
lastValues: lastValues, lastValues: lastValues,
} }
} }
if err := file.Close(); err != nil {
return fmt.Errorf("%s Init(): Failed to close file \"%s\": %w", m.name, IOSTATFILE, err)
}
m.init = true m.init = true
return err return nil
} }
func (m *IOstatCollector) Read(interval time.Duration, output chan lp.CCMessage) { func (m *IOstatCollector) Read(interval time.Duration, output chan lp.CCMessage) {
@@ -138,10 +145,18 @@ func (m *IOstatCollector) Read(interval time.Duration, output chan lp.CCMessage)
file, err := os.Open(IOSTATFILE) file, err := os.Open(IOSTATFILE)
if err != nil { if err != nil {
cclog.ComponentError(m.name, err.Error()) cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to open file '%s': %v", IOSTATFILE, err))
return return
} }
defer file.Close() defer func() {
if err := file.Close(); err != nil {
cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to close file '%s': %v", IOSTATFILE, err))
}
}()
scanner := bufio.NewScanner(file) scanner := bufio.NewScanner(file)
for scanner.Scan() { for scanner.Scan() {
@@ -157,7 +172,7 @@ func (m *IOstatCollector) Read(interval time.Duration, output chan lp.CCMessage)
if strings.Contains(device, "loop") { if strings.Contains(device, "loop") {
continue continue
} }
if _, skip := stringArrayContains(m.config.ExcludeDevices, device); skip { if slices.Contains(m.config.ExcludeDevices, device) {
continue continue
} }
if _, ok := m.devices[device]; !ok { if _, ok := m.devices[device]; !ok {

View File

@@ -11,23 +11,22 @@ import (
"bufio" "bufio"
"bytes" "bytes"
"encoding/json" "encoding/json"
"errors"
"fmt" "fmt"
"io" "io"
"log"
"os/exec" "os/exec"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
const IPMISENSORS_PATH = `ipmi-sensors` const IPMISENSORS_PATH = `ipmi-sensors`
type IpmiCollector struct { type IpmiCollector struct {
metricCollector metricCollector
config struct { config struct {
ExcludeDevices []string `json:"exclude_devices"` ExcludeDevices []string `json:"exclude_devices"`
IpmitoolPath string `json:"ipmitool_path"` IpmitoolPath string `json:"ipmitool_path"`
@@ -44,7 +43,9 @@ func (m *IpmiCollector) Init(config json.RawMessage) error {
} }
m.name = "IpmiCollector" m.name = "IpmiCollector"
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true m.parallel = true
m.meta = map[string]string{ m.meta = map[string]string{
"source": m.name, "source": m.name,
@@ -54,9 +55,10 @@ func (m *IpmiCollector) Init(config json.RawMessage) error {
m.config.IpmitoolPath = "ipmitool" m.config.IpmitoolPath = "ipmitool"
m.config.IpmisensorsPath = "ipmi-sensors" m.config.IpmisensorsPath = "ipmi-sensors"
if len(config) > 0 { if len(config) > 0 {
err := json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
// Check if executables ipmitool or ipmisensors are found // Check if executables ipmitool or ipmisensors are found
@@ -65,7 +67,7 @@ func (m *IpmiCollector) Init(config json.RawMessage) error {
command := exec.Command(p) command := exec.Command(p)
err := command.Run() err := command.Run()
if err != nil { if err != nil {
cclog.ComponentError(m.name, fmt.Sprintf("Failed to execute %s: %v", p, err.Error())) cclog.ComponentError(m.name, fmt.Sprintf("Failed to execute %s: %s", p, err.Error()))
m.ipmitool = "" m.ipmitool = ""
} else { } else {
m.ipmitool = p m.ipmitool = p
@@ -76,14 +78,14 @@ func (m *IpmiCollector) Init(config json.RawMessage) error {
command := exec.Command(p) command := exec.Command(p)
err := command.Run() err := command.Run()
if err != nil { if err != nil {
cclog.ComponentError(m.name, fmt.Sprintf("Failed to execute %s: %v", p, err.Error())) cclog.ComponentError(m.name, fmt.Sprintf("Failed to execute %s: %s", p, err.Error()))
m.ipmisensors = "" m.ipmisensors = ""
} else { } else {
m.ipmisensors = p m.ipmisensors = p
} }
} }
if len(m.ipmitool) == 0 && len(m.ipmisensors) == 0 { if len(m.ipmitool) == 0 && len(m.ipmisensors) == 0 {
return errors.New("no usable IPMI reader found") return fmt.Errorf("%s Init(): no usable IPMI reader found", m.name)
} }
m.init = true m.init = true
@@ -91,7 +93,6 @@ func (m *IpmiCollector) Init(config json.RawMessage) error {
} }
func (m *IpmiCollector) readIpmiTool(cmd string, output chan lp.CCMessage) { func (m *IpmiCollector) readIpmiTool(cmd string, output chan lp.CCMessage) {
// Setup ipmitool command // Setup ipmitool command
command := exec.Command(cmd, "sensor") command := exec.Command(cmd, "sensor")
stdout, _ := command.StdoutPipe() stdout, _ := command.StdoutPipe()
@@ -116,19 +117,20 @@ func (m *IpmiCollector) readIpmiTool(cmd string, output chan lp.CCMessage) {
} }
v, err := strconv.ParseFloat(strings.TrimSpace(lv[1]), 64) v, err := strconv.ParseFloat(strings.TrimSpace(lv[1]), 64)
if err == nil { if err == nil {
name := strings.ToLower(strings.Replace(strings.TrimSpace(lv[0]), " ", "_", -1)) name := strings.ToLower(strings.ReplaceAll(strings.TrimSpace(lv[0]), " ", "_"))
unit := strings.TrimSpace(lv[2]) unit := strings.TrimSpace(lv[2])
if unit == "Volts" { switch unit {
case "Volts":
unit = "Volts" unit = "Volts"
} else if unit == "degrees C" { case "degrees C":
unit = "degC" unit = "degC"
} else if unit == "degrees F" { case "degrees F":
unit = "degF" unit = "degF"
} else if unit == "Watts" { case "Watts":
unit = "Watts" unit = "Watts"
} }
y, err := lp.NewMessage(name, map[string]string{"type": "node"}, m.meta, map[string]interface{}{"value": v}, time.Now()) y, err := lp.NewMessage(name, map[string]string{"type": "node"}, m.meta, map[string]any{"value": v}, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", unit) y.AddMeta("unit", unit)
output <- y output <- y
@@ -149,24 +151,30 @@ func (m *IpmiCollector) readIpmiTool(cmd string, output chan lp.CCMessage) {
} }
func (m *IpmiCollector) readIpmiSensors(cmd string, output chan lp.CCMessage) { func (m *IpmiCollector) readIpmiSensors(cmd string, output chan lp.CCMessage) {
// Setup ipmisensors command
command := exec.Command(cmd, "--comma-separated-output", "--sdr-cache-recreate") command := exec.Command(cmd, "--comma-separated-output", "--sdr-cache-recreate")
command.Wait() stdout, _ := command.StdoutPipe()
stdout, err := command.Output() errBuf := new(bytes.Buffer)
if err != nil { command.Stderr = errBuf
log.Print(err)
// start command
if err := command.Start(); err != nil {
cclog.ComponentError(
m.name,
fmt.Sprintf("readIpmiSensors(): Failed to start command \"%s\": %v", command.String(), err),
)
return return
} }
ll := strings.Split(string(stdout), "\n") // Read command output
scanner := bufio.NewScanner(stdout)
for _, line := range ll { for scanner.Scan() {
lv := strings.Split(line, ",") lv := strings.Split(scanner.Text(), ",")
if len(lv) > 3 { if len(lv) > 3 {
v, err := strconv.ParseFloat(lv[3], 64) v, err := strconv.ParseFloat(lv[3], 64)
if err == nil { if err == nil {
name := strings.ToLower(strings.Replace(lv[1], " ", "_", -1)) name := strings.ToLower(strings.ReplaceAll(lv[1], " ", "_"))
y, err := lp.NewMessage(name, map[string]string{"type": "node"}, m.meta, map[string]interface{}{"value": v}, time.Now()) y, err := lp.NewMessage(name, map[string]string{"type": "node"}, m.meta, map[string]any{"value": v}, time.Now())
if err == nil { if err == nil {
if len(lv) > 4 { if len(lv) > 4 {
y.AddMeta("unit", lv[4]) y.AddMeta("unit", lv[4])
@@ -176,10 +184,20 @@ func (m *IpmiCollector) readIpmiSensors(cmd string, output chan lp.CCMessage) {
} }
} }
} }
// Wait for command end
if err := command.Wait(); err != nil {
errMsg, _ := io.ReadAll(errBuf)
cclog.ComponentError(
m.name,
fmt.Sprintf("readIpmiSensors(): Failed to wait for the end of command \"%s\": %v\n", command.String(), err),
)
cclog.ComponentError(m.name, fmt.Sprintf("readIpmiSensors(): command stderr: \"%s\"\n", strings.TrimSpace(string(errMsg))))
return
}
} }
func (m *IpmiCollector) Read(interval time.Duration, output chan lp.CCMessage) { func (m *IpmiCollector) Read(interval time.Duration, output chan lp.CCMessage) {
// Check if already initialized // Check if already initialized
if !m.init { if !m.init {
return return

View File

@@ -14,7 +14,7 @@ hugo_path: docs/reference/cc-metric-collector/collectors/ipmi.md
```json ```json
"ipmistat": { "ipmistat": {
"ipmitool_path": "/path/to/ipmitool", "ipmitool_path": "/path/to/ipmitool",
"ipmisensors_path": "/path/to/ipmi-sensors", "ipmisensors_path": "/path/to/ipmi-sensors"
} }
``` ```

View File

@@ -16,14 +16,15 @@ package collectors
import "C" import "C"
import ( import (
"bytes"
"encoding/json" "encoding/json"
"errors"
"fmt" "fmt"
"maps"
"math" "math"
"os" "os"
"os/signal" "os/signal"
"os/user" "os/user"
"sort" "slices"
"strconv" "strconv"
"strings" "strings"
"sync" "sync"
@@ -31,8 +32,8 @@ import (
"time" "time"
"unsafe" "unsafe"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
agg "github.com/ClusterCockpit/cc-metric-collector/internal/metricAggregator" agg "github.com/ClusterCockpit/cc-metric-collector/internal/metricAggregator"
topo "github.com/ClusterCockpit/cc-metric-collector/pkg/ccTopology" topo "github.com/ClusterCockpit/cc-metric-collector/pkg/ccTopology"
"github.com/NVIDIA/go-nvml/pkg/dl" "github.com/NVIDIA/go-nvml/pkg/dl"
@@ -124,22 +125,14 @@ func checkMetricType(t string) bool {
return ok return ok
} }
func eventsToEventStr(events map[string]string) string {
elist := make([]string, 0)
for k, v := range events {
elist = append(elist, fmt.Sprintf("%s:%s", v, k))
}
return strings.Join(elist, ",")
}
func genLikwidEventSet(input LikwidCollectorEventsetConfig) LikwidEventsetConfig { func genLikwidEventSet(input LikwidCollectorEventsetConfig) LikwidEventsetConfig {
tmplist := make([]string, 0) clist := make([]string, 0, len(input.Events))
clist := make([]string, 0)
for k := range input.Events { for k := range input.Events {
clist = append(clist, k) clist = append(clist, k)
} }
sort.Strings(clist) slices.Sort(clist)
elist := make([]*C.char, 0) tmplist := make([]string, 0, len(clist))
elist := make([]*C.char, 0, len(clist))
for _, k := range clist { for _, k := range clist {
v := input.Events[k] v := input.Events[k]
tmplist = append(tmplist, fmt.Sprintf("%s:%s", v, k)) tmplist = append(tmplist, fmt.Sprintf("%s:%s", v, k))
@@ -149,7 +142,7 @@ func genLikwidEventSet(input LikwidCollectorEventsetConfig) LikwidEventsetConfig
estr := strings.Join(tmplist, ",") estr := strings.Join(tmplist, ",")
res := make(map[int]map[string]float64) res := make(map[int]map[string]float64)
met := make(map[int]map[string]float64) met := make(map[int]map[string]float64)
for _, i := range topo.CpuList() { for _, i := range topo.HwthreadList() {
res[i] = make(map[string]float64) res[i] = make(map[string]float64)
for k := range input.Events { for k := range input.Events {
res[i][k] = 0.0 res[i][k] = 0.0
@@ -187,7 +180,7 @@ func getBaseFreq() float64 {
for _, f := range files { for _, f := range files {
buffer, err := os.ReadFile(f) buffer, err := os.ReadFile(f)
if err == nil { if err == nil {
data := strings.Replace(string(buffer), "\n", "", -1) data := strings.ReplaceAll(string(buffer), "\n", "")
x, err := strconv.ParseInt(data, 0, 64) x, err := strconv.ParseInt(data, 0, 64)
if err == nil { if err == nil {
freq = float64(x) freq = float64(x)
@@ -214,25 +207,30 @@ func (m *LikwidCollector) Init(config json.RawMessage) error {
m.config.LibraryPath = LIKWID_LIB_NAME m.config.LibraryPath = LIKWID_LIB_NAME
m.config.LockfilePath = LIKWID_DEF_LOCKFILE m.config.LockfilePath = LIKWID_DEF_LOCKFILE
if len(config) > 0 { if len(config) > 0 {
err := json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
lib := dl.New(m.config.LibraryPath, LIKWID_LIB_DL_FLAGS) lib := dl.New(m.config.LibraryPath, LIKWID_LIB_DL_FLAGS)
if lib == nil { if lib == nil {
return fmt.Errorf("error instantiating DynamicLibrary for %s", m.config.LibraryPath) return fmt.Errorf("%s Init(): error instantiating DynamicLibrary for %s", m.name, m.config.LibraryPath)
} }
err := lib.Open() err := lib.Open()
if err != nil { if err != nil {
return fmt.Errorf("error opening %s: %v", m.config.LibraryPath, err) return fmt.Errorf("%s Init(): error opening %s: %w", m.name, m.config.LibraryPath, err)
} }
if m.config.ForceOverwrite { if m.config.ForceOverwrite {
cclog.ComponentDebug(m.name, "Set LIKWID_FORCE=1") cclog.ComponentDebug(m.name, "Set LIKWID_FORCE=1")
os.Setenv("LIKWID_FORCE", "1") if err := os.Setenv("LIKWID_FORCE", "1"); err != nil {
return fmt.Errorf("%s Init(): error setting environment variable LIKWID_FORCE=1: %w", m.name, err)
}
}
if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
} }
m.setup()
m.meta = map[string]string{"group": "PerfCounter"} m.meta = map[string]string{"group": "PerfCounter"}
cclog.ComponentDebug(m.name, "Get cpulist and init maps and lists") cclog.ComponentDebug(m.name, "Get cpulist and init maps and lists")
@@ -298,16 +296,12 @@ func (m *LikwidCollector) Init(config json.RawMessage) error {
// If no event set could be added, shut down LikwidCollector // If no event set could be added, shut down LikwidCollector
if totalMetrics == 0 { if totalMetrics == 0 {
err := errors.New("no LIKWID eventset or metric usable") return fmt.Errorf("%s Init(): no LIKWID eventset or metric usable", m.name)
cclog.ComponentError(m.name, err.Error())
return err
} }
ret := C.topology_init() ret := C.topology_init()
if ret != 0 { if ret != 0 {
err := errors.New("failed to initialize topology module") return fmt.Errorf("%s Init(): failed to initialize topology module", m.name)
cclog.ComponentError(m.name, err.Error())
return err
} }
m.measureThread = thread.New() m.measureThread = thread.New()
switch m.config.AccessMode { switch m.config.AccessMode {
@@ -316,7 +310,14 @@ func (m *LikwidCollector) Init(config json.RawMessage) error {
case "accessdaemon": case "accessdaemon":
if len(m.config.DaemonPath) > 0 { if len(m.config.DaemonPath) > 0 {
p := os.Getenv("PATH") p := os.Getenv("PATH")
os.Setenv("PATH", m.config.DaemonPath+":"+p) if len(p) > 0 {
p = m.config.DaemonPath + ":" + p
} else {
p = m.config.DaemonPath
}
if err := os.Setenv("PATH", p); err != nil {
return fmt.Errorf("%s Init(): error setting environment variable PATH=%s: %w", m.name, p, err)
}
} }
C.HPMmode(1) C.HPMmode(1)
retCode := C.HPMinit() retCode := C.HPMinit()
@@ -327,7 +328,7 @@ func (m *LikwidCollector) Init(config json.RawMessage) error {
for _, c := range m.cpulist { for _, c := range m.cpulist {
m.measureThread.Call( m.measureThread.Call(
func() { func() {
retCode := C.HPMaddThread(c) retCode := C.HPMaddThread(C.uint32_t(c))
if retCode != 0 { if retCode != 0 {
err := fmt.Errorf("C.HPMaddThread(%v) failed with return code %v", c, retCode) err := fmt.Errorf("C.HPMaddThread(%v) failed with return code %v", c, retCode)
cclog.ComponentError(m.name, err.Error()) cclog.ComponentError(m.name, err.Error())
@@ -369,16 +370,23 @@ func (m *LikwidCollector) Init(config json.RawMessage) error {
// take a measurement for 'interval' seconds of event set index 'group' // take a measurement for 'interval' seconds of event set index 'group'
func (m *LikwidCollector) takeMeasurement(evidx int, evset LikwidEventsetConfig, interval time.Duration) (bool, error) { func (m *LikwidCollector) takeMeasurement(evidx int, evset LikwidEventsetConfig, interval time.Duration) (bool, error) {
var ret C.int var ret C.int
var gid C.int = -1
sigchan := make(chan os.Signal, 1) sigchan := make(chan os.Signal, 1)
// Watch changes for the lock file () // Watch changes for the lock file ()
watcher, err := fsnotify.NewWatcher() watcher, err := fsnotify.NewWatcher()
if err != nil { if err != nil {
cclog.ComponentError(m.name, err.Error()) cclog.ComponentError(
m.name,
fmt.Sprintf("takeMeasurement(): Failed to create a new fsnotify.Watcher: %v", err))
return true, err return true, err
} }
defer watcher.Close() defer func() {
if err := watcher.Close(); err != nil {
cclog.ComponentError(
m.name,
fmt.Sprintf("takeMeasurement(): Failed to close fsnotify.Watcher: %v", err))
}
}()
if len(m.config.LockfilePath) > 0 { if len(m.config.LockfilePath) > 0 {
// Check if the lock file exists // Check if the lock file exists
info, err := os.Stat(m.config.LockfilePath) info, err := os.Stat(m.config.LockfilePath)
@@ -386,9 +394,11 @@ func (m *LikwidCollector) takeMeasurement(evidx int, evset LikwidEventsetConfig,
// Create the lock file if it does not exist // Create the lock file if it does not exist
file, createErr := os.Create(m.config.LockfilePath) file, createErr := os.Create(m.config.LockfilePath)
if createErr != nil { if createErr != nil {
return true, fmt.Errorf("failed to create lock file: %v", createErr) return true, fmt.Errorf("failed to create lock file: %w", createErr)
}
if err := file.Close(); err != nil {
return true, fmt.Errorf("failed to close lock file: %w", err)
} }
file.Close()
info, err = os.Stat(m.config.LockfilePath) // Recheck the file after creation info, err = os.Stat(m.config.LockfilePath) // Recheck the file after creation
} }
if err != nil { if err != nil {
@@ -440,6 +450,7 @@ func (m *LikwidCollector) takeMeasurement(evidx int, evset LikwidEventsetConfig,
signal.Notify(sigchan, syscall.SIGCHLD) signal.Notify(sigchan, syscall.SIGCHLD)
// Add an event string to LIKWID // Add an event string to LIKWID
var gid C.int
select { select {
case <-sigchan: case <-sigchan:
gid = -1 gid = -1
@@ -595,20 +606,20 @@ func (m *LikwidCollector) calcEventsetMetrics(evset LikwidEventsetConfig, interv
evset.metrics[tid][metric.Name] = value evset.metrics[tid][metric.Name] = value
// Now we have the result, send it with the proper tags // Now we have the result, send it with the proper tags
if !math.IsNaN(value) && metric.Publish { if !math.IsNaN(value) && metric.Publish {
fields := map[string]interface{}{"value": value} y, err := lp.NewMessage(
y, err := metric.Name,
lp.NewMessage( map[string]string{
metric.Name, "type": metric.Type,
map[string]string{ },
"type": metric.Type, m.meta,
}, map[string]any{
m.meta, "value": value,
fields, },
now, now,
) )
if err == nil { if err == nil {
if metric.Type != "node" { if metric.Type != "node" {
y.AddTag("type-id", fmt.Sprintf("%d", domain)) y.AddTag("type-id", strconv.Itoa(domain))
} }
if len(metric.Unit) > 0 { if len(metric.Unit) > 0 {
y.AddMeta("unit", metric.Unit) y.AddMeta("unit", metric.Unit)
@@ -633,19 +644,18 @@ func (m *LikwidCollector) calcEventsetMetrics(evset LikwidEventsetConfig, interv
} }
for coreID, value := range totalCoreValues { for coreID, value := range totalCoreValues {
y, err := y, err := lp.NewMessage(
lp.NewMessage( metric.Name,
metric.Name, map[string]string{
map[string]string{ "type": "core",
"type": "core", "type-id": strconv.Itoa(coreID),
"type-id": fmt.Sprintf("%d", coreID), },
}, m.meta,
m.meta, map[string]any{
map[string]interface{}{ "value": value,
"value": value, },
}, now,
now, )
)
if err != nil { if err != nil {
continue continue
} }
@@ -670,19 +680,18 @@ func (m *LikwidCollector) calcEventsetMetrics(evset LikwidEventsetConfig, interv
} }
for socketID, value := range totalSocketValues { for socketID, value := range totalSocketValues {
y, err := y, err := lp.NewMessage(
lp.NewMessage( metric.Name,
metric.Name, map[string]string{
map[string]string{ "type": "socket",
"type": "socket", "type-id": strconv.Itoa(socketID),
"type-id": fmt.Sprintf("%d", socketID), },
}, m.meta,
m.meta, map[string]any{
map[string]interface{}{ "value": value,
"value": value, },
}, now,
now, )
)
if err != nil { if err != nil {
continue continue
} }
@@ -705,18 +714,17 @@ func (m *LikwidCollector) calcEventsetMetrics(evset LikwidEventsetConfig, interv
} }
} }
y, err := y, err := lp.NewMessage(
lp.NewMessage( metric.Name,
metric.Name, map[string]string{
map[string]string{ "type": "node",
"type": "node", },
}, m.meta,
m.meta, map[string]any{
map[string]interface{}{ "value": totalNodeValue,
"value": totalNodeValue, },
}, now,
now, )
)
if err != nil { if err != nil {
continue continue
} }
@@ -748,9 +756,7 @@ func (m *LikwidCollector) calcGlobalMetrics(groups []LikwidEventsetConfig, inter
// Here we generate parameter list // Here we generate parameter list
params := make(map[string]float64) params := make(map[string]float64)
for _, evset := range groups { for _, evset := range groups {
for mname, mres := range evset.metrics[tid] { maps.Copy(params, evset.metrics[tid])
params[mname] = mres
}
} }
params["gotime"] = interval.Seconds() params["gotime"] = interval.Seconds()
// Evaluate the metric // Evaluate the metric
@@ -765,21 +771,20 @@ func (m *LikwidCollector) calcGlobalMetrics(groups []LikwidEventsetConfig, inter
// Now we have the result, send it with the proper tags // Now we have the result, send it with the proper tags
if !math.IsNaN(value) { if !math.IsNaN(value) {
if metric.Publish { if metric.Publish {
y, err := y, err := lp.NewMessage(
lp.NewMessage( metric.Name,
metric.Name, map[string]string{
map[string]string{ "type": metric.Type,
"type": metric.Type, },
}, m.meta,
m.meta, map[string]any{
map[string]interface{}{ "value": value,
"value": value, },
}, now,
now, )
)
if err == nil { if err == nil {
if metric.Type != "node" { if metric.Type != "node" {
y.AddTag("type-id", fmt.Sprintf("%d", domain)) y.AddTag("type-id", strconv.Itoa(domain))
} }
if len(metric.Unit) > 0 { if len(metric.Unit) > 0 {
y.AddMeta("unit", metric.Unit) y.AddMeta("unit", metric.Unit)
@@ -795,7 +800,7 @@ func (m *LikwidCollector) calcGlobalMetrics(groups []LikwidEventsetConfig, inter
} }
func (m *LikwidCollector) ReadThread(interval time.Duration, output chan lp.CCMessage) { func (m *LikwidCollector) ReadThread(interval time.Duration, output chan lp.CCMessage) {
var err error = nil var err error
groups := make([]LikwidEventsetConfig, 0) groups := make([]LikwidEventsetConfig, 0)
for evidx, evset := range m.config.Eventsets { for evidx, evset := range m.config.Eventsets {
@@ -813,13 +818,21 @@ func (m *LikwidCollector) ReadThread(interval time.Duration, output chan lp.CCMe
if !skip { if !skip {
// read measurements and derive event set metrics // read measurements and derive event set metrics
m.calcEventsetMetrics(e, interval, output) err = m.calcEventsetMetrics(e, interval, output)
if err != nil {
cclog.ComponentError(m.name, err.Error())
return
}
groups = append(groups, e) groups = append(groups, e)
} }
} }
if len(groups) > 0 { if len(groups) > 0 {
// calculate global metrics // calculate global metrics
m.calcGlobalMetrics(groups, interval, output) err = m.calcGlobalMetrics(groups, interval, output)
if err != nil {
cclog.ComponentError(m.name, err.Error())
return
}
} }
} }

View File

@@ -8,15 +8,17 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"fmt" "fmt"
"os" "os"
"slices"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
// LoadavgCollector collects: // LoadavgCollector collects:
@@ -29,6 +31,7 @@ const LOADAVGFILE = "/proc/loadavg"
type LoadavgCollector struct { type LoadavgCollector struct {
metricCollector metricCollector
tags map[string]string tags map[string]string
load_matches []string load_matches []string
load_skips []bool load_skips []bool
@@ -42,32 +45,39 @@ type LoadavgCollector struct {
func (m *LoadavgCollector) Init(config json.RawMessage) error { func (m *LoadavgCollector) Init(config json.RawMessage) error {
m.name = "LoadavgCollector" m.name = "LoadavgCollector"
m.parallel = true m.parallel = true
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
if len(config) > 0 { if len(config) > 0 {
err := json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
m.meta = map[string]string{ m.meta = map[string]string{
"source": m.name, "source": m.name,
"group": "LOAD"} "group": "LOAD",
}
m.tags = map[string]string{"type": "node"} m.tags = map[string]string{"type": "node"}
m.load_matches = []string{ m.load_matches = []string{
"load_one", "load_one",
"load_five", "load_five",
"load_fifteen"} "load_fifteen",
m.load_skips = make([]bool, len(m.load_matches)) }
m.proc_matches = []string{ m.proc_matches = []string{
"proc_run", "proc_run",
"proc_total"} "proc_total",
m.proc_skips = make([]bool, len(m.proc_matches))
for i, name := range m.load_matches {
_, m.load_skips[i] = stringArrayContains(m.config.ExcludeMetrics, name)
} }
m.load_skips = make([]bool, len(m.load_matches))
for i, name := range m.load_matches {
m.load_skips[i] = slices.Contains(m.config.ExcludeMetrics, name)
}
m.proc_skips = make([]bool, len(m.proc_matches))
for i, name := range m.proc_matches { for i, name := range m.proc_matches {
_, m.proc_skips[i] = stringArrayContains(m.config.ExcludeMetrics, name) m.proc_skips[i] = slices.Contains(m.config.ExcludeMetrics, name)
} }
m.init = true m.init = true
return nil return nil
@@ -99,7 +109,7 @@ func (m *LoadavgCollector) Read(interval time.Duration, output chan lp.CCMessage
if m.load_skips[i] { if m.load_skips[i] {
continue continue
} }
y, err := lp.NewMessage(name, m.tags, m.meta, map[string]interface{}{"value": x}, now) y, err := lp.NewMessage(name, m.tags, m.meta, map[string]any{"value": x}, now)
if err == nil { if err == nil {
output <- y output <- y
} }
@@ -118,7 +128,7 @@ func (m *LoadavgCollector) Read(interval time.Duration, output chan lp.CCMessage
if m.proc_skips[i] { if m.proc_skips[i] {
continue continue
} }
y, err := lp.NewMessage(name, m.tags, m.meta, map[string]interface{}{"value": x}, now) y, err := lp.NewMessage(name, m.tags, m.meta, map[string]any{"value": x}, now)
if err == nil { if err == nil {
output <- y output <- y
} }

View File

@@ -8,22 +8,25 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"errors" "errors"
"fmt" "fmt"
"os/exec" "os/exec"
"os/user" "os/user"
"slices"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
lp "github.com/ClusterCockpit/cc-lib/ccMessage"
) )
const LUSTRE_SYSFS = `/sys/fs/lustre` const (
const LCTL_CMD = `lctl` LUSTRE_SYSFS = `/sys/fs/lustre`
const LCTL_OPTION = `get_param` LCTL_CMD = `lctl`
LCTL_OPTION = `get_param`
)
type LustreCollectorConfig struct { type LustreCollectorConfig struct {
LCtlCommand string `json:"lctl_command,omitempty"` LCtlCommand string `json:"lctl_command,omitempty"`
@@ -44,6 +47,7 @@ type LustreMetricDefinition struct {
type LustreCollector struct { type LustreCollector struct {
metricCollector metricCollector
tags map[string]string tags map[string]string
config LustreCollectorConfig config LustreCollectorConfig
lctl string lctl string
@@ -61,7 +65,6 @@ func (m *LustreCollector) getDeviceDataCommand(device string) []string {
} else { } else {
command = exec.Command(m.lctl, LCTL_OPTION, statsfile) command = exec.Command(m.lctl, LCTL_OPTION, statsfile)
} }
command.Wait()
stdout, _ := command.Output() stdout, _ := command.Output()
return strings.Split(string(stdout), "\n") return strings.Split(string(stdout), "\n")
} }
@@ -297,12 +300,15 @@ func (m *LustreCollector) Init(config json.RawMessage) error {
m.name = "LustreCollector" m.name = "LustreCollector"
m.parallel = true m.parallel = true
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.tags = map[string]string{"type": "node"} m.tags = map[string]string{"type": "node"}
m.meta = map[string]string{"source": m.name, "group": "Lustre"} m.meta = map[string]string{"source": m.name, "group": "Lustre"}
@@ -311,18 +317,15 @@ func (m *LustreCollector) Init(config json.RawMessage) error {
if !m.config.Sudo { if !m.config.Sudo {
user, err := user.Current() user, err := user.Current()
if err != nil { if err != nil {
cclog.ComponentError(m.name, "Failed to get current user:", err.Error()) return fmt.Errorf("%s Init(): Failed to get current user: %w", m.name, err)
return err
} }
if user.Uid != "0" { if user.Uid != "0" {
cclog.ComponentError(m.name, "Lustre file system statistics can only be queried by user root") return fmt.Errorf("%s Init(): Lustre file system statistics can only be queried by user root", m.name)
return err
} }
} else { } else {
p, err := exec.LookPath("sudo") p, err := exec.LookPath("sudo")
if err != nil { if err != nil {
cclog.ComponentError(m.name, "Cannot find 'sudo'") return fmt.Errorf("%s Init(): Cannot find 'sudo': %w", m.name, err)
return err
} }
m.sudoCmd = p m.sudoCmd = p
} }
@@ -331,7 +334,7 @@ func (m *LustreCollector) Init(config json.RawMessage) error {
if err != nil { if err != nil {
p, err = exec.LookPath(LCTL_CMD) p, err = exec.LookPath(LCTL_CMD)
if err != nil { if err != nil {
return err return fmt.Errorf("%s Init(): Cannot find %s command: %w", m.name, LCTL_CMD, err)
} }
} }
m.lctl = p m.lctl = p
@@ -339,32 +342,32 @@ func (m *LustreCollector) Init(config json.RawMessage) error {
m.definitions = []LustreMetricDefinition{} m.definitions = []LustreMetricDefinition{}
if m.config.SendAbsoluteValues { if m.config.SendAbsoluteValues {
for _, def := range LustreAbsMetrics { for _, def := range LustreAbsMetrics {
if _, skip := stringArrayContains(m.config.ExcludeMetrics, def.name); !skip { if !slices.Contains(m.config.ExcludeMetrics, def.name) {
m.definitions = append(m.definitions, def) m.definitions = append(m.definitions, def)
} }
} }
} }
if m.config.SendDiffValues { if m.config.SendDiffValues {
for _, def := range LustreDiffMetrics { for _, def := range LustreDiffMetrics {
if _, skip := stringArrayContains(m.config.ExcludeMetrics, def.name); !skip { if !slices.Contains(m.config.ExcludeMetrics, def.name) {
m.definitions = append(m.definitions, def) m.definitions = append(m.definitions, def)
} }
} }
} }
if m.config.SendDerivedValues { if m.config.SendDerivedValues {
for _, def := range LustreDeriveMetrics { for _, def := range LustreDeriveMetrics {
if _, skip := stringArrayContains(m.config.ExcludeMetrics, def.name); !skip { if !slices.Contains(m.config.ExcludeMetrics, def.name) {
m.definitions = append(m.definitions, def) m.definitions = append(m.definitions, def)
} }
} }
} }
if len(m.definitions) == 0 { if len(m.definitions) == 0 {
return errors.New("no metrics to collect") return fmt.Errorf("%s Init(): no metrics to collect", m.name)
} }
devices := m.getDevices() devices := m.getDevices()
if len(devices) == 0 { if len(devices) == 0 {
return errors.New("no Lustre devices found") return fmt.Errorf("%s Init(): no Lustre devices found", m.name)
} }
m.stats = make(map[string]map[string]int64) m.stats = make(map[string]map[string]int64)
for _, d := range devices { for _, d := range devices {
@@ -402,23 +405,23 @@ func (m *LustreCollector) Read(interval time.Duration, output chan lp.CCMessage)
} else { } else {
use_x = devData[def.name] use_x = devData[def.name]
} }
var value interface{} var value any
switch def.calc { switch def.calc {
case "none": case "none":
value = use_x value = use_x
y, err = lp.NewMessage(def.name, m.tags, m.meta, map[string]interface{}{"value": value}, time.Now()) y, err = lp.NewMessage(def.name, m.tags, m.meta, map[string]any{"value": value}, time.Now())
case "difference": case "difference":
value = use_x - devData[def.name] value = use_x - devData[def.name]
if value.(int64) < 0 { if value.(int64) < 0 {
value = 0 value = 0
} }
y, err = lp.NewMessage(def.name, m.tags, m.meta, map[string]interface{}{"value": value}, time.Now()) y, err = lp.NewMessage(def.name, m.tags, m.meta, map[string]any{"value": value}, time.Now())
case "derivative": case "derivative":
value = float64(use_x-devData[def.name]) / tdiff.Seconds() value = float64(use_x-devData[def.name]) / tdiff.Seconds()
if value.(float64) < 0 { if value.(float64) < 0 {
value = 0 value = 0
} }
y, err = lp.NewMessage(def.name, m.tags, m.meta, map[string]interface{}{"value": value}, time.Now()) y, err = lp.NewMessage(def.name, m.tags, m.meta, map[string]any{"value": value}, time.Now())
} }
if err == nil { if err == nil {
y.AddTag("device", device) y.AddTag("device", device)

View File

@@ -9,22 +9,25 @@ package collectors
import ( import (
"bufio" "bufio"
"bytes"
"encoding/json" "encoding/json"
"errors"
"fmt" "fmt"
"os" "os"
"path/filepath" "path/filepath"
"regexp" "regexp"
"slices"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
const MEMSTATFILE = "/proc/meminfo" const (
const NUMA_MEMSTAT_BASE = "/sys/devices/system/node" MEMSTATFILE = "/proc/meminfo"
NUMA_MEMSTAT_BASE = "/sys/devices/system/node"
)
type MemstatCollectorConfig struct { type MemstatCollectorConfig struct {
ExcludeMetrics []string `json:"exclude_metrics"` ExcludeMetrics []string `json:"exclude_metrics"`
@@ -39,6 +42,7 @@ type MemstatCollectorNode struct {
type MemstatCollector struct { type MemstatCollector struct {
metricCollector metricCollector
stats map[string]int64 stats map[string]int64
tags map[string]string tags map[string]string
matches map[string]string matches map[string]string
@@ -58,7 +62,11 @@ func getStats(filename string) map[string]MemstatStats {
if err != nil { if err != nil {
cclog.Error(err.Error()) cclog.Error(err.Error())
} }
defer file.Close() defer func() {
if err := file.Close(); err != nil {
cclog.Error(err.Error())
}
}()
scanner := bufio.NewScanner(file) scanner := bufio.NewScanner(file)
for scanner.Scan() { for scanner.Scan() {
@@ -87,15 +95,15 @@ func getStats(filename string) map[string]MemstatStats {
} }
func (m *MemstatCollector) Init(config json.RawMessage) error { func (m *MemstatCollector) Init(config json.RawMessage) error {
var err error
m.name = "MemstatCollector" m.name = "MemstatCollector"
m.parallel = true m.parallel = true
m.config.NodeStats = true m.config.NodeStats = true
m.config.NumaStats = false m.config.NumaStats = false
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
m.meta = map[string]string{"source": m.name, "group": "Memory"} m.meta = map[string]string{"source": m.name, "group": "Memory"}
@@ -115,23 +123,24 @@ func (m *MemstatCollector) Init(config json.RawMessage) error {
"MemShared": "mem_shared", "MemShared": "mem_shared",
} }
for k, v := range matches { for k, v := range matches {
_, skip := stringArrayContains(m.config.ExcludeMetrics, k) if !slices.Contains(m.config.ExcludeMetrics, k) {
if !skip {
m.matches[k] = v m.matches[k] = v
} }
} }
m.sendMemUsed = false m.sendMemUsed = false
if _, skip := stringArrayContains(m.config.ExcludeMetrics, "mem_used"); !skip { if !slices.Contains(m.config.ExcludeMetrics, "mem_used") {
m.sendMemUsed = true m.sendMemUsed = true
} }
if len(m.matches) == 0 { if len(m.matches) == 0 {
return errors.New("no metrics to collect") return fmt.Errorf("%s Init(): no metrics to collect", m.name)
}
if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
} }
m.setup()
if m.config.NodeStats { if m.config.NodeStats {
if stats := getStats(MEMSTATFILE); len(stats) == 0 { if stats := getStats(MEMSTATFILE); len(stats) == 0 {
return fmt.Errorf("cannot read data from file %s", MEMSTATFILE) return fmt.Errorf("%s Init(): cannot read data from file %s", m.name, MEMSTATFILE)
} }
} }
@@ -143,7 +152,7 @@ func (m *MemstatCollector) Init(config json.RawMessage) error {
m.nodefiles = make(map[int]MemstatCollectorNode) m.nodefiles = make(map[int]MemstatCollectorNode)
for _, f := range files { for _, f := range files {
if stats := getStats(f); len(stats) == 0 { if stats := getStats(f); len(stats) == 0 {
return fmt.Errorf("cannot read data from file %s", f) return fmt.Errorf("%s Init(): cannot read data from file %s", m.name, f)
} }
rematch := regex.FindStringSubmatch(f) rematch := regex.FindStringSubmatch(f)
if len(rematch) == 2 { if len(rematch) == 2 {
@@ -153,7 +162,7 @@ func (m *MemstatCollector) Init(config json.RawMessage) error {
file: f, file: f,
tags: map[string]string{ tags: map[string]string{
"type": "memoryDomain", "type": "memoryDomain",
"type-id": fmt.Sprintf("%d", id), "type-id": strconv.Itoa(id),
}, },
} }
m.nodefiles[id] = f m.nodefiles[id] = f
@@ -163,7 +172,7 @@ func (m *MemstatCollector) Init(config json.RawMessage) error {
} }
} }
m.init = true m.init = true
return err return nil
} }
func (m *MemstatCollector) Read(interval time.Duration, output chan lp.CCMessage) { func (m *MemstatCollector) Read(interval time.Duration, output chan lp.CCMessage) {
@@ -174,7 +183,7 @@ func (m *MemstatCollector) Read(interval time.Duration, output chan lp.CCMessage
sendStats := func(stats map[string]MemstatStats, tags map[string]string) { sendStats := func(stats map[string]MemstatStats, tags map[string]string) {
for match, name := range m.matches { for match, name := range m.matches {
var value float64 = 0 var value float64 = 0
var unit string = "" unit := ""
if v, ok := stats[match]; ok { if v, ok := stats[match]; ok {
value = v.value value = v.value
if len(v.unit) > 0 { if len(v.unit) > 0 {
@@ -182,7 +191,7 @@ func (m *MemstatCollector) Read(interval time.Duration, output chan lp.CCMessage
} }
} }
y, err := lp.NewMessage(name, tags, m.meta, map[string]interface{}{"value": value}, time.Now()) y, err := lp.NewMessage(name, tags, m.meta, map[string]any{"value": value}, time.Now())
if err == nil { if err == nil {
if len(unit) > 0 { if len(unit) > 0 {
y.AddMeta("unit", unit) y.AddMeta("unit", unit)
@@ -215,7 +224,7 @@ func (m *MemstatCollector) Read(interval time.Duration, output chan lp.CCMessage
} }
} }
} }
y, err := lp.NewMessage("mem_used", tags, m.meta, map[string]interface{}{"value": memUsed}, time.Now()) y, err := lp.NewMessage("mem_used", tags, m.meta, map[string]any{"value": memUsed}, time.Now())
if err == nil { if err == nil {
if len(unit) > 0 { if len(unit) > 0 {
y.AddMeta("unit", unit) y.AddMeta("unit", unit)

View File

@@ -12,7 +12,7 @@ import (
"fmt" "fmt"
"time" "time"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
type MetricCollector interface { type MetricCollector interface {
@@ -51,30 +51,6 @@ func (c *metricCollector) Initialized() bool {
return c.init return c.init
} }
// intArrayContains scans an array of ints if the value str is present in the array
// If the specified value is found, the corresponding array index is returned.
// The bool value is used to signal success or failure
func intArrayContains(array []int, str int) (int, bool) {
for i, a := range array {
if a == str {
return i, true
}
}
return -1, false
}
// stringArrayContains scans an array of strings if the value str is present in the array
// If the specified value is found, the corresponding array index is returned.
// The bool value is used to signal success or failure
func stringArrayContains(array []string, str string) (int, bool) {
for i, a := range array {
if a == str {
return i, true
}
}
return -1, false
}
// RemoveFromStringList removes the string r from the array of strings s // RemoveFromStringList removes the string r from the array of strings s
// If r is not contained in the array an error is returned // If r is not contained in the array an error is returned
func RemoveFromStringList(s []string, r string) ([]string, error) { func RemoveFromStringList(s []string, r string) ([]string, error) {

View File

@@ -9,15 +9,17 @@ package collectors
import ( import (
"bufio" "bufio"
"bytes"
"encoding/json" "encoding/json"
"errors" "fmt"
"os" "os"
"slices"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
const NETSTATFILE = "/proc/net/dev" const NETSTATFILE = "/proc/net/dev"
@@ -40,6 +42,7 @@ type NetstatCollectorMetric struct {
type NetstatCollector struct { type NetstatCollector struct {
metricCollector metricCollector
config NetstatCollectorConfig config NetstatCollectorConfig
aliasToCanonical map[string]string aliasToCanonical map[string]string
matches map[string][]NetstatCollectorMetric matches map[string][]NetstatCollectorMetric
@@ -65,7 +68,9 @@ func getCanonicalName(raw string, aliasToCanonical map[string]string) string {
func (m *NetstatCollector) Init(config json.RawMessage) error { func (m *NetstatCollector) Init(config json.RawMessage) error {
m.name = "NetstatCollector" m.name = "NetstatCollector"
m.parallel = true m.parallel = true
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.lastTimestamp = time.Now() m.lastTimestamp = time.Now()
const ( const (
@@ -95,10 +100,10 @@ func (m *NetstatCollector) Init(config json.RawMessage) error {
m.config.SendDerivedValues = false m.config.SendDerivedValues = false
// Read configuration file, allow overwriting default config // Read configuration file, allow overwriting default config
if len(config) > 0 { if len(config) > 0 {
err := json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
cclog.ComponentError(m.name, "Error reading config:", err.Error()) if err := d.Decode(&m.config); err != nil {
return err return fmt.Errorf("%s Init(): failed to decode JSON config: %w", m.name, err)
} }
} }
@@ -107,10 +112,8 @@ func (m *NetstatCollector) Init(config json.RawMessage) error {
// Check access to net statistic file // Check access to net statistic file
file, err := os.Open(NETSTATFILE) file, err := os.Open(NETSTATFILE)
if err != nil { if err != nil {
cclog.ComponentError(m.name, err.Error()) return fmt.Errorf("%s Init(): failed to open netstat file \"%s\": %w", m.name, NETSTATFILE, err)
return err
} }
defer file.Close()
scanner := bufio.NewScanner(file) scanner := bufio.NewScanner(file)
for scanner.Scan() { for scanner.Scan() {
@@ -129,13 +132,33 @@ func (m *NetstatCollector) Init(config json.RawMessage) error {
canonical := getCanonicalName(raw, m.aliasToCanonical) canonical := getCanonicalName(raw, m.aliasToCanonical)
// Check if device is a included device // Check if device is a included device
if _, ok := stringArrayContains(m.config.IncludeDevices, canonical); ok { if slices.Contains(m.config.IncludeDevices, canonical) {
// Tag will contain original device name (raw). // Tag will contain original device name (raw).
tags := map[string]string{"stype": "network", "stype-id": raw, "type": "node"} tags := map[string]string{
meta_unit_byte := map[string]string{"source": m.name, "group": "Network", "unit": "bytes"} "stype": "network",
meta_unit_byte_per_sec := map[string]string{"source": m.name, "group": "Network", "unit": "bytes/sec"} "stype-id": raw,
meta_unit_pkts := map[string]string{"source": m.name, "group": "Network", "unit": "packets"} "type": "node",
meta_unit_pkts_per_sec := map[string]string{"source": m.name, "group": "Network", "unit": "packets/sec"} }
meta_unit_byte := map[string]string{
"source": m.name,
"group": "Network",
"unit": "bytes",
}
meta_unit_byte_per_sec := map[string]string{
"source": m.name,
"group": "Network",
"unit": "bytes/sec",
}
meta_unit_pkts := map[string]string{
"source": m.name,
"group": "Network",
"unit": "packets",
}
meta_unit_pkts_per_sec := map[string]string{
"source": m.name,
"group": "Network",
"unit": "packets/sec",
}
m.matches[canonical] = []NetstatCollectorMetric{ m.matches[canonical] = []NetstatCollectorMetric{
{ {
@@ -174,8 +197,13 @@ func (m *NetstatCollector) Init(config json.RawMessage) error {
} }
} }
// Close netstat file
if err := file.Close(); err != nil {
return fmt.Errorf("%s Init(): failed to close netstat file \"%s\": %w", m.name, NETSTATFILE, err)
}
if len(m.matches) == 0 { if len(m.matches) == 0 {
return errors.New("no devices to collector metrics found") return fmt.Errorf("%s Init(): no devices to collect metrics found", m.name)
} }
m.init = true m.init = true
return nil return nil
@@ -194,10 +222,18 @@ func (m *NetstatCollector) Read(interval time.Duration, output chan lp.CCMessage
file, err := os.Open(NETSTATFILE) file, err := os.Open(NETSTATFILE)
if err != nil { if err != nil {
cclog.ComponentError(m.name, err.Error()) cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to open file '%s': %v", NETSTATFILE, err))
return return
} }
defer file.Close() defer func() {
if err := file.Close(); err != nil {
cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to close file '%s': %v", NETSTATFILE, err))
}
}()
scanner := bufio.NewScanner(file) scanner := bufio.NewScanner(file)
for scanner.Scan() { for scanner.Scan() {
@@ -226,14 +262,14 @@ func (m *NetstatCollector) Read(interval time.Duration, output chan lp.CCMessage
continue continue
} }
if m.config.SendAbsoluteValues { if m.config.SendAbsoluteValues {
if y, err := lp.NewMessage(metric.name, metric.tags, metric.meta, map[string]interface{}{"value": v}, now); err == nil { if y, err := lp.NewMessage(metric.name, metric.tags, metric.meta, map[string]any{"value": v}, now); err == nil {
output <- y output <- y
} }
} }
if m.config.SendDerivedValues { if m.config.SendDerivedValues {
if metric.lastValue >= 0 { if metric.lastValue >= 0 {
rate := float64(v-metric.lastValue) / timeDiff rate := float64(v-metric.lastValue) / timeDiff
if y, err := lp.NewMessage(metric.name+"_bw", metric.tags, metric.meta_rates, map[string]interface{}{"value": rate}, now); err == nil { if y, err := lp.NewMessage(metric.name+"_bw", metric.tags, metric.meta_rates, map[string]any{"value": rate}, now); err == nil {
output <- y output <- y
} }
} }

View File

@@ -8,9 +8,10 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"fmt" "fmt"
"log" "slices"
// "os" // "os"
"os/exec" "os/exec"
@@ -18,7 +19,8 @@ import (
"strings" "strings"
"time" "time"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
// First part contains the code for the general NfsCollector. // First part contains the code for the general NfsCollector.
@@ -33,6 +35,7 @@ type NfsCollectorData struct {
type nfsCollector struct { type nfsCollector struct {
metricCollector metricCollector
tags map[string]string tags map[string]string
version string version string
config struct { config struct {
@@ -42,68 +45,56 @@ type nfsCollector struct {
data map[string]NfsCollectorData data map[string]NfsCollectorData
} }
func (m *nfsCollector) initStats() error {
cmd := exec.Command(m.config.Nfsstats, `-l`, `--all`)
cmd.Wait()
buffer, err := cmd.Output()
if err == nil {
for _, line := range strings.Split(string(buffer), "\n") {
lf := strings.Fields(line)
if len(lf) != 5 {
continue
}
if lf[1] == m.version {
name := strings.Trim(lf[3], ":")
if _, exist := m.data[name]; !exist {
value, err := strconv.ParseInt(lf[4], 0, 64)
if err == nil {
x := m.data[name]
x.current = value
x.last = value
m.data[name] = x
}
}
}
}
}
return err
}
func (m *nfsCollector) updateStats() error { func (m *nfsCollector) updateStats() error {
cmd := exec.Command(m.config.Nfsstats, `-l`, `--all`) cmd := exec.Command(m.config.Nfsstats, "-l", "--all")
cmd.Wait()
buffer, err := cmd.Output() buffer, err := cmd.Output()
if err == nil { if err != nil {
for _, line := range strings.Split(string(buffer), "\n") { return err
lf := strings.Fields(line) }
if len(lf) != 5 {
continue for name, data := range m.data {
} m.data[name] = NfsCollectorData{
if lf[1] == m.version { last: data.current,
name := strings.Trim(lf[3], ":") current: -1,
if _, exist := m.data[name]; exist {
value, err := strconv.ParseInt(lf[4], 0, 64)
if err == nil {
x := m.data[name]
x.last = x.current
x.current = value
m.data[name] = x
}
}
}
} }
} }
return err
for line := range strings.Lines(string(buffer)) {
lf := strings.Fields(line)
if len(lf) != 5 {
continue
}
if lf[1] != m.version {
continue
}
name := strings.Trim(lf[3], ":")
value, err := strconv.ParseInt(lf[4], 0, 64)
if err != nil {
return err
}
collectorData, exist := m.data[name]
collectorData.current = value
if !exist {
collectorData.last = -1
}
m.data[name] = collectorData
}
return nil
} }
func (m *nfsCollector) MainInit(config json.RawMessage) error { func (m *nfsCollector) MainInit(config json.RawMessage) error {
m.config.Nfsstats = string(NFSSTAT_EXEC) m.config.Nfsstats = string(NFSSTAT_EXEC)
// Read JSON configuration // Read JSON configuration
if len(config) > 0 { if len(config) > 0 {
err := json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
log.Print(err.Error()) if err := d.Decode(&m.config); err != nil {
return err return fmt.Errorf("%s Init(): failed to decode JSON config: %w", m.name, err)
} }
} }
m.meta = map[string]string{ m.meta = map[string]string{
@@ -116,10 +107,12 @@ func (m *nfsCollector) MainInit(config json.RawMessage) error {
// Check if nfsstat is in executable search path // Check if nfsstat is in executable search path
_, err := exec.LookPath(m.config.Nfsstats) _, err := exec.LookPath(m.config.Nfsstats)
if err != nil { if err != nil {
return fmt.Errorf("NfsCollector.Init(): Failed to find nfsstat binary '%s': %v", m.config.Nfsstats, err) return fmt.Errorf("%s Init(): Failed to find nfsstat binary '%s': %w", m.name, m.config.Nfsstats, err)
} }
m.data = make(map[string]NfsCollectorData) m.data = make(map[string]NfsCollectorData)
m.initStats() if err := m.updateStats(); err != nil {
return fmt.Errorf("%s Init(): %w", m.name, err)
}
m.init = true m.init = true
m.parallel = true m.parallel = true
return nil return nil
@@ -131,8 +124,14 @@ func (m *nfsCollector) Read(interval time.Duration, output chan lp.CCMessage) {
} }
timestamp := time.Now() timestamp := time.Now()
m.updateStats() if err := m.updateStats(); err != nil {
prefix := "" cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): updateStats() failed: %v", err),
)
return
}
var prefix string
switch m.version { switch m.version {
case "v3": case "v3":
prefix = "nfs3" prefix = "nfs3"
@@ -143,11 +142,15 @@ func (m *nfsCollector) Read(interval time.Duration, output chan lp.CCMessage) {
} }
for name, data := range m.data { for name, data := range m.data {
if _, skip := stringArrayContains(m.config.ExcludeMetrics, name); skip { if slices.Contains(m.config.ExcludeMetrics, name) {
continue continue
} }
value := data.current - data.last
y, err := lp.NewMessage(fmt.Sprintf("%s_%s", prefix, name), m.tags, m.meta, map[string]interface{}{"value": value}, timestamp) valueMap := make(map[string]any)
if data.current >= 0 && data.last >= 0 {
valueMap["value"] = data.current - data.last
}
y, err := lp.NewMessage(fmt.Sprintf("%s_%s", prefix, name), m.tags, m.meta, valueMap, timestamp)
if err == nil { if err == nil {
y.AddMeta("version", m.version) y.AddMeta("version", m.version)
output <- y output <- y
@@ -170,13 +173,17 @@ type Nfs4Collector struct {
func (m *Nfs3Collector) Init(config json.RawMessage) error { func (m *Nfs3Collector) Init(config json.RawMessage) error {
m.name = "Nfs3Collector" m.name = "Nfs3Collector"
m.version = `v3` m.version = `v3`
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
return m.MainInit(config) return m.MainInit(config)
} }
func (m *Nfs4Collector) Init(config json.RawMessage) error { func (m *Nfs4Collector) Init(config json.RawMessage) error {
m.name = "Nfs4Collector" m.name = "Nfs4Collector"
m.version = `v4` m.version = `v4`
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
return m.MainInit(config) return m.MainInit(config)
} }

View File

@@ -8,22 +8,23 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"fmt" "fmt"
"os" "os"
"regexp" "regexp"
"slices"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
lp "github.com/ClusterCockpit/cc-lib/ccMessage"
) )
// These are the fields we read from the JSON configuration // These are the fields we read from the JSON configuration
type NfsIOStatCollectorConfig struct { type NfsIOStatCollectorConfig struct {
ExcludeMetrics []string `json:"exclude_metrics,omitempty"` ExcludeMetrics []string `json:"exclude_metrics,omitempty"`
ExcludeFilesystem []string `json:"exclude_filesystem,omitempty"` ExcludeFilesystems []string `json:"exclude_filesystem,omitempty"`
UseServerAddressAsSType bool `json:"use_server_as_stype,omitempty"` UseServerAddressAsSType bool `json:"use_server_as_stype,omitempty"`
SendAbsoluteValues bool `json:"send_abs_values"` SendAbsoluteValues bool `json:"send_abs_values"`
SendDerivedValues bool `json:"send_derived_values"` SendDerivedValues bool `json:"send_derived_values"`
@@ -33,6 +34,7 @@ type NfsIOStatCollectorConfig struct {
// defined by metricCollector (name, init, ...) // defined by metricCollector (name, init, ...)
type NfsIOStatCollector struct { type NfsIOStatCollector struct {
metricCollector metricCollector
config NfsIOStatCollectorConfig // the configuration structure config NfsIOStatCollectorConfig // the configuration structure
meta map[string]string // default meta information meta map[string]string // default meta information
tags map[string]string // default tags tags map[string]string // default tags
@@ -41,8 +43,10 @@ type NfsIOStatCollector struct {
lastTimestamp time.Time lastTimestamp time.Time
} }
var deviceRegex = regexp.MustCompile(`device (?P<server>[^ ]+) mounted on (?P<mntpoint>[^ ]+) with fstype nfs(?P<version>\d*) statvers=[\d\.]+`) var (
var bytesRegex = regexp.MustCompile(`\s+bytes:\s+(?P<nread>[^ ]+) (?P<nwrite>[^ ]+) (?P<dread>[^ ]+) (?P<dwrite>[^ ]+) (?P<nfsread>[^ ]+) (?P<nfswrite>[^ ]+) (?P<pageread>[^ ]+) (?P<pagewrite>[^ ]+)`) deviceRegex = regexp.MustCompile(`device (?P<server>[^ ]+) mounted on (?P<mntpoint>[^ ]+) with fstype nfs(?P<version>\d*) statvers=[\d\.]+`)
bytesRegex = regexp.MustCompile(`\s+bytes:\s+(?P<nread>[^ ]+) (?P<nwrite>[^ ]+) (?P<dread>[^ ]+) (?P<dwrite>[^ ]+) (?P<nfsread>[^ ]+) (?P<nfswrite>[^ ]+) (?P<pageread>[^ ]+) (?P<pagewrite>[^ ]+)`)
)
func resolve_regex_fields(s string, regex *regexp.Regexp) map[string]string { func resolve_regex_fields(s string, regex *regexp.Regexp) map[string]string {
fields := make(map[string]string) fields := make(map[string]string)
@@ -71,7 +75,7 @@ func (m *NfsIOStatCollector) readNfsiostats() map[string]map[string]int64 {
// Is this a device line with mount point, remote target and NFS version? // Is this a device line with mount point, remote target and NFS version?
dev := resolve_regex_fields(l, deviceRegex) dev := resolve_regex_fields(l, deviceRegex)
if len(dev) > 0 { if len(dev) > 0 {
if _, ok := stringArrayContains(m.config.ExcludeFilesystem, dev[m.key]); !ok { if !slices.Contains(m.config.ExcludeFilesystems, dev[m.key]) {
current = dev current = dev
if len(current["version"]) == 0 { if len(current["version"]) == 0 {
current["version"] = "3" current["version"] = "3"
@@ -85,7 +89,7 @@ func (m *NfsIOStatCollector) readNfsiostats() map[string]map[string]int64 {
if len(bytes) > 0 { if len(bytes) > 0 {
data[current[m.key]] = make(map[string]int64) data[current[m.key]] = make(map[string]int64)
for name, sval := range bytes { for name, sval := range bytes {
if _, ok := stringArrayContains(m.config.ExcludeMetrics, name); !ok { if !slices.Contains(m.config.ExcludeMetrics, name) {
val, err := strconv.ParseInt(sval, 10, 64) val, err := strconv.ParseInt(sval, 10, 64)
if err == nil { if err == nil {
data[current[m.key]][name] = val data[current[m.key]][name] = val
@@ -100,9 +104,10 @@ func (m *NfsIOStatCollector) readNfsiostats() map[string]map[string]int64 {
} }
func (m *NfsIOStatCollector) Init(config json.RawMessage) error { func (m *NfsIOStatCollector) Init(config json.RawMessage) error {
var err error = nil
m.name = "NfsIOStatCollector" m.name = "NfsIOStatCollector"
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true m.parallel = true
m.meta = map[string]string{"source": m.name, "group": "NFS", "unit": "bytes"} m.meta = map[string]string{"source": m.name, "group": "NFS", "unit": "bytes"}
m.tags = map[string]string{"type": "node"} m.tags = map[string]string{"type": "node"}
@@ -111,10 +116,10 @@ func (m *NfsIOStatCollector) Init(config json.RawMessage) error {
m.config.SendAbsoluteValues = true m.config.SendAbsoluteValues = true
m.config.SendDerivedValues = false m.config.SendDerivedValues = false
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
cclog.ComponentError(m.name, "Error reading config:", err.Error()) if err := d.Decode(&m.config); err != nil {
return err return fmt.Errorf("%s Init(): failed to decode JSON config: %w", m.name, err)
} }
} }
m.key = "mntpoint" m.key = "mntpoint"
@@ -124,7 +129,7 @@ func (m *NfsIOStatCollector) Init(config json.RawMessage) error {
m.data = m.readNfsiostats() m.data = m.readNfsiostats()
m.lastTimestamp = time.Now() m.lastTimestamp = time.Now()
m.init = true m.init = true
return err return nil
} }
func (m *NfsIOStatCollector) Read(interval time.Duration, output chan lp.CCMessage) { func (m *NfsIOStatCollector) Read(interval time.Duration, output chan lp.CCMessage) {
@@ -140,7 +145,14 @@ func (m *NfsIOStatCollector) Read(interval time.Duration, output chan lp.CCMessa
if old, ok := m.data[mntpoint]; ok { if old, ok := m.data[mntpoint]; ok {
for name, newVal := range values { for name, newVal := range values {
if m.config.SendAbsoluteValues { if m.config.SendAbsoluteValues {
msg, err := lp.NewMessage(fmt.Sprintf("nfsio_%s", name), m.tags, m.meta, map[string]interface{}{"value": newVal}, now) msg, err := lp.NewMessage(
"nfsio_"+name,
m.tags,
m.meta,
map[string]any{
"value": newVal,
},
now)
if err == nil { if err == nil {
msg.AddTag("stype", "filesystem") msg.AddTag("stype", "filesystem")
msg.AddTag("stype-id", mntpoint) msg.AddTag("stype-id", mntpoint)
@@ -149,7 +161,7 @@ func (m *NfsIOStatCollector) Read(interval time.Duration, output chan lp.CCMessa
} }
if m.config.SendDerivedValues { if m.config.SendDerivedValues {
rate := float64(newVal-old[name]) / timeDiff rate := float64(newVal-old[name]) / timeDiff
msg, err := lp.NewMessage(fmt.Sprintf("nfsio_%s_bw", name), m.tags, m.meta, map[string]interface{}{"value": rate}, now) msg, err := lp.NewMessage(fmt.Sprintf("nfsio_%s_bw", name), m.tags, m.meta, map[string]any{"value": rate}, now)
if err == nil { if err == nil {
if strings.HasPrefix(name, "page") { if strings.HasPrefix(name, "page") {
msg.AddMeta("unit", "4K_pages/s") msg.AddMeta("unit", "4K_pages/s")

View File

@@ -16,7 +16,7 @@ hugo_path: docs/reference/cc-metric-collector/collectors/nfsio.md
"exclude_metrics": [ "exclude_metrics": [
"oread", "pageread" "oread", "pageread"
], ],
"exclude_filesystems": [ "exclude_filesystem": [
"/mnt" "/mnt"
], ],
"use_server_as_stype": false, "use_server_as_stype": false,

View File

@@ -2,6 +2,7 @@ package collectors
import ( import (
"bufio" "bufio"
"bytes"
"encoding/json" "encoding/json"
"fmt" "fmt"
"os" "os"
@@ -10,8 +11,8 @@ import (
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
type NUMAStatsCollectorConfig struct { type NUMAStatsCollectorConfig struct {
@@ -59,6 +60,7 @@ type NUMAStatsCollectorTopolgy struct {
type NUMAStatsCollector struct { type NUMAStatsCollector struct {
metricCollector metricCollector
topology []NUMAStatsCollectorTopolgy topology []NUMAStatsCollectorTopolgy
config NUMAStatsCollectorConfig config NUMAStatsCollectorConfig
lastTimestamp time.Time lastTimestamp time.Time
@@ -72,7 +74,9 @@ func (m *NUMAStatsCollector) Init(config json.RawMessage) error {
m.name = "NUMAStatsCollector" m.name = "NUMAStatsCollector"
m.parallel = true m.parallel = true
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.meta = map[string]string{ m.meta = map[string]string{
"source": m.name, "source": m.name,
"group": "NUMA", "group": "NUMA",
@@ -80,9 +84,10 @@ func (m *NUMAStatsCollector) Init(config json.RawMessage) error {
m.config.SendAbsoluteValues = true m.config.SendAbsoluteValues = true
if len(config) > 0 { if len(config) > 0 {
err := json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return fmt.Errorf("unable to unmarshal numastat configuration: %s", err.Error()) if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
@@ -91,10 +96,10 @@ func (m *NUMAStatsCollector) Init(config json.RawMessage) error {
globPattern := base + "[0-9]*" globPattern := base + "[0-9]*"
dirs, err := filepath.Glob(globPattern) dirs, err := filepath.Glob(globPattern)
if err != nil { if err != nil {
return fmt.Errorf("unable to glob files with pattern '%s'", globPattern) return fmt.Errorf("%s Init(): unable to glob files with pattern '%s'", m.name, globPattern)
} }
if dirs == nil { if dirs == nil {
return fmt.Errorf("unable to find any files with pattern '%s'", globPattern) return fmt.Errorf("%s Init(): unable to find any files with pattern '%s'", m.name, globPattern)
} }
m.topology = make([]NUMAStatsCollectorTopolgy, 0, len(dirs)) m.topology = make([]NUMAStatsCollectorTopolgy, 0, len(dirs))
for _, dir := range dirs { for _, dir := range dirs {
@@ -186,7 +191,11 @@ func (m *NUMAStatsCollector) Read(interval time.Duration, output chan lp.CCMessa
t.previousValues[key] = value t.previousValues[key] = value
} }
} }
file.Close() if err := file.Close(); err != nil {
cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to close file '%s': %v", t.file, err))
}
} }
} }

View File

@@ -15,7 +15,7 @@ hugo_path: docs/reference/cc-metric-collector/collectors/numastat.md
"numastats": { "numastats": {
"send_abs_values" : true, "send_abs_values" : true,
"send_derived_values" : true "send_derived_values" : true
} }
``` ```
The `numastat` collector reads data from `/sys/devices/system/node/node*/numastat` and outputs a handful **memoryDomain** metrics. See: <https://www.kernel.org/doc/html/latest/admin-guide/numastat.html> The `numastat` collector reads data from `/sys/devices/system/node/node*/numastat` and outputs a handful **memoryDomain** metrics. See: <https://www.kernel.org/doc/html/latest/admin-guide/numastat.html>

View File

@@ -12,11 +12,14 @@ import (
"errors" "errors"
"fmt" "fmt"
"log" "log"
"maps"
"slices"
"strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
"github.com/NVIDIA/go-nvml/pkg/nvml" "github.com/NVIDIA/go-nvml/pkg/nvml"
) )
@@ -44,6 +47,7 @@ type NvidiaCollectorDevice struct {
type NvidiaCollector struct { type NvidiaCollector struct {
metricCollector metricCollector
config NvidiaCollectorConfig config NvidiaCollectorConfig
gpus []NvidiaCollectorDevice gpus []NvidiaCollectorDevice
num_gpus int num_gpus int
@@ -64,11 +68,14 @@ func (m *NvidiaCollector) Init(config json.RawMessage) error {
m.config.ProcessMigDevices = false m.config.ProcessMigDevices = false
m.config.UseUuidForMigDevices = false m.config.UseUuidForMigDevices = false
m.config.UseSliceForMigDevices = false m.config.UseSliceForMigDevices = false
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(strings.NewReader(string(config)))
if err != nil { d.DisallowUnknownFields()
return err if err = d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
m.meta = map[string]string{ m.meta = map[string]string{
@@ -84,32 +91,28 @@ func (m *NvidiaCollector) Init(config json.RawMessage) error {
// Error: NVML library not found // Error: NVML library not found
// (nvml.ErrorString can not be used in this case) // (nvml.ErrorString can not be used in this case)
if ret == nvml.ERROR_LIBRARY_NOT_FOUND { if ret == nvml.ERROR_LIBRARY_NOT_FOUND {
err = fmt.Errorf("NVML library not found") return fmt.Errorf("%s Init(): NVML library not found", m.name)
cclog.ComponentError(m.name, err.Error())
return err
} }
if ret != nvml.SUCCESS { if ret != nvml.SUCCESS {
err = errors.New(nvml.ErrorString(ret)) err = errors.New(nvml.ErrorString(ret))
cclog.ComponentError(m.name, "Unable to initialize NVML", err.Error()) return fmt.Errorf("%s Init(): Unable to initialize NVML: %w", m.name, err)
return err
} }
// Number of NVIDIA GPUs // Number of NVIDIA GPUs
num_gpus, ret := nvml.DeviceGetCount() num_gpus, ret := nvml.DeviceGetCount()
if ret != nvml.SUCCESS { if ret != nvml.SUCCESS {
err = errors.New(nvml.ErrorString(ret)) err = errors.New(nvml.ErrorString(ret))
cclog.ComponentError(m.name, "Unable to get device count", err.Error()) return fmt.Errorf("%s Init(): Unable to get device count: %w", m.name, err)
return err
} }
// For all GPUs // For all GPUs
idx := 0 idx := 0
m.gpus = make([]NvidiaCollectorDevice, num_gpus) m.gpus = make([]NvidiaCollectorDevice, num_gpus)
for i := 0; i < num_gpus; i++ { for i := range num_gpus {
// Skip excluded devices by ID // Skip excluded devices by ID
str_i := fmt.Sprintf("%d", i) str_i := strconv.Itoa(i)
if _, skip := stringArrayContains(m.config.ExcludeDevices, str_i); skip { if slices.Contains(m.config.ExcludeDevices, str_i) {
cclog.ComponentDebug(m.name, "Skipping excluded device", str_i) cclog.ComponentDebug(m.name, "Skipping excluded device", str_i)
continue continue
} }
@@ -137,7 +140,7 @@ func (m *NvidiaCollector) Init(config json.RawMessage) error {
pciInfo.Device) pciInfo.Device)
// Skip excluded devices specified by PCI ID // Skip excluded devices specified by PCI ID
if _, skip := stringArrayContains(m.config.ExcludeDevices, pci_id); skip { if slices.Contains(m.config.ExcludeDevices, pci_id) {
cclog.ComponentDebug(m.name, "Skipping excluded device", pci_id) cclog.ComponentDebug(m.name, "Skipping excluded device", pci_id)
continue continue
} }
@@ -222,18 +225,20 @@ func readMemoryInfo(device *NvidiaCollectorDevice, output chan lp.CCMessage) err
var total uint64 var total uint64
var used uint64 var used uint64
var reserved uint64 = 0 var reserved uint64 = 0
var v2 bool = false v2 := false
meminfo, ret := nvml.DeviceGetMemoryInfo(device.device) meminfo, ret := nvml.DeviceGetMemoryInfo(device.device)
if ret != nvml.SUCCESS { if ret != nvml.SUCCESS {
err := errors.New(nvml.ErrorString(ret)) err := errors.New(nvml.ErrorString(ret))
return err return err
} }
// Total physical device memory (in bytes)
total = meminfo.Total total = meminfo.Total
// Sum of Reserved and Allocated device memory (in bytes)
used = meminfo.Used used = meminfo.Used
if !device.excludeMetrics["nv_fb_mem_total"] { if !device.excludeMetrics["nv_fb_mem_total"] {
t := float64(total) / (1024 * 1024) t := float64(total) / (1024 * 1024)
y, err := lp.NewMessage("nv_fb_mem_total", device.tags, device.meta, map[string]interface{}{"value": t}, time.Now()) y, err := lp.NewMetric("nv_fb_mem_total", device.tags, device.meta, t, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "MByte") y.AddMeta("unit", "MByte")
output <- y output <- y
@@ -242,7 +247,7 @@ func readMemoryInfo(device *NvidiaCollectorDevice, output chan lp.CCMessage) err
if !device.excludeMetrics["nv_fb_mem_used"] { if !device.excludeMetrics["nv_fb_mem_used"] {
f := float64(used) / (1024 * 1024) f := float64(used) / (1024 * 1024)
y, err := lp.NewMessage("nv_fb_mem_used", device.tags, device.meta, map[string]interface{}{"value": f}, time.Now()) y, err := lp.NewMetric("nv_fb_mem_used", device.tags, device.meta, f, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "MByte") y.AddMeta("unit", "MByte")
output <- y output <- y
@@ -251,7 +256,7 @@ func readMemoryInfo(device *NvidiaCollectorDevice, output chan lp.CCMessage) err
if v2 && !device.excludeMetrics["nv_fb_mem_reserved"] { if v2 && !device.excludeMetrics["nv_fb_mem_reserved"] {
r := float64(reserved) / (1024 * 1024) r := float64(reserved) / (1024 * 1024)
y, err := lp.NewMessage("nv_fb_mem_reserved", device.tags, device.meta, map[string]interface{}{"value": r}, time.Now()) y, err := lp.NewMetric("nv_fb_mem_reserved", device.tags, device.meta, r, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "MByte") y.AddMeta("unit", "MByte")
output <- y output <- y
@@ -270,7 +275,7 @@ func readBarMemoryInfo(device *NvidiaCollectorDevice, output chan lp.CCMessage)
} }
if !device.excludeMetrics["nv_bar1_mem_total"] { if !device.excludeMetrics["nv_bar1_mem_total"] {
t := float64(meminfo.Bar1Total) / (1024 * 1024) t := float64(meminfo.Bar1Total) / (1024 * 1024)
y, err := lp.NewMessage("nv_bar1_mem_total", device.tags, device.meta, map[string]interface{}{"value": t}, time.Now()) y, err := lp.NewMetric("nv_bar1_mem_total", device.tags, device.meta, t, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "MByte") y.AddMeta("unit", "MByte")
output <- y output <- y
@@ -278,7 +283,7 @@ func readBarMemoryInfo(device *NvidiaCollectorDevice, output chan lp.CCMessage)
} }
if !device.excludeMetrics["nv_bar1_mem_used"] { if !device.excludeMetrics["nv_bar1_mem_used"] {
t := float64(meminfo.Bar1Used) / (1024 * 1024) t := float64(meminfo.Bar1Used) / (1024 * 1024)
y, err := lp.NewMessage("nv_bar1_mem_used", device.tags, device.meta, map[string]interface{}{"value": t}, time.Now()) y, err := lp.NewMetric("nv_bar1_mem_used", device.tags, device.meta, t, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "MByte") y.AddMeta("unit", "MByte")
output <- y output <- y
@@ -312,14 +317,14 @@ func readUtilization(device *NvidiaCollectorDevice, output chan lp.CCMessage) er
util, ret := nvml.DeviceGetUtilizationRates(device.device) util, ret := nvml.DeviceGetUtilizationRates(device.device)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
if !device.excludeMetrics["nv_util"] { if !device.excludeMetrics["nv_util"] {
y, err := lp.NewMessage("nv_util", device.tags, device.meta, map[string]interface{}{"value": float64(util.Gpu)}, time.Now()) y, err := lp.NewMetric("nv_util", device.tags, device.meta, float64(util.Gpu), time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "%") y.AddMeta("unit", "%")
output <- y output <- y
} }
} }
if !device.excludeMetrics["nv_mem_util"] { if !device.excludeMetrics["nv_mem_util"] {
y, err := lp.NewMessage("nv_mem_util", device.tags, device.meta, map[string]interface{}{"value": float64(util.Memory)}, time.Now()) y, err := lp.NewMetric("nv_mem_util", device.tags, device.meta, float64(util.Memory), time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "%") y.AddMeta("unit", "%")
output <- y output <- y
@@ -339,7 +344,7 @@ func readTemp(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
// * NVML_TEMPERATURE_COUNT // * NVML_TEMPERATURE_COUNT
temp, ret := nvml.DeviceGetTemperature(device.device, nvml.TEMPERATURE_GPU) temp, ret := nvml.DeviceGetTemperature(device.device, nvml.TEMPERATURE_GPU)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_temp", device.tags, device.meta, map[string]interface{}{"value": float64(temp)}, time.Now()) y, err := lp.NewMetric("nv_temp", device.tags, device.meta, float64(temp), time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "degC") y.AddMeta("unit", "degC")
output <- y output <- y
@@ -362,7 +367,7 @@ func readFan(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
// This value may exceed 100% in certain cases. // This value may exceed 100% in certain cases.
fan, ret := nvml.DeviceGetFanSpeed(device.device) fan, ret := nvml.DeviceGetFanSpeed(device.device)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_fan", device.tags, device.meta, map[string]interface{}{"value": float64(fan)}, time.Now()) y, err := lp.NewMetric("nv_fan", device.tags, device.meta, float64(fan), time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "%") y.AddMeta("unit", "%")
output <- y output <- y
@@ -372,27 +377,6 @@ func readFan(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
return nil return nil
} }
// func readFans(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
// if !device.excludeMetrics["nv_fan"] {
// numFans, ret := nvml.DeviceGetNumFans(device.device)
// if ret == nvml.SUCCESS {
// for i := 0; i < numFans; i++ {
// fan, ret := nvml.DeviceGetFanSpeed_v2(device.device, i)
// if ret == nvml.SUCCESS {
// y, err := lp.NewMessage("nv_fan", device.tags, device.meta, map[string]interface{}{"value": float64(fan)}, time.Now())
// if err == nil {
// y.AddMeta("unit", "%")
// y.AddTag("stype", "fan")
// y.AddTag("stype-id", fmt.Sprintf("%d", i))
// output <- y
// }
// }
// }
// }
// }
// return nil
// }
func readEccMode(device *NvidiaCollectorDevice, output chan lp.CCMessage) error { func readEccMode(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
if !device.excludeMetrics["nv_ecc_mode"] { if !device.excludeMetrics["nv_ecc_mode"] {
// Retrieves the current and pending ECC modes for the device. // Retrieves the current and pending ECC modes for the device.
@@ -403,22 +387,23 @@ func readEccMode(device *NvidiaCollectorDevice, output chan lp.CCMessage) error
// Changing ECC modes requires a reboot. // Changing ECC modes requires a reboot.
// The "pending" ECC mode refers to the target mode following the next reboot. // The "pending" ECC mode refers to the target mode following the next reboot.
_, ecc_pend, ret := nvml.DeviceGetEccMode(device.device) _, ecc_pend, ret := nvml.DeviceGetEccMode(device.device)
if ret == nvml.SUCCESS { switch ret {
case nvml.SUCCESS:
var y lp.CCMessage var y lp.CCMessage
var err error var err error
switch ecc_pend { switch ecc_pend {
case nvml.FEATURE_DISABLED: case nvml.FEATURE_DISABLED:
y, err = lp.NewMessage("nv_ecc_mode", device.tags, device.meta, map[string]interface{}{"value": "OFF"}, time.Now()) y, err = lp.NewMetric("nv_ecc_mode", device.tags, device.meta, "OFF", time.Now())
case nvml.FEATURE_ENABLED: case nvml.FEATURE_ENABLED:
y, err = lp.NewMessage("nv_ecc_mode", device.tags, device.meta, map[string]interface{}{"value": "ON"}, time.Now()) y, err = lp.NewMetric("nv_ecc_mode", device.tags, device.meta, "ON", time.Now())
default: default:
y, err = lp.NewMessage("nv_ecc_mode", device.tags, device.meta, map[string]interface{}{"value": "UNKNOWN"}, time.Now()) y, err = lp.NewMetric("nv_ecc_mode", device.tags, device.meta, "UNKNOWN", time.Now())
} }
if err == nil { if err == nil {
output <- y output <- y
} }
} else if ret == nvml.ERROR_NOT_SUPPORTED { case nvml.ERROR_NOT_SUPPORTED:
y, err := lp.NewMessage("nv_ecc_mode", device.tags, device.meta, map[string]interface{}{"value": "N/A"}, time.Now()) y, err := lp.NewMetric("nv_ecc_mode", device.tags, device.meta, "N/A", time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }
@@ -438,7 +423,7 @@ func readPerfState(device *NvidiaCollectorDevice, output chan lp.CCMessage) erro
// 32: Unknown performance state. // 32: Unknown performance state.
pState, ret := nvml.DeviceGetPerformanceState(device.device) pState, ret := nvml.DeviceGetPerformanceState(device.device)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_perf_state", device.tags, device.meta, map[string]interface{}{"value": fmt.Sprintf("P%d", int(pState))}, time.Now()) y, err := lp.NewMetric("nv_perf_state", device.tags, device.meta, fmt.Sprintf("P%d", int(pState)), time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }
@@ -464,7 +449,7 @@ func readPowerUsage(device *NvidiaCollectorDevice, output chan lp.CCMessage) err
if mode == nvml.FEATURE_ENABLED { if mode == nvml.FEATURE_ENABLED {
power, ret := nvml.DeviceGetPowerUsage(device.device) power, ret := nvml.DeviceGetPowerUsage(device.device)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_power_usage", device.tags, device.meta, map[string]interface{}{"value": float64(power) / 1000}, time.Now()) y, err := lp.NewMetric("nv_power_usage", device.tags, device.meta, float64(power)/1000, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "watts") y.AddMeta("unit", "watts")
output <- y output <- y
@@ -490,7 +475,12 @@ func readEnergyConsumption(device *NvidiaCollectorDevice, output chan lp.CCMessa
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
if device.lastEnergyReading != 0 { if device.lastEnergyReading != 0 {
if !device.excludeMetrics["nv_energy"] { if !device.excludeMetrics["nv_energy"] {
y, err := lp.NewMetric("nv_energy", device.tags, device.meta, (energy-device.lastEnergyReading)/1000, now) y, err := lp.NewMetric(
"nv_energy",
device.tags,
device.meta,
(energy-device.lastEnergyReading)/1000,
now)
if err == nil { if err == nil {
y.AddMeta("unit", "Joules") y.AddMeta("unit", "Joules")
output <- y output <- y
@@ -532,7 +522,7 @@ func readClocks(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
if !device.excludeMetrics["nv_graphics_clock"] { if !device.excludeMetrics["nv_graphics_clock"] {
graphicsClock, ret := nvml.DeviceGetClockInfo(device.device, nvml.CLOCK_GRAPHICS) graphicsClock, ret := nvml.DeviceGetClockInfo(device.device, nvml.CLOCK_GRAPHICS)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_graphics_clock", device.tags, device.meta, map[string]interface{}{"value": float64(graphicsClock)}, time.Now()) y, err := lp.NewMetric("nv_graphics_clock", device.tags, device.meta, float64(graphicsClock), time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "MHz") y.AddMeta("unit", "MHz")
output <- y output <- y
@@ -543,7 +533,7 @@ func readClocks(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
if !device.excludeMetrics["nv_sm_clock"] { if !device.excludeMetrics["nv_sm_clock"] {
smCock, ret := nvml.DeviceGetClockInfo(device.device, nvml.CLOCK_SM) smCock, ret := nvml.DeviceGetClockInfo(device.device, nvml.CLOCK_SM)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_sm_clock", device.tags, device.meta, map[string]interface{}{"value": float64(smCock)}, time.Now()) y, err := lp.NewMetric("nv_sm_clock", device.tags, device.meta, float64(smCock), time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "MHz") y.AddMeta("unit", "MHz")
output <- y output <- y
@@ -554,7 +544,7 @@ func readClocks(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
if !device.excludeMetrics["nv_mem_clock"] { if !device.excludeMetrics["nv_mem_clock"] {
memClock, ret := nvml.DeviceGetClockInfo(device.device, nvml.CLOCK_MEM) memClock, ret := nvml.DeviceGetClockInfo(device.device, nvml.CLOCK_MEM)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_mem_clock", device.tags, device.meta, map[string]interface{}{"value": float64(memClock)}, time.Now()) y, err := lp.NewMetric("nv_mem_clock", device.tags, device.meta, float64(memClock), time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "MHz") y.AddMeta("unit", "MHz")
output <- y output <- y
@@ -564,7 +554,7 @@ func readClocks(device *NvidiaCollectorDevice, output chan lp.CCMessage) error {
if !device.excludeMetrics["nv_video_clock"] { if !device.excludeMetrics["nv_video_clock"] {
memClock, ret := nvml.DeviceGetClockInfo(device.device, nvml.CLOCK_VIDEO) memClock, ret := nvml.DeviceGetClockInfo(device.device, nvml.CLOCK_VIDEO)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_video_clock", device.tags, device.meta, map[string]interface{}{"value": float64(memClock)}, time.Now()) y, err := lp.NewMetric("nv_video_clock", device.tags, device.meta, float64(memClock), time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "MHz") y.AddMeta("unit", "MHz")
output <- y output <- y
@@ -645,7 +635,7 @@ func readEccErrors(device *NvidiaCollectorDevice, output chan lp.CCMessage) erro
// i.e. the total set of errors across the entire device. // i.e. the total set of errors across the entire device.
ecc_db, ret := nvml.DeviceGetTotalEccErrors(device.device, nvml.MEMORY_ERROR_TYPE_UNCORRECTED, nvml.AGGREGATE_ECC) ecc_db, ret := nvml.DeviceGetTotalEccErrors(device.device, nvml.MEMORY_ERROR_TYPE_UNCORRECTED, nvml.AGGREGATE_ECC)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_ecc_uncorrected_error", device.tags, device.meta, map[string]interface{}{"value": float64(ecc_db)}, time.Now()) y, err := lp.NewMetric("nv_ecc_uncorrected_error", device.tags, device.meta, float64(ecc_db), time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }
@@ -654,7 +644,7 @@ func readEccErrors(device *NvidiaCollectorDevice, output chan lp.CCMessage) erro
if !device.excludeMetrics["nv_ecc_corrected_error"] { if !device.excludeMetrics["nv_ecc_corrected_error"] {
ecc_sb, ret := nvml.DeviceGetTotalEccErrors(device.device, nvml.MEMORY_ERROR_TYPE_CORRECTED, nvml.AGGREGATE_ECC) ecc_sb, ret := nvml.DeviceGetTotalEccErrors(device.device, nvml.MEMORY_ERROR_TYPE_CORRECTED, nvml.AGGREGATE_ECC)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_ecc_corrected_error", device.tags, device.meta, map[string]interface{}{"value": float64(ecc_sb)}, time.Now()) y, err := lp.NewMetric("nv_ecc_corrected_error", device.tags, device.meta, float64(ecc_sb), time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }
@@ -673,7 +663,7 @@ func readPowerLimit(device *NvidiaCollectorDevice, output chan lp.CCMessage) err
// If the card's total power draw reaches this limit the power management algorithm kicks in. // If the card's total power draw reaches this limit the power management algorithm kicks in.
pwr_limit, ret := nvml.DeviceGetPowerManagementLimit(device.device) pwr_limit, ret := nvml.DeviceGetPowerManagementLimit(device.device)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_power_max_limit", device.tags, device.meta, map[string]interface{}{"value": float64(pwr_limit) / 1000}, time.Now()) y, err := lp.NewMetric("nv_power_max_limit", device.tags, device.meta, float64(pwr_limit)/1000, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "watts") y.AddMeta("unit", "watts")
output <- y output <- y
@@ -700,7 +690,7 @@ func readEncUtilization(device *NvidiaCollectorDevice, output chan lp.CCMessage)
// Note: On MIG-enabled GPUs, querying encoder utilization is not currently supported. // Note: On MIG-enabled GPUs, querying encoder utilization is not currently supported.
enc_util, _, ret := nvml.DeviceGetEncoderUtilization(device.device) enc_util, _, ret := nvml.DeviceGetEncoderUtilization(device.device)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_encoder_util", device.tags, device.meta, map[string]interface{}{"value": float64(enc_util)}, time.Now()) y, err := lp.NewMetric("nv_encoder_util", device.tags, device.meta, float64(enc_util), time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "%") y.AddMeta("unit", "%")
output <- y output <- y
@@ -727,7 +717,7 @@ func readDecUtilization(device *NvidiaCollectorDevice, output chan lp.CCMessage)
// Note: On MIG-enabled GPUs, querying encoder utilization is not currently supported. // Note: On MIG-enabled GPUs, querying encoder utilization is not currently supported.
dec_util, _, ret := nvml.DeviceGetDecoderUtilization(device.device) dec_util, _, ret := nvml.DeviceGetDecoderUtilization(device.device)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_decoder_util", device.tags, device.meta, map[string]interface{}{"value": float64(dec_util)}, time.Now()) y, err := lp.NewMetric("nv_decoder_util", device.tags, device.meta, float64(dec_util), time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "%") y.AddMeta("unit", "%")
output <- y output <- y
@@ -754,33 +744,33 @@ func readRemappedRows(device *NvidiaCollectorDevice, output chan lp.CCMessage) e
corrected, uncorrected, pending, failure, ret := nvml.DeviceGetRemappedRows(device.device) corrected, uncorrected, pending, failure, ret := nvml.DeviceGetRemappedRows(device.device)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
if !device.excludeMetrics["nv_remapped_rows_corrected"] { if !device.excludeMetrics["nv_remapped_rows_corrected"] {
y, err := lp.NewMessage("nv_remapped_rows_corrected", device.tags, device.meta, map[string]interface{}{"value": float64(corrected)}, time.Now()) y, err := lp.NewMetric("nv_remapped_rows_corrected", device.tags, device.meta, float64(corrected), time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !device.excludeMetrics["nv_remapped_rows_uncorrected"] { if !device.excludeMetrics["nv_remapped_rows_uncorrected"] {
y, err := lp.NewMessage("nv_remapped_rows_corrected", device.tags, device.meta, map[string]interface{}{"value": float64(uncorrected)}, time.Now()) y, err := lp.NewMetric("nv_remapped_rows_corrected", device.tags, device.meta, float64(uncorrected), time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !device.excludeMetrics["nv_remapped_rows_pending"] { if !device.excludeMetrics["nv_remapped_rows_pending"] {
var p int = 0 p := 0
if pending { if pending {
p = 1 p = 1
} }
y, err := lp.NewMessage("nv_remapped_rows_pending", device.tags, device.meta, map[string]interface{}{"value": p}, time.Now()) y, err := lp.NewMetric("nv_remapped_rows_pending", device.tags, device.meta, p, time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !device.excludeMetrics["nv_remapped_rows_failure"] { if !device.excludeMetrics["nv_remapped_rows_failure"] {
var f int = 0 f := 0
if failure { if failure {
f = 1 f = 1
} }
y, err := lp.NewMessage("nv_remapped_rows_failure", device.tags, device.meta, map[string]interface{}{"value": f}, time.Now()) y, err := lp.NewMetric("nv_remapped_rows_failure", device.tags, device.meta, f, time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }
@@ -814,7 +804,7 @@ func readProcessCounts(device *NvidiaCollectorDevice, output chan lp.CCMessage)
// Querying per-instance information using MIG device handles is not supported if the device is in vGPU Host virtualization mode. // Querying per-instance information using MIG device handles is not supported if the device is in vGPU Host virtualization mode.
procList, ret := nvml.DeviceGetComputeRunningProcesses(device.device) procList, ret := nvml.DeviceGetComputeRunningProcesses(device.device)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_compute_processes", device.tags, device.meta, map[string]interface{}{"value": len(procList)}, time.Now()) y, err := lp.NewMetric("nv_compute_processes", device.tags, device.meta, len(procList), time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }
@@ -843,7 +833,7 @@ func readProcessCounts(device *NvidiaCollectorDevice, output chan lp.CCMessage)
// Querying per-instance information using MIG device handles is not supported if the device is in vGPU Host virtualization mode. // Querying per-instance information using MIG device handles is not supported if the device is in vGPU Host virtualization mode.
procList, ret := nvml.DeviceGetGraphicsRunningProcesses(device.device) procList, ret := nvml.DeviceGetGraphicsRunningProcesses(device.device)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_graphics_processes", device.tags, device.meta, map[string]interface{}{"value": len(procList)}, time.Now()) y, err := lp.NewMetric("nv_graphics_processes", device.tags, device.meta, len(procList), time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }
@@ -873,7 +863,7 @@ func readProcessCounts(device *NvidiaCollectorDevice, output chan lp.CCMessage)
// // Querying per-instance information using MIG device handles is not supported if the device is in vGPU Host virtualization mode. // // Querying per-instance information using MIG device handles is not supported if the device is in vGPU Host virtualization mode.
// procList, ret := nvml.DeviceGetMPSComputeRunningProcesses(device.device) // procList, ret := nvml.DeviceGetMPSComputeRunningProcesses(device.device)
// if ret == nvml.SUCCESS { // if ret == nvml.SUCCESS {
// y, err := lp.NewMessage("nv_mps_compute_processes", device.tags, device.meta, map[string]interface{}{"value": len(procList)}, time.Now()) // y, err := lp.NewMetric("nv_mps_compute_processes", device.tags, device.meta, len(procList), time.Now())
// if err == nil { // if err == nil {
// output <- y // output <- y
// } // }
@@ -901,7 +891,7 @@ func readViolationStats(device *NvidiaCollectorDevice, output chan lp.CCMessage)
violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_POWER) violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_POWER)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
t := float64(violTime.ViolationTime) * 1e-9 t := float64(violTime.ViolationTime) * 1e-9
y, err := lp.NewMessage("nv_violation_power", device.tags, device.meta, map[string]interface{}{"value": t}, time.Now()) y, err := lp.NewMetric("nv_violation_power", device.tags, device.meta, t, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "sec") y.AddMeta("unit", "sec")
output <- y output <- y
@@ -913,7 +903,7 @@ func readViolationStats(device *NvidiaCollectorDevice, output chan lp.CCMessage)
violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_THERMAL) violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_THERMAL)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
t := float64(violTime.ViolationTime) * 1e-9 t := float64(violTime.ViolationTime) * 1e-9
y, err := lp.NewMessage("nv_violation_thermal", device.tags, device.meta, map[string]interface{}{"value": t}, time.Now()) y, err := lp.NewMetric("nv_violation_thermal", device.tags, device.meta, t, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "sec") y.AddMeta("unit", "sec")
output <- y output <- y
@@ -925,7 +915,7 @@ func readViolationStats(device *NvidiaCollectorDevice, output chan lp.CCMessage)
violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_SYNC_BOOST) violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_SYNC_BOOST)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
t := float64(violTime.ViolationTime) * 1e-9 t := float64(violTime.ViolationTime) * 1e-9
y, err := lp.NewMessage("nv_violation_sync_boost", device.tags, device.meta, map[string]interface{}{"value": t}, time.Now()) y, err := lp.NewMetric("nv_violation_sync_boost", device.tags, device.meta, t, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "sec") y.AddMeta("unit", "sec")
output <- y output <- y
@@ -937,7 +927,7 @@ func readViolationStats(device *NvidiaCollectorDevice, output chan lp.CCMessage)
violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_BOARD_LIMIT) violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_BOARD_LIMIT)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
t := float64(violTime.ViolationTime) * 1e-9 t := float64(violTime.ViolationTime) * 1e-9
y, err := lp.NewMessage("nv_violation_board_limit", device.tags, device.meta, map[string]interface{}{"value": t}, time.Now()) y, err := lp.NewMetric("nv_violation_board_limit", device.tags, device.meta, t, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "sec") y.AddMeta("unit", "sec")
output <- y output <- y
@@ -949,7 +939,7 @@ func readViolationStats(device *NvidiaCollectorDevice, output chan lp.CCMessage)
violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_LOW_UTILIZATION) violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_LOW_UTILIZATION)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
t := float64(violTime.ViolationTime) * 1e-9 t := float64(violTime.ViolationTime) * 1e-9
y, err := lp.NewMessage("nv_violation_low_util", device.tags, device.meta, map[string]interface{}{"value": t}, time.Now()) y, err := lp.NewMetric("nv_violation_low_util", device.tags, device.meta, t, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "sec") y.AddMeta("unit", "sec")
output <- y output <- y
@@ -961,7 +951,7 @@ func readViolationStats(device *NvidiaCollectorDevice, output chan lp.CCMessage)
violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_RELIABILITY) violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_RELIABILITY)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
t := float64(violTime.ViolationTime) * 1e-9 t := float64(violTime.ViolationTime) * 1e-9
y, err := lp.NewMessage("nv_violation_reliability", device.tags, device.meta, map[string]interface{}{"value": t}, time.Now()) y, err := lp.NewMetric("nv_violation_reliability", device.tags, device.meta, t, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "sec") y.AddMeta("unit", "sec")
output <- y output <- y
@@ -973,7 +963,7 @@ func readViolationStats(device *NvidiaCollectorDevice, output chan lp.CCMessage)
violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_TOTAL_APP_CLOCKS) violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_TOTAL_APP_CLOCKS)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
t := float64(violTime.ViolationTime) * 1e-9 t := float64(violTime.ViolationTime) * 1e-9
y, err := lp.NewMessage("nv_violation_below_app_clock", device.tags, device.meta, map[string]interface{}{"value": t}, time.Now()) y, err := lp.NewMetric("nv_violation_below_app_clock", device.tags, device.meta, t, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "sec") y.AddMeta("unit", "sec")
output <- y output <- y
@@ -985,7 +975,7 @@ func readViolationStats(device *NvidiaCollectorDevice, output chan lp.CCMessage)
violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_TOTAL_BASE_CLOCKS) violTime, ret = nvml.DeviceGetViolationStatus(device.device, nvml.PERF_POLICY_TOTAL_BASE_CLOCKS)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
t := float64(violTime.ViolationTime) * 1e-9 t := float64(violTime.ViolationTime) * 1e-9
y, err := lp.NewMessage("nv_violation_below_base_clock", device.tags, device.meta, map[string]interface{}{"value": t}, time.Now()) y, err := lp.NewMetric("nv_violation_below_base_clock", device.tags, device.meta, t, time.Now())
if err == nil { if err == nil {
y.AddMeta("unit", "sec") y.AddMeta("unit", "sec")
output <- y output <- y
@@ -1008,19 +998,19 @@ func readNVLinkStats(device *NvidiaCollectorDevice, output chan lp.CCMessage) er
var aggregate_recovery_errors uint64 = 0 var aggregate_recovery_errors uint64 = 0
var aggregate_crc_flit_errors uint64 = 0 var aggregate_crc_flit_errors uint64 = 0
for i := 0; i < nvml.NVLINK_MAX_LINKS; i++ { for i := range nvml.NVLINK_MAX_LINKS {
state, ret := nvml.DeviceGetNvLinkState(device.device, i) state, ret := nvml.DeviceGetNvLinkState(device.device, i)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
if state == nvml.FEATURE_ENABLED { if state == nvml.FEATURE_ENABLED {
if !device.excludeMetrics["nv_nvlink_crc_errors"] { if !device.excludeMetrics["nv_nvlink_crc_errors"] {
// Data link receive data CRC error counter // Data link receive data CRC error counter
count, ret := nvml.DeviceGetNvLinkErrorCounter(device.device, i, nvml.NVLINK_ERROR_DL_CRC_DATA) count, ret := nvml.DeviceGetNvLinkErrorCounter(device.device, i, nvml.NVLINK_ERROR_DL_CRC_DATA)
aggregate_crc_errors = aggregate_crc_errors + count aggregate_crc_errors += count
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_nvlink_crc_errors", device.tags, device.meta, map[string]interface{}{"value": count}, time.Now()) y, err := lp.NewMetric("nv_nvlink_crc_errors", device.tags, device.meta, count, time.Now())
if err == nil { if err == nil {
y.AddTag("stype", "nvlink") y.AddTag("stype", "nvlink")
y.AddTag("stype-id", fmt.Sprintf("%d", i)) y.AddTag("stype-id", strconv.Itoa(i))
output <- y output <- y
} }
} }
@@ -1028,12 +1018,12 @@ func readNVLinkStats(device *NvidiaCollectorDevice, output chan lp.CCMessage) er
if !device.excludeMetrics["nv_nvlink_ecc_errors"] { if !device.excludeMetrics["nv_nvlink_ecc_errors"] {
// Data link receive data ECC error counter // Data link receive data ECC error counter
count, ret := nvml.DeviceGetNvLinkErrorCounter(device.device, i, nvml.NVLINK_ERROR_DL_ECC_DATA) count, ret := nvml.DeviceGetNvLinkErrorCounter(device.device, i, nvml.NVLINK_ERROR_DL_ECC_DATA)
aggregate_ecc_errors = aggregate_ecc_errors + count aggregate_ecc_errors += count
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_nvlink_ecc_errors", device.tags, device.meta, map[string]interface{}{"value": count}, time.Now()) y, err := lp.NewMetric("nv_nvlink_ecc_errors", device.tags, device.meta, count, time.Now())
if err == nil { if err == nil {
y.AddTag("stype", "nvlink") y.AddTag("stype", "nvlink")
y.AddTag("stype-id", fmt.Sprintf("%d", i)) y.AddTag("stype-id", strconv.Itoa(i))
output <- y output <- y
} }
} }
@@ -1041,12 +1031,12 @@ func readNVLinkStats(device *NvidiaCollectorDevice, output chan lp.CCMessage) er
if !device.excludeMetrics["nv_nvlink_replay_errors"] { if !device.excludeMetrics["nv_nvlink_replay_errors"] {
// Data link transmit replay error counter // Data link transmit replay error counter
count, ret := nvml.DeviceGetNvLinkErrorCounter(device.device, i, nvml.NVLINK_ERROR_DL_REPLAY) count, ret := nvml.DeviceGetNvLinkErrorCounter(device.device, i, nvml.NVLINK_ERROR_DL_REPLAY)
aggregate_replay_errors = aggregate_replay_errors + count aggregate_replay_errors += count
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_nvlink_replay_errors", device.tags, device.meta, map[string]interface{}{"value": count}, time.Now()) y, err := lp.NewMetric("nv_nvlink_replay_errors", device.tags, device.meta, count, time.Now())
if err == nil { if err == nil {
y.AddTag("stype", "nvlink") y.AddTag("stype", "nvlink")
y.AddTag("stype-id", fmt.Sprintf("%d", i)) y.AddTag("stype-id", strconv.Itoa(i))
output <- y output <- y
} }
} }
@@ -1054,12 +1044,12 @@ func readNVLinkStats(device *NvidiaCollectorDevice, output chan lp.CCMessage) er
if !device.excludeMetrics["nv_nvlink_recovery_errors"] { if !device.excludeMetrics["nv_nvlink_recovery_errors"] {
// Data link transmit recovery error counter // Data link transmit recovery error counter
count, ret := nvml.DeviceGetNvLinkErrorCounter(device.device, i, nvml.NVLINK_ERROR_DL_RECOVERY) count, ret := nvml.DeviceGetNvLinkErrorCounter(device.device, i, nvml.NVLINK_ERROR_DL_RECOVERY)
aggregate_recovery_errors = aggregate_recovery_errors + count aggregate_recovery_errors += count
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_nvlink_recovery_errors", device.tags, device.meta, map[string]interface{}{"value": count}, time.Now()) y, err := lp.NewMetric("nv_nvlink_recovery_errors", device.tags, device.meta, count, time.Now())
if err == nil { if err == nil {
y.AddTag("stype", "nvlink") y.AddTag("stype", "nvlink")
y.AddTag("stype-id", fmt.Sprintf("%d", i)) y.AddTag("stype-id", strconv.Itoa(i))
output <- y output <- y
} }
} }
@@ -1067,12 +1057,12 @@ func readNVLinkStats(device *NvidiaCollectorDevice, output chan lp.CCMessage) er
if !device.excludeMetrics["nv_nvlink_crc_flit_errors"] { if !device.excludeMetrics["nv_nvlink_crc_flit_errors"] {
// Data link receive flow control digit CRC error counter // Data link receive flow control digit CRC error counter
count, ret := nvml.DeviceGetNvLinkErrorCounter(device.device, i, nvml.NVLINK_ERROR_DL_CRC_FLIT) count, ret := nvml.DeviceGetNvLinkErrorCounter(device.device, i, nvml.NVLINK_ERROR_DL_CRC_FLIT)
aggregate_crc_flit_errors = aggregate_crc_flit_errors + count aggregate_crc_flit_errors += count
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
y, err := lp.NewMessage("nv_nvlink_crc_flit_errors", device.tags, device.meta, map[string]interface{}{"value": count}, time.Now()) y, err := lp.NewMetric("nv_nvlink_crc_flit_errors", device.tags, device.meta, count, time.Now())
if err == nil { if err == nil {
y.AddTag("stype", "nvlink") y.AddTag("stype", "nvlink")
y.AddTag("stype-id", fmt.Sprintf("%d", i)) y.AddTag("stype-id", strconv.Itoa(i))
output <- y output <- y
} }
} }
@@ -1084,7 +1074,7 @@ func readNVLinkStats(device *NvidiaCollectorDevice, output chan lp.CCMessage) er
// Export aggegated values // Export aggegated values
if !device.excludeMetrics["nv_nvlink_crc_errors"] { if !device.excludeMetrics["nv_nvlink_crc_errors"] {
// Data link receive data CRC error counter // Data link receive data CRC error counter
y, err := lp.NewMessage("nv_nvlink_crc_errors_sum", device.tags, device.meta, map[string]interface{}{"value": aggregate_crc_errors}, time.Now()) y, err := lp.NewMetric("nv_nvlink_crc_errors_sum", device.tags, device.meta, aggregate_crc_errors, time.Now())
if err == nil { if err == nil {
y.AddTag("stype", "nvlink") y.AddTag("stype", "nvlink")
output <- y output <- y
@@ -1092,7 +1082,7 @@ func readNVLinkStats(device *NvidiaCollectorDevice, output chan lp.CCMessage) er
} }
if !device.excludeMetrics["nv_nvlink_ecc_errors"] { if !device.excludeMetrics["nv_nvlink_ecc_errors"] {
// Data link receive data ECC error counter // Data link receive data ECC error counter
y, err := lp.NewMessage("nv_nvlink_ecc_errors_sum", device.tags, device.meta, map[string]interface{}{"value": aggregate_ecc_errors}, time.Now()) y, err := lp.NewMetric("nv_nvlink_ecc_errors_sum", device.tags, device.meta, aggregate_ecc_errors, time.Now())
if err == nil { if err == nil {
y.AddTag("stype", "nvlink") y.AddTag("stype", "nvlink")
output <- y output <- y
@@ -1100,7 +1090,7 @@ func readNVLinkStats(device *NvidiaCollectorDevice, output chan lp.CCMessage) er
} }
if !device.excludeMetrics["nv_nvlink_replay_errors"] { if !device.excludeMetrics["nv_nvlink_replay_errors"] {
// Data link transmit replay error counter // Data link transmit replay error counter
y, err := lp.NewMessage("nv_nvlink_replay_errors_sum", device.tags, device.meta, map[string]interface{}{"value": aggregate_replay_errors}, time.Now()) y, err := lp.NewMetric("nv_nvlink_replay_errors_sum", device.tags, device.meta, aggregate_replay_errors, time.Now())
if err == nil { if err == nil {
y.AddTag("stype", "nvlink") y.AddTag("stype", "nvlink")
output <- y output <- y
@@ -1108,7 +1098,7 @@ func readNVLinkStats(device *NvidiaCollectorDevice, output chan lp.CCMessage) er
} }
if !device.excludeMetrics["nv_nvlink_recovery_errors"] { if !device.excludeMetrics["nv_nvlink_recovery_errors"] {
// Data link transmit recovery error counter // Data link transmit recovery error counter
y, err := lp.NewMessage("nv_nvlink_recovery_errors_sum", device.tags, device.meta, map[string]interface{}{"value": aggregate_recovery_errors}, time.Now()) y, err := lp.NewMetric("nv_nvlink_recovery_errors_sum", device.tags, device.meta, aggregate_recovery_errors, time.Now())
if err == nil { if err == nil {
y.AddTag("stype", "nvlink") y.AddTag("stype", "nvlink")
output <- y output <- y
@@ -1116,7 +1106,7 @@ func readNVLinkStats(device *NvidiaCollectorDevice, output chan lp.CCMessage) er
} }
if !device.excludeMetrics["nv_nvlink_crc_flit_errors"] { if !device.excludeMetrics["nv_nvlink_crc_flit_errors"] {
// Data link receive flow control digit CRC error counter // Data link receive flow control digit CRC error counter
y, err := lp.NewMessage("nv_nvlink_crc_flit_errors_sum", device.tags, device.meta, map[string]interface{}{"value": aggregate_crc_flit_errors}, time.Now()) y, err := lp.NewMetric("nv_nvlink_crc_flit_errors_sum", device.tags, device.meta, aggregate_crc_flit_errors, time.Now())
if err == nil { if err == nil {
y.AddTag("stype", "nvlink") y.AddTag("stype", "nvlink")
output <- y output <- y
@@ -1233,7 +1223,7 @@ func (m *NvidiaCollector) Read(interval time.Duration, output chan lp.CCMessage)
} }
// Actual read loop over all attached Nvidia GPUs // Actual read loop over all attached Nvidia GPUs
for i := 0; i < m.num_gpus; i++ { for i := range m.num_gpus {
readAll(&m.gpus[i], output) readAll(&m.gpus[i], output)
@@ -1256,7 +1246,7 @@ func (m *NvidiaCollector) Read(interval time.Duration, output chan lp.CCMessage)
} }
cclog.ComponentDebug(m.name, "Reading MIG devices for GPU", i) cclog.ComponentDebug(m.name, "Reading MIG devices for GPU", i)
for j := 0; j < maxMig; j++ { for j := range maxMig {
mdev, ret := nvml.DeviceGetMigDeviceHandleByIndex(m.gpus[i].device, j) mdev, ret := nvml.DeviceGetMigDeviceHandleByIndex(m.gpus[i].device, j)
if ret != nvml.SUCCESS { if ret != nvml.SUCCESS {
continue continue
@@ -1273,9 +1263,7 @@ func (m *NvidiaCollector) Read(interval time.Duration, output chan lp.CCMessage)
meta: map[string]string{}, meta: map[string]string{},
excludeMetrics: excludeMetrics, excludeMetrics: excludeMetrics,
} }
for k, v := range m.gpus[i].tags { maps.Copy(migDevice.tags, m.gpus[i].tags)
migDevice.tags[k] = v
}
migDevice.tags["stype"] = "mig" migDevice.tags["stype"] = "mig"
if m.config.UseUuidForMigDevices { if m.config.UseUuidForMigDevices {
uuid, ret := nvml.DeviceGetUUID(mdev) uuid, ret := nvml.DeviceGetUUID(mdev)
@@ -1289,19 +1277,17 @@ func (m *NvidiaCollector) Read(interval time.Duration, output chan lp.CCMessage)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
mname, ret := nvml.DeviceGetName(mdev) mname, ret := nvml.DeviceGetName(mdev)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
x := strings.Replace(mname, name, "", -1) x := strings.ReplaceAll(mname, name, "")
x = strings.Replace(x, "MIG", "", -1) x = strings.ReplaceAll(x, "MIG", "")
x = strings.TrimSpace(x) x = strings.TrimSpace(x)
migDevice.tags["stype-id"] = x migDevice.tags["stype-id"] = x
} }
} }
} }
if _, ok := migDevice.tags["stype-id"]; !ok { if _, ok := migDevice.tags["stype-id"]; !ok {
migDevice.tags["stype-id"] = fmt.Sprintf("%d", j) migDevice.tags["stype-id"] = strconv.Itoa(j)
}
for k, v := range m.gpus[i].meta {
migDevice.meta[k] = v
} }
maps.Copy(migDevice.meta, m.gpus[i].meta)
if _, ok := migDevice.meta["uuid"]; ok && !m.config.UseUuidForMigDevices { if _, ok := migDevice.meta["uuid"]; ok && !m.config.UseUuidForMigDevices {
uuid, ret := nvml.DeviceGetUUID(mdev) uuid, ret := nvml.DeviceGetUUID(mdev)
if ret == nvml.SUCCESS { if ret == nvml.SUCCESS {
@@ -1317,7 +1303,9 @@ func (m *NvidiaCollector) Read(interval time.Duration, output chan lp.CCMessage)
func (m *NvidiaCollector) Close() { func (m *NvidiaCollector) Close() {
if m.init { if m.init {
nvml.Shutdown() if ret := nvml.Shutdown(); ret != nvml.SUCCESS {
cclog.ComponentError(m.name, "nvml.Shutdown() not successful")
}
m.init = false m.init = false
} }
} }

View File

@@ -8,6 +8,7 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"fmt" "fmt"
"os" "os"
@@ -16,8 +17,8 @@ import (
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
// running average power limit (RAPL) monitoring attributes for a zone // running average power limit (RAPL) monitoring attributes for a zone
@@ -34,6 +35,7 @@ type RAPLZoneInfo struct {
type RAPLCollector struct { type RAPLCollector struct {
metricCollector metricCollector
config struct { config struct {
// Exclude IDs for RAPL zones, e.g. // Exclude IDs for RAPL zones, e.g.
// * 0 for zone 0 // * 0 for zone 0
@@ -48,15 +50,15 @@ type RAPLCollector struct {
// Init initializes the running average power limit (RAPL) collector // Init initializes the running average power limit (RAPL) collector
func (m *RAPLCollector) Init(config json.RawMessage) error { func (m *RAPLCollector) Init(config json.RawMessage) error {
// Check if already initialized // Check if already initialized
if m.init { if m.init {
return nil return nil
} }
var err error = nil
m.name = "RAPLCollector" m.name = "RAPLCollector"
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true m.parallel = true
m.meta = map[string]string{ m.meta = map[string]string{
"source": m.name, "source": m.name,
@@ -66,10 +68,10 @@ func (m *RAPLCollector) Init(config json.RawMessage) error {
// Read in the JSON configuration // Read in the JSON configuration
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
cclog.ComponentError(m.name, "Error reading config:", err.Error()) if err := d.Decode(&m.config); err != nil {
return err return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
@@ -89,19 +91,20 @@ func (m *RAPLCollector) Init(config json.RawMessage) error {
// readZoneInfo reads RAPL monitoring attributes for a zone given by zonePath // readZoneInfo reads RAPL monitoring attributes for a zone given by zonePath
// See: https://www.kernel.org/doc/html/latest/power/powercap/powercap.html#monitoring-attributes // See: https://www.kernel.org/doc/html/latest/power/powercap/powercap.html#monitoring-attributes
readZoneInfo := func(zonePath string) (z struct { readZoneInfo := func(zonePath string) (
name string // zones name e.g. psys, dram, core, uncore, package-0 z struct {
energyFilepath string // path to a file containing the zones current energy counter in micro joules name string // zones name e.g. psys, dram, core, uncore, package-0
energy int64 // current reading of the energy counter in micro joules energyFilepath string // path to a file containing the zones current energy counter in micro joules
energyTimestamp time.Time // timestamp when energy counter was read energy int64 // current reading of the energy counter in micro joules
maxEnergyRange int64 // Range of the above energy counter in micro-joules energyTimestamp time.Time // timestamp when energy counter was read
ok bool // Are all information available? maxEnergyRange int64 // Range of the above energy counter in micro-joules
}) { ok bool // Are all information available?
},
) {
// zones name e.g. psys, dram, core, uncore, package-0 // zones name e.g. psys, dram, core, uncore, package-0
foundName := false foundName := false
if v, err := if v, err := os.ReadFile(
os.ReadFile( filepath.Join(zonePath, "name")); err == nil {
filepath.Join(zonePath, "name")); err == nil {
foundName = true foundName = true
z.name = strings.TrimSpace(string(v)) z.name = strings.TrimSpace(string(v))
} }
@@ -122,9 +125,8 @@ func (m *RAPLCollector) Init(config json.RawMessage) error {
// Range of the above energy counter in micro-joules // Range of the above energy counter in micro-joules
foundMaxEnergyRange := false foundMaxEnergyRange := false
if v, err := if v, err := os.ReadFile(
os.ReadFile( filepath.Join(zonePath, "max_energy_range_uj")); err == nil {
filepath.Join(zonePath, "max_energy_range_uj")); err == nil {
if i, err := strconv.ParseInt(strings.TrimSpace(string(v)), 10, 64); err == nil { if i, err := strconv.ParseInt(strings.TrimSpace(string(v)), 10, 64); err == nil {
foundMaxEnergyRange = true foundMaxEnergyRange = true
z.maxEnergyRange = i z.maxEnergyRange = i
@@ -156,19 +158,18 @@ func (m *RAPLCollector) Init(config json.RawMessage) error {
!isNameExcluded[z.name] { !isNameExcluded[z.name] {
// Add RAPL monitoring attributes for a zone // Add RAPL monitoring attributes for a zone
m.RAPLZoneInfo = m.RAPLZoneInfo = append(
append( m.RAPLZoneInfo,
m.RAPLZoneInfo, RAPLZoneInfo{
RAPLZoneInfo{ tags: map[string]string{
tags: map[string]string{ "id": zoneID,
"id": zoneID, "zone_name": z.name,
"zone_name": z.name, },
}, energyFilepath: z.energyFilepath,
energyFilepath: z.energyFilepath, energy: z.energy,
energy: z.energy, energyTimestamp: z.energyTimestamp,
energyTimestamp: z.energyTimestamp, maxEnergyRange: z.maxEnergyRange,
maxEnergyRange: z.maxEnergyRange, })
})
} }
// find all sub zones for the given zone // find all sub zones for the given zone
@@ -185,27 +186,25 @@ func (m *RAPLCollector) Init(config json.RawMessage) error {
sz.ok && sz.ok &&
!isIDExcluded[zoneID+":"+subZoneID] && !isIDExcluded[zoneID+":"+subZoneID] &&
!isNameExcluded[sz.name] { !isNameExcluded[sz.name] {
m.RAPLZoneInfo = m.RAPLZoneInfo = append(
append( m.RAPLZoneInfo,
m.RAPLZoneInfo, RAPLZoneInfo{
RAPLZoneInfo{ tags: map[string]string{
tags: map[string]string{ "id": zoneID + ":" + subZoneID,
"id": zoneID + ":" + subZoneID, "zone_name": z.name,
"zone_name": z.name, "sub_zone_name": sz.name,
"sub_zone_name": sz.name, },
}, energyFilepath: sz.energyFilepath,
energyFilepath: sz.energyFilepath, energy: sz.energy,
energy: sz.energy, energyTimestamp: sz.energyTimestamp,
energyTimestamp: sz.energyTimestamp, maxEnergyRange: sz.maxEnergyRange,
maxEnergyRange: sz.maxEnergyRange, })
})
} }
} }
} }
if m.RAPLZoneInfo == nil { if m.RAPLZoneInfo == nil {
return fmt.Errorf("no running average power limit (RAPL) device found in %s", controlTypePath) return fmt.Errorf("no running average power limit (RAPL) device found in %s", controlTypePath)
} }
// Initialized // Initialized
@@ -222,7 +221,6 @@ func (m *RAPLCollector) Init(config json.RawMessage) error {
// Read reads running average power limit (RAPL) monitoring attributes for all initialized zones // Read reads running average power limit (RAPL) monitoring attributes for all initialized zones
// See: https://www.kernel.org/doc/html/latest/power/powercap/powercap.html#monitoring-attributes // See: https://www.kernel.org/doc/html/latest/power/powercap/powercap.html#monitoring-attributes
func (m *RAPLCollector) Read(interval time.Duration, output chan lp.CCMessage) { func (m *RAPLCollector) Read(interval time.Duration, output chan lp.CCMessage) {
for i := range m.RAPLZoneInfo { for i := range m.RAPLZoneInfo {
p := &m.RAPLZoneInfo[i] p := &m.RAPLZoneInfo[i]
@@ -248,7 +246,7 @@ func (m *RAPLCollector) Read(interval time.Duration, output chan lp.CCMessage) {
"rapl_average_power", "rapl_average_power",
p.tags, p.tags,
m.meta, m.meta,
map[string]interface{}{"value": averagePower}, map[string]any{"value": averagePower},
energyTimestamp) energyTimestamp)
if err == nil { if err == nil {
output <- y output <- y

View File

@@ -8,13 +8,15 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"errors"
"fmt" "fmt"
"slices"
"strconv"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
"github.com/ClusterCockpit/go-rocm-smi/pkg/rocm_smi" "github.com/ClusterCockpit/go-rocm-smi/pkg/rocm_smi"
) )
@@ -36,6 +38,7 @@ type RocmSmiCollectorDevice struct {
type RocmSmiCollector struct { type RocmSmiCollector struct {
metricCollector metricCollector
config RocmSmiCollectorConfig // the configuration structure config RocmSmiCollectorConfig // the configuration structure
devices []RocmSmiCollectorDevice devices []RocmSmiCollectorDevice
} }
@@ -48,73 +51,46 @@ type RocmSmiCollector struct {
// Called once by the collector manager // Called once by the collector manager
// All tags, meta data tags and metrics that do not change over the runtime should be set here // All tags, meta data tags and metrics that do not change over the runtime should be set here
func (m *RocmSmiCollector) Init(config json.RawMessage) error { func (m *RocmSmiCollector) Init(config json.RawMessage) error {
var err error = nil
// Always set the name early in Init() to use it in cclog.Component* functions // Always set the name early in Init() to use it in cclog.Component* functions
m.name = "RocmSmiCollector" m.name = "RocmSmiCollector"
// This is for later use, also call it early // This is for later use, also call it early
m.setup() if err := m.setup(); err != nil {
// Define meta information sent with each metric return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
// (Can also be dynamic or this is the basic set with extension through AddMeta()) }
//m.meta = map[string]string{"source": m.name, "group": "AMD"}
// Define tags sent with each metric
// The 'type' tag is always needed, it defines the granulatity of the metric
// node -> whole system
// socket -> CPU socket (requires socket ID as 'type-id' tag)
// cpu -> single CPU hardware thread (requires cpu ID as 'type-id' tag)
//m.tags = map[string]string{"type": "node"}
// Read in the JSON configuration // Read in the JSON configuration
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
cclog.ComponentError(m.name, "Error reading config:", err.Error()) if err := d.Decode(&m.config); err != nil {
return err return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
ret := rocm_smi.Init() ret := rocm_smi.Init()
if ret != rocm_smi.STATUS_SUCCESS { if ret != rocm_smi.STATUS_SUCCESS {
err = errors.New("failed to initialize ROCm SMI library") return fmt.Errorf("%s Init(): failed to initialize ROCm SMI library", m.name)
cclog.ComponentError(m.name, err.Error())
return err
} }
numDevs, ret := rocm_smi.NumMonitorDevices() numDevs, ret := rocm_smi.NumMonitorDevices()
if ret != rocm_smi.STATUS_SUCCESS { if ret != rocm_smi.STATUS_SUCCESS {
err = errors.New("failed to get number of GPUs from ROCm SMI library") return fmt.Errorf("%s Init(): failed to get number of GPUs from ROCm SMI library", m.name)
cclog.ComponentError(m.name, err.Error())
return err
}
exclDev := func(s string) bool {
skip_device := false
for _, excl := range m.config.ExcludeDevices {
if excl == s {
skip_device = true
break
}
}
return skip_device
} }
m.devices = make([]RocmSmiCollectorDevice, 0) m.devices = make([]RocmSmiCollectorDevice, 0)
for i := 0; i < numDevs; i++ { for i := range numDevs {
str_i := fmt.Sprintf("%d", i) str_i := strconv.Itoa(i)
if exclDev(str_i) { if slices.Contains(m.config.ExcludeDevices, str_i) {
continue continue
} }
device, ret := rocm_smi.DeviceGetHandleByIndex(i) device, ret := rocm_smi.DeviceGetHandleByIndex(i)
if ret != rocm_smi.STATUS_SUCCESS { if ret != rocm_smi.STATUS_SUCCESS {
err = fmt.Errorf("failed to get handle for GPU %d", i) return fmt.Errorf("%s Init(): failed to get get handle for GPU %d", m.name, i)
cclog.ComponentError(m.name, err.Error())
return err
} }
pciInfo, ret := rocm_smi.DeviceGetPciInfo(device) pciInfo, ret := rocm_smi.DeviceGetPciInfo(device)
if ret != rocm_smi.STATUS_SUCCESS { if ret != rocm_smi.STATUS_SUCCESS {
err = fmt.Errorf("failed to get PCI information for GPU %d", i) return fmt.Errorf("%s Init(): failed to get PCI information for GPU %d", m.name, i)
cclog.ComponentError(m.name, err.Error())
return err
} }
pciId := fmt.Sprintf( pciId := fmt.Sprintf(
@@ -124,7 +100,7 @@ func (m *RocmSmiCollector) Init(config json.RawMessage) error {
pciInfo.Device, pciInfo.Device,
pciInfo.Function) pciInfo.Function)
if exclDev(pciId) { if slices.Contains(m.config.ExcludeDevices, pciId) {
continue continue
} }
@@ -164,7 +140,7 @@ func (m *RocmSmiCollector) Init(config json.RawMessage) error {
// Set this flag only if everything is initialized properly, all required files exist, ... // Set this flag only if everything is initialized properly, all required files exist, ...
m.init = true m.init = true
return err return nil
} }
// Read collects all metrics belonging to the sample collector // Read collects all metrics belonging to the sample collector
@@ -182,136 +158,135 @@ func (m *RocmSmiCollector) Read(interval time.Duration, output chan lp.CCMessage
if !dev.excludeMetrics["rocm_gfx_util"] { if !dev.excludeMetrics["rocm_gfx_util"] {
value := metrics.Average_gfx_activity value := metrics.Average_gfx_activity
y, err := lp.NewMessage("rocm_gfx_util", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_gfx_util", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_umc_util"] { if !dev.excludeMetrics["rocm_umc_util"] {
value := metrics.Average_umc_activity value := metrics.Average_umc_activity
y, err := lp.NewMessage("rocm_umc_util", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_umc_util", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_mm_util"] { if !dev.excludeMetrics["rocm_mm_util"] {
value := metrics.Average_mm_activity value := metrics.Average_mm_activity
y, err := lp.NewMessage("rocm_mm_util", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_mm_util", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_avg_power"] { if !dev.excludeMetrics["rocm_avg_power"] {
value := metrics.Average_socket_power value := metrics.Average_socket_power
y, err := lp.NewMessage("rocm_avg_power", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_avg_power", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_temp_mem"] { if !dev.excludeMetrics["rocm_temp_mem"] {
value := metrics.Temperature_mem value := metrics.Temperature_mem
y, err := lp.NewMessage("rocm_temp_mem", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_temp_mem", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_temp_hotspot"] { if !dev.excludeMetrics["rocm_temp_hotspot"] {
value := metrics.Temperature_hotspot value := metrics.Temperature_hotspot
y, err := lp.NewMessage("rocm_temp_hotspot", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_temp_hotspot", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_temp_edge"] { if !dev.excludeMetrics["rocm_temp_edge"] {
value := metrics.Temperature_edge value := metrics.Temperature_edge
y, err := lp.NewMessage("rocm_temp_edge", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_temp_edge", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_temp_vrgfx"] { if !dev.excludeMetrics["rocm_temp_vrgfx"] {
value := metrics.Temperature_vrgfx value := metrics.Temperature_vrgfx
y, err := lp.NewMessage("rocm_temp_vrgfx", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_temp_vrgfx", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_temp_vrsoc"] { if !dev.excludeMetrics["rocm_temp_vrsoc"] {
value := metrics.Temperature_vrsoc value := metrics.Temperature_vrsoc
y, err := lp.NewMessage("rocm_temp_vrsoc", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_temp_vrsoc", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_temp_vrmem"] { if !dev.excludeMetrics["rocm_temp_vrmem"] {
value := metrics.Temperature_vrmem value := metrics.Temperature_vrmem
y, err := lp.NewMessage("rocm_temp_vrmem", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_temp_vrmem", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_gfx_clock"] { if !dev.excludeMetrics["rocm_gfx_clock"] {
value := metrics.Average_gfxclk_frequency value := metrics.Average_gfxclk_frequency
y, err := lp.NewMessage("rocm_gfx_clock", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_gfx_clock", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_soc_clock"] { if !dev.excludeMetrics["rocm_soc_clock"] {
value := metrics.Average_socclk_frequency value := metrics.Average_socclk_frequency
y, err := lp.NewMessage("rocm_soc_clock", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_soc_clock", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_u_clock"] { if !dev.excludeMetrics["rocm_u_clock"] {
value := metrics.Average_uclk_frequency value := metrics.Average_uclk_frequency
y, err := lp.NewMessage("rocm_u_clock", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_u_clock", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_v0_clock"] { if !dev.excludeMetrics["rocm_v0_clock"] {
value := metrics.Average_vclk0_frequency value := metrics.Average_vclk0_frequency
y, err := lp.NewMessage("rocm_v0_clock", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_v0_clock", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_v1_clock"] { if !dev.excludeMetrics["rocm_v1_clock"] {
value := metrics.Average_vclk1_frequency value := metrics.Average_vclk1_frequency
y, err := lp.NewMessage("rocm_v1_clock", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_v1_clock", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_d0_clock"] { if !dev.excludeMetrics["rocm_d0_clock"] {
value := metrics.Average_dclk0_frequency value := metrics.Average_dclk0_frequency
y, err := lp.NewMessage("rocm_d0_clock", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_d0_clock", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_d1_clock"] { if !dev.excludeMetrics["rocm_d1_clock"] {
value := metrics.Average_dclk1_frequency value := metrics.Average_dclk1_frequency
y, err := lp.NewMessage("rocm_d1_clock", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_d1_clock", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if !dev.excludeMetrics["rocm_temp_hbm"] { if !dev.excludeMetrics["rocm_temp_hbm"] {
for i := 0; i < rocm_smi.NUM_HBM_INSTANCES; i++ { for i := range rocm_smi.NUM_HBM_INSTANCES {
value := metrics.Temperature_hbm[i] value := metrics.Temperature_hbm[i]
y, err := lp.NewMessage("rocm_temp_hbm", dev.tags, dev.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMessage("rocm_temp_hbm", dev.tags, dev.meta, map[string]any{"value": value}, timestamp)
if err == nil { if err == nil {
y.AddTag("stype", "device") y.AddTag("stype", "device")
y.AddTag("stype-id", fmt.Sprintf("%d", i)) y.AddTag("stype-id", strconv.Itoa(i))
output <- y output <- y
} }
} }
} }
} }
} }
// Close metric collector: close network connection, close files, close libraries, ... // Close metric collector: close network connection, close files, close libraries, ...

View File

@@ -15,7 +15,9 @@ hugo_path: docs/reference/cc-metric-collector/collectors/rocmsmi.md
```json ```json
"rocm_smi": { "rocm_smi": {
"exclude_devices": [ "exclude_devices": [
"0","1", "0000000:ff:01.0" "0",
"1",
"0000000:ff:01.0"
], ],
"exclude_metrics": [ "exclude_metrics": [
"rocm_mm_util", "rocm_mm_util",
@@ -23,7 +25,7 @@ hugo_path: docs/reference/cc-metric-collector/collectors/rocmsmi.md
], ],
"use_pci_info_as_type_id": true, "use_pci_info_as_type_id": true,
"add_pci_info_tag": false, "add_pci_info_tag": false,
"add_serial_meta": false, "add_serial_meta": false
} }
``` ```

View File

@@ -8,11 +8,12 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"fmt"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
lp "github.com/ClusterCockpit/cc-lib/ccMessage"
) )
// These are the fields we read from the JSON configuration // These are the fields we read from the JSON configuration
@@ -24,6 +25,7 @@ type SampleCollectorConfig struct {
// defined by metricCollector (name, init, ...) // defined by metricCollector (name, init, ...)
type SampleCollector struct { type SampleCollector struct {
metricCollector metricCollector
config SampleCollectorConfig // the configuration structure config SampleCollectorConfig // the configuration structure
meta map[string]string // default meta information meta map[string]string // default meta information
tags map[string]string // default tags tags map[string]string // default tags
@@ -41,14 +43,19 @@ func (m *SampleCollector) Init(config json.RawMessage) error {
// Always set the name early in Init() to use it in cclog.Component* functions // Always set the name early in Init() to use it in cclog.Component* functions
m.name = "SampleCollector" m.name = "SampleCollector"
// This is for later use, also call it early // This is for later use, also call it early
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
// Tell whether the collector should be run in parallel with others (reading files, ...) // Tell whether the collector should be run in parallel with others (reading files, ...)
// or it should be run serially, mostly for collectors actually doing measurements // or it should be run serially, mostly for collectors actually doing measurements
// because they should not measure the execution of the other collectors // because they should not measure the execution of the other collectors
m.parallel = true m.parallel = true
// Define meta information sent with each metric // Define meta information sent with each metric
// (Can also be dynamic or this is the basic set with extension through AddMeta()) // (Can also be dynamic or this is the basic set with extension through AddMeta())
m.meta = map[string]string{"source": m.name, "group": "SAMPLE"} m.meta = map[string]string{
"source": m.name,
"group": "SAMPLE",
}
// Define tags sent with each metric // Define tags sent with each metric
// The 'type' tag is always needed, it defines the granularity of the metric // The 'type' tag is always needed, it defines the granularity of the metric
// node -> whole system // node -> whole system
@@ -59,13 +66,15 @@ func (m *SampleCollector) Init(config json.RawMessage) error {
// core -> single CPU core that may consist of multiple hardware threads (SMT) (requires core ID as 'type-id' tag) // core -> single CPU core that may consist of multiple hardware threads (SMT) (requires core ID as 'type-id' tag)
// hwthtread -> single CPU hardware thread (requires hardware thread ID as 'type-id' tag) // hwthtread -> single CPU hardware thread (requires hardware thread ID as 'type-id' tag)
// accelerator -> A accelerator device like GPU or FPGA (requires an accelerator ID as 'type-id' tag) // accelerator -> A accelerator device like GPU or FPGA (requires an accelerator ID as 'type-id' tag)
m.tags = map[string]string{"type": "node"} m.tags = map[string]string{
"type": "node",
}
// Read in the JSON configuration // Read in the JSON configuration
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
cclog.ComponentError(m.name, "Error reading config:", err.Error()) if err := d.Decode(&m.config); err != nil {
return err return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
@@ -92,12 +101,11 @@ func (m *SampleCollector) Read(interval time.Duration, output chan lp.CCMessage)
// stop := readState() // stop := readState()
// value = (stop - start) / interval.Seconds() // value = (stop - start) / interval.Seconds()
y, err := lp.NewMessage("sample_metric", m.tags, m.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMetric("sample_metric", m.tags, m.meta, value, timestamp)
if err == nil { if err == nil {
// Send it to output channel // Send it to output channel
output <- y output <- y
} }
} }
// Close metric collector: close network connection, close files, close libraries, ... // Close metric collector: close network connection, close files, close libraries, ...

View File

@@ -8,12 +8,14 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"fmt"
"sync" "sync"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
// These are the fields we read from the JSON configuration // These are the fields we read from the JSON configuration
@@ -25,6 +27,7 @@ type SampleTimerCollectorConfig struct {
// defined by metricCollector (name, init, ...) // defined by metricCollector (name, init, ...)
type SampleTimerCollector struct { type SampleTimerCollector struct {
metricCollector metricCollector
wg sync.WaitGroup // sync group for management wg sync.WaitGroup // sync group for management
done chan bool // channel for management done chan bool // channel for management
meta map[string]string // default meta information meta map[string]string // default meta information
@@ -36,33 +39,39 @@ type SampleTimerCollector struct {
} }
func (m *SampleTimerCollector) Init(name string, config json.RawMessage) error { func (m *SampleTimerCollector) Init(name string, config json.RawMessage) error {
var err error = nil var err error
// Always set the name early in Init() to use it in cclog.Component* functions // Always set the name early in Init() to use it in cclog.Component* functions
m.name = "SampleTimerCollector" m.name = "SampleTimerCollector"
// This is for later use, also call it early // This is for later use, also call it early
m.setup() if err = m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
// Define meta information sent with each metric // Define meta information sent with each metric
// (Can also be dynamic or this is the basic set with extension through AddMeta()) // (Can also be dynamic or this is the basic set with extension through AddMeta())
m.meta = map[string]string{"source": m.name, "group": "SAMPLE"} m.meta = map[string]string{
"source": m.name,
"group": "SAMPLE",
}
// Define tags sent with each metric // Define tags sent with each metric
// The 'type' tag is always needed, it defines the granularity of the metric // The 'type' tag is always needed, it defines the granularity of the metric
// node -> whole system // node -> whole system
// socket -> CPU socket (requires socket ID as 'type-id' tag) // socket -> CPU socket (requires socket ID as 'type-id' tag)
// cpu -> single CPU hardware thread (requires cpu ID as 'type-id' tag) // cpu -> single CPU hardware thread (requires cpu ID as 'type-id' tag)
m.tags = map[string]string{"type": "node"} m.tags = map[string]string{
"type": "node",
}
// Read in the JSON configuration // Read in the JSON configuration
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
cclog.ComponentError(m.name, "Error reading config:", err.Error()) if err := d.Decode(&m.config); err != nil {
return err return fmt.Errorf("%s Init(): error decoding JSON config: %w", m.name, err)
} }
} }
// Parse the read interval duration // Parse the read interval duration
m.interval, err = time.ParseDuration(m.config.Interval) m.interval, err = time.ParseDuration(m.config.Interval)
if err != nil { if err != nil {
cclog.ComponentError(m.name, "Error parsing interval:", err.Error()) return fmt.Errorf("%s Init(): error parsing interval: %w", m.name, err)
return err
} }
// Storage for output channel // Storage for output channel
@@ -73,13 +82,11 @@ func (m *SampleTimerCollector) Init(name string, config json.RawMessage) error {
m.ticker = time.NewTicker(m.interval) m.ticker = time.NewTicker(m.interval)
// Start the timer loop with return functionality by sending 'true' to the done channel // Start the timer loop with return functionality by sending 'true' to the done channel
m.wg.Add(1) m.wg.Go(func() {
go func() {
select { select {
case <-m.done: case <-m.done:
// Exit the timer loop // Exit the timer loop
cclog.ComponentDebug(m.name, "Closing...") cclog.ComponentDebug(m.name, "Closing...")
m.wg.Done()
return return
case timestamp := <-m.ticker.C: case timestamp := <-m.ticker.C:
// This is executed every timer tick but we have to wait until the first // This is executed every timer tick but we have to wait until the first
@@ -88,7 +95,7 @@ func (m *SampleTimerCollector) Init(name string, config json.RawMessage) error {
m.ReadMetrics(timestamp) m.ReadMetrics(timestamp)
} }
} }
}() })
// Set this flag only if everything is initialized properly, all required files exist, ... // Set this flag only if everything is initialized properly, all required files exist, ...
m.init = true m.init = true
@@ -107,7 +114,7 @@ func (m *SampleTimerCollector) ReadMetrics(timestamp time.Time) {
// stop := readState() // stop := readState()
// value = (stop - start) / interval.Seconds() // value = (stop - start) / interval.Seconds()
y, err := lp.NewMessage("sample_metric", m.tags, m.meta, map[string]interface{}{"value": value}, timestamp) y, err := lp.NewMetric("sample_metric", m.tags, m.meta, value, timestamp)
if err == nil && m.output != nil { if err == nil && m.output != nil {
// Send it to output channel if we have a valid channel // Send it to output channel if we have a valid channel
m.output <- y m.output <- y

View File

@@ -9,16 +9,16 @@ package collectors
import ( import (
"bufio" "bufio"
"bytes"
"encoding/json" "encoding/json"
"fmt" "fmt"
"math"
"os" "os"
"strconv" "strconv"
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
const SCHEDSTATFILE = `/proc/schedstat` const SCHEDSTATFILE = `/proc/schedstat`
@@ -32,6 +32,7 @@ type SchedstatCollectorConfig struct {
// defined by metricCollector (name, init, ...) // defined by metricCollector (name, init, ...)
type SchedstatCollector struct { type SchedstatCollector struct {
metricCollector metricCollector
config SchedstatCollectorConfig // the configuration structure config SchedstatCollectorConfig // the configuration structure
lastTimestamp time.Time // Store time stamp of last tick to derive values lastTimestamp time.Time // Store time stamp of last tick to derive values
meta map[string]string // default meta information meta map[string]string // default meta information
@@ -47,37 +48,39 @@ type SchedstatCollector struct {
// Called once by the collector manager // Called once by the collector manager
// All tags, meta data tags and metrics that do not change over the runtime should be set here // All tags, meta data tags and metrics that do not change over the runtime should be set here
func (m *SchedstatCollector) Init(config json.RawMessage) error { func (m *SchedstatCollector) Init(config json.RawMessage) error {
var err error = nil
// Always set the name early in Init() to use it in cclog.Component* functions // Always set the name early in Init() to use it in cclog.Component* functions
m.name = "SchedstatCollector" m.name = "SchedstatCollector"
// This is for later use, also call it early // This is for later use, also call it early
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
// Tell whether the collector should be run in parallel with others (reading files, ...) // Tell whether the collector should be run in parallel with others (reading files, ...)
// or it should be run serially, mostly for collectors acutally doing measurements // or it should be run serially, mostly for collectors actually doing measurements
// because they should not measure the execution of the other collectors // because they should not measure the execution of the other collectors
m.parallel = true m.parallel = true
// Define meta information sent with each metric // Define meta information sent with each metric
// (Can also be dynamic or this is the basic set with extension through AddMeta()) // (Can also be dynamic or this is the basic set with extension through AddMeta())
m.meta = map[string]string{"source": m.name, "group": "SCHEDSTAT"} m.meta = map[string]string{
"source": m.name,
"group": "SCHEDSTAT",
}
// Read in the JSON configuration // Read in the JSON configuration
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
cclog.ComponentError(m.name, "Error reading config:", err.Error()) if err := d.Decode(&m.config); err != nil {
return err return fmt.Errorf("%s Init(): failed to decode JSON config: %w", m.name, err)
} }
} }
// Check input file // Check input file
file, err := os.Open(string(SCHEDSTATFILE)) file, err := os.Open(SCHEDSTATFILE)
if err != nil { if err != nil {
cclog.ComponentError(m.name, err.Error()) return fmt.Errorf("%s Init(): Failed opening scheduler statistics file \"%s\": %w", m.name, SCHEDSTATFILE, err)
} }
defer file.Close()
// Pre-generate tags for all CPUs // Pre-generate tags for all CPUs
num_cpus := 0
m.cputags = make(map[string]map[string]string) m.cputags = make(map[string]map[string]string)
m.olddata = make(map[string]map[string]int64) m.olddata = make(map[string]map[string]int64)
scanner := bufio.NewScanner(file) scanner := bufio.NewScanner(file)
@@ -89,11 +92,19 @@ func (m *SchedstatCollector) Init(config json.RawMessage) error {
cpu, _ := strconv.Atoi(cpustr) cpu, _ := strconv.Atoi(cpustr)
running, _ := strconv.ParseInt(linefields[7], 10, 64) running, _ := strconv.ParseInt(linefields[7], 10, 64)
waiting, _ := strconv.ParseInt(linefields[8], 10, 64) waiting, _ := strconv.ParseInt(linefields[8], 10, 64)
m.cputags[linefields[0]] = map[string]string{"type": "hwthread", "type-id": fmt.Sprintf("%d", cpu)} m.cputags[linefields[0]] = map[string]string{
m.olddata[linefields[0]] = map[string]int64{"running": running, "waiting": waiting} "type": "hwthread",
num_cpus++ "type-id": strconv.Itoa(cpu),
}
m.olddata[linefields[0]] = map[string]int64{
"running": running,
"waiting": waiting,
}
} }
} }
if err := file.Close(); err != nil {
return fmt.Errorf("%s Init(): Failed closing scheduler statistics file \"%s\": %w", m.name, SCHEDSTATFILE, err)
}
// Save current timestamp // Save current timestamp
m.lastTimestamp = time.Now() m.lastTimestamp = time.Now()
@@ -109,14 +120,14 @@ func (m *SchedstatCollector) ParseProcLine(linefields []string, tags map[string]
diff_running := running - m.olddata[linefields[0]]["running"] diff_running := running - m.olddata[linefields[0]]["running"]
diff_waiting := waiting - m.olddata[linefields[0]]["waiting"] diff_waiting := waiting - m.olddata[linefields[0]]["waiting"]
var l_running float64 = float64(diff_running) / tsdelta.Seconds() / (math.Pow(1000, 3)) l_running := float64(diff_running) / tsdelta.Seconds() / 1000_000_000
var l_waiting float64 = float64(diff_waiting) / tsdelta.Seconds() / (math.Pow(1000, 3)) l_waiting := float64(diff_waiting) / tsdelta.Seconds() / 1000_000_000
m.olddata[linefields[0]]["running"] = running m.olddata[linefields[0]]["running"] = running
m.olddata[linefields[0]]["waiting"] = waiting m.olddata[linefields[0]]["waiting"] = waiting
value := l_running + l_waiting value := l_running + l_waiting
y, err := lp.NewMessage("cpu_load_core", tags, m.meta, map[string]interface{}{"value": value}, now) y, err := lp.NewMetric("cpu_load_core", tags, m.meta, value, now)
if err == nil { if err == nil {
// Send it to output channel // Send it to output channel
output <- y output <- y
@@ -130,15 +141,23 @@ func (m *SchedstatCollector) Read(interval time.Duration, output chan lp.CCMessa
return return
} }
//timestamps // timestamps
now := time.Now() now := time.Now()
tsdelta := now.Sub(m.lastTimestamp) tsdelta := now.Sub(m.lastTimestamp)
file, err := os.Open(string(SCHEDSTATFILE)) file, err := os.Open(SCHEDSTATFILE)
if err != nil { if err != nil {
cclog.ComponentError(m.name, err.Error()) cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to open file '%s': %v", SCHEDSTATFILE, err))
} }
defer file.Close() defer func() {
if err := file.Close(); err != nil {
cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to close file '%s': %v", SCHEDSTATFILE, err))
}
}()
scanner := bufio.NewScanner(file) scanner := bufio.NewScanner(file)
for scanner.Scan() { for scanner.Scan() {
@@ -150,7 +169,6 @@ func (m *SchedstatCollector) Read(interval time.Duration, output chan lp.CCMessa
} }
m.lastTimestamp = now m.lastTimestamp = now
} }
// Close metric collector: close network connection, close files, close libraries, ... // Close metric collector: close network connection, close files, close libraries, ...

View File

@@ -8,13 +8,14 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"fmt"
"runtime" "runtime"
"syscall" "syscall"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
lp "github.com/ClusterCockpit/cc-lib/ccMessage"
) )
type SelfCollectorConfig struct { type SelfCollectorConfig struct {
@@ -26,6 +27,7 @@ type SelfCollectorConfig struct {
type SelfCollector struct { type SelfCollector struct {
metricCollector metricCollector
config SelfCollectorConfig // the configuration structure config SelfCollectorConfig // the configuration structure
meta map[string]string // default meta information meta map[string]string // default meta information
tags map[string]string // default tags tags map[string]string // default tags
@@ -34,15 +36,22 @@ type SelfCollector struct {
func (m *SelfCollector) Init(config json.RawMessage) error { func (m *SelfCollector) Init(config json.RawMessage) error {
var err error = nil var err error = nil
m.name = "SelfCollector" m.name = "SelfCollector"
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true m.parallel = true
m.meta = map[string]string{"source": m.name, "group": "Self"} m.meta = map[string]string{
m.tags = map[string]string{"type": "node"} "source": m.name,
"group": "Self",
}
m.tags = map[string]string{
"type": "node",
}
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
cclog.ComponentError(m.name, "Error reading config:", err.Error()) if err := d.Decode(&m.config); err != nil {
return err return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
m.init = true m.init = true
@@ -56,49 +65,49 @@ func (m *SelfCollector) Read(interval time.Duration, output chan lp.CCMessage) {
var memstats runtime.MemStats var memstats runtime.MemStats
runtime.ReadMemStats(&memstats) runtime.ReadMemStats(&memstats)
y, err := lp.NewMessage("total_alloc", m.tags, m.meta, map[string]interface{}{"value": memstats.TotalAlloc}, timestamp) y, err := lp.NewMetric("total_alloc", m.tags, m.meta, memstats.TotalAlloc, timestamp)
if err == nil { if err == nil {
y.AddMeta("unit", "Bytes") y.AddMeta("unit", "Bytes")
output <- y output <- y
} }
y, err = lp.NewMessage("heap_alloc", m.tags, m.meta, map[string]interface{}{"value": memstats.HeapAlloc}, timestamp) y, err = lp.NewMetric("heap_alloc", m.tags, m.meta, memstats.HeapAlloc, timestamp)
if err == nil { if err == nil {
y.AddMeta("unit", "Bytes") y.AddMeta("unit", "Bytes")
output <- y output <- y
} }
y, err = lp.NewMessage("heap_sys", m.tags, m.meta, map[string]interface{}{"value": memstats.HeapSys}, timestamp) y, err = lp.NewMetric("heap_sys", m.tags, m.meta, memstats.HeapSys, timestamp)
if err == nil { if err == nil {
y.AddMeta("unit", "Bytes") y.AddMeta("unit", "Bytes")
output <- y output <- y
} }
y, err = lp.NewMessage("heap_idle", m.tags, m.meta, map[string]interface{}{"value": memstats.HeapIdle}, timestamp) y, err = lp.NewMetric("heap_idle", m.tags, m.meta, memstats.HeapIdle, timestamp)
if err == nil { if err == nil {
y.AddMeta("unit", "Bytes") y.AddMeta("unit", "Bytes")
output <- y output <- y
} }
y, err = lp.NewMessage("heap_inuse", m.tags, m.meta, map[string]interface{}{"value": memstats.HeapInuse}, timestamp) y, err = lp.NewMetric("heap_inuse", m.tags, m.meta, memstats.HeapInuse, timestamp)
if err == nil { if err == nil {
y.AddMeta("unit", "Bytes") y.AddMeta("unit", "Bytes")
output <- y output <- y
} }
y, err = lp.NewMessage("heap_released", m.tags, m.meta, map[string]interface{}{"value": memstats.HeapReleased}, timestamp) y, err = lp.NewMetric("heap_released", m.tags, m.meta, memstats.HeapReleased, timestamp)
if err == nil { if err == nil {
y.AddMeta("unit", "Bytes") y.AddMeta("unit", "Bytes")
output <- y output <- y
} }
y, err = lp.NewMessage("heap_objects", m.tags, m.meta, map[string]interface{}{"value": memstats.HeapObjects}, timestamp) y, err = lp.NewMetric("heap_objects", m.tags, m.meta, memstats.HeapObjects, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if m.config.GoRoutines { if m.config.GoRoutines {
y, err := lp.NewMessage("num_goroutines", m.tags, m.meta, map[string]interface{}{"value": runtime.NumGoroutine()}, timestamp) y, err := lp.NewMetric("num_goroutines", m.tags, m.meta, runtime.NumGoroutine(), timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
} }
if m.config.CgoCalls { if m.config.CgoCalls {
y, err := lp.NewMessage("num_cgo_calls", m.tags, m.meta, map[string]interface{}{"value": runtime.NumCgoCall()}, timestamp) y, err := lp.NewMetric("num_cgo_calls", m.tags, m.meta, runtime.NumCgoCall(), timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
@@ -109,35 +118,35 @@ func (m *SelfCollector) Read(interval time.Duration, output chan lp.CCMessage) {
if err == nil { if err == nil {
sec, nsec := rusage.Utime.Unix() sec, nsec := rusage.Utime.Unix()
t := float64(sec) + (float64(nsec) * 1e-9) t := float64(sec) + (float64(nsec) * 1e-9)
y, err := lp.NewMessage("rusage_user_time", m.tags, m.meta, map[string]interface{}{"value": t}, timestamp) y, err := lp.NewMetric("rusage_user_time", m.tags, m.meta, t, timestamp)
if err == nil { if err == nil {
y.AddMeta("unit", "seconds") y.AddMeta("unit", "seconds")
output <- y output <- y
} }
sec, nsec = rusage.Stime.Unix() sec, nsec = rusage.Stime.Unix()
t = float64(sec) + (float64(nsec) * 1e-9) t = float64(sec) + (float64(nsec) * 1e-9)
y, err = lp.NewMessage("rusage_system_time", m.tags, m.meta, map[string]interface{}{"value": t}, timestamp) y, err = lp.NewMetric("rusage_system_time", m.tags, m.meta, t, timestamp)
if err == nil { if err == nil {
y.AddMeta("unit", "seconds") y.AddMeta("unit", "seconds")
output <- y output <- y
} }
y, err = lp.NewMessage("rusage_vol_ctx_switch", m.tags, m.meta, map[string]interface{}{"value": rusage.Nvcsw}, timestamp) y, err = lp.NewMetric("rusage_vol_ctx_switch", m.tags, m.meta, rusage.Nvcsw, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
y, err = lp.NewMessage("rusage_invol_ctx_switch", m.tags, m.meta, map[string]interface{}{"value": rusage.Nivcsw}, timestamp) y, err = lp.NewMetric("rusage_invol_ctx_switch", m.tags, m.meta, rusage.Nivcsw, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
y, err = lp.NewMessage("rusage_signals", m.tags, m.meta, map[string]interface{}{"value": rusage.Nsignals}, timestamp) y, err = lp.NewMetric("rusage_signals", m.tags, m.meta, rusage.Nsignals, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
y, err = lp.NewMessage("rusage_major_pgfaults", m.tags, m.meta, map[string]interface{}{"value": rusage.Majflt}, timestamp) y, err = lp.NewMetric("rusage_major_pgfaults", m.tags, m.meta, rusage.Majflt, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }
y, err = lp.NewMessage("rusage_minor_pgfaults", m.tags, m.meta, map[string]interface{}{"value": rusage.Minflt}, timestamp) y, err = lp.NewMetric("rusage_minor_pgfaults", m.tags, m.meta, rusage.Minflt, timestamp)
if err == nil { if err == nil {
output <- y output <- y
} }

View File

@@ -11,8 +11,8 @@ import (
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
type SlurmJobData struct { type SlurmJobData struct {
@@ -32,6 +32,7 @@ type SlurmCgroupsConfig struct {
type SlurmCgroupCollector struct { type SlurmCgroupCollector struct {
metricCollector metricCollector
config SlurmCgroupsConfig config SlurmCgroupsConfig
meta map[string]string meta map[string]string
tags map[string]string tags map[string]string
@@ -50,8 +51,7 @@ func ParseCPUs(cpuset string) ([]int, error) {
return result, nil return result, nil
} }
ranges := strings.Split(cpuset, ",") for r := range strings.SplitSeq(cpuset, ",") {
for _, r := range ranges {
if strings.Contains(r, "-") { if strings.Contains(r, "-") {
parts := strings.Split(r, "-") parts := strings.Split(r, "-")
if len(parts) != 2 { if len(parts) != 2 {
@@ -80,9 +80,10 @@ func ParseCPUs(cpuset string) ([]int, error) {
} }
func GetAllCPUs() ([]int, error) { func GetAllCPUs() ([]int, error) {
data, err := os.ReadFile("/sys/devices/system/cpu/online") cpuOnline := "/sys/devices/system/cpu/online"
data, err := os.ReadFile(cpuOnline)
if err != nil { if err != nil {
return nil, fmt.Errorf("failed to read /sys/devices/system/cpu/online: %v", err) return nil, fmt.Errorf("failed to read file \"%s\": %w", cpuOnline, err)
} }
return ParseCPUs(strings.TrimSpace(string(data))) return ParseCPUs(strings.TrimSpace(string(data)))
} }
@@ -103,18 +104,25 @@ func (m *SlurmCgroupCollector) readFile(path string) ([]byte, error) {
func (m *SlurmCgroupCollector) Init(config json.RawMessage) error { func (m *SlurmCgroupCollector) Init(config json.RawMessage) error {
var err error var err error
m.name = "SlurmCgroupCollector" m.name = "SlurmCgroupCollector"
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true m.parallel = true
m.meta = map[string]string{"source": m.name, "group": "SLURM"} m.meta = map[string]string{
m.tags = map[string]string{"type": "hwthread"} "source": m.name,
"group": "SLURM",
}
m.tags = map[string]string{
"type": "hwthread",
}
m.cpuUsed = make(map[int]bool) m.cpuUsed = make(map[int]bool)
m.cgroupBase = defaultCgroupBase m.cgroupBase = defaultCgroupBase
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(strings.NewReader(string(config)))
if err != nil { d.DisallowUnknownFields()
cclog.ComponentError(m.name, "Error reading config:", err.Error()) if err = d.Decode(&m.config); err != nil {
return err return fmt.Errorf("%s Init(): Error reading JSON config: %w", m.name, err)
} }
m.excludeMetrics = make(map[string]struct{}) m.excludeMetrics = make(map[string]struct{})
for _, metric := range m.config.ExcludeMetrics { for _, metric := range m.config.ExcludeMetrics {
@@ -129,19 +137,16 @@ func (m *SlurmCgroupCollector) Init(config json.RawMessage) error {
if !m.useSudo { if !m.useSudo {
user, err := user.Current() user, err := user.Current()
if err != nil { if err != nil {
cclog.ComponentError(m.name, "Failed to get current user:", err.Error()) return fmt.Errorf("%s Init(): Failed to get current user: %w", m.name, err)
return err
} }
if user.Uid != "0" { if user.Uid != "0" {
cclog.ComponentError(m.name, "Reading cgroup files requires root privileges (or enable use_sudo in config)") return fmt.Errorf("%s Init(): Reading cgroup files requires root privileges (or enable use_sudo in config)", m.name)
return fmt.Errorf("not root")
} }
} }
m.allCPUs, err = GetAllCPUs() m.allCPUs, err = GetAllCPUs()
if err != nil { if err != nil {
cclog.ComponentError(m.name, "Error reading online CPUs:", err.Error()) return fmt.Errorf("%s Init(): Error reading online CPUs: %w", m.name, err)
return err
} }
m.init = true m.init = true
@@ -158,7 +163,9 @@ func (m *SlurmCgroupCollector) ReadJobData(jobdir string) (SlurmJobData, error)
CpuSet: []int{}, CpuSet: []int{},
} }
cg := func(f string) string { return filepath.Join(m.cgroupBase, jobdir, f) } cg := func(f string) string {
return filepath.Join(m.cgroupBase, jobdir, f)
}
memUsage, err := m.readFile(cg("memory.current")) memUsage, err := m.readFile(cg("memory.current"))
if err == nil { if err == nil {
@@ -207,8 +214,8 @@ func (m *SlurmCgroupCollector) ReadJobData(jobdir string) (SlurmJobData, error)
} }
} }
if usageUsec > 0 { if usageUsec > 0 {
jobdata.CpuUsageUser = (userUsec * 100 / usageUsec) jobdata.CpuUsageUser = (userUsec * 100.0 / usageUsec)
jobdata.CpuUsageSys = (systemUsec * 100 / usageUsec) jobdata.CpuUsageSys = (systemUsec * 100.0 / usageUsec)
} }
} }
@@ -251,12 +258,19 @@ func (m *SlurmCgroupCollector) Read(interval time.Duration, output chan lp.CCMes
for _, cpu := range jobdata.CpuSet { for _, cpu := range jobdata.CpuSet {
coreTags := map[string]string{ coreTags := map[string]string{
"type": "hwthread", "type": "hwthread",
"type-id": fmt.Sprintf("%d", cpu), "type-id": strconv.Itoa(cpu),
} }
if coreCount > 0 && !m.isExcluded("job_mem_used") { if coreCount > 0 && !m.isExcluded("job_mem_used") {
memPerCore := jobdata.MemoryUsage / coreCount memPerCore := jobdata.MemoryUsage / coreCount
if y, err := lp.NewMessage("job_mem_used", coreTags, m.meta, map[string]interface{}{"value": memPerCore}, timestamp); err == nil { if y, err := lp.NewMessage(
"job_mem_used",
coreTags,
m.meta,
map[string]any{
"value": memPerCore,
},
timestamp); err == nil {
y.AddMeta("unit", "Bytes") y.AddMeta("unit", "Bytes")
output <- y output <- y
} }
@@ -264,7 +278,14 @@ func (m *SlurmCgroupCollector) Read(interval time.Duration, output chan lp.CCMes
if coreCount > 0 && !m.isExcluded("job_max_mem_used") { if coreCount > 0 && !m.isExcluded("job_max_mem_used") {
maxMemPerCore := jobdata.MaxMemoryUsage / coreCount maxMemPerCore := jobdata.MaxMemoryUsage / coreCount
if y, err := lp.NewMessage("job_max_mem_used", coreTags, m.meta, map[string]interface{}{"value": maxMemPerCore}, timestamp); err == nil { if y, err := lp.NewMessage(
"job_max_mem_used",
coreTags,
m.meta,
map[string]any{
"value": maxMemPerCore,
},
timestamp); err == nil {
y.AddMeta("unit", "Bytes") y.AddMeta("unit", "Bytes")
output <- y output <- y
} }
@@ -272,7 +293,14 @@ func (m *SlurmCgroupCollector) Read(interval time.Duration, output chan lp.CCMes
if coreCount > 0 && !m.isExcluded("job_mem_limit") { if coreCount > 0 && !m.isExcluded("job_mem_limit") {
limitPerCore := jobdata.LimitMemoryUsage / coreCount limitPerCore := jobdata.LimitMemoryUsage / coreCount
if y, err := lp.NewMessage("job_mem_limit", coreTags, m.meta, map[string]interface{}{"value": limitPerCore}, timestamp); err == nil { if y, err := lp.NewMessage(
"job_mem_limit",
coreTags,
m.meta,
map[string]any{
"value": limitPerCore,
},
timestamp); err == nil {
y.AddMeta("unit", "Bytes") y.AddMeta("unit", "Bytes")
output <- y output <- y
} }
@@ -280,7 +308,14 @@ func (m *SlurmCgroupCollector) Read(interval time.Duration, output chan lp.CCMes
if coreCount > 0 && !m.isExcluded("job_user_cpu") { if coreCount > 0 && !m.isExcluded("job_user_cpu") {
cpuUserPerCore := jobdata.CpuUsageUser / coreCount cpuUserPerCore := jobdata.CpuUsageUser / coreCount
if y, err := lp.NewMessage("job_user_cpu", coreTags, m.meta, map[string]interface{}{"value": cpuUserPerCore}, timestamp); err == nil { if y, err := lp.NewMessage(
"job_user_cpu",
coreTags,
m.meta,
map[string]any{
"value": cpuUserPerCore,
},
timestamp); err == nil {
y.AddMeta("unit", "%") y.AddMeta("unit", "%")
output <- y output <- y
} }
@@ -288,7 +323,14 @@ func (m *SlurmCgroupCollector) Read(interval time.Duration, output chan lp.CCMes
if coreCount > 0 && !m.isExcluded("job_sys_cpu") { if coreCount > 0 && !m.isExcluded("job_sys_cpu") {
cpuSysPerCore := jobdata.CpuUsageSys / coreCount cpuSysPerCore := jobdata.CpuUsageSys / coreCount
if y, err := lp.NewMessage("job_sys_cpu", coreTags, m.meta, map[string]interface{}{"value": cpuSysPerCore}, timestamp); err == nil { if y, err := lp.NewMessage(
"job_sys_cpu",
coreTags,
m.meta,
map[string]any{
"value": cpuSysPerCore,
},
timestamp); err == nil {
y.AddMeta("unit", "%") y.AddMeta("unit", "%")
output <- y output <- y
} }
@@ -303,39 +345,60 @@ func (m *SlurmCgroupCollector) Read(interval time.Duration, output chan lp.CCMes
if !m.cpuUsed[cpu] { if !m.cpuUsed[cpu] {
coreTags := map[string]string{ coreTags := map[string]string{
"type": "hwthread", "type": "hwthread",
"type-id": fmt.Sprintf("%d", cpu), "type-id": strconv.Itoa(cpu),
} }
if !m.isExcluded("job_mem_used") { if !m.isExcluded("job_mem_used") {
if y, err := lp.NewMessage("job_mem_used", coreTags, m.meta, map[string]interface{}{"value": 0}, timestamp); err == nil { if y, err := lp.NewMessage(
"job_mem_used",
coreTags,
m.meta,
map[string]any{
"value": 0,
},
timestamp); err == nil {
y.AddMeta("unit", "Bytes") y.AddMeta("unit", "Bytes")
output <- y output <- y
} }
} }
if !m.isExcluded("job_max_mem_used") { if !m.isExcluded("job_max_mem_used") {
if y, err := lp.NewMessage("job_max_mem_used", coreTags, m.meta, map[string]interface{}{"value": 0}, timestamp); err == nil { if y, err := lp.NewMessage(
"job_max_mem_used",
coreTags,
m.meta,
map[string]any{
"value": 0,
},
timestamp); err == nil {
y.AddMeta("unit", "Bytes") y.AddMeta("unit", "Bytes")
output <- y output <- y
} }
} }
if !m.isExcluded("job_mem_limit") { if !m.isExcluded("job_mem_limit") {
if y, err := lp.NewMessage("job_mem_limit", coreTags, m.meta, map[string]interface{}{"value": 0}, timestamp); err == nil { if y, err := lp.NewMessage(
"job_mem_limit",
coreTags,
m.meta,
map[string]any{
"value": 0,
},
timestamp); err == nil {
y.AddMeta("unit", "Bytes") y.AddMeta("unit", "Bytes")
output <- y output <- y
} }
} }
if !m.isExcluded("job_user_cpu") { if !m.isExcluded("job_user_cpu") {
if y, err := lp.NewMessage("job_user_cpu", coreTags, m.meta, map[string]interface{}{"value": 0}, timestamp); err == nil { if y, err := lp.NewMessage("job_user_cpu", coreTags, m.meta, map[string]any{"value": 0}, timestamp); err == nil {
y.AddMeta("unit", "%") y.AddMeta("unit", "%")
output <- y output <- y
} }
} }
if !m.isExcluded("job_sys_cpu") { if !m.isExcluded("job_sys_cpu") {
if y, err := lp.NewMessage("job_sys_cpu", coreTags, m.meta, map[string]interface{}{"value": 0}, timestamp); err == nil { if y, err := lp.NewMessage("job_sys_cpu", coreTags, m.meta, map[string]any{"value": 0}, timestamp); err == nil {
y.AddMeta("unit", "%") y.AddMeta("unit", "%")
output <- y output <- y
} }

View File

@@ -0,0 +1,360 @@
package collectors
import (
"bytes"
"encoding/json"
"fmt"
"os/exec"
"slices"
"time"
cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
)
type SmartMonCollectorConfig struct {
UseSudo bool `json:"use_sudo,omitempty"`
ExcludeDevices []string `json:"exclude_devices,omitempty"`
ExcludeMetrics []string `json:"excludeMetrics,omitempty"`
Devices []struct {
Name string `json:"name"`
Type string `json:"type"`
} `json:"devices,omitempty"`
}
type deviceT struct {
Name string `json:"name"`
Type string `json:"type"`
queryCommand []string
}
type SmartMonCollector struct {
metricCollector
config SmartMonCollectorConfig // the configuration structure
meta map[string]string // default meta information
tags map[string]string // default tags
devices []deviceT // smartmon devices
sudoCmd string // Full path to 'sudo' command
smartCtlCmd string // Full path to 'smartctl' command
excludeMetric struct {
temp,
percentUsed,
availSpare,
dataUnitsRead,
dataUnitsWrite,
hostReads,
hostWrites,
powerCycles,
powerOn,
UnsafeShutdowns,
mediaErrors,
errlogEntries,
warnTempTime,
critCompTime bool
}
}
func (m *SmartMonCollector) getSmartmonDevices() error {
// Use configured devices
if len(m.config.Devices) > 0 {
for _, configDevice := range m.config.Devices {
if !slices.Contains(m.config.ExcludeDevices, configDevice.Name) {
d := deviceT{
Name: configDevice.Name,
Type: configDevice.Type,
}
if m.config.UseSudo {
d.queryCommand = append(d.queryCommand, m.sudoCmd)
}
d.queryCommand = append(d.queryCommand, m.smartCtlCmd, "--json=c", "--device="+d.Type, "--all", d.Name)
m.devices = append(m.devices, d)
}
}
return nil
}
// Use scan command
var scanCmd []string
if m.config.UseSudo {
scanCmd = append(scanCmd, m.sudoCmd)
}
scanCmd = append(scanCmd, m.smartCtlCmd, "--scan", "--json=c")
command := exec.Command(scanCmd[0], scanCmd[1:]...)
stdout, err := command.Output()
if err != nil {
return fmt.Errorf(
"%s getSmartmonDevices(): Failed to execute device scan command %s: %w",
m.name, command.String(), err)
}
var scanOutput struct {
Devices []deviceT `json:"devices"`
}
err = json.Unmarshal(stdout, &scanOutput)
if err != nil {
return fmt.Errorf("%s getSmartmonDevices(): Failed to parse JSON output from device scan command: %w",
m.name, err)
}
m.devices = make([]deviceT, 0)
for _, d := range scanOutput.Devices {
if !slices.Contains(m.config.ExcludeDevices, d.Name) {
if m.config.UseSudo {
d.queryCommand = append(d.queryCommand, m.sudoCmd)
}
d.queryCommand = append(d.queryCommand, m.smartCtlCmd, "--json=c", "--device="+d.Type, "--all", d.Name)
m.devices = append(m.devices, d)
}
}
return nil
}
func (m *SmartMonCollector) Init(config json.RawMessage) error {
m.name = "SmartMonCollector"
if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
m.parallel = true
m.meta = map[string]string{
"source": m.name,
"group": "Disk",
}
m.tags = map[string]string{
"type": "node",
"stype": "disk",
}
// Read in the JSON configuration
if len(config) > 0 {
d := json.NewDecoder(bytes.NewReader(config))
d.DisallowUnknownFields()
if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error reading config: %w", m.name, err)
}
}
for _, excludeMetric := range m.config.ExcludeMetrics {
switch excludeMetric {
case "smartmon_temp":
m.excludeMetric.temp = true
case "smartmon_percent_used":
m.excludeMetric.percentUsed = true
case "smartmon_avail_spare":
m.excludeMetric.availSpare = true
case "smartmon_data_units_read":
m.excludeMetric.dataUnitsRead = true
case "smartmon_data_units_write":
m.excludeMetric.dataUnitsWrite = true
case "smartmon_host_reads":
m.excludeMetric.hostReads = true
case "smartmon_host_writes":
m.excludeMetric.hostWrites = true
case "smartmon_power_cycles":
m.excludeMetric.powerCycles = true
case "smartmon_power_on":
m.excludeMetric.powerOn = true
case "smartmon_unsafe_shutdowns":
m.excludeMetric.UnsafeShutdowns = true
case "smartmon_media_errors":
m.excludeMetric.mediaErrors = true
case "smartmon_errlog_entries":
m.excludeMetric.errlogEntries = true
case "smartmon_warn_temp_time":
m.excludeMetric.warnTempTime = true
case "smartmon_crit_comp_time":
m.excludeMetric.critCompTime = true
default:
return fmt.Errorf("%s Init(): Unknown excluded metric: %s", m.name, excludeMetric)
}
}
// Check if sudo and smartctl are in search path
if m.config.UseSudo {
p, err := exec.LookPath("sudo")
if err != nil {
return fmt.Errorf("%s Init(): No sudo command found in search path: %w", m.name, err)
}
m.sudoCmd = p
}
p, err := exec.LookPath("smartctl")
if err != nil {
return fmt.Errorf("%s Init(): No smartctl command found in search path: %w", m.name, err)
}
m.smartCtlCmd = p
if err = m.getSmartmonDevices(); err != nil {
return err
}
m.init = true
return err
}
type SmartMonData struct {
SerialNumber string `json:"serial_number"`
UserCapacity struct {
Blocks int `json:"blocks"`
Bytes int `json:"bytes"`
} `json:"user_capacity"`
HealthLog struct {
// Available SMART health information:
// sudo smartctl -a --json=c /dev/nvme0 | jq --color-output | less --RAW-CONTROL-CHARS
Temperature int `json:"temperature"`
PercentageUsed int `json:"percentage_used"`
AvailableSpare int `json:"available_spare"`
DataUnitsRead int `json:"data_units_read"`
DataUnitsWrite int `json:"data_units_written"`
HostReads int `json:"host_reads"`
HostWrites int `json:"host_writes"`
PowerCycles int `json:"power_cycles"`
PowerOnHours int `json:"power_on_hours"`
UnsafeShutdowns int `json:"unsafe_shutdowns"`
MediaErrors int `json:"media_errors"`
NumErrorLogEntries int `json:"num_err_log_entries"`
WarnTempTime int `json:"warning_temp_time"`
CriticalCompTime int `json:"critical_comp_time"`
} `json:"nvme_smart_health_information_log"`
}
func (m *SmartMonCollector) Read(interval time.Duration, output chan lp.CCMessage) {
timestamp := time.Now()
for _, d := range m.devices {
var data SmartMonData
command := exec.Command(d.queryCommand[0], d.queryCommand[1:]...)
stdout, err := command.Output()
if err != nil {
cclog.ComponentError(m.name, "cannot read data for device", d.Name)
continue
}
err = json.Unmarshal(stdout, &data)
if err != nil {
cclog.ComponentError(m.name, "cannot unmarshal data for device", d.Name)
continue
}
if !m.excludeMetric.temp {
y, err := lp.NewMetric(
"smartmon_temp", m.tags, m.meta, data.HealthLog.Temperature, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
y.AddMeta("unit", "degC")
output <- y
}
}
if !m.excludeMetric.percentUsed {
y, err := lp.NewMetric(
"smartmon_percent_used", m.tags, m.meta, data.HealthLog.PercentageUsed, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
y.AddMeta("unit", "percent")
output <- y
}
}
if !m.excludeMetric.availSpare {
y, err := lp.NewMetric(
"smartmon_avail_spare", m.tags, m.meta, data.HealthLog.AvailableSpare, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
y.AddMeta("unit", "percent")
output <- y
}
}
if !m.excludeMetric.dataUnitsRead {
y, err := lp.NewMetric(
"smartmon_data_units_read", m.tags, m.meta, data.HealthLog.DataUnitsRead, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
output <- y
}
}
if !m.excludeMetric.dataUnitsWrite {
y, err := lp.NewMetric(
"smartmon_data_units_write", m.tags, m.meta, data.HealthLog.DataUnitsWrite, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
output <- y
}
}
if !m.excludeMetric.hostReads {
y, err := lp.NewMetric(
"smartmon_host_reads", m.tags, m.meta, data.HealthLog.HostReads, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
output <- y
}
}
if !m.excludeMetric.hostWrites {
y, err := lp.NewMetric(
"smartmon_host_writes", m.tags, m.meta, data.HealthLog.HostWrites, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
output <- y
}
}
if !m.excludeMetric.powerCycles {
y, err := lp.NewMetric(
"smartmon_power_cycles", m.tags, m.meta, data.HealthLog.PowerCycles, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
output <- y
}
}
if !m.excludeMetric.powerOn {
y, err := lp.NewMetric(
"smartmon_power_on", m.tags, m.meta, int64(data.HealthLog.PowerOnHours)*3600, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
y.AddMeta("unit", "sec")
output <- y
}
}
if !m.excludeMetric.UnsafeShutdowns {
y, err := lp.NewMetric(
"smartmon_unsafe_shutdowns", m.tags, m.meta, data.HealthLog.UnsafeShutdowns, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
output <- y
}
}
if !m.excludeMetric.mediaErrors {
y, err := lp.NewMetric(
"smartmon_media_errors", m.tags, m.meta, data.HealthLog.MediaErrors, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
output <- y
}
}
if !m.excludeMetric.errlogEntries {
y, err := lp.NewMetric(
"smartmon_errlog_entries", m.tags, m.meta, data.HealthLog.NumErrorLogEntries, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
output <- y
}
}
if !m.excludeMetric.warnTempTime {
y, err := lp.NewMetric(
"smartmon_warn_temp_time", m.tags, m.meta, data.HealthLog.WarnTempTime, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
output <- y
}
}
if !m.excludeMetric.critCompTime {
y, err := lp.NewMetric(
"smartmon_crit_comp_time", m.tags, m.meta, data.HealthLog.CriticalCompTime, timestamp)
if err == nil {
y.AddTag("stype-id", d.Name)
output <- y
}
}
}
}
func (m *SmartMonCollector) Close() {
m.init = false
}

View File

@@ -0,0 +1,52 @@
<!--
---
title: smartmon metric collector
description: Collect S.M.A.R.T data from NVMEs
categories: [cc-metric-collector]
tags: ['Admin']
weight: 2
hugo_path: docs/reference/cc-metric-collector/collectors/smartmonMetric.md
---
-->
## `smartmon` collector
```json
"smartmon": {
"use_sudo": true,
"exclude_devices": [
"/dev/sda"
],
"excludeMetrics": [
"smartmon_warn_temp_time",
"smartmon_crit_comp_time"
],
"devices": [
{
"name": "/dev/nvme0",
"type": "nvme"
}
]
}
```
The `smartmon` collector retrieves S.M.A.R.T data from NVMEs via command `smartctl`.
Available NVMEs can be either automatically detected by a device scan or manually added with the "devices" config option.
Metrics:
* `smartmon_temp`: Temperature of the device (`unit=degC`)
* `smartmon_avail_spare`: Amount of spare left (`unit=percent`)
* `smartmon_percent_used`: Percentage of the device is used (`unit=percent`)
* `smartmon_data_units_read`: Read data units
* `smartmon_data_units_write`: Written data units
* `smartmon_host_reads`: Read operations
* `smartmon_host_writes`: Write operations
* `smartmon_power_cycles`: Number of power cycles
* `smartmon_power_on`: Seconds the device is powered on (`unit=seconds`)
* `smartmon_unsafe_shutdowns`: Count of unsafe shutdowns
* `smartmon_media_errors`: Media errors of the device
* `smartmon_errlog_entries`: Error log entries
* `smartmon_warn_temp_time`: Time above the warning temperature threshold
* `smartmon_crit_comp_time`: Time above the critical composite temperature threshold

View File

@@ -8,6 +8,7 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"fmt" "fmt"
"os" "os"
@@ -16,8 +17,8 @@ import (
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
// See: https://www.kernel.org/doc/html/latest/hwmon/sysfs-interface.html // See: https://www.kernel.org/doc/html/latest/hwmon/sysfs-interface.html
@@ -41,6 +42,7 @@ type TempCollectorSensor struct {
type TempCollector struct { type TempCollector struct {
metricCollector metricCollector
config struct { config struct {
ExcludeMetrics []string `json:"exclude_metrics"` ExcludeMetrics []string `json:"exclude_metrics"`
TagOverride map[string]map[string]string `json:"tag_override"` TagOverride map[string]map[string]string `json:"tag_override"`
@@ -58,11 +60,14 @@ func (m *TempCollector) Init(config json.RawMessage) error {
m.name = "TempCollector" m.name = "TempCollector"
m.parallel = true m.parallel = true
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
if len(config) > 0 { if len(config) > 0 {
err := json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} }
@@ -78,10 +83,10 @@ func (m *TempCollector) Init(config json.RawMessage) error {
globPattern := filepath.Join("/sys/class/hwmon", "*", "temp*_input") globPattern := filepath.Join("/sys/class/hwmon", "*", "temp*_input")
inputFiles, err := filepath.Glob(globPattern) inputFiles, err := filepath.Glob(globPattern)
if err != nil { if err != nil {
return fmt.Errorf("unable to glob files with pattern '%s': %v", globPattern, err) return fmt.Errorf("%s Init(): unable to glob files with pattern '%s': %w", m.name, globPattern, err)
} }
if inputFiles == nil { if inputFiles == nil {
return fmt.Errorf("unable to find any files with pattern '%s'", globPattern) return fmt.Errorf("%s Init(): unable to find any files with pattern '%s'", m.name, globPattern)
} }
// Get sensor name for each temperature sensor file // Get sensor name for each temperature sensor file
@@ -117,7 +122,7 @@ func (m *TempCollector) Init(config json.RawMessage) error {
sensor.metricName = sensor.label sensor.metricName = sensor.label
} }
sensor.metricName = strings.ToLower(sensor.metricName) sensor.metricName = strings.ToLower(sensor.metricName)
sensor.metricName = strings.Replace(sensor.metricName, " ", "_", -1) sensor.metricName = strings.ReplaceAll(sensor.metricName, " ", "_")
// Add temperature prefix, if required // Add temperature prefix, if required
if !strings.Contains(sensor.metricName, "temp") { if !strings.Contains(sensor.metricName, "temp") {
sensor.metricName = "temp_" + sensor.metricName sensor.metricName = "temp_" + sensor.metricName
@@ -170,7 +175,7 @@ func (m *TempCollector) Init(config json.RawMessage) error {
// Empty sensors map // Empty sensors map
if len(m.sensors) == 0 { if len(m.sensors) == 0 {
return fmt.Errorf("no temperature sensors found") return fmt.Errorf("%s Init(): no temperature sensors found", m.name)
} }
// Finished initialization // Finished initialization
@@ -179,7 +184,6 @@ func (m *TempCollector) Init(config json.RawMessage) error {
} }
func (m *TempCollector) Read(interval time.Duration, output chan lp.CCMessage) { func (m *TempCollector) Read(interval time.Duration, output chan lp.CCMessage) {
for _, sensor := range m.sensors { for _, sensor := range m.sensors {
// Read sensor file // Read sensor file
buffer, err := os.ReadFile(sensor.file) buffer, err := os.ReadFile(sensor.file)
@@ -201,7 +205,7 @@ func (m *TempCollector) Read(interval time.Duration, output chan lp.CCMessage) {
sensor.metricName, sensor.metricName,
sensor.tags, sensor.tags,
m.meta, m.meta,
map[string]interface{}{"value": x}, map[string]any{"value": x},
time.Now(), time.Now(),
) )
if err == nil { if err == nil {
@@ -214,7 +218,7 @@ func (m *TempCollector) Read(interval time.Duration, output chan lp.CCMessage) {
sensor.maxTempName, sensor.maxTempName,
sensor.tags, sensor.tags,
m.meta, m.meta,
map[string]interface{}{"value": sensor.maxTemp}, map[string]any{"value": sensor.maxTemp},
time.Now(), time.Now(),
) )
if err == nil { if err == nil {
@@ -228,7 +232,7 @@ func (m *TempCollector) Read(interval time.Duration, output chan lp.CCMessage) {
sensor.critTempName, sensor.critTempName,
sensor.tags, sensor.tags,
m.meta, m.meta,
map[string]interface{}{"value": sensor.critTemp}, map[string]any{"value": sensor.critTemp},
time.Now(), time.Now(),
) )
if err == nil { if err == nil {
@@ -236,7 +240,6 @@ func (m *TempCollector) Read(interval time.Duration, output chan lp.CCMessage) {
} }
} }
} }
} }
func (m *TempCollector) Close() { func (m *TempCollector) Close() {

View File

@@ -14,10 +14,10 @@ hugo_path: docs/reference/cc-metric-collector/collectors/temp.md
```json ```json
"tempstat": { "tempstat": {
"tag_override" : { "tag_override": {
"<device like hwmon1>" : { "<device like hwmon1>": {
"type" : "socket", "type": "socket",
"type-id" : "0" "type-id": "0"
} }
}, },
"exclude_metrics": [ "exclude_metrics": [

View File

@@ -8,19 +8,21 @@
package collectors package collectors
import ( import (
"bytes"
"encoding/json" "encoding/json"
"errors"
"fmt" "fmt"
"log"
"os/exec" "os/exec"
"strings" "strings"
"time" "time"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
) )
const MAX_NUM_PROCS = 10 const (
const DEFAULT_NUM_PROCS = 2 MAX_NUM_PROCS = 10
DEFAULT_NUM_PROCS = 2
)
type TopProcsCollectorConfig struct { type TopProcsCollectorConfig struct {
Num_procs int `json:"num_procs"` Num_procs int `json:"num_procs"`
@@ -28,6 +30,7 @@ type TopProcsCollectorConfig struct {
type TopProcsCollector struct { type TopProcsCollector struct {
metricCollector metricCollector
tags map[string]string tags map[string]string
config TopProcsCollectorConfig config TopProcsCollectorConfig
} }
@@ -36,12 +39,18 @@ func (m *TopProcsCollector) Init(config json.RawMessage) error {
var err error var err error
m.name = "TopProcsCollector" m.name = "TopProcsCollector"
m.parallel = true m.parallel = true
m.tags = map[string]string{"type": "node"} m.tags = map[string]string{
m.meta = map[string]string{"source": m.name, "group": "TopProcs"} "type": "node",
}
m.meta = map[string]string{
"source": m.name,
"group": "TopProcs",
}
if len(config) > 0 { if len(config) > 0 {
err = json.Unmarshal(config, &m.config) d := json.NewDecoder(bytes.NewReader(config))
if err != nil { d.DisallowUnknownFields()
return err if err := d.Decode(&m.config); err != nil {
return fmt.Errorf("%s Init(): Error decoding JSON config: %w", m.name, err)
} }
} else { } else {
m.config.Num_procs = int(DEFAULT_NUM_PROCS) m.config.Num_procs = int(DEFAULT_NUM_PROCS)
@@ -49,12 +58,13 @@ func (m *TopProcsCollector) Init(config json.RawMessage) error {
if m.config.Num_procs <= 0 || m.config.Num_procs > MAX_NUM_PROCS { if m.config.Num_procs <= 0 || m.config.Num_procs > MAX_NUM_PROCS {
return fmt.Errorf("num_procs option must be set in 'topprocs' config (range: 1-%d)", MAX_NUM_PROCS) return fmt.Errorf("num_procs option must be set in 'topprocs' config (range: 1-%d)", MAX_NUM_PROCS)
} }
m.setup() if err := m.setup(); err != nil {
return fmt.Errorf("%s Init(): setup() call failed: %w", m.name, err)
}
command := exec.Command("ps", "-Ao", "comm", "--sort=-pcpu") command := exec.Command("ps", "-Ao", "comm", "--sort=-pcpu")
command.Wait()
_, err = command.Output() _, err = command.Output()
if err != nil { if err != nil {
return errors.New("failed to execute command") return fmt.Errorf("%s Init(): failed to get output from command: %w", m.name, err)
} }
m.init = true m.init = true
return nil return nil
@@ -65,17 +75,25 @@ func (m *TopProcsCollector) Read(interval time.Duration, output chan lp.CCMessag
return return
} }
command := exec.Command("ps", "-Ao", "comm", "--sort=-pcpu") command := exec.Command("ps", "-Ao", "comm", "--sort=-pcpu")
command.Wait()
stdout, err := command.Output() stdout, err := command.Output()
if err != nil { if err != nil {
log.Print(m.name, err) cclog.ComponentError(
m.name,
fmt.Sprintf("Read(): Failed to read output from command \"%s\": %v", command.String(), err))
return return
} }
lines := strings.Split(string(stdout), "\n") lines := strings.Split(string(stdout), "\n")
for i := 1; i < m.config.Num_procs+1; i++ { for i := 1; i < m.config.Num_procs+1; i++ {
name := fmt.Sprintf("topproc%d", i) name := fmt.Sprintf("topproc%d", i)
y, err := lp.NewMessage(name, m.tags, m.meta, map[string]interface{}{"value": string(lines[i])}, time.Now()) y, err := lp.NewMessage(
name,
m.tags,
m.meta,
map[string]any{
"value": lines[i],
},
time.Now())
if err == nil { if err == nil {
output <- y output <- y
} }

View File

@@ -1,6 +1,19 @@
{ {
"cpufreq": {}, "cpufreq": {},
"cpufreq_cpuinfo": {}, "cpufreq_cpuinfo": {},
"cpustat": {
"exclude_metrics": [
"cpu_idle"
]
},
"diskstat": {
"exclude_metrics": [
"disk_total"
],
"exclude_mounts": [
"slurm-tmpfs"
]
},
"gpfs": { "gpfs": {
"exclude_filesystem": [ "exclude_filesystem": [
"test_fs" "test_fs"
@@ -21,6 +34,8 @@
}, },
"numastats": {}, "numastats": {},
"nvidia": {}, "nvidia": {},
"schedstat": {},
"smartmon": {},
"tempstat": { "tempstat": {
"report_max_temperature": true, "report_max_temperature": true,
"report_critical_temperature": true, "report_critical_temperature": true,
@@ -38,4 +53,4 @@
"topprocs": { "topprocs": {
"num_procs": 5 "num_procs": 5
} }
} }

View File

@@ -1,6 +1,6 @@
{ {
"process_messages" : { "process_messages" : {
"add_tag_if": [ "add_tags_if": [
{ {
"key" : "cluster", "key" : "cluster",
"value" : "testcluster", "value" : "testcluster",
@@ -12,7 +12,7 @@
"if" : "name == 'temp_package_id_0'" "if" : "name == 'temp_package_id_0'"
} }
], ],
"delete_tag_if": [ "delete_meta_if": [
{ {
"key" : "unit", "key" : "unit",
"if" : "true" "if" : "true"

37
go.mod
View File

@@ -1,45 +1,44 @@
module github.com/ClusterCockpit/cc-metric-collector module github.com/ClusterCockpit/cc-metric-collector
go 1.24.0 go 1.25.0
require ( require (
github.com/ClusterCockpit/cc-lib v0.11.0 github.com/ClusterCockpit/cc-lib/v2 v2.8.2
github.com/ClusterCockpit/go-rocm-smi v0.3.0 github.com/ClusterCockpit/go-rocm-smi v0.3.0
github.com/NVIDIA/go-nvml v0.13.0-1 github.com/NVIDIA/go-nvml v0.13.0-1
github.com/PaesslerAG/gval v1.2.4 github.com/PaesslerAG/gval v1.2.4
github.com/fsnotify/fsnotify v1.9.0 github.com/fsnotify/fsnotify v1.9.0
github.com/influxdata/line-protocol v0.0.0-20210922203350-b1ad95c89adf
github.com/tklauser/go-sysconf v0.3.16 github.com/tklauser/go-sysconf v0.3.16
golang.design/x/thread v0.0.0-20210122121316-335e9adffdf1 golang.design/x/thread v0.0.0-20210122121316-335e9adffdf1
golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b golang.org/x/sys v0.42.0
golang.org/x/sys v0.38.0
) )
require ( require (
github.com/ClusterCockpit/cc-line-protocol/v2 v2.4.0 // indirect
github.com/apapsch/go-jsonmerge/v2 v2.0.0 // indirect github.com/apapsch/go-jsonmerge/v2 v2.0.0 // indirect
github.com/beorn7/perks v1.0.1 // indirect github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/expr-lang/expr v1.17.6 // indirect github.com/expr-lang/expr v1.17.8 // indirect
github.com/google/uuid v1.6.0 // indirect github.com/google/uuid v1.6.0 // indirect
github.com/gorilla/mux v1.8.1 // indirect github.com/gorilla/mux v1.8.1 // indirect
github.com/influxdata/influxdb-client-go/v2 v2.14.0 // indirect github.com/influxdata/influxdb-client-go/v2 v2.14.0 // indirect
github.com/influxdata/line-protocol/v2 v2.2.1 // indirect github.com/influxdata/line-protocol v0.0.0-20210922203350-b1ad95c89adf // indirect
github.com/klauspost/compress v1.18.2 // indirect github.com/klauspost/compress v1.18.4 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/nats-io/nats.go v1.47.0 // indirect github.com/nats-io/nats.go v1.49.0 // indirect
github.com/nats-io/nkeys v0.4.11 // indirect github.com/nats-io/nkeys v0.4.15 // indirect
github.com/nats-io/nuid v1.0.1 // indirect github.com/nats-io/nuid v1.0.1 // indirect
github.com/oapi-codegen/runtime v1.1.1 // indirect github.com/oapi-codegen/runtime v1.2.0 // indirect
github.com/prometheus/client_golang v1.23.2 // indirect github.com/prometheus/client_golang v1.23.2 // indirect
github.com/prometheus/client_model v0.6.2 // indirect github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.66.1 // indirect github.com/prometheus/common v0.67.5 // indirect
github.com/prometheus/procfs v0.16.1 // indirect github.com/prometheus/procfs v0.20.0 // indirect
github.com/santhosh-tekuri/jsonschema/v5 v5.3.1 // indirect github.com/santhosh-tekuri/jsonschema/v5 v5.3.1 // indirect
github.com/shopspring/decimal v1.3.1 // indirect github.com/shopspring/decimal v1.4.0 // indirect
github.com/stmcginnis/gofish v0.20.0 // indirect github.com/stmcginnis/gofish v0.21.3 // indirect
github.com/tklauser/numcpus v0.11.0 // indirect github.com/tklauser/numcpus v0.11.0 // indirect
go.yaml.in/yaml/v2 v2.4.2 // indirect go.yaml.in/yaml/v2 v2.4.3 // indirect
golang.org/x/crypto v0.43.0 // indirect golang.org/x/crypto v0.48.0 // indirect
golang.org/x/net v0.45.0 // indirect golang.org/x/net v0.51.0 // indirect
google.golang.org/protobuf v1.36.8 // indirect google.golang.org/protobuf v1.36.11 // indirect
) )

96
go.sum
View File

@@ -1,5 +1,9 @@
github.com/ClusterCockpit/cc-lib v0.11.0 h1:66YkTOxWUak7nB3r7dJEm2q+B0uPRPGj0mwXZHXpOuA= github.com/ClusterCockpit/cc-lib/v2 v2.8.0 h1:ROduRzRuusi+6kLB991AAu3Pp2AHOasQJFJc7JU/n/E=
github.com/ClusterCockpit/cc-lib v0.11.0/go.mod h1:0LKjDJs813/NMmaSJXJc11A9rxiFDPV/QdWQbZUp0XY= github.com/ClusterCockpit/cc-lib/v2 v2.8.0/go.mod h1:FwD8vnTIbBM3ngeLNKmCvp9FoSjQZm7xnuaVxEKR23o=
github.com/ClusterCockpit/cc-lib/v2 v2.8.2 h1:rCLZk8wz8yq8xBnBEdVKigvA2ngR8dPmHbEFwxxb3jw=
github.com/ClusterCockpit/cc-lib/v2 v2.8.2/go.mod h1:FwD8vnTIbBM3ngeLNKmCvp9FoSjQZm7xnuaVxEKR23o=
github.com/ClusterCockpit/cc-line-protocol/v2 v2.4.0 h1:hIzxgTBWcmCIHtoDKDkSCsKCOCOwUC34sFsbD2wcW0Q=
github.com/ClusterCockpit/cc-line-protocol/v2 v2.4.0/go.mod h1:y42qUu+YFmu5fdNuUAS4VbbIKxVjxCvbVqFdpdh8ahY=
github.com/ClusterCockpit/go-rocm-smi v0.3.0 h1:1qZnSpG7/NyLtc7AjqnUL9Jb8xtqG1nMVgp69rJfaR8= github.com/ClusterCockpit/go-rocm-smi v0.3.0 h1:1qZnSpG7/NyLtc7AjqnUL9Jb8xtqG1nMVgp69rJfaR8=
github.com/ClusterCockpit/go-rocm-smi v0.3.0/go.mod h1:+I3UMeX3OlizXDf1WpGD43W4KGZZGVSGmny6rTeOnWA= github.com/ClusterCockpit/go-rocm-smi v0.3.0/go.mod h1:+I3UMeX3OlizXDf1WpGD43W4KGZZGVSGmny6rTeOnWA=
github.com/NVIDIA/go-nvml v0.11.6-0/go.mod h1:hy7HYeQy335x6nEss0Ne3PYqleRa6Ct+VKD9RQ4nyFs= github.com/NVIDIA/go-nvml v0.11.6-0/go.mod h1:hy7HYeQy335x6nEss0Ne3PYqleRa6Ct+VKD9RQ4nyFs=
@@ -10,8 +14,8 @@ github.com/PaesslerAG/gval v1.2.4/go.mod h1:XRFLwvmkTEdYziLdaCeCa5ImcGVrfQbeNUbV
github.com/PaesslerAG/jsonpath v0.1.0 h1:gADYeifvlqK3R3i2cR5B4DGgxLXIPb3TRTH1mGi0jPI= github.com/PaesslerAG/jsonpath v0.1.0 h1:gADYeifvlqK3R3i2cR5B4DGgxLXIPb3TRTH1mGi0jPI=
github.com/PaesslerAG/jsonpath v0.1.0/go.mod h1:4BzmtoM/PI8fPO4aQGIusjGxGir2BzcV0grWtFzq1Y8= github.com/PaesslerAG/jsonpath v0.1.0/go.mod h1:4BzmtoM/PI8fPO4aQGIusjGxGir2BzcV0grWtFzq1Y8=
github.com/RaveNoX/go-jsoncommentstrip v1.0.0/go.mod h1:78ihd09MekBnJnxpICcwzCMzGrKSKYe4AqU6PDYYpjk= github.com/RaveNoX/go-jsoncommentstrip v1.0.0/go.mod h1:78ihd09MekBnJnxpICcwzCMzGrKSKYe4AqU6PDYYpjk=
github.com/antithesishq/antithesis-sdk-go v0.4.3-default-no-op h1:+OSa/t11TFhqfrX0EOSqQBDJ0YlpmK0rDSiB19dg9M0= github.com/antithesishq/antithesis-sdk-go v0.5.0-default-no-op h1:Ucf+QxEKMbPogRO5guBNe5cgd9uZgfoJLOYs8WWhtjM=
github.com/antithesishq/antithesis-sdk-go v0.4.3-default-no-op/go.mod h1:IUpT2DPAKh6i/YhSbt6Gl3v2yvUZjmKncl7U91fup7E= github.com/antithesishq/antithesis-sdk-go v0.5.0-default-no-op/go.mod h1:IUpT2DPAKh6i/YhSbt6Gl3v2yvUZjmKncl7U91fup7E=
github.com/apapsch/go-jsonmerge/v2 v2.0.0 h1:axGnT1gRIfimI7gJifB699GoE/oq+F2MU7Dml6nw9rQ= github.com/apapsch/go-jsonmerge/v2 v2.0.0 h1:axGnT1gRIfimI7gJifB699GoE/oq+F2MU7Dml6nw9rQ=
github.com/apapsch/go-jsonmerge/v2 v2.0.0/go.mod h1:lvDnEdqiQrp0O42VQGgmlKpxL1AP2+08jFMw88y4klk= github.com/apapsch/go-jsonmerge/v2 v2.0.0/go.mod h1:lvDnEdqiQrp0O42VQGgmlKpxL1AP2+08jFMw88y4klk=
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM= github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
@@ -19,24 +23,19 @@ github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6r
github.com/bmatcuk/doublestar v1.1.1/go.mod h1:UD6OnuiIn0yFxxA2le/rnRU1G4RaI4UvFv1sNto9p6w= github.com/bmatcuk/doublestar v1.1.1/go.mod h1:UD6OnuiIn0yFxxA2le/rnRU1G4RaI4UvFv1sNto9p6w=
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs= github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs= github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/expr-lang/expr v1.17.6 h1:1h6i8ONk9cexhDmowO/A64VPxHScu7qfSl2k8OlINec= github.com/expr-lang/expr v1.17.8 h1:W1loDTT+0PQf5YteHSTpju2qfUfNoBt4yw9+wOEU9VM=
github.com/expr-lang/expr v1.17.6/go.mod h1:8/vRC7+7HBzESEqt5kKpYXxrxkr31SaO8r40VO/1IT4= github.com/expr-lang/expr v1.17.8/go.mod h1:8/vRC7+7HBzESEqt5kKpYXxrxkr31SaO8r40VO/1IT4=
github.com/frankban/quicktest v1.11.0/go.mod h1:K+q6oSqb0W0Ininfk863uOk1lMy69l/P6txr3mVT54s=
github.com/frankban/quicktest v1.11.2/go.mod h1:K+q6oSqb0W0Ininfk863uOk1lMy69l/P6txr3mVT54s=
github.com/frankban/quicktest v1.13.0 h1:yNZif1OkDfNoDfb9zZa9aXIpejNR4F23Wely0c+Qdqk= github.com/frankban/quicktest v1.13.0 h1:yNZif1OkDfNoDfb9zZa9aXIpejNR4F23Wely0c+Qdqk=
github.com/frankban/quicktest v1.13.0/go.mod h1:qLE0fzW0VuyUAJgPU19zByoIr0HtCHN/r/VLSOOIySU= github.com/frankban/quicktest v1.13.0/go.mod h1:qLE0fzW0VuyUAJgPU19zByoIr0HtCHN/r/VLSOOIySU=
github.com/fsnotify/fsnotify v1.9.0 h1:2Ml+OJNzbYCTzsxtv8vKSFD9PbJjmhYF14k/jKC7S9k= github.com/fsnotify/fsnotify v1.9.0 h1:2Ml+OJNzbYCTzsxtv8vKSFD9PbJjmhYF14k/jKC7S9k=
github.com/fsnotify/fsnotify v1.9.0/go.mod h1:8jBTzvmWwFyi3Pb8djgCCO5IBqzKJ/Jwo8TRcHyHii0= github.com/fsnotify/fsnotify v1.9.0/go.mod h1:8jBTzvmWwFyi3Pb8djgCCO5IBqzKJ/Jwo8TRcHyHii0=
github.com/google/go-cmp v0.5.2/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8= github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU= github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
github.com/google/go-tpm v0.9.6 h1:Ku42PT4LmjDu1H5C5ISWLlpI1mj+Zq7sPGKoRw2XROA= github.com/google/go-tpm v0.9.7 h1:u89J4tUUeDTlH8xxC3CTW7OHZjbjKoHdQ9W7gCUhtxA=
github.com/google/go-tpm v0.9.6/go.mod h1:h9jEsEECg7gtLis0upRBQU+GhYVH6jMjrFxI8u6bVUY= github.com/google/go-tpm v0.9.7/go.mod h1:h9jEsEECg7gtLis0upRBQU+GhYVH6jMjrFxI8u6bVUY=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0= github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo= github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/gorilla/mux v1.8.1 h1:TuBL49tXwgrFYWhqrNgrUNEY92u81SPhu7sTdzQEiWY= github.com/gorilla/mux v1.8.1 h1:TuBL49tXwgrFYWhqrNgrUNEY92u81SPhu7sTdzQEiWY=
@@ -45,23 +44,13 @@ github.com/influxdata/influxdb-client-go/v2 v2.14.0 h1:AjbBfJuq+QoaXNcrova8smSjw
github.com/influxdata/influxdb-client-go/v2 v2.14.0/go.mod h1:Ahpm3QXKMJslpXl3IftVLVezreAUtBOTZssDrjZEFHI= github.com/influxdata/influxdb-client-go/v2 v2.14.0/go.mod h1:Ahpm3QXKMJslpXl3IftVLVezreAUtBOTZssDrjZEFHI=
github.com/influxdata/line-protocol v0.0.0-20210922203350-b1ad95c89adf h1:7JTmneyiNEwVBOHSjoMxiWAqB992atOeepeFYegn5RU= github.com/influxdata/line-protocol v0.0.0-20210922203350-b1ad95c89adf h1:7JTmneyiNEwVBOHSjoMxiWAqB992atOeepeFYegn5RU=
github.com/influxdata/line-protocol v0.0.0-20210922203350-b1ad95c89adf/go.mod h1:xaLFMmpvUxqXtVkUJfg9QmT88cDaCJ3ZKgdZ78oO8Qo= github.com/influxdata/line-protocol v0.0.0-20210922203350-b1ad95c89adf/go.mod h1:xaLFMmpvUxqXtVkUJfg9QmT88cDaCJ3ZKgdZ78oO8Qo=
github.com/influxdata/line-protocol-corpus v0.0.0-20210519164801-ca6fa5da0184/go.mod h1:03nmhxzZ7Xk2pdG+lmMd7mHDfeVOYFyhOgwO61qWU98=
github.com/influxdata/line-protocol-corpus v0.0.0-20210922080147-aa28ccfb8937 h1:MHJNQ+p99hFATQm6ORoLmpUCF7ovjwEFshs/NHzAbig= github.com/influxdata/line-protocol-corpus v0.0.0-20210922080147-aa28ccfb8937 h1:MHJNQ+p99hFATQm6ORoLmpUCF7ovjwEFshs/NHzAbig=
github.com/influxdata/line-protocol-corpus v0.0.0-20210922080147-aa28ccfb8937/go.mod h1:BKR9c0uHSmRgM/se9JhFHtTT7JTO67X23MtKMHtZcpo= github.com/influxdata/line-protocol-corpus v0.0.0-20210922080147-aa28ccfb8937/go.mod h1:BKR9c0uHSmRgM/se9JhFHtTT7JTO67X23MtKMHtZcpo=
github.com/influxdata/line-protocol/v2 v2.0.0-20210312151457-c52fdecb625a/go.mod h1:6+9Xt5Sq1rWx+glMgxhcg2c0DUaehK+5TDcPZ76GypY=
github.com/influxdata/line-protocol/v2 v2.1.0/go.mod h1:QKw43hdUBg3GTk2iC3iyCxksNj7PX9aUSeYOYE/ceHY=
github.com/influxdata/line-protocol/v2 v2.2.1 h1:EAPkqJ9Km4uAxtMRgUubJyqAr6zgWM0dznKMLRauQRE=
github.com/influxdata/line-protocol/v2 v2.2.1/go.mod h1:DmB3Cnh+3oxmG6LOBIxce4oaL4CPj3OmMPgvauXh+tM=
github.com/juju/gnuflag v0.0.0-20171113085948-2ce1bb71843d/go.mod h1:2PavIy+JPciBPrBUjwbNvtwB6RQlve+hkpll6QSNmOE= github.com/juju/gnuflag v0.0.0-20171113085948-2ce1bb71843d/go.mod h1:2PavIy+JPciBPrBUjwbNvtwB6RQlve+hkpll6QSNmOE=
github.com/klauspost/compress v1.18.1 h1:bcSGx7UbpBqMChDtsF28Lw6v/G94LPrrbMbdC3JH2co= github.com/klauspost/compress v1.18.4 h1:RPhnKRAQ4Fh8zU2FY/6ZFDwTVTxgJ/EMydqSTzE9a2c=
github.com/klauspost/compress v1.18.1/go.mod h1:ZQFFVG+MdnR0P+l6wpXgIL4NTtwiKIdBnrBd8Nrxr+0= github.com/klauspost/compress v1.18.4/go.mod h1:R0h/fSBs8DE4ENlcrlib3PsXS61voFxhIs2DeRhCvJ4=
github.com/klauspost/compress v1.18.2 h1:iiPHWW0YrcFgpBYhsA6D1+fqHssJscY/Tm/y2Uqnapk=
github.com/klauspost/compress v1.18.2/go.mod h1:R0h/fSBs8DE4ENlcrlib3PsXS61voFxhIs2DeRhCvJ4=
github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI=
github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE= github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc= github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc=
@@ -72,36 +61,36 @@ github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ= github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ=
github.com/nats-io/jwt/v2 v2.8.0 h1:K7uzyz50+yGZDO5o772eRE7atlcSEENpL7P+b74JV1g= github.com/nats-io/jwt/v2 v2.8.0 h1:K7uzyz50+yGZDO5o772eRE7atlcSEENpL7P+b74JV1g=
github.com/nats-io/jwt/v2 v2.8.0/go.mod h1:me11pOkwObtcBNR8AiMrUbtVOUGkqYjMQZ6jnSdVUIA= github.com/nats-io/jwt/v2 v2.8.0/go.mod h1:me11pOkwObtcBNR8AiMrUbtVOUGkqYjMQZ6jnSdVUIA=
github.com/nats-io/nats-server/v2 v2.12.2 h1:4TEQd0Y4zvcW0IsVxjlXnRso1hBkQl3TS0BI+SxgPhE= github.com/nats-io/nats-server/v2 v2.12.3 h1:KRv+1n7lddMVgkJPQer+pt36TcO0ENxjilBmeWdjcHs=
github.com/nats-io/nats-server/v2 v2.12.2/go.mod h1:j1AAttYeu7WnvD8HLJ+WWKNMSyxsqmZ160pNtCQRMyE= github.com/nats-io/nats-server/v2 v2.12.3/go.mod h1:MQXjG9WjyXKz9koWzUc3jYUMKD8x3CLmTNy91IQQz3Y=
github.com/nats-io/nats.go v1.47.0 h1:YQdADw6J/UfGUd2Oy6tn4Hq6YHxCaJrVKayxxFqYrgM= github.com/nats-io/nats.go v1.49.0 h1:yh/WvY59gXqYpgl33ZI+XoVPKyut/IcEaqtsiuTJpoE=
github.com/nats-io/nats.go v1.47.0/go.mod h1:iRWIPokVIFbVijxuMQq4y9ttaBTMe0SFdlZfMDd+33g= github.com/nats-io/nats.go v1.49.0/go.mod h1:fDCn3mN5cY8HooHwE2ukiLb4p4G4ImmzvXyJt+tGwdw=
github.com/nats-io/nkeys v0.4.11 h1:q44qGV008kYd9W1b1nEBkNzvnWxtRSQ7A8BoqRrcfa0= github.com/nats-io/nkeys v0.4.15 h1:JACV5jRVO9V856KOapQ7x+EY8Jo3qw1vJt/9Jpwzkk4=
github.com/nats-io/nkeys v0.4.11/go.mod h1:szDimtgmfOi9n25JpfIdGw12tZFYXqhGxjhVxsatHVE= github.com/nats-io/nkeys v0.4.15/go.mod h1:CpMchTXC9fxA5zrMo4KpySxNjiDVvr8ANOSZdiNfUrs=
github.com/nats-io/nuid v1.0.1 h1:5iA8DT8V7q8WK2EScv2padNa/rTESc1KdnPw4TC2paw= github.com/nats-io/nuid v1.0.1 h1:5iA8DT8V7q8WK2EScv2padNa/rTESc1KdnPw4TC2paw=
github.com/nats-io/nuid v1.0.1/go.mod h1:19wcPz3Ph3q0Jbyiqsd0kePYG7A95tJPxeL+1OSON2c= github.com/nats-io/nuid v1.0.1/go.mod h1:19wcPz3Ph3q0Jbyiqsd0kePYG7A95tJPxeL+1OSON2c=
github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e/go.mod h1:zD1mROLANZcx1PVRCS0qkT7pwLkGfwJo4zjcN/Tysno= github.com/oapi-codegen/runtime v1.2.0 h1:RvKc1CVS1QeKSNzO97FBQbSMZyQ8s6rZd+LpmzwHMP4=
github.com/oapi-codegen/runtime v1.1.1 h1:EXLHh0DXIJnWhdRPN2w4MXAzFyE4CskzhNLUmtpMYro= github.com/oapi-codegen/runtime v1.2.0/go.mod h1:Y7ZhmmlE8ikZOmuHRRndiIm7nf3xcVv+YMweKgG1DT0=
github.com/oapi-codegen/runtime v1.1.1/go.mod h1:SK9X900oXmPWilYR5/WKPzt3Kqxn/uS/+lbpREv+eCg=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/prometheus/client_golang v1.23.2 h1:Je96obch5RDVy3FDMndoUsjAhG5Edi49h0RJWRi/o0o= github.com/prometheus/client_golang v1.23.2 h1:Je96obch5RDVy3FDMndoUsjAhG5Edi49h0RJWRi/o0o=
github.com/prometheus/client_golang v1.23.2/go.mod h1:Tb1a6LWHB3/SPIzCoaDXI4I8UHKeFTEQ1YCr+0Gyqmg= github.com/prometheus/client_golang v1.23.2/go.mod h1:Tb1a6LWHB3/SPIzCoaDXI4I8UHKeFTEQ1YCr+0Gyqmg=
github.com/prometheus/client_model v0.6.2 h1:oBsgwpGs7iVziMvrGhE53c/GrLUsZdHnqNwqPLxwZyk= github.com/prometheus/client_model v0.6.2 h1:oBsgwpGs7iVziMvrGhE53c/GrLUsZdHnqNwqPLxwZyk=
github.com/prometheus/client_model v0.6.2/go.mod h1:y3m2F6Gdpfy6Ut/GBsUqTWZqCUvMVzSfMLjcu6wAwpE= github.com/prometheus/client_model v0.6.2/go.mod h1:y3m2F6Gdpfy6Ut/GBsUqTWZqCUvMVzSfMLjcu6wAwpE=
github.com/prometheus/common v0.66.1 h1:h5E0h5/Y8niHc5DlaLlWLArTQI7tMrsfQjHV+d9ZoGs= github.com/prometheus/common v0.67.5 h1:pIgK94WWlQt1WLwAC5j2ynLaBRDiinoAb86HZHTUGI4=
github.com/prometheus/common v0.66.1/go.mod h1:gcaUsgf3KfRSwHY4dIMXLPV0K/Wg1oZ8+SbZk/HH/dA= github.com/prometheus/common v0.67.5/go.mod h1:SjE/0MzDEEAyrdr5Gqc6G+sXI67maCxzaT3A2+HqjUw=
github.com/prometheus/procfs v0.16.1 h1:hZ15bTNuirocR6u0JZ6BAHHmwS1p8B4P6MRqxtzMyRg= github.com/prometheus/procfs v0.20.0 h1:AA7aCvjxwAquZAlonN7888f2u4IN8WVeFgBi4k82M4Q=
github.com/prometheus/procfs v0.16.1/go.mod h1:teAbpZRB1iIAJYREa1LsoWUXykVXA1KlTmWl8x/U+Is= github.com/prometheus/procfs v0.20.0/go.mod h1:o9EMBZGRyvDrSPH1RqdxhojkuXstoe4UlK79eF5TGGo=
github.com/rogpeppe/go-internal v1.10.0 h1:TMyTOH3F/DB16zRVcYyreMH6GnZZrwQVAoYjRBZyWFQ= github.com/rogpeppe/go-internal v1.10.0 h1:TMyTOH3F/DB16zRVcYyreMH6GnZZrwQVAoYjRBZyWFQ=
github.com/rogpeppe/go-internal v1.10.0/go.mod h1:UQnix2H7Ngw/k4C5ijL5+65zddjncjaFoBhdsK/akog= github.com/rogpeppe/go-internal v1.10.0/go.mod h1:UQnix2H7Ngw/k4C5ijL5+65zddjncjaFoBhdsK/akog=
github.com/santhosh-tekuri/jsonschema/v5 v5.3.1 h1:lZUw3E0/J3roVtGQ+SCrUrg3ON6NgVqpn3+iol9aGu4= github.com/santhosh-tekuri/jsonschema/v5 v5.3.1 h1:lZUw3E0/J3roVtGQ+SCrUrg3ON6NgVqpn3+iol9aGu4=
github.com/santhosh-tekuri/jsonschema/v5 v5.3.1/go.mod h1:uToXkOrWAZ6/Oc07xWQrPOhJotwFIyu2bBVN41fcDUY= github.com/santhosh-tekuri/jsonschema/v5 v5.3.1/go.mod h1:uToXkOrWAZ6/Oc07xWQrPOhJotwFIyu2bBVN41fcDUY=
github.com/shopspring/decimal v1.3.1 h1:2Usl1nmF/WZucqkFZhnfFYxxxu8LG21F6nPQBE5gKV8=
github.com/shopspring/decimal v1.3.1/go.mod h1:DKyhrW/HYNuLGql+MJL6WCR6knT2jwCFRcu2hWCYk4o= github.com/shopspring/decimal v1.3.1/go.mod h1:DKyhrW/HYNuLGql+MJL6WCR6knT2jwCFRcu2hWCYk4o=
github.com/shopspring/decimal v1.4.0 h1:bxl37RwXBklmTi0C79JfXCEBD1cqqHt0bbgBAGFp81k=
github.com/shopspring/decimal v1.4.0/go.mod h1:gawqmDU56v4yIKSwfBSFip1HdCCXN8/+DMd9qYNcwME=
github.com/spkg/bom v0.0.0-20160624110644-59b7046e48ad/go.mod h1:qLr4V1qq6nMqFKkMo8ZTx3f+BZEkzsRUY10Xsm2mwU0= github.com/spkg/bom v0.0.0-20160624110644-59b7046e48ad/go.mod h1:qLr4V1qq6nMqFKkMo8ZTx3f+BZEkzsRUY10Xsm2mwU0=
github.com/stmcginnis/gofish v0.20.0 h1:hH2V2Qe898F2wWT1loApnkDUrXXiLKqbSlMaH3Y1n08= github.com/stmcginnis/gofish v0.21.3 h1:EBLCHfORnbx7MPw7lplOOVe9QAD1T3XRVz6+a1Z4z5Q=
github.com/stmcginnis/gofish v0.20.0/go.mod h1:PzF5i8ecRG9A2ol8XT64npKUunyraJ+7t0kYMpQAtqU= github.com/stmcginnis/gofish v0.21.3/go.mod h1:PzF5i8ecRG9A2ol8XT64npKUunyraJ+7t0kYMpQAtqU=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U= github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U=
@@ -112,28 +101,23 @@ github.com/tklauser/numcpus v0.11.0 h1:nSTwhKH5e1dMNsCdVBukSZrURJRoHbSEQjdEbY+9R
github.com/tklauser/numcpus v0.11.0/go.mod h1:z+LwcLq54uWZTX0u/bGobaV34u6V7KNlTZejzM6/3MQ= github.com/tklauser/numcpus v0.11.0/go.mod h1:z+LwcLq54uWZTX0u/bGobaV34u6V7KNlTZejzM6/3MQ=
go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto= go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto=
go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE= go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE=
go.yaml.in/yaml/v2 v2.4.2 h1:DzmwEr2rDGHl7lsFgAHxmNz/1NlQ7xLIrlN2h5d1eGI= go.yaml.in/yaml/v2 v2.4.3 h1:6gvOSjQoTB3vt1l+CU+tSyi/HOjfOjRLJ4YwYZGwRO0=
go.yaml.in/yaml/v2 v2.4.2/go.mod h1:081UH+NErpNdqlCXm3TtEran0rJZGxAYx9hb/ELlsPU= go.yaml.in/yaml/v2 v2.4.3/go.mod h1:zSxWcmIDjOzPXpjlTTbAsKokqkDNAVtZO0WOMiT90s8=
golang.design/x/thread v0.0.0-20210122121316-335e9adffdf1 h1:P7S/GeHBAFEZIYp0ePPs2kHXoazz8q2KsyxHyQVGCJg= golang.design/x/thread v0.0.0-20210122121316-335e9adffdf1 h1:P7S/GeHBAFEZIYp0ePPs2kHXoazz8q2KsyxHyQVGCJg=
golang.design/x/thread v0.0.0-20210122121316-335e9adffdf1/go.mod h1:9CWpnTUmlQkfdpdutA1nNf4iE5lAVt3QZOu0Z6hahBE= golang.design/x/thread v0.0.0-20210122121316-335e9adffdf1/go.mod h1:9CWpnTUmlQkfdpdutA1nNf4iE5lAVt3QZOu0Z6hahBE=
golang.org/x/crypto v0.43.0 h1:dduJYIi3A3KOfdGOHX8AVZ/jGiyPa3IbBozJ5kNuE04= golang.org/x/crypto v0.48.0 h1:/VRzVqiRSggnhY7gNRxPauEQ5Drw9haKdM0jqfcCFts=
golang.org/x/crypto v0.43.0/go.mod h1:BFbav4mRNlXJL4wNeejLpWxB7wMbc79PdRGhWKncxR0= golang.org/x/crypto v0.48.0/go.mod h1:r0kV5h3qnFPlQnBSrULhlsRfryS2pmewsg+XfMgkVos=
golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b h1:M2rDM6z3Fhozi9O7NWsxAkg/yqS/lQJ6PmkyIV3YP+o= golang.org/x/net v0.51.0 h1:94R/GTO7mt3/4wIKpcR5gkGmRLOuE/2hNGeWq/GBIFo=
golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b/go.mod h1:3//PLf8L/X+8b4vuAfHzxeRUl04Adcb341+IGKfnqS8= golang.org/x/net v0.51.0/go.mod h1:aamm+2QF5ogm02fjy5Bb7CQ0WMt1/WVM7FtyaTLlA9Y=
golang.org/x/net v0.45.0 h1:RLBg5JKixCy82FtLJpeNlVM0nrSqpCRYzVU1n8kj0tM=
golang.org/x/net v0.45.0/go.mod h1:ECOoLqd5U3Lhyeyo/QDCEVQ4sNgYsqvCZ722XogGieY=
golang.org/x/sys v0.0.0-20210122093101-04d7465088b8/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210122093101-04d7465088b8/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.38.0 h1:3yZWxaJjBmCWXqhN1qh02AkOnCQ1poK6oF+a7xWL6Gc= golang.org/x/sys v0.42.0 h1:omrd2nAlyT5ESRdCLYdm3+fMfNFE/+Rf4bDIQImRJeo=
golang.org/x/sys v0.38.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks= golang.org/x/sys v0.42.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
golang.org/x/time v0.14.0 h1:MRx4UaLrDotUKUdCIqzPC48t1Y9hANFKIRpNx+Te8PI= golang.org/x/time v0.14.0 h1:MRx4UaLrDotUKUdCIqzPC48t1Y9hANFKIRpNx+Te8PI=
golang.org/x/time v0.14.0/go.mod h1:eL/Oa2bBBK0TkX57Fyni+NgnyQQN4LitPmob2Hjnqw4= golang.org/x/time v0.14.0/go.mod h1:eL/Oa2bBBK0TkX57Fyni+NgnyQQN4LitPmob2Hjnqw4=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= google.golang.org/protobuf v1.36.11 h1:fV6ZwhNocDyBLK0dj+fg8ektcVegBBuEolpbTQyBNVE=
google.golang.org/protobuf v1.36.8 h1:xHScyCOEuuwZEc6UtSOvPbAT4zRh0xcNRYekJwfqyMc= google.golang.org/protobuf v1.36.11/go.mod h1:HTf+CrKn2C3g5S8VImy6tdcUvCska2kB7j23XfzDpco=
google.golang.org/protobuf v1.36.8/go.mod h1:fuxRtAxBytpl4zzqUh6/eyUujkJdNiuEkXntxiD/uRU=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20200227125254-8fa46927fb4f/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk= gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q= gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
gopkg.in/yaml.v3 v3.0.0-20200615113413-eeeca48fe776/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

View File

@@ -10,15 +10,17 @@ package metricAggregator
import ( import (
"context" "context"
"fmt" "fmt"
"maps"
"math" "math"
"os" "os"
"slices"
"strings" "strings"
"sync" "sync"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
topo "github.com/ClusterCockpit/cc-metric-collector/pkg/ccTopology" topo "github.com/ClusterCockpit/cc-metric-collector/pkg/ccTopology"
"github.com/PaesslerAG/gval" "github.com/PaesslerAG/gval"
@@ -36,7 +38,7 @@ type MetricAggregatorIntervalConfig struct {
type metricAggregator struct { type metricAggregator struct {
functions []*MetricAggregatorIntervalConfig functions []*MetricAggregatorIntervalConfig
constants map[string]interface{} constants map[string]any
language gval.Language language gval.Language
output chan lp.CCMessage output chan lp.CCMessage
} }
@@ -70,10 +72,12 @@ var metricCacheLanguage = gval.NewLanguage(
gval.Function("getCpuList", getCpuListOfNode), gval.Function("getCpuList", getCpuListOfNode),
gval.Function("getCpuListOfType", getCpuListOfType), gval.Function("getCpuListOfType", getCpuListOfType),
) )
var language gval.Language = gval.NewLanguage( var language gval.Language = gval.NewLanguage(
gval.Full(), gval.Full(),
metricCacheLanguage, metricCacheLanguage,
) )
var evaluables = struct { var evaluables = struct {
mapping map[string]gval.Evaluable mapping map[string]gval.Evaluable
mutex sync.Mutex mutex sync.Mutex
@@ -84,14 +88,13 @@ var evaluables = struct {
func (c *metricAggregator) Init(output chan lp.CCMessage) error { func (c *metricAggregator) Init(output chan lp.CCMessage) error {
c.output = output c.output = output
c.functions = make([]*MetricAggregatorIntervalConfig, 0) c.functions = make([]*MetricAggregatorIntervalConfig, 0)
c.constants = make(map[string]interface{}) c.constants = make(map[string]any)
// add constants like hostname, numSockets, ... to constants list // add constants like hostname, numSockets, ... to constants list
// Set hostname // Set hostname
hostname, err := os.Hostname() hostname, err := os.Hostname()
if err != nil { if err != nil {
cclog.Error(err.Error()) return fmt.Errorf("metricAggregator: failed to get hostname: %w", err)
return err
} }
// Drop domain part of host name // Drop domain part of host name
c.constants["hostname"] = strings.SplitN(hostname, `.`, 2)[0] c.constants["hostname"] = strings.SplitN(hostname, `.`, 2)[0]
@@ -120,10 +123,8 @@ func (c *metricAggregator) Init(output chan lp.CCMessage) error {
} }
func (c *metricAggregator) Eval(starttime time.Time, endtime time.Time, metrics []lp.CCMessage) { func (c *metricAggregator) Eval(starttime time.Time, endtime time.Time, metrics []lp.CCMessage) {
vars := make(map[string]interface{}) vars := make(map[string]any)
for k, v := range c.constants { maps.Copy(vars, c.constants)
vars[k] = v
}
vars["starttime"] = starttime vars["starttime"] = starttime
vars["endtime"] = endtime vars["endtime"] = endtime
for _, f := range c.functions { for _, f := range c.functions {
@@ -137,7 +138,6 @@ func (c *metricAggregator) Eval(starttime time.Time, endtime time.Time, metrics
matches := make([]lp.CCMessage, 0) matches := make([]lp.CCMessage, 0)
for _, m := range metrics { for _, m := range metrics {
vars["metric"] = m vars["metric"] = m
//value, err := gval.Evaluate(f.Condition, vars, c.language)
value, err := f.gvalCond.EvalBool(context.Background(), vars) value, err := f.gvalCond.EvalBool(context.Background(), vars)
if err != nil { if err != nil {
cclog.ComponentError("MetricCache", "COLLECT", f.Name, "COND", f.Condition, ":", err.Error()) cclog.ComponentError("MetricCache", "COLLECT", f.Name, "COND", f.Condition, ":", err.Error())
@@ -171,22 +171,22 @@ func (c *metricAggregator) Eval(starttime time.Time, endtime time.Time, metrics
// Check, that only values of one type were collected // Check, that only values of one type were collected
countValueTypes := 0 countValueTypes := 0
if len(valuesFloat64) > 0 { if len(valuesFloat64) > 0 {
countValueTypes += 1 countValueTypes++
} }
if len(valuesFloat32) > 0 { if len(valuesFloat32) > 0 {
countValueTypes += 1 countValueTypes++
} }
if len(valuesInt) > 0 { if len(valuesInt) > 0 {
countValueTypes += 1 countValueTypes++
} }
if len(valuesInt32) > 0 { if len(valuesInt32) > 0 {
countValueTypes += 1 countValueTypes++
} }
if len(valuesInt64) > 0 { if len(valuesInt64) > 0 {
countValueTypes += 1 countValueTypes++
} }
if len(valuesBool) > 0 { if len(valuesBool) > 0 {
countValueTypes += 1 countValueTypes++
} }
if countValueTypes > 1 { if countValueTypes > 1 {
cclog.ComponentError("MetricCache", "Collected values of different types") cclog.ComponentError("MetricCache", "Collected values of different types")
@@ -263,15 +263,15 @@ func (c *metricAggregator) Eval(starttime time.Time, endtime time.Time, metrics
var m lp.CCMessage var m lp.CCMessage
switch t := value.(type) { switch t := value.(type) {
case float64: case float64:
m, err = lp.NewMessage(f.Name, tags, meta, map[string]interface{}{"value": t}, starttime) m, err = lp.NewMessage(f.Name, tags, meta, map[string]any{"value": t}, starttime)
case float32: case float32:
m, err = lp.NewMessage(f.Name, tags, meta, map[string]interface{}{"value": t}, starttime) m, err = lp.NewMessage(f.Name, tags, meta, map[string]any{"value": t}, starttime)
case int: case int:
m, err = lp.NewMessage(f.Name, tags, meta, map[string]interface{}{"value": t}, starttime) m, err = lp.NewMessage(f.Name, tags, meta, map[string]any{"value": t}, starttime)
case int64: case int64:
m, err = lp.NewMessage(f.Name, tags, meta, map[string]interface{}{"value": t}, starttime) m, err = lp.NewMessage(f.Name, tags, meta, map[string]any{"value": t}, starttime)
case string: case string:
m, err = lp.NewMessage(f.Name, tags, meta, map[string]interface{}{"value": t}, starttime) m, err = lp.NewMessage(f.Name, tags, meta, map[string]any{"value": t}, starttime)
default: default:
cclog.ComponentError("MetricCache", "Gval returned invalid type", t, "skipping metric", f.Name) cclog.ComponentError("MetricCache", "Gval returned invalid type", t, "skipping metric", f.Name)
} }
@@ -329,18 +329,21 @@ func (c *metricAggregator) AddAggregation(name, function, condition string, tags
} }
func (c *metricAggregator) DeleteAggregation(name string) error { func (c *metricAggregator) DeleteAggregation(name string) error {
for i, agg := range c.functions { i := slices.IndexFunc(
if agg.Name == name { c.functions,
copy(c.functions[i:], c.functions[i+1:]) func(agg *MetricAggregatorIntervalConfig) bool {
c.functions[len(c.functions)-1] = nil return agg.Name == name
c.functions = c.functions[:len(c.functions)-1] })
return nil if i == -1 {
} return fmt.Errorf("no aggregation for metric name %s", name)
} }
return fmt.Errorf("no aggregation for metric name %s", name) copy(c.functions[i:], c.functions[i+1:])
c.functions[len(c.functions)-1] = nil
c.functions = c.functions[:len(c.functions)-1]
return nil
} }
func (c *metricAggregator) AddConstant(name string, value interface{}) { func (c *metricAggregator) AddConstant(name string, value any) {
c.constants[name] = value c.constants[name] = value
} }
@@ -348,19 +351,18 @@ func (c *metricAggregator) DelConstant(name string) {
delete(c.constants, name) delete(c.constants, name)
} }
func (c *metricAggregator) AddFunction(name string, function func(args ...interface{}) (interface{}, error)) { func (c *metricAggregator) AddFunction(name string, function func(args ...any) (any, error)) {
c.language = gval.NewLanguage(c.language, gval.Function(name, function)) c.language = gval.NewLanguage(c.language, gval.Function(name, function))
} }
func EvalBoolCondition(condition string, params map[string]interface{}) (bool, error) { func EvalBoolCondition(condition string, params map[string]any) (bool, error) {
evaluables.mutex.Lock() evaluables.mutex.Lock()
evaluable, ok := evaluables.mapping[condition] evaluable, ok := evaluables.mapping[condition]
evaluables.mutex.Unlock() evaluables.mutex.Unlock()
if !ok { if !ok {
newcond := newcond := strings.ReplaceAll(
strings.ReplaceAll( strings.ReplaceAll(
strings.ReplaceAll( condition, "'", "\""), "%", "\\")
condition, "'", "\""), "%", "\\")
var err error var err error
evaluable, err = language.NewEvaluable(newcond) evaluable, err = language.NewEvaluable(newcond)
if err != nil { if err != nil {
@@ -379,10 +381,9 @@ func EvalFloat64Condition(condition string, params map[string]float64) (float64,
evaluable, ok := evaluables.mapping[condition] evaluable, ok := evaluables.mapping[condition]
evaluables.mutex.Unlock() evaluables.mutex.Unlock()
if !ok { if !ok {
newcond := newcond := strings.ReplaceAll(
strings.ReplaceAll( strings.ReplaceAll(
strings.ReplaceAll( condition, "'", "\""), "%", "\\")
condition, "'", "\""), "%", "\\")
var err error var err error
evaluable, err = language.NewEvaluable(newcond) evaluable, err = language.NewEvaluable(newcond)
if err != nil { if err != nil {

View File

@@ -11,10 +11,10 @@ import (
"errors" "errors"
"fmt" "fmt"
"regexp" "regexp"
"slices"
"strconv"
"strings" "strings"
"golang.org/x/exp/slices"
topo "github.com/ClusterCockpit/cc-metric-collector/pkg/ccTopology" topo "github.com/ClusterCockpit/cc-metric-collector/pkg/ccTopology"
) )
@@ -34,8 +34,7 @@ func sumAnyType[T float64 | float32 | int | int32 | int64](values []T) (T, error
} }
// Sum up values // Sum up values
func sumfunc(args interface{}) (interface{}, error) { func sumfunc(args any) (any, error) {
var err error var err error
switch values := args.(type) { switch values := args.(type) {
case []float64: case []float64:
@@ -63,7 +62,7 @@ func minAnyType[T float64 | float32 | int | int32 | int64](values []T) (T, error
} }
// Get the minimum value // Get the minimum value
func minfunc(args interface{}) (interface{}, error) { func minfunc(args any) (any, error) {
switch values := args.(type) { switch values := args.(type) {
case []float64: case []float64:
return minAnyType(values) return minAnyType(values)
@@ -84,12 +83,12 @@ func avgAnyType[T float64 | float32 | int | int32 | int64](values []T) (float64,
if len(values) == 0 { if len(values) == 0 {
return 0.0, errors.New("average function requires at least one argument") return 0.0, errors.New("average function requires at least one argument")
} }
sum, err := sumAnyType[T](values) sum, err := sumAnyType(values)
return float64(sum) / float64(len(values)), err return float64(sum) / float64(len(values)), err
} }
// Get the average or mean value // Get the average or mean value
func avgfunc(args interface{}) (interface{}, error) { func avgfunc(args any) (any, error) {
switch values := args.(type) { switch values := args.(type) {
case []float64: case []float64:
return avgAnyType(values) return avgAnyType(values)
@@ -114,7 +113,7 @@ func maxAnyType[T float64 | float32 | int | int32 | int64](values []T) (T, error
} }
// Get the maximum value // Get the maximum value
func maxfunc(args interface{}) (interface{}, error) { func maxfunc(args any) (any, error) {
switch values := args.(type) { switch values := args.(type) {
case []float64: case []float64:
return maxAnyType(values) return maxAnyType(values)
@@ -146,7 +145,7 @@ func medianAnyType[T float64 | float32 | int | int32 | int64](values []T) (T, er
} }
// Get the median value // Get the median value
func medianfunc(args interface{}) (interface{}, error) { func medianfunc(args any) (any, error) {
switch values := args.(type) { switch values := args.(type) {
case []float64: case []float64:
return medianAnyType(values) return medianAnyType(values)
@@ -167,9 +166,9 @@ func medianfunc(args interface{}) (interface{}, error) {
* Get number of values in list. Returns always an int * Get number of values in list. Returns always an int
*/ */
func lenfunc(args interface{}) (interface{}, error) { func lenfunc(args any) (any, error) {
var err error = nil var err error
var length int = 0 length := 0
switch values := args.(type) { switch values := args.(type) {
case []float64: case []float64:
length = len(values) length = len(values)
@@ -181,13 +180,7 @@ func lenfunc(args interface{}) (interface{}, error) {
length = len(values) length = len(values)
case []int32: case []int32:
length = len(values) length = len(values)
case float64: case float64, float32, int, int64:
err = errors.New("function 'len' can only be applied on arrays and strings")
case float32:
err = errors.New("function 'len' can only be applied on arrays and strings")
case int:
err = errors.New("function 'len' can only be applied on arrays and strings")
case int64:
err = errors.New("function 'len' can only be applied on arrays and strings") err = errors.New("function 'len' can only be applied on arrays and strings")
case string: case string:
length = len(values) length = len(values)
@@ -197,13 +190,13 @@ func lenfunc(args interface{}) (interface{}, error) {
/* /*
* Check if a values is in a list * Check if a values is in a list
* In constrast to most of the other functions, this one is an infix operator for * In contrast to most of the other functions, this one is an infix operator for
* - substring matching: `"abc" in "abcdef"` -> true * - substring matching: `"abc" in "abcdef"` -> true
* - substring matching with int casting: `3 in "abd3"` -> true * - substring matching with int casting: `3 in "abd3"` -> true
* - search for an int in an int list: `3 in getCpuList()` -> true (if you have more than 4 CPU hardware threads) * - search for an int in an int list: `3 in getCpuList()` -> true (if you have more than 4 CPU hardware threads)
*/ */
func infunc(a interface{}, b interface{}) (interface{}, error) { func infunc(a any, b any) (any, error) {
switch match := a.(type) { switch match := a.(type) {
case string: case string:
switch total := b.(type) { switch total := b.(type) {
@@ -213,13 +206,9 @@ func infunc(a interface{}, b interface{}) (interface{}, error) {
case int: case int:
switch total := b.(type) { switch total := b.(type) {
case []int: case []int:
for _, x := range total { return slices.Contains(total, match), nil
if x == match {
return true, nil
}
}
case string: case string:
smatch := fmt.Sprintf("%d", match) smatch := strconv.Itoa(match)
return strings.Contains(total, smatch), nil return strings.Contains(total, smatch), nil
} }
@@ -233,12 +222,12 @@ func infunc(a interface{}, b interface{}) (interface{}, error) {
* format keys \d = %d, \w = %d, ... Not sure how to fix this * format keys \d = %d, \w = %d, ... Not sure how to fix this
*/ */
func matchfunc(args ...interface{}) (interface{}, error) { func matchfunc(args ...any) (any, error) {
switch match := args[0].(type) { switch match := args[0].(type) {
case string: case string:
switch total := args[1].(type) { switch total := args[1].(type) {
case string: case string:
smatch := strings.Replace(match, "%", "\\", -1) smatch := strings.ReplaceAll(match, "%", "\\")
regex, err := regexp.Compile(smatch) regex, err := regexp.Compile(smatch)
if err != nil { if err != nil {
return false, err return false, err
@@ -255,7 +244,7 @@ func matchfunc(args ...interface{}) (interface{}, error) {
*/ */
// for a given cpuid, it returns the core id // for a given cpuid, it returns the core id
func getCpuCoreFunc(args interface{}) (interface{}, error) { func getCpuCoreFunc(args any) (any, error) {
switch cpuid := args.(type) { switch cpuid := args.(type) {
case int: case int:
return topo.GetHwthreadCore(cpuid), nil return topo.GetHwthreadCore(cpuid), nil
@@ -264,7 +253,7 @@ func getCpuCoreFunc(args interface{}) (interface{}, error) {
} }
// for a given cpuid, it returns the socket id // for a given cpuid, it returns the socket id
func getCpuSocketFunc(args interface{}) (interface{}, error) { func getCpuSocketFunc(args any) (any, error) {
switch cpuid := args.(type) { switch cpuid := args.(type) {
case int: case int:
return topo.GetHwthreadSocket(cpuid), nil return topo.GetHwthreadSocket(cpuid), nil
@@ -273,7 +262,7 @@ func getCpuSocketFunc(args interface{}) (interface{}, error) {
} }
// for a given cpuid, it returns the id of the NUMA node // for a given cpuid, it returns the id of the NUMA node
func getCpuNumaDomainFunc(args interface{}) (interface{}, error) { func getCpuNumaDomainFunc(args any) (any, error) {
switch cpuid := args.(type) { switch cpuid := args.(type) {
case int: case int:
return topo.GetHwthreadNumaDomain(cpuid), nil return topo.GetHwthreadNumaDomain(cpuid), nil
@@ -282,7 +271,7 @@ func getCpuNumaDomainFunc(args interface{}) (interface{}, error) {
} }
// for a given cpuid, it returns the id of the CPU die // for a given cpuid, it returns the id of the CPU die
func getCpuDieFunc(args interface{}) (interface{}, error) { func getCpuDieFunc(args any) (any, error) {
switch cpuid := args.(type) { switch cpuid := args.(type) {
case int: case int:
return topo.GetHwthreadDie(cpuid), nil return topo.GetHwthreadDie(cpuid), nil
@@ -291,7 +280,7 @@ func getCpuDieFunc(args interface{}) (interface{}, error) {
} }
// for a given core id, it returns the list of cpuids // for a given core id, it returns the list of cpuids
func getCpuListOfCoreFunc(args interface{}) (interface{}, error) { func getCpuListOfCoreFunc(args any) (any, error) {
cpulist := make([]int, 0) cpulist := make([]int, 0)
switch in := args.(type) { switch in := args.(type) {
case int: case int:
@@ -305,7 +294,7 @@ func getCpuListOfCoreFunc(args interface{}) (interface{}, error) {
} }
// for a given socket id, it returns the list of cpuids // for a given socket id, it returns the list of cpuids
func getCpuListOfSocketFunc(args interface{}) (interface{}, error) { func getCpuListOfSocketFunc(args any) (any, error) {
cpulist := make([]int, 0) cpulist := make([]int, 0)
switch in := args.(type) { switch in := args.(type) {
case int: case int:
@@ -319,7 +308,7 @@ func getCpuListOfSocketFunc(args interface{}) (interface{}, error) {
} }
// for a given id of a NUMA domain, it returns the list of cpuids // for a given id of a NUMA domain, it returns the list of cpuids
func getCpuListOfNumaDomainFunc(args interface{}) (interface{}, error) { func getCpuListOfNumaDomainFunc(args any) (any, error) {
cpulist := make([]int, 0) cpulist := make([]int, 0)
switch in := args.(type) { switch in := args.(type) {
case int: case int:
@@ -333,7 +322,7 @@ func getCpuListOfNumaDomainFunc(args interface{}) (interface{}, error) {
} }
// for a given CPU die id, it returns the list of cpuids // for a given CPU die id, it returns the list of cpuids
func getCpuListOfDieFunc(args interface{}) (interface{}, error) { func getCpuListOfDieFunc(args any) (any, error) {
cpulist := make([]int, 0) cpulist := make([]int, 0)
switch in := args.(type) { switch in := args.(type) {
case int: case int:
@@ -347,14 +336,14 @@ func getCpuListOfDieFunc(args interface{}) (interface{}, error) {
} }
// wrapper function to get a list of all cpuids of the node // wrapper function to get a list of all cpuids of the node
func getCpuListOfNode() (interface{}, error) { func getCpuListOfNode() (any, error) {
return topo.HwthreadList(), nil return topo.HwthreadList(), nil
} }
// helper function to get the cpuid list for a CCMetric type tag set (type and type-id) // helper function to get the cpuid list for a CCMetric type tag set (type and type-id)
// since there is no access to the metric data in the function, is should be called like // since there is no access to the metric data in the function, is should be called like
// `getCpuListOfType()` // `getCpuListOfType()`
func getCpuListOfType(args ...interface{}) (interface{}, error) { func getCpuListOfType(args ...any) (any, error) {
cpulist := make([]int, 0) cpulist := make([]int, 0)
switch typ := args[0].(type) { switch typ := args[0].(type) {
case string: case string:

View File

@@ -8,12 +8,13 @@
package metricRouter package metricRouter
import ( import (
"fmt"
"sync" "sync"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
agg "github.com/ClusterCockpit/cc-metric-collector/internal/metricAggregator" agg "github.com/ClusterCockpit/cc-metric-collector/internal/metricAggregator"
mct "github.com/ClusterCockpit/cc-metric-collector/pkg/multiChanTicker" mct "github.com/ClusterCockpit/cc-metric-collector/pkg/multiChanTicker"
) )
@@ -51,7 +52,7 @@ type MetricCache interface {
} }
func (c *metricCache) Init(output chan lp.CCMessage, ticker mct.MultiChanTicker, wg *sync.WaitGroup, numPeriods int) error { func (c *metricCache) Init(output chan lp.CCMessage, ticker mct.MultiChanTicker, wg *sync.WaitGroup, numPeriods int) error {
var err error = nil var err error
c.done = make(chan bool) c.done = make(chan bool)
c.wg = wg c.wg = wg
c.ticker = ticker c.ticker = ticker
@@ -70,8 +71,7 @@ func (c *metricCache) Init(output chan lp.CCMessage, ticker mct.MultiChanTicker,
// The code is executed by the MetricCache goroutine // The code is executed by the MetricCache goroutine
c.aggEngine, err = agg.NewAggregator(c.output) c.aggEngine, err = agg.NewAggregator(c.output)
if err != nil { if err != nil {
cclog.ComponentError("MetricCache", "Cannot create aggregator") return fmt.Errorf("MetricCache: failed to create aggregator: %w", err)
return err
} }
return nil return nil
@@ -79,7 +79,6 @@ func (c *metricCache) Init(output chan lp.CCMessage, ticker mct.MultiChanTicker,
// Start starts the metric cache // Start starts the metric cache
func (c *metricCache) Start() { func (c *metricCache) Start() {
c.tickchan = make(chan time.Time) c.tickchan = make(chan time.Time)
c.ticker.AddChannel(c.tickchan) c.ticker.AddChannel(c.tickchan)
// Router cache is done // Router cache is done
@@ -102,9 +101,7 @@ func (c *metricCache) Start() {
return oldPeriod return oldPeriod
} }
c.wg.Add(1) c.wg.Go(func() {
go func() {
defer c.wg.Done()
for { for {
select { select {
case <-c.done: case <-c.done:
@@ -124,7 +121,7 @@ func (c *metricCache) Start() {
} }
} }
} }
}() })
cclog.ComponentDebug("MetricCache", "START") cclog.ComponentDebug("MetricCache", "START")
} }
@@ -137,12 +134,12 @@ func (c *metricCache) Add(metric lp.CCMessage) {
p := c.intervals[c.curPeriod] p := c.intervals[c.curPeriod]
if p.numMetrics < p.sizeMetrics { if p.numMetrics < p.sizeMetrics {
p.metrics[p.numMetrics] = metric p.metrics[p.numMetrics] = metric
p.numMetrics = p.numMetrics + 1 p.numMetrics++
p.stopstamp = metric.Time() p.stopstamp = metric.Time()
} else { } else {
p.metrics = append(p.metrics, metric) p.metrics = append(p.metrics, metric)
p.numMetrics = p.numMetrics + 1 p.numMetrics++
p.sizeMetrics = p.sizeMetrics + 1 p.sizeMetrics++
p.stopstamp = metric.Time() p.stopstamp = metric.Time()
} }
c.lock.Unlock() c.lock.Unlock()
@@ -161,8 +158,8 @@ func (c *metricCache) DeleteAggregation(name string) error {
// is the current one, index=1 the last interval and so on. Returns and empty array if a wrong index // is the current one, index=1 the last interval and so on. Returns and empty array if a wrong index
// is given (negative index, index larger than configured number of total intervals, ...) // is given (negative index, index larger than configured number of total intervals, ...)
func (c *metricCache) GetPeriod(index int) (time.Time, time.Time, []lp.CCMessage) { func (c *metricCache) GetPeriod(index int) (time.Time, time.Time, []lp.CCMessage) {
var start time.Time = time.Now() start := time.Now()
var stop time.Time = time.Now() stop := time.Now()
var metrics []lp.CCMessage var metrics []lp.CCMessage
if index >= 0 && index < c.numPeriods { if index >= 0 && index < c.numPeriods {
pindex := c.curPeriod - index pindex := c.curPeriod - index
@@ -173,7 +170,6 @@ func (c *metricCache) GetPeriod(index int) (time.Time, time.Time, []lp.CCMessage
start = c.intervals[pindex].startstamp start = c.intervals[pindex].startstamp
stop = c.intervals[pindex].stopstamp stop = c.intervals[pindex].stopstamp
metrics = c.intervals[pindex].metrics metrics = c.intervals[pindex].metrics
//return c.intervals[pindex].startstamp, c.intervals[pindex].stopstamp, c.intervals[pindex].metrics
} else { } else {
metrics = make([]lp.CCMessage, 0) metrics = make([]lp.CCMessage, 0)
} }

View File

@@ -8,17 +8,18 @@
package metricRouter package metricRouter
import ( import (
"bytes"
"encoding/json" "encoding/json"
"fmt" "fmt"
"maps"
"os" "os"
"strings" "strings"
"sync" "sync"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
lp "github.com/ClusterCockpit/cc-lib/ccMessage" mp "github.com/ClusterCockpit/cc-lib/v2/messageProcessor"
mp "github.com/ClusterCockpit/cc-lib/messageProcessor"
agg "github.com/ClusterCockpit/cc-metric-collector/internal/metricAggregator" agg "github.com/ClusterCockpit/cc-metric-collector/internal/metricAggregator"
mct "github.com/ClusterCockpit/cc-metric-collector/pkg/multiChanTicker" mct "github.com/ClusterCockpit/cc-metric-collector/pkg/multiChanTicker"
) )
@@ -46,8 +47,7 @@ type metricRouterConfig struct {
MaxForward int `json:"max_forward"` // Number of maximal forwarded metrics at one select MaxForward int `json:"max_forward"` // Number of maximal forwarded metrics at one select
NormalizeUnits bool `json:"normalize_units"` // Check unit meta flag and normalize it using cc-units NormalizeUnits bool `json:"normalize_units"` // Check unit meta flag and normalize it using cc-units
ChangeUnitPrefix map[string]string `json:"change_unit_prefix"` // Add prefix that should be applied to the metrics ChangeUnitPrefix map[string]string `json:"change_unit_prefix"` // Add prefix that should be applied to the metrics
// dropMetrics map[string]bool // Internal map for O(1) lookup MessageProcessor json.RawMessage `json:"process_messages,omitempty"`
MessageProcessor json.RawMessage `json:"process_messages,omitempty"`
} }
// Metric router data structure // Metric router data structure
@@ -102,76 +102,93 @@ func (r *metricRouter) Init(ticker mct.MultiChanTicker, wg *sync.WaitGroup, rout
// Drop domain part of host name // Drop domain part of host name
r.hostname = strings.SplitN(hostname, `.`, 2)[0] r.hostname = strings.SplitN(hostname, `.`, 2)[0]
err = json.Unmarshal(routerConfig, &r.config) d := json.NewDecoder(bytes.NewReader(routerConfig))
if err != nil { d.DisallowUnknownFields()
cclog.ComponentError("MetricRouter", err.Error()) if err := d.Decode(&r.config); err != nil {
return err return fmt.Errorf("failed to decode metric router config: %w", err)
}
r.maxForward = 1
if r.config.MaxForward > r.maxForward {
r.maxForward = r.config.MaxForward
} }
r.maxForward = max(1, r.config.MaxForward)
if r.config.NumCacheIntervals > 0 { if r.config.NumCacheIntervals > 0 {
r.cache, err = NewCache(r.cache_input, r.ticker, &r.cachewg, r.config.NumCacheIntervals) r.cache, err = NewCache(r.cache_input, r.ticker, &r.cachewg, r.config.NumCacheIntervals)
if err != nil { if err != nil {
cclog.ComponentError("MetricRouter", "MetricCache initialization failed:", err.Error()) return fmt.Errorf("MetricRouter: failed to initialize MetricCache: %w", err)
return err
} }
for _, agg := range r.config.IntervalAgg { for _, agg := range r.config.IntervalAgg {
r.cache.AddAggregation(agg.Name, agg.Function, agg.Condition, agg.Tags, agg.Meta) err = r.cache.AddAggregation(agg.Name, agg.Function, agg.Condition, agg.Tags, agg.Meta)
if err != nil {
return fmt.Errorf("MetricCache AddAggregation() failed: %w", err)
}
} }
} }
p, err := mp.NewMessageProcessor() p, err := mp.NewMessageProcessor()
if err != nil { if err != nil {
return fmt.Errorf("initialization of message processor failed: %v", err.Error()) return fmt.Errorf("MessageProcessor NewMessageProcessor() failed: %w", err)
} }
r.mp = p r.mp = p
if len(r.config.MessageProcessor) > 0 { if len(r.config.MessageProcessor) > 0 {
err = r.mp.FromConfigJSON(r.config.MessageProcessor) err = r.mp.FromConfigJSON(r.config.MessageProcessor)
if err != nil { if err != nil {
return fmt.Errorf("failed parsing JSON for message processor: %v", err.Error()) return fmt.Errorf("MessageProcessor FromConfigJSON() failed: %w", err)
} }
} }
for _, mname := range r.config.DropMetrics { for _, mname := range r.config.DropMetrics {
r.mp.AddDropMessagesByName(mname) err = r.mp.AddDropMessagesByName(mname)
if err != nil {
return fmt.Errorf("MessageProcessor AddDropMessagesByName() failed: %w", err)
}
} }
for _, cond := range r.config.DropMetricsIf { for _, cond := range r.config.DropMetricsIf {
r.mp.AddDropMessagesByCondition(cond) err = r.mp.AddDropMessagesByCondition(cond)
if err != nil {
return fmt.Errorf("MessageProcessor AddDropMessagesByCondition() failed: %w", err)
}
} }
for _, data := range r.config.AddTags { for _, data := range r.config.AddTags {
cond := data.Condition cond := data.Condition
if cond == "*" { if cond == "*" {
cond = "true" cond = "true"
} }
r.mp.AddAddTagsByCondition(cond, data.Key, data.Value) err = r.mp.AddAddTagsByCondition(cond, data.Key, data.Value)
if err != nil {
return fmt.Errorf("MessageProcessor AddAddTagsByCondition() failed: %w", err)
}
} }
for _, data := range r.config.DelTags { for _, data := range r.config.DelTags {
cond := data.Condition cond := data.Condition
if cond == "*" { if cond == "*" {
cond = "true" cond = "true"
} }
r.mp.AddDeleteTagsByCondition(cond, data.Key, data.Value) err = r.mp.AddDeleteTagsByCondition(cond, data.Key, data.Value)
if err != nil {
return fmt.Errorf("MessageProcessor AddDeleteTagsByCondition() failed: %w", err)
}
} }
for oldname, newname := range r.config.RenameMetrics { for oldname, newname := range r.config.RenameMetrics {
r.mp.AddRenameMetricByName(oldname, newname) err = r.mp.AddRenameMetricByName(oldname, newname)
if err != nil {
return fmt.Errorf("MessageProcessor AddRenameMetricByName() failed: %w", err)
}
} }
for metricName, prefix := range r.config.ChangeUnitPrefix { for metricName, prefix := range r.config.ChangeUnitPrefix {
r.mp.AddChangeUnitPrefix(fmt.Sprintf("name == '%s'", metricName), prefix) err = r.mp.AddChangeUnitPrefix(fmt.Sprintf("name == '%s'", metricName), prefix)
if err != nil {
return fmt.Errorf("MessageProcessor AddChangeUnitPrefix() failed: %w", err)
}
} }
r.mp.SetNormalizeUnits(r.config.NormalizeUnits) r.mp.SetNormalizeUnits(r.config.NormalizeUnits)
r.mp.AddAddTagsByCondition("true", r.config.HostnameTagName, r.hostname) err = r.mp.AddAddTagsByCondition("!msg.HasTag('"+r.config.HostnameTagName+"')", r.config.HostnameTagName, r.hostname)
if err != nil {
return fmt.Errorf("MessageProcessor AddAddTagsByCondition() failed: %w", err)
}
// r.config.dropMetrics = make(map[string]bool)
// for _, mname := range r.config.DropMetrics {
// r.config.dropMetrics[mname] = true
// }
return nil return nil
} }
func getParamMap(point lp.CCMessage) map[string]interface{} { func getParamMap(point lp.CCMessage) map[string]any {
params := make(map[string]interface{}) params := make(map[string]any)
params["metric"] = point params["metric"] = point
params["name"] = point.Name() params["name"] = point.Name()
for key, value := range point.Tags() { for key, value := range point.Tags() {
@@ -180,14 +197,12 @@ func getParamMap(point lp.CCMessage) map[string]interface{} {
for key, value := range point.Meta() { for key, value := range point.Meta() {
params[key] = value params[key] = value
} }
for key, value := range point.Fields() { maps.Copy(params, point.Fields())
params[key] = value
}
params["timestamp"] = point.Time() params["timestamp"] = point.Time()
return params return params
} }
// DoAddTags adds a tag when condition is fullfiled // DoAddTags adds a tag when condition is fulfilled
func (r *metricRouter) DoAddTags(point lp.CCMessage) { func (r *metricRouter) DoAddTags(point lp.CCMessage) {
var conditionMatches bool var conditionMatches bool
for _, m := range r.config.AddTags { for _, m := range r.config.AddTags {
@@ -209,83 +224,6 @@ func (r *metricRouter) DoAddTags(point lp.CCMessage) {
} }
} }
// DoDelTags removes a tag when condition is fullfiled
// func (r *metricRouter) DoDelTags(point lp.CCMessage) {
// var conditionMatches bool
// for _, m := range r.config.DelTags {
// if m.Condition == "*" {
// // Condition is always matched
// conditionMatches = true
// } else {
// // Evaluate condition
// var err error
// conditionMatches, err = agg.EvalBoolCondition(m.Condition, getParamMap(point))
// if err != nil {
// cclog.ComponentError("MetricRouter", err.Error())
// conditionMatches = false
// }
// }
// if conditionMatches {
// point.RemoveTag(m.Key)
// }
// }
// }
// Conditional test whether a metric should be dropped
// func (r *metricRouter) dropMetric(point lp.CCMessage) bool {
// // Simple drop check
// if conditionMatches, ok := r.config.dropMetrics[point.Name()]; ok {
// return conditionMatches
// }
// // Checking the dropping conditions
// for _, m := range r.config.DropMetricsIf {
// conditionMatches, err := agg.EvalBoolCondition(m, getParamMap(point))
// if err != nil {
// cclog.ComponentError("MetricRouter", err.Error())
// conditionMatches = false
// }
// if conditionMatches {
// return conditionMatches
// }
// }
// // No dropping condition met
// return false
// }
// func (r *metricRouter) prepareUnit(point lp.CCMessage) bool {
// if r.config.NormalizeUnits {
// if in_unit, ok := point.GetMeta("unit"); ok {
// u := units.NewUnit(in_unit)
// if u.Valid() {
// point.AddMeta("unit", u.Short())
// }
// }
// }
// if newP, ok := r.config.ChangeUnitPrefix[point.Name()]; ok {
// newPrefix := units.NewPrefix(newP)
// if in_unit, ok := point.GetMeta("unit"); ok && newPrefix != units.InvalidPrefix {
// u := units.NewUnit(in_unit)
// if u.Valid() {
// cclog.ComponentDebug("MetricRouter", "Change prefix to", newP, "for metric", point.Name())
// conv, out_unit := units.GetUnitPrefixFactor(u, newPrefix)
// if conv != nil && out_unit.Valid() {
// if val, ok := point.GetField("value"); ok {
// point.AddField("value", conv(val))
// point.AddMeta("unit", out_unit.Short())
// }
// }
// }
// }
// }
// return true
// }
// Start starts the metric router // Start starts the metric router
func (r *metricRouter) Start() { func (r *metricRouter) Start() {
// start timer if configured // start timer if configured
@@ -301,31 +239,9 @@ func (r *metricRouter) Start() {
cclog.ComponentDebug("MetricRouter", "DONE") cclog.ComponentDebug("MetricRouter", "DONE")
} }
// Forward takes a received metric, adds or deletes tags // Forward message received from collector channel
// and forwards it to the output channels
// forward := func(point lp.CCMessage) {
// cclog.ComponentDebug("MetricRouter", "FORWARD", point)
// r.DoAddTags(point)
// r.DoDelTags(point)
// name := point.Name()
// if new, ok := r.config.RenameMetrics[name]; ok {
// point.SetName(new)
// point.AddMeta("oldname", name)
// r.DoAddTags(point)
// r.DoDelTags(point)
// }
// r.prepareUnit(point)
// for _, o := range r.outputs {
// o <- point
// }
// }
// Foward message received from collector channel
coll_forward := func(p lp.CCMessage) { coll_forward := func(p lp.CCMessage) {
// receive from metric collector // receive from metric collector
//p.AddTag(r.config.HostnameTagName, r.hostname)
if r.config.IntervalStamp { if r.config.IntervalStamp {
p.SetTime(r.timestamp) p.SetTime(r.timestamp)
} }
@@ -335,11 +251,6 @@ func (r *metricRouter) Start() {
o <- m o <- m
} }
} }
// if !r.dropMetric(p) {
// for _, o := range r.outputs {
// o <- point
// }
// }
// even if the metric is dropped, it is stored in the cache for // even if the metric is dropped, it is stored in the cache for
// aggregations // aggregations
if r.config.NumCacheIntervals > 0 { if r.config.NumCacheIntervals > 0 {
@@ -359,9 +270,6 @@ func (r *metricRouter) Start() {
o <- m o <- m
} }
} }
// if !r.dropMetric(p) {
// forward(p)
// }
} }
// Forward message received from cache channel // Forward message received from cache channel
@@ -380,10 +288,7 @@ func (r *metricRouter) Start() {
r.cache.Start() r.cache.Start()
} }
r.wg.Add(1) r.wg.Go(func() {
go func() {
defer r.wg.Done()
for { for {
select { select {
case <-r.done: case <-r.done:
@@ -413,7 +318,7 @@ func (r *metricRouter) Start() {
} }
} }
} }
}() })
cclog.ComponentDebug("MetricRouter", "STARTED") cclog.ComponentDebug("MetricRouter", "STARTED")
} }

View File

@@ -13,11 +13,11 @@ import (
"os" "os"
"path/filepath" "path/filepath"
"regexp" "regexp"
"slices"
"strconv" "strconv"
"strings" "strings"
cclogger "github.com/ClusterCockpit/cc-lib/ccLogger" cclogger "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"golang.org/x/exp/slices"
) )
const SYSFS_CPUBASE = `/sys/devices/system/cpu` const SYSFS_CPUBASE = `/sys/devices/system/cpu`
@@ -51,14 +51,13 @@ var cache struct {
func fileToInt(path string) int { func fileToInt(path string) int {
buffer, err := os.ReadFile(path) buffer, err := os.ReadFile(path)
if err != nil { if err != nil {
log.Print(err) cclogger.ComponentError("ccTopology", fmt.Sprintf("fileToInt(): Reading \"%s\": %v", path, err))
cclogger.ComponentError("ccTopology", "fileToInt", "Reading", path, ":", err.Error())
return -1 return -1
} }
stringBuffer := strings.TrimSpace(string(buffer)) stringBuffer := strings.TrimSpace(string(buffer))
id, err := strconv.Atoi(stringBuffer) id, err := strconv.Atoi(stringBuffer)
if err != nil { if err != nil {
cclogger.ComponentError("ccTopology", "fileToInt", "Parsing", path, ":", stringBuffer, err.Error()) cclogger.ComponentError("ccTopology", fmt.Sprintf("fileToInt(): Parsing \"%s\": %v", stringBuffer, err))
return -1 return -1
} }
return id return id
@@ -80,7 +79,7 @@ func fileToList(path string) []int {
// Create list // Create list
list := make([]int, 0) list := make([]int, 0)
stringBuffer := strings.TrimSpace(string(buffer)) stringBuffer := strings.TrimSpace(string(buffer))
for _, valueRangeString := range strings.Split(stringBuffer, ",") { for valueRangeString := range strings.SplitSeq(stringBuffer, ",") {
valueRange := strings.Split(valueRangeString, "-") valueRange := strings.Split(valueRangeString, "-")
switch len(valueRange) { switch len(valueRange) {
case 1: case 1:
@@ -112,79 +111,76 @@ func fileToList(path string) []int {
// init initializes the cache structure // init initializes the cache structure
func init() { func init() {
getHWThreads := func() []int {
globPath := filepath.Join(SYSFS_CPUBASE, "cpu[0-9]*")
regexPath := filepath.Join(SYSFS_CPUBASE, "cpu([[:digit:]]+)")
regex := regexp.MustCompile(regexPath)
getHWThreads := // File globbing for hardware threads
func() []int { files, err := filepath.Glob(globPath)
globPath := filepath.Join(SYSFS_CPUBASE, "cpu[0-9]*") if err != nil {
regexPath := filepath.Join(SYSFS_CPUBASE, "cpu([[:digit:]]+)") cclogger.ComponentError("CCTopology", "init:getHWThreads", err.Error())
regex := regexp.MustCompile(regexPath) return nil
}
// File globbing for hardware threads hwThreadIDs := make([]int, len(files))
files, err := filepath.Glob(globPath) for i, file := range files {
if err != nil { // Extract hardware thread ID
cclogger.ComponentError("CCTopology", "init:getHWThreads", err.Error()) matches := regex.FindStringSubmatch(file)
if len(matches) != 2 {
cclogger.ComponentError("CCTopology", "init:getHWThreads: Failed to extract hardware thread ID from ", file)
return nil return nil
} }
hwThreadIDs := make([]int, len(files)) // Convert hardware thread ID to int
for i, file := range files {
// Extract hardware thread ID
matches := regex.FindStringSubmatch(file)
if len(matches) != 2 {
cclogger.ComponentError("CCTopology", "init:getHWThreads: Failed to extract hardware thread ID from ", file)
return nil
}
// Convert hardware thread ID to int
id, err := strconv.Atoi(matches[1])
if err != nil {
cclogger.ComponentError("CCTopology", "init:getHWThreads: Failed to convert to int hardware thread ID ", matches[1])
return nil
}
hwThreadIDs[i] = id
}
// Sort hardware thread IDs
slices.Sort(hwThreadIDs)
return hwThreadIDs
}
getNumaDomain :=
func(basePath string) int {
globPath := filepath.Join(basePath, "node*")
regexPath := filepath.Join(basePath, "node([[:digit:]]+)")
regex := regexp.MustCompile(regexPath)
// File globbing for NUMA node
files, err := filepath.Glob(globPath)
if err != nil {
cclogger.ComponentError("CCTopology", "init:getNumaDomain", err.Error())
return -1
}
// Check, that exactly one NUMA domain was found
if len(files) != 1 {
cclogger.ComponentError("CCTopology", "init:getNumaDomain", "Number of NUMA domains != 1: ", len(files))
return -1
}
// Extract NUMA node ID
matches := regex.FindStringSubmatch(files[0])
if len(matches) != 2 {
cclogger.ComponentError("CCTopology", "init:getNumaDomain", "Failed to extract NUMA node ID from: ", files[0])
return -1
}
id, err := strconv.Atoi(matches[1]) id, err := strconv.Atoi(matches[1])
if err != nil { if err != nil {
cclogger.ComponentError("CCTopology", "init:getNumaDomain", "Failed to parse NUMA node ID from: ", matches[1]) cclogger.ComponentError("CCTopology", "init:getHWThreads: Failed to convert to int hardware thread ID ", matches[1])
return -1 return nil
} }
return id hwThreadIDs[i] = id
} }
// Sort hardware thread IDs
slices.Sort(hwThreadIDs)
return hwThreadIDs
}
getNumaDomain := func(basePath string) int {
globPath := filepath.Join(basePath, "node*")
regexPath := filepath.Join(basePath, "node([[:digit:]]+)")
regex := regexp.MustCompile(regexPath)
// File globbing for NUMA node
files, err := filepath.Glob(globPath)
if err != nil {
cclogger.ComponentError("CCTopology", "init:getNumaDomain", err.Error())
return -1
}
// Check, that exactly one NUMA domain was found
if len(files) != 1 {
cclogger.ComponentError("CCTopology", "init:getNumaDomain", "Number of NUMA domains != 1: ", len(files))
return -1
}
// Extract NUMA node ID
matches := regex.FindStringSubmatch(files[0])
if len(matches) != 2 {
cclogger.ComponentError("CCTopology", "init:getNumaDomain", "Failed to extract NUMA node ID from: ", files[0])
return -1
}
id, err := strconv.Atoi(matches[1])
if err != nil {
cclogger.ComponentError("CCTopology", "init:getNumaDomain", "Failed to parse NUMA node ID from: ", matches[1])
return -1
}
return id
}
cache.HwthreadList = getHWThreads() cache.HwthreadList = getHWThreads()
cache.CoreList = make([]int, len(cache.HwthreadList)) cache.CoreList = make([]int, len(cache.HwthreadList))
cache.SocketList = make([]int, len(cache.HwthreadList)) cache.SocketList = make([]int, len(cache.HwthreadList))
@@ -219,16 +215,15 @@ func init() {
// Lookup NUMA domain id // Lookup NUMA domain id
cache.NumaDomainList[i] = getNumaDomain(cpuBase) cache.NumaDomainList[i] = getNumaDomain(cpuBase)
cache.CpuData[i] = cache.CpuData[i] = HwthreadEntry{
HwthreadEntry{ CpuID: cache.HwthreadList[i],
CpuID: cache.HwthreadList[i], SMT: cache.SMTList[i],
SMT: cache.SMTList[i], CoreCPUsList: coreCPUsList,
CoreCPUsList: coreCPUsList, Socket: cache.SocketList[i],
Socket: cache.SocketList[i], NumaDomain: cache.NumaDomainList[i],
NumaDomain: cache.NumaDomainList[i], Die: cache.DieList[i],
Die: cache.DieList[i], Core: cache.CoreList[i],
Core: cache.CoreList[i], }
}
} }
slices.Sort(cache.HwthreadList) slices.Sort(cache.HwthreadList)
@@ -260,12 +255,6 @@ func HwthreadList() []int {
return slices.Clone(cache.HwthreadList) return slices.Clone(cache.HwthreadList)
} }
// Get list of hardware thread IDs in the order of listing in /proc/cpuinfo
// Deprecated! Use HwthreadList()
func CpuList() []int {
return HwthreadList()
}
// CoreList gets the list of CPU core IDs in the order of listing in /proc/cpuinfo // CoreList gets the list of CPU core IDs in the order of listing in /proc/cpuinfo
func CoreList() []int { func CoreList() []int {
return slices.Clone(cache.CoreList) return slices.Clone(cache.CoreList)
@@ -304,20 +293,19 @@ func GetTypeList(topology_type string) []int {
} }
func GetTypeId(hwt HwthreadEntry, topology_type string) (int, error) { func GetTypeId(hwt HwthreadEntry, topology_type string) (int, error) {
var err error = nil
switch topology_type { switch topology_type {
case "node": case "node":
return 0, err return 0, nil
case "socket": case "socket":
return hwt.Socket, err return hwt.Socket, nil
case "die": case "die":
return hwt.Die, err return hwt.Die, nil
case "memoryDomain": case "memoryDomain":
return hwt.NumaDomain, err return hwt.NumaDomain, nil
case "core": case "core":
return hwt.Core, err return hwt.Core, nil
case "hwthread": case "hwthread":
return hwt.CpuID, err return hwt.CpuID, nil
} }
return -1, fmt.Errorf("unknown topology type '%s'", topology_type) return -1, fmt.Errorf("unknown topology type '%s'", topology_type)
} }

View File

@@ -10,7 +10,7 @@ package multiChanTicker
import ( import (
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
) )
type multiChanTicker struct { type multiChanTicker struct {
@@ -21,7 +21,7 @@ type multiChanTicker struct {
type MultiChanTicker interface { type MultiChanTicker interface {
Init(duration time.Duration) Init(duration time.Duration)
AddChannel(chan time.Time) AddChannel(channel chan time.Time)
Close() Close()
} }

View File

@@ -30,11 +30,11 @@ make
%install %install
install -Dpm 0750 %{name} %{buildroot}%{_bindir}/%{name} install -Dpm 0750 %{name} %{buildroot}%{_bindir}/%{name}
install -Dpm 0600 config.json %{buildroot}%{_sysconfdir}/%{name}/%{name}.json install -Dpm 0600 example-configs/config.json %{buildroot}%{_sysconfdir}/%{name}/%{name}.json
install -Dpm 0600 collectors.json %{buildroot}%{_sysconfdir}/%{name}/collectors.json install -Dpm 0600 example-configs/collectors.json %{buildroot}%{_sysconfdir}/%{name}/collectors.json
install -Dpm 0600 sinks.json %{buildroot}%{_sysconfdir}/%{name}/sinks.json install -Dpm 0600 example-configs/sinks.json %{buildroot}%{_sysconfdir}/%{name}/sinks.json
install -Dpm 0600 receivers.json %{buildroot}%{_sysconfdir}/%{name}/receivers.json install -Dpm 0600 example-configs/receivers.json %{buildroot}%{_sysconfdir}/%{name}/receivers.json
install -Dpm 0600 router.json %{buildroot}%{_sysconfdir}/%{name}/router.json install -Dpm 0600 example-configs/router.json %{buildroot}%{_sysconfdir}/%{name}/router.json
install -Dpm 0644 scripts/%{name}.service %{buildroot}%{_unitdir}/%{name}.service install -Dpm 0644 scripts/%{name}.service %{buildroot}%{_unitdir}/%{name}.service
install -Dpm 0600 scripts/%{name}.config %{buildroot}%{_sysconfdir}/default/%{name} install -Dpm 0600 scripts/%{name}.config %{buildroot}%{_sysconfdir}/default/%{name}
install -Dpm 0644 scripts/%{name}.sysusers %{buildroot}%{_sysusersdir}/%{name}.conf install -Dpm 0644 scripts/%{name}.sysusers %{buildroot}%{_sysusersdir}/%{name}.conf