cc-metric-collector/collectors/infinibandMetric.md
Thomas Gruber 195d0794b0
Merge develop branch into main (#106)
* Add cpu_used (all-cpu_idle) to CpustatCollector

* Update to line-protocol/v2

* Update runonce.yml with Golang 1.20

* Update fsnotify in LIKWID Collector

* Use not a pointer to line-protocol.Encoder

* Simplify Makefile

* Use only as many arguments as required

* Allow sum function to handle non float types

* Allow values to be a slice of type float64, float32, int, int64, int32, bool

* Use generic function to simplify code

* Add missing case for type []int32

* Use generic function to compute minimum

* Use generic function to compute maximum

* Use generic function to compute average

* Add error value to sumAnyType

* Use generic function to compute median

* For older versions of go slices is not part of the installation

* Remove old entries from go.sum

* Use simpler sort function

* Compute metrics ib_total and ib_total_pkts

* Add aggregated metrics.
Add missing units

* Update likwidMetric.go

Fixes a potential bug when `fsnotify.NewWatcher()` fails with an error

* Completly avoid memory allocations in infinibandMetric read()

* Fixed initialization: Initalization and measurements should run in the same thread

---------

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2023-08-29 14:12:49 +02:00

1.1 KiB

ibstat collector

  "ibstat": {
    "exclude_devices": [
      "mlx4"
    ],
    "send_abs_values": true,
    "send_derived_values": true
  }

The ibstat collector includes all Infiniband devices that can be found below /sys/class/infiniband/ and where any of the ports provides a LID file (/sys/class/infiniband/<dev>/ports/<port>/lid)

The devices can be filtered with the exclude_devices option in the configuration.

For each found LID the collector reads data through the sysfs files below /sys/class/infiniband/<device>. (See: https://www.kernel.org/doc/Documentation/ABI/stable/sysfs-class-infiniband)

Metrics:

  • ib_recv
  • ib_xmit
  • ib_recv_pkts
  • ib_xmit_pkts
  • ib_total = ib_recv + ib_xmit (if send_total_values == true)
  • ib_total_pkts = ib_recv_pkts + ib_xmit_pkts (if send_total_values == true)
  • ib_recv_bw (if send_derived_values == true)
  • ib_xmit_bw (if send_derived_values == true)
  • ib_recv_pkts_bw (if send_derived_values == true)
  • ib_xmit_pkts_bw (if send_derived_values == true)

The collector adds a device tag to all metrics