cc-metric-collector/collectors
Thomas Gruber 6ab45dd3ec
Merge develop into main (#109)
* Add cpu_used (all-cpu_idle) to CpustatCollector

* Update to line-protocol/v2

* Update runonce.yml with Golang 1.20

* Update fsnotify in LIKWID Collector

* Use not a pointer to line-protocol.Encoder

* Simplify Makefile

* Use only as many arguments as required

* Allow sum function to handle non float types

* Allow values to be a slice of type float64, float32, int, int64, int32, bool

* Use generic function to simplify code

* Add missing case for type []int32

* Use generic function to compute minimum

* Use generic function to compute maximum

* Use generic function to compute average

* Add error value to sumAnyType

* Use generic function to compute median

* For older versions of go slices is not part of the installation

* Remove old entries from go.sum

* Use simpler sort function

* Compute metrics ib_total and ib_total_pkts

* Add aggregated metrics.
Add missing units

* Update likwidMetric.go

Fixes a potential bug when `fsnotify.NewWatcher()` fails with an error

* Completly avoid memory allocations in infinibandMetric read()

* Fixed initialization: Initalization and measurements should run in the same thread

* Add safe.directory to Release action

* Fix path after installation to /usr/bin after installation

* ioutil.ReadFile is deprecated: As of Go 1.16, this function simply calls os.ReadFile

* Switch to package slices from the golang 1.21 default library

* Read file line by line

* Read file line by line

* Read file line by line

* Use CamelCase

* Use CamelCase

* Fix function getNumaDomain, it always returned 0

* Avoid type conversion by using Atoi
Avoid copying structs by using pointer access
Increase readability with CamelCase variable names

* Add caching

* Cache CpuData

* Cleanup

* Use init function to initalize cache structure to avoid multi threading problems

* Reuse information from /proc/cpuinfo

* Avoid slice cloning. Directly use the cache

* Add DieList

* Add NumaDomainList and SMTList

* Cleanup

* Add comment

* Lookup core ID from /sys/devices/system/cpu, /proc/cpuinfo is not portable

* Lookup all information from /sys/devices/system/cpu, /proc/cpuinfo is not portable

* Correctly handle lists from /sys

* Add Simultaneous Multithreading siblings

* Replace deprecated thread_siblings_list by core_cpus_list

* Reduce number of required slices

* Allow to send total values per core, socket and node

* Send all metrics with same time stamp
calcEventsetMetrics does only computiation, counter measurement is done before

* Input parameters should be float64 when evaluating to float64

* Send all metrics with same time stamp
calcGlobalMetrics does only computiation, counter measurement is done before

* Remove unused variable gmresults

* Add comments

* Updated go packages

* Add build with golang 1.21

* Switch to checkout action version 4

* Switch to setup-go action version 4

* Add workflow_dispatch to allow manual run of workflow

* Add workflow_dispatch to allow manual run of workflow

* Add release build jobs to runonce.yml

* Switch to golang 1.20 for RHEL based distributions

* Use dnf to download golang

* Remove golang versions before 1.20

* Upgrade Ubuntu focal -> jammy

* Pipe golang tar package directly to tar

* Update golang version

* Fix Ubuntu version number

* Add links to ipmi and redfish receivers

* Fix http server addr format

* github.com/influxdata/line-protocol -> github.com/influxdata/line-protocol/v2/lineprotocol

* Corrected spelling

* Add some comments

* github.com/influxdata/line-protocol -> github.com/influxdata/line-protocol/v2/lineprotocol

* Allow other fields not only field "value"

* Add some basic debugging documentation

* Add some basic debugging documentation

* Use a lock for the flush timer

* Add tags in lexical order as required by AddTag()

* Only access meta data, when it gets used as tag

* Use slice to store lexialicly orderd key value pairs

* Increase golang version requirement to 1.20.

* Avoid package cmp to allow builds with golang v1.20

* Fix: Error NVML library not found did crash
cc-metric-collector with "SIGSEGV: segmentation violation"

* Add config option idle_timeout

* Add basic authentication support

* Add basic authentication support

* Avoid unneccessary memory allocations

* Add documentation for send_*_total values

* Use generic package maps to clone maps

* Reuse flush timer

* Add Influx client options

* Reuse ccTopology functionality

* Do not store unused topology information

* Add batch_size config

* Cleanup

* Use stype and stype-id for the NIC in NetstatCollector

* Wait for concurrent flush operations to finish

* Be more verbose in error messages

* Reverted previous changes.
Made the code to complex without much advantages

* Use line protocol encoder

* Go pkg update

* Stop flush timer, when immediatelly flushing

* Fix: Corrected unlock access to batch slice

* Add config option to specify whether to use GZip compression in influx write requests

* Add asynchron send of encoder metrics

* Use DefaultServeMux instead of github.com/gorilla/mux

* Add config option for HTTP keep-alives

* Be more strict, when parsing json

* Add config option for HTTP request timeout and Retry interval

* Allow more then one background send operation

* Fix %sysusers_create_package args (#108)

%sysusers_create_package requires two arguments. See: https://github.com/systemd/systemd/blob/main/src/rpm/macros.systemd.in#L165

* Add nfsiostat to list of collectors

---------

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <holgerob@gmx.de>
Co-authored-by: Obihörnchen <obihoernchende@gmail.com>
2023-12-04 12:21:26 +01:00
..
beegfsmetaMetric.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
beegfsmetaMetric.md
beegfsstorageMetric.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
beegfsstorageMetric.md
collectorManager.go Merge develop into main (#109) 2023-12-04 12:21:26 +01:00
cpufreqCpuinfoMetric.go Merge develop into main (#109) 2023-12-04 12:21:26 +01:00
cpufreqCpuinfoMetric.md Merge develop into main (#109) 2023-12-04 12:21:26 +01:00
cpufreqMetric.go Merge develop into main (#109) 2023-12-04 12:21:26 +01:00
cpufreqMetric.md Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
cpustatMetric.go Merge develop branch into main (#106) 2023-08-29 14:12:49 +02:00
cpustatMetric.md Merge develop branch into main (#106) 2023-08-29 14:12:49 +02:00
customCmdMetric.go Use customcmd commands if they did not error. (#101) 2023-02-28 12:02:01 +01:00
customCmdMetric.md
diskstatMetric.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
diskstatMetric.md
gpfsMetric.go Merge develop branch into main (#106) 2023-08-29 14:12:49 +02:00
gpfsMetric.md Merge develop branch into main (#106) 2023-08-29 14:12:49 +02:00
infinibandMetric.go Merge develop branch into main (#106) 2023-08-29 14:12:49 +02:00
infinibandMetric.md Merge develop branch into main (#106) 2023-08-29 14:12:49 +02:00
iostatMetric.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
iostatMetric.md
ipmiMetric.go Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
ipmiMetric.md Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
likwidMetric.go Merge develop into main (#109) 2023-12-04 12:21:26 +01:00
likwidMetric.md Merge develop into main (#109) 2023-12-04 12:21:26 +01:00
loadavgMetric.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
loadavgMetric.md
lustreMetric.go Merge develop into main (#109) 2023-12-04 12:21:26 +01:00
lustreMetric.md
Makefile Merge develop branch into main (#106) 2023-08-29 14:12:49 +02:00
memstatMetric.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
memstatMetric.md
metricCollector.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
netstatMetric.go Merge develop into main (#109) 2023-12-04 12:21:26 +01:00
netstatMetric.md Merge develop into main (#109) 2023-12-04 12:21:26 +01:00
nfs3Metric.md
nfs4Metric.md
nfsiostatMetric.go Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
nfsiostatMetric.md Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
nfsMetric.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
numastatsMetric.go Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
numastatsMetric.md Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
nvidiaMetric.go Merge develop into main (#109) 2023-12-04 12:21:26 +01:00
nvidiaMetric.md
raplMetric.go Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
raplMetric.md Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
README.md Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
rocmsmiMetric.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
rocmsmiMetric.md
sampleMetric.go Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
sampleTimerMetric.go Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
schedstatMetric.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
schedstatMetric.md cpustatMetric.go: Use derived values instead of absolute values (#83) 2022-09-07 14:13:06 +02:00
selfMetric.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
selfMetric.md Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
tempMetric.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
tempMetric.md
topprocsMetric.go Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
topprocsMetric.md

CCMetric collectors

This folder contains the collectors for the cc-metric-collector.

Configuration

{
    "collector_type" : {
        <collector specific configuration>
    }
}

In contrast to the configuration files for sinks and receivers, the collectors configuration is not a list but a set of dicts. This is required because we didn't manage to partially read the type before loading the remaining configuration. We are eager to change this to the same format.

Available collectors

Todos

  • Aggreate metrics to higher topology entity (sum hwthread metrics to socket metric, ...). Needs to be configurable

Contributing own collectors

A collector reads data from any source, parses it to metrics and submits these metrics to the metric-collector. A collector provides three function:

  • Name() string: Return the name of the collector
  • Init(config json.RawMessage) error: Initializes the collector using the given collector-specific config in JSON. Check if needed files/commands exists, ...
  • Initialized() bool: Check if a collector is successfully initialized
  • Read(duration time.Duration, output chan ccMetric.CCMetric): Read, parse and submit data to the output channel as CCMetric. If the collector has to measure anything for some duration, use the provided function argument duration.
  • Close(): Closes down the collector.

It is recommanded to call setup() in the Init() function.

Finally, the collector needs to be registered in the collectorManager.go. There is a list of collectors called AvailableCollectors which is a map (collector_type_string -> pointer to MetricCollector interface). Add a new entry with a descriptive name and the new collector.

Sample collector

package collectors

import (
    "encoding/json"
    "time"

    lp "github.com/ClusterCockpit/cc-metric-collector/internal/ccMetric"
)

// Struct for the collector-specific JSON config
type SampleCollectorConfig struct {
    ExcludeMetrics []string `json:"exclude_metrics"`
}

type SampleCollector struct {
    metricCollector
    config SampleCollectorConfig
}

func (m *SampleCollector) Init(config json.RawMessage) error {
    // Check if already initialized
    if m.init {
        return nil
    }

    m.name = "SampleCollector"
    m.setup()
    if len(config) > 0 {
        err := json.Unmarshal(config, &m.config)
        if err != nil {
            return err
        }
    }
    m.meta = map[string]string{"source": m.name, "group": "Sample"}

    m.init = true
    return nil
}

func (m *SampleCollector) Read(interval time.Duration, output chan lp.CCMetric) {
    if !m.init {
        return
    }
    // tags for the metric, if type != node use proper type and type-id
    tags := map[string]string{"type" : "node"}

    x, err := GetMetric()
    if err != nil {
        cclog.ComponentError(m.name, fmt.Sprintf("Read(): %v", err))
    }

    // Each metric has exactly one field: value !
    value := map[string]interface{}{"value": int64(x)}
    if y, err := lp.New("sample_metric", tags, m.meta, value, time.Now()); err == nil {
        output <- y
    }
}

func (m *SampleCollector) Close() {
    m.init = false
    return
}