cc-metric-collector/collectors
Thomas Gruber 7840de7b82
Merge develop branch into main (#123)
* Add cpu_used (all-cpu_idle) to CpustatCollector

* Update cc-metric-collector.init

* Allow selection of timestamp precision in HttpSink

* Add comment about precision requirement for cc-metric-store

* Fix for API changes in gofish@v0.15.0

* Update requirements to latest version

* Read sensors through redfish

* Update golang toolchain to 1.21

* Remove stray error check

* Update main config in configuration.md

* Update Release action to use golang 1.22 stable release, no golang RPMs anymore

* Update runonce action to use golang 1.22 stable release, no golang RPMs anymore

* Update README.md

Use right JSON type in configuration

* Update sink's README

* Test whether ipmitool or ipmi-sensors can be executed without errors

* Little fixes to the prometheus sink (#115)

* Add uint64 to float64 cast option

* Add prometheus sink to the list of available sinks

* Add aggregated counters by gpu for nvlink errors

---------

Co-authored-by: Michael Schwarz <schwarz@uni-paderborn.de>

* Ccmessage migration (#119)

* Add cpu_used (all-cpu_idle) to CpustatCollector

* Update cc-metric-collector.init

* Allow selection of timestamp precision in HttpSink

* Add comment about precision requirement for cc-metric-store

* Fix for API changes in gofish@v0.15.0

* Update requirements to latest version

* Read sensors through redfish

* Update golang toolchain to 1.21

* Remove stray error check

* Update main config in configuration.md

* Update Release action to use golang 1.22 stable release, no golang RPMs anymore

* Update runonce action to use golang 1.22 stable release, no golang RPMs anymore

* Switch to CCMessage for all files.

---------

Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>

* Switch to ccmessage also for latest additions in nvidiaMetric

* New Message processor (#118)

* Add cpu_used (all-cpu_idle) to CpustatCollector

* Update cc-metric-collector.init

* Allow selection of timestamp precision in HttpSink

* Add comment about precision requirement for cc-metric-store

* Fix for API changes in gofish@v0.15.0

* Update requirements to latest version

* Read sensors through redfish

* Update golang toolchain to 1.21

* Remove stray error check

* Update main config in configuration.md

* Update Release action to use golang 1.22 stable release, no golang RPMs anymore

* Update runonce action to use golang 1.22 stable release, no golang RPMs anymore

* New message processor to check whether a message should be dropped or manipulate it in flight

* Create a copy of message before manipulation

---------

Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>

* Update collector's Makefile and go.mod/sum files

* Use message processor in router, all sinks and all receivers

* Add support for credential file (NKEY) to NATS sink and receiver

* Fix JSON keys in message processor configuration

* Update docs for message processor, router and the default router config file

* Add link to expr syntax and fix regex matching docs

* Update sample collectors

* Minor style change in collector manager

* Some helpers for ccTopology

* LIKWID collector: write log owner change only once

* Fix for metrics without units and reduce debugging messages for messageProcessor

* Use shorted hostname for hostname added by router

* Define default port for NATS

* CPUstat collector: only add unit for applicable metrics

* Add precision option to all sinks using Influx's encoder

* Add message processor to all sink documentation

* Add units to documentation of cpustat collector

---------

Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: oscarminus <me@oscarminus.de>
Co-authored-by: Michael Schwarz <schwarz@uni-paderborn.de>
2024-12-19 23:00:14 +01:00
..
beegfsmetaMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
beegfsmetaMetric.md Beegfs collector (#50) 2022-03-04 14:35:47 +01:00
beegfsstorageMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
beegfsstorageMetric.md Beegfs collector (#50) 2022-03-04 14:35:47 +01:00
collectorManager.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
cpufreqCpuinfoMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
cpufreqCpuinfoMetric.md Merge develop into main (#109) 2023-12-04 12:21:26 +01:00
cpufreqMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
cpufreqMetric.md Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
cpustatMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
cpustatMetric.md Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
customCmdMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
customCmdMetric.md
diskstatMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
diskstatMetric.md
gpfsMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
gpfsMetric.md Merge develop branch into main (#106) 2023-08-29 14:12:49 +02:00
infinibandMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
infinibandMetric.md Merge develop branch into main (#106) 2023-08-29 14:12:49 +02:00
iostatMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
iostatMetric.md
ipmiMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
ipmiMetric.md Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
likwidMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
likwidMetric.md Update likwidMetric.md 2024-10-08 13:36:46 +02:00
loadavgMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
loadavgMetric.md
lustreMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
lustreMetric.md Derived metrics (#65) 2022-03-15 16:09:47 +01:00
Makefile Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
memstatMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
memstatMetric.md
metricCollector.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
netstatMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
netstatMetric.md Merge develop into main (#109) 2023-12-04 12:21:26 +01:00
nfs3Metric.md
nfs4Metric.md
nfsiostatMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
nfsiostatMetric.md Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
nfsMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
numastatsMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
numastatsMetric.md Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
nvidiaMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
nvidiaMetric.md Option to use MIG slice name as subtype-id in NvidiaCollector 2022-05-13 15:26:47 +02:00
raplMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
raplMetric.md Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
README.md Merge develop branch into main (#96) 2022-12-14 17:02:39 +01:00
rocmsmiMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
rocmsmiMetric.md AMD ROCm SMI collector (#77) 2022-05-25 15:55:43 +02:00
sampleMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
sampleTimerMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
schedstatMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
schedstatMetric.md cpustatMetric.go: Use derived values instead of absolute values (#83) 2022-09-07 14:13:06 +02:00
selfMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
selfMetric.md Add latest development to main branch (#89) 2022-10-10 12:23:51 +02:00
tempMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
tempMetric.md
topprocsMetric.go Merge develop branch into main (#123) 2024-12-19 23:00:14 +01:00
topprocsMetric.md

CCMetric collectors

This folder contains the collectors for the cc-metric-collector.

Configuration

{
    "collector_type" : {
        <collector specific configuration>
    }
}

In contrast to the configuration files for sinks and receivers, the collectors configuration is not a list but a set of dicts. This is required because we didn't manage to partially read the type before loading the remaining configuration. We are eager to change this to the same format.

Available collectors

Todos

  • Aggreate metrics to higher topology entity (sum hwthread metrics to socket metric, ...). Needs to be configurable

Contributing own collectors

A collector reads data from any source, parses it to metrics and submits these metrics to the metric-collector. A collector provides three function:

  • Name() string: Return the name of the collector
  • Init(config json.RawMessage) error: Initializes the collector using the given collector-specific config in JSON. Check if needed files/commands exists, ...
  • Initialized() bool: Check if a collector is successfully initialized
  • Read(duration time.Duration, output chan ccMetric.CCMetric): Read, parse and submit data to the output channel as CCMetric. If the collector has to measure anything for some duration, use the provided function argument duration.
  • Close(): Closes down the collector.

It is recommanded to call setup() in the Init() function.

Finally, the collector needs to be registered in the collectorManager.go. There is a list of collectors called AvailableCollectors which is a map (collector_type_string -> pointer to MetricCollector interface). Add a new entry with a descriptive name and the new collector.

Sample collector

package collectors

import (
    "encoding/json"
    "time"

    lp "github.com/ClusterCockpit/cc-metric-collector/internal/ccMetric"
)

// Struct for the collector-specific JSON config
type SampleCollectorConfig struct {
    ExcludeMetrics []string `json:"exclude_metrics"`
}

type SampleCollector struct {
    metricCollector
    config SampleCollectorConfig
}

func (m *SampleCollector) Init(config json.RawMessage) error {
    // Check if already initialized
    if m.init {
        return nil
    }

    m.name = "SampleCollector"
    m.setup()
    if len(config) > 0 {
        err := json.Unmarshal(config, &m.config)
        if err != nil {
            return err
        }
    }
    m.meta = map[string]string{"source": m.name, "group": "Sample"}

    m.init = true
    return nil
}

func (m *SampleCollector) Read(interval time.Duration, output chan lp.CCMetric) {
    if !m.init {
        return
    }
    // tags for the metric, if type != node use proper type and type-id
    tags := map[string]string{"type" : "node"}

    x, err := GetMetric()
    if err != nil {
        cclog.ComponentError(m.name, fmt.Sprintf("Read(): %v", err))
    }

    // Each metric has exactly one field: value !
    value := map[string]interface{}{"value": int64(x)}
    if y, err := lp.New("sample_metric", tags, m.meta, value, time.Now()); err == nil {
        output <- y
    }
}

func (m *SampleCollector) Close() {
    m.init = false
    return
}