cc-metric-collector/collectors
oscarminus 88fabc2e83
cpustatMetric.go: Use derived values instead of absolute values (#83)
* cpustatMetric.go: Use derived values instead of absolute values

  The values in /proc/stat are absolute counters related to the boot
  time of the system. To obtain a utilization of the CPU, the changes
  in the counters must be derived according to time. To take only the
  absolute values leads to the fact that changes in the utilization,
  straight with larger values, do not become visible.

* Add new collector for /proc/schedstat

  The `schedstat` collector reads data from /proc/schedstat and calculates
  a load value, separated by hwthread. This might be useful to detect bad
  cpu pinning on shared nodes etc.

Co-authored-by: Michael Schwarz <post@michael-schwarz.name>
2022-09-07 14:09:29 +02:00
..
beegfsmetaMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
beegfsmetaMetric.md Beegfs collector (#50) 2022-03-04 14:35:47 +01:00
beegfsstorageMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
beegfsstorageMetric.md Beegfs collector (#50) 2022-03-04 14:35:47 +01:00
collectorManager.go cpustatMetric.go: Use derived values instead of absolute values (#83) 2022-09-07 14:09:29 +02:00
cpufreqCpuinfoMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
cpufreqCpuinfoMetric.md Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
cpufreqMetric.go Merge latest development changes (#80) 2022-07-13 10:09:49 +02:00
cpufreqMetric.md Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
cpustatMetric.go cpustatMetric.go: Use derived values instead of absolute values (#83) 2022-09-07 14:09:29 +02:00
cpustatMetric.md Fix for documentation 2022-01-26 18:37:59 +01:00
customCmdMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
customCmdMetric.md Modularize the whole thing (#16) 2022-01-25 15:37:43 +01:00
diskstatMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
diskstatMetric.md Split diskstat Collector (#38) 2022-02-21 12:44:26 +01:00
gpfsMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
gpfsMetric.md Derived metrics (#57) 2022-03-11 13:48:18 +01:00
infinibandMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
infinibandMetric.md Derived metrics (#57) 2022-03-11 13:48:18 +01:00
iostatMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
iostatMetric.md Split diskstat Collector (#38) 2022-02-21 12:44:26 +01:00
ipmiMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
ipmiMetric.md Modularize the whole thing (#16) 2022-01-25 15:37:43 +01:00
likwidMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
likwidMetric.md Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
loadavgMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
loadavgMetric.md Modularize the whole thing (#16) 2022-01-25 15:37:43 +01:00
lustreMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
lustreMetric.md Derived metrics (#65) 2022-03-15 16:09:47 +01:00
Makefile Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
memstatMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
memstatMetric.md Modularize the whole thing (#16) 2022-01-25 15:37:43 +01:00
metricCollector.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
netstatMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
netstatMetric.md Derived metrics (#57) 2022-03-11 13:48:18 +01:00
nfs3Metric.md Split NfsCollector in Nfs3Collector and Nfs4Collector (#28) 2022-02-07 15:43:01 +01:00
nfs4Metric.md Split NfsCollector in Nfs3Collector and Nfs4Collector (#28) 2022-02-07 15:43:01 +01:00
nfsMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
numastatsMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
numastatsMetric.md Add collector documentation 2022-02-08 13:46:44 +01:00
nvidiaMetric.go Redo fix for NvidiaCollector and MiG. Got lost somehow 2022-07-12 12:31:24 +02:00
nvidiaMetric.md Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
README.md Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
rocmsmiMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
rocmsmiMetric.md Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
sampleMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
sampleTimerMetric.go Add samples for collectors, sinks and receivers 2022-02-25 13:47:19 +01:00
schedstatMetric.go cpustatMetric.go: Use derived values instead of absolute values (#83) 2022-09-07 14:09:29 +02:00
schedstatMetric.md cpustatMetric.go: Use derived values instead of absolute values (#83) 2022-09-07 14:09:29 +02:00
tempMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
tempMetric.md Modularize the whole thing (#16) 2022-01-25 15:37:43 +01:00
topprocsMetric.go Merge latest development changes to main branch (#79) 2022-06-08 15:25:40 +02:00
topprocsMetric.md Modularize the whole thing (#16) 2022-01-25 15:37:43 +01:00

CCMetric collectors

This folder contains the collectors for the cc-metric-collector.

Configuration

{
    "collector_type" : {
        <collector specific configuration>
    }
}

In contrast to the configuration files for sinks and receivers, the collectors configuration is not a list but a set of dicts. This is required because we didn't manage to partially read the type before loading the remaining configuration. We are eager to change this to the same format.

Available collectors

Todos

  • Aggreate metrics to higher topology entity (sum hwthread metrics to socket metric, ...). Needs to be configurable

Contributing own collectors

A collector reads data from any source, parses it to metrics and submits these metrics to the metric-collector. A collector provides three function:

  • Name() string: Return the name of the collector
  • Init(config json.RawMessage) error: Initializes the collector using the given collector-specific config in JSON. Check if needed files/commands exists, ...
  • Initialized() bool: Check if a collector is successfully initialized
  • Read(duration time.Duration, output chan ccMetric.CCMetric): Read, parse and submit data to the output channel as CCMetric. If the collector has to measure anything for some duration, use the provided function argument duration.
  • Close(): Closes down the collector.

It is recommanded to call setup() in the Init() function.

Finally, the collector needs to be registered in the collectorManager.go. There is a list of collectors called AvailableCollectors which is a map (collector_type_string -> pointer to MetricCollector interface). Add a new entry with a descriptive name and the new collector.

Sample collector

package collectors

import (
    "encoding/json"
    "time"

    lp "github.com/ClusterCockpit/cc-metric-collector/internal/ccMetric"
)

// Struct for the collector-specific JSON config
type SampleCollectorConfig struct {
    ExcludeMetrics []string `json:"exclude_metrics"`
}

type SampleCollector struct {
    metricCollector
    config SampleCollectorConfig
}

func (m *SampleCollector) Init(config json.RawMessage) error {
    // Check if already initialized
    if m.init {
        return nil
    }

    m.name = "SampleCollector"
    m.setup()
    if len(config) > 0 {
        err := json.Unmarshal(config, &m.config)
        if err != nil {
            return err
        }
    }
    m.meta = map[string]string{"source": m.name, "group": "Sample"}

    m.init = true
    return nil
}

func (m *SampleCollector) Read(interval time.Duration, output chan lp.CCMetric) {
    if !m.init {
        return
    }
    // tags for the metric, if type != node use proper type and type-id
    tags := map[string]string{"type" : "node"}

    x, err := GetMetric()
    if err != nil {
        cclog.ComponentError(m.name, fmt.Sprintf("Read(): %v", err))
    }

    // Each metric has exactly one field: value !
    value := map[string]interface{}{"value": int64(x)}
    if y, err := lp.New("sample_metric", tags, m.meta, value, time.Now()); err == nil {
        output <- y
    }
}

func (m *SampleCollector) Close() {
    m.init = false
    return
}