cc-metric-collector/collectors
Thomas Gruber 3f76947f54
Merge latest developments into main (#67)
* Update configuration.md

Add an additional receiver to have better alignment of components

* Change default GpfsCollector command to `mmpmon` (#53)

* Set default cmd to 'mmpmon'

* Reuse looked up path

* Cast const to string

* Just download LIKWID to get the headers (#54)

* Just download LIKWID to get the headers

* Remove perl-Data-Dumper from BuildRequires, only required by LIKWID build

* Add HttpReceiver as counterpart to the HttpSink (#49)

* Use GBytes as unit for large memory numbers

* Make maxForward configurable, save old name in meta in rename metrics and make the hostname tag key configurable

* Single release action (#55)

Building all RPMs and releasing in a single workflow

* Makefile target to build binary-only Debian packages (#61)

* Add 'install' and 'DEB' make targets to build binary-only Debian packages

* Add control file for DEB builds

* Use a single line for bash loop in make clean

* Add config options for retry intervals of InfluxDB clients (#59)

* Refactoring of LikwidCollector and metric units (#62)

* Reduce complexity of LikwidCollector and allow metric units

* Add unit to LikwidCollector docu and fix some typos

* Make library path configurable

* Use old metric name in Ganglia if rename has happened in the router (#60)

* Use old metric name if rename has happened in the router

* Also check for Ganglia renames for the oldname

* Derived metrics (#57)

* Add time-based derivatived (e.g. bandwidth) to some collectors

* Add documentation

* Add comments

* Fix: Only compute rates with a valid previous state

* Only compute rates with a valid previous state

* Define const values for net/dev fields

* Set default config values

* Add comments

* Refactor: Consolidate data structures

* Refactor: Consolidate data structures

* Refactor: Avoid struct deep copy

* Refactor: Avoid redundant tag maps

* Refactor: Use int64 type for absolut values

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>

* Simplified iota usage

* Move unit tag to meta data tags

* Derived metrics (#65)

* Add time-based derivatived (e.g. bandwidth) to some collectors

* Add documentation

* Add comments

* Fix: Only compute rates with a valid previous state

* Only compute rates with a valid previous state

* Define const values for net/dev fields

* Set default config values

* Add comments

* Refactor: Consolidate data structures

* Refactor: Consolidate data structures

* Refactor: Avoid struct deep copy

* Refactor: Avoid redundant tag maps

* Refactor: Use int64 type for absolut values

* Update LustreCollector

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>

* Meta to tags list and map for sinks (#63)

* Change ccMetric->Influx functions

* Use a meta_as_tags string list in config but create a lookup map afterwards

* Add meta as tag logic to sampleSink

* Fix staticcheck warnings (#66)

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2022-03-15 16:41:11 +01:00
..
beegfsmetaMetric.go Beegfs collector (#50) 2022-03-04 14:35:47 +01:00
beegfsmetaMetric.md Beegfs collector (#50) 2022-03-04 14:35:47 +01:00
beegfsstorageMetric.go Beegfs collector (#50) 2022-03-04 14:35:47 +01:00
beegfsstorageMetric.md Beegfs collector (#50) 2022-03-04 14:35:47 +01:00
collectorManager.go Beegfs collector (#50) 2022-03-04 14:35:47 +01:00
cpufreqCpuinfoMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
cpufreqCpuinfoMetric.md Add collector documentation 2022-02-08 13:46:44 +01:00
cpufreqMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
cpufreqMetric.md Add collector documentation 2022-02-08 13:46:44 +01:00
cpustatMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
cpustatMetric.md Fix for documentation 2022-01-26 18:37:59 +01:00
customCmdMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
customCmdMetric.md Modularize the whole thing (#16) 2022-01-25 15:37:43 +01:00
diskstatMetric.go DiskstatCollector: cast part_max_used metric to int 2022-02-22 15:50:49 +01:00
diskstatMetric.md Split diskstat Collector (#38) 2022-02-21 12:44:26 +01:00
gpfsMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
gpfsMetric.md Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
infinibandMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
infinibandMetric.md Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
iostatMetric.go Split diskstat Collector (#38) 2022-02-21 12:44:26 +01:00
iostatMetric.md Split diskstat Collector (#38) 2022-02-21 12:44:26 +01:00
ipmiMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
ipmiMetric.md Modularize the whole thing (#16) 2022-01-25 15:37:43 +01:00
likwidMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
likwidMetric.md Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
loadavgMetric.go Moved check which metric to skip to Init() 2022-02-04 19:22:42 +01:00
loadavgMetric.md Modularize the whole thing (#16) 2022-01-25 15:37:43 +01:00
lustreMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
lustreMetric.md Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
Makefile Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
memstatMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
memstatMetric.md Modularize the whole thing (#16) 2022-01-25 15:37:43 +01:00
metricCollector.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
netstatMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
netstatMetric.md Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
nfs3Metric.md Split NfsCollector in Nfs3Collector and Nfs4Collector (#28) 2022-02-07 15:43:01 +01:00
nfs4Metric.md Split NfsCollector in Nfs3Collector and Nfs4Collector (#28) 2022-02-07 15:43:01 +01:00
nfsMetric.go Split NfsCollector in Nfs3Collector and Nfs4Collector (#28) 2022-02-07 15:43:01 +01:00
numastatsMetric.go Cleanup 2022-02-14 22:14:06 +01:00
numastatsMetric.md Add collector documentation 2022-02-08 13:46:44 +01:00
nvidiaMetric.go Add comments and units to all nvidia metrics 2022-02-15 10:57:32 +01:00
nvidiaMetric.md Prefix Nvidia metrics with 'nv_' 2022-01-26 18:45:23 +01:00
README.md Beegfs collector (#50) 2022-03-04 14:35:47 +01:00
sampleMetric.go Additional comments 2022-02-28 09:57:26 +01:00
sampleTimerMetric.go Add samples for collectors, sinks and receivers 2022-02-25 13:47:19 +01:00
tempMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
tempMetric.md Modularize the whole thing (#16) 2022-01-25 15:37:43 +01:00
topprocsMetric.go Merge latest developments into main (#67) 2022-03-15 16:41:11 +01:00
topprocsMetric.md Modularize the whole thing (#16) 2022-01-25 15:37:43 +01:00

CCMetric collectors

This folder contains the collectors for the cc-metric-collector.

Configuration

{
    "collector_type" : {
        <collector specific configuration>
    }
}

In contrast to the configuration files for sinks and receivers, the collectors configuration is not a list but a set of dicts. This is required because we didn't manage to partially read the type before loading the remaining configuration. We are eager to change this to the same format.

Available collectors

Todos

  • Aggreate metrics to higher topology entity (sum hwthread metrics to socket metric, ...). Needs to be configurable

Contributing own collectors

A collector reads data from any source, parses it to metrics and submits these metrics to the metric-collector. A collector provides three function:

  • Name() string: Return the name of the collector
  • Init(config json.RawMessage) error: Initializes the collector using the given collector-specific config in JSON. Check if needed files/commands exists, ...
  • Initialized() bool: Check if a collector is successfully initialized
  • Read(duration time.Duration, output chan ccMetric.CCMetric): Read, parse and submit data to the output channel as CCMetric. If the collector has to measure anything for some duration, use the provided function argument duration.
  • Close(): Closes down the collector.

It is recommanded to call setup() in the Init() function.

Finally, the collector needs to be registered in the collectorManager.go. There is a list of collectors called AvailableCollectors which is a map (collector_type_string -> pointer to MetricCollector interface). Add a new entry with a descriptive name and the new collector.

Sample collector

package collectors

import (
    "encoding/json"
    "time"

    lp "github.com/ClusterCockpit/cc-metric-collector/internal/ccMetric"
)

// Struct for the collector-specific JSON config
type SampleCollectorConfig struct {
    ExcludeMetrics []string `json:"exclude_metrics"`
}

type SampleCollector struct {
    metricCollector
    config SampleCollectorConfig
}

func (m *SampleCollector) Init(config json.RawMessage) error {
    // Check if already initialized
    if m.init {
        return nil
    }

    m.name = "SampleCollector"
    m.setup()
    if len(config) > 0 {
        err := json.Unmarshal(config, &m.config)
        if err != nil {
            return err
        }
    }
    m.meta = map[string]string{"source": m.name, "group": "Sample"}

    m.init = true
    return nil
}

func (m *SampleCollector) Read(interval time.Duration, output chan lp.CCMetric) {
    if !m.init {
        return
    }
    // tags for the metric, if type != node use proper type and type-id
    tags := map[string]string{"type" : "node"}

    x, err := GetMetric()
    if err != nil {
        cclog.ComponentError(m.name, fmt.Sprintf("Read(): %v", err))
    }

    // Each metric has exactly one field: value !
    value := map[string]interface{}{"value": int64(x)}
    if y, err := lp.New("sample_metric", tags, m.meta, value, time.Now()); err == nil {
        output <- y
    }
}

func (m *SampleCollector) Close() {
    m.init = false
    return
}