Modularize the whole thing (#16)

* Use channels, add a metric router, split up configuration and use extended version of Influx line protocol internally * Use central timer for collectors and router. Add expressions to router * Add expression to router config * Update entry points * Start with README * Update README for CCMetric * Formatting * Update README.md * Add README for MultiChanTicker * Add README for MultiChanTicker * Update README.md * Add README to metric router * Update main README * Remove SinkEntity type * Update README for sinks * Update go files * Update README for receivers * Update collectors README * Update collectors README * Use seperate page per collector * Fix for tempstat page * Add docs for customcmd collector * Add docs for ipmistat collector * Add docs for topprocs collector * Update customCmdMetric.md * Use seconds when calculating LIKWID metrics * Add IB metrics ib_recv_pkts and ib_xmit_pkts * Drop domain part of host name * Updated to latest stable version of likwid * Define source code dependencies in Makefile * Add GPFS / IBM Spectrum Scale collector * Add vet and staticcheck make targets * Add vet and staticcheck make targets * Avoid go vet warning: struct field tag `json:"..., omitempty"` not compatible with reflect.StructTag.Get: suspicious space in struct tag value struct field tag `json:"...", omitempty` not compatible with reflect.StructTag.Get: key:"value" pairs not separated by spaces * Add sample collector to README.md * Add CPU frequency collector * Avoid staticcheck warning: redundant return statement * Avoid staticcheck warning: unnecessary assignment to the blank identifier * Simplified code * Add CPUFreqCollectorCpuinfo a metric collector to measure the current frequency of the CPUs as obtained from /proc/cpuinfo Only measure on the first hyperthread * Add collector for NFS clients * Move publication of metrics into Flush() for NatsSink * Update GitHub actions * Refactoring * Avoid vet warning: Println arg list ends with redundant newline * Avoid vet warning struct field commands has json tag but is not exported * Avoid vet warning: return copies lock value. * Corrected typo * Refactoring * Add go sources in internal/... * Bad separator in Makefile * Fix Infiniband collector Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
2026-03-03 23:27:29 +01:00 · 2022-01-25 15:37:43 +01:00
parent 222862af32
commit 200af84c54
60 changed files with 2596 additions and 1105 deletions
--- a/internal/ccMetric/README.md
+++ b/internal/ccMetric/README.md
@@ -0,0 +1,32 @@
+# ClusterCockpit metrics
+
+As described in the [ClusterCockpit specifications](https://github.com/ClusterCockpit/cc-specifications), the whole ClusterCockpit stack uses metrics in the InfluxDB line protocol format. This is also the input and output format for the ClusterCockpit Metric Collector but internally it uses an extended format while processing, named CCMetric.
+
+It is basically a copy of the [InfluxDB line protocol](https://github.com/influxdata/line-protocol) `MutableMetric` interface with one extension. Besides the tags and fields, it contains a list of meta information (re-using the `Tag` structure of the original protocol):
+
+```golang
+type ccMetric struct {
+    name   string            // same as
+    tags   []*influx.Tag     // original
+    fields []*influx.Field   // Influx
+    tm     time.Time         // line-protocol
+    meta   []*influx.Tag
+}
+
+type CCMetric interface {
+    influx.MutableMetric        // the same functions as defined by influx.MutableMetric
+    RemoveTag(key string)       // this is not published by the original influx.MutableMetric
+    Meta() map[string]string
+    MetaList() []*inlux.Tag
+    AddMeta(key, value string)
+    HasMeta(key string) bool
+    GetMeta(key string) (string, bool)
+    RemoveMeta(key string)
+}
+```
+
+The `CCMetric` interface provides the same functions as the `MutableMetric` like `{Add, Remove, Has}{Tag, Field}` and additionally provides `{Add, Remove, Has}Meta`.
+
+The InfluxDB protocol creates a new metric with `influx.New(name, tags, fields, time)` while CCMetric uses `ccMetric.New(name, tags, meta, fields, time)` where `tags` and `meta` are both of type `map[string]string`.
+
+You can copy a CCMetric with `FromMetric(other CCMetric) CCMetric`. If you get an `influx.Metric` from a function, like the line protocol parser, you can use `FromInfluxMetric(other influx.Metric) CCMetric` to get a CCMetric out of it (see `NatsReceiver` for an example).
--- a/internal/ccMetric/ccMetric.go
+++ b/internal/ccMetric/ccMetric.go
@@ -0,0 +1,374 @@
+package ccmetric
+
+import (
+	"fmt"
+	lp "github.com/influxdata/line-protocol" // MIT license
+	"sort"
+	"time"
+)
+
+// Most functions are derived from github.com/influxdata/line-protocol/metric.go
+// The metric type is extended with an extra meta information list re-using the Tag
+// type.
+
+type ccMetric struct {
+	name   string
+	tags   []*lp.Tag
+	fields []*lp.Field
+	tm     time.Time
+	meta   []*lp.Tag
+}
+
+type CCMetric interface {
+	lp.MutableMetric
+	AddMeta(key, value string)
+	MetaList() []*lp.Tag
+	RemoveTag(key string)
+}
+
+func (m *ccMetric) Meta() map[string]string {
+	meta := make(map[string]string, len(m.meta))
+	for _, m := range m.meta {
+		meta[m.Key] = m.Value
+	}
+	return meta
+}
+
+func (m *ccMetric) MetaList() []*lp.Tag {
+	return m.meta
+}
+
+func (m *ccMetric) String() string {
+	return fmt.Sprintf("%s %v %v %v %d", m.name, m.Tags(), m.Meta(), m.Fields(), m.tm.UnixNano())
+}
+
+func (m *ccMetric) Name() string {
+	return m.name
+}
+
+func (m *ccMetric) Tags() map[string]string {
+	tags := make(map[string]string, len(m.tags))
+	for _, tag := range m.tags {
+		tags[tag.Key] = tag.Value
+	}
+	return tags
+}
+
+func (m *ccMetric) TagList() []*lp.Tag {
+	return m.tags
+}
+
+func (m *ccMetric) Fields() map[string]interface{} {
+	fields := make(map[string]interface{}, len(m.fields))
+	for _, field := range m.fields {
+		fields[field.Key] = field.Value
+	}
+
+	return fields
+}
+
+func (m *ccMetric) FieldList() []*lp.Field {
+	return m.fields
+}
+
+func (m *ccMetric) Time() time.Time {
+	return m.tm
+}
+
+func (m *ccMetric) SetTime(t time.Time) {
+	m.tm = t
+}
+
+func (m *ccMetric) HasTag(key string) bool {
+	for _, tag := range m.tags {
+		if tag.Key == key {
+			return true
+		}
+	}
+	return false
+}
+
+func (m *ccMetric) GetTag(key string) (string, bool) {
+	for _, tag := range m.tags {
+		if tag.Key == key {
+			return tag.Value, true
+		}
+	}
+	return "", false
+}
+
+func (m *ccMetric) RemoveTag(key string) {
+	for i, tag := range m.tags {
+		if tag.Key == key {
+			copy(m.tags[i:], m.tags[i+1:])
+			m.tags[len(m.tags)-1] = nil
+			m.tags = m.tags[:len(m.tags)-1]
+			return
+		}
+	}
+}
+
+func (m *ccMetric) AddTag(key, value string) {
+	for i, tag := range m.tags {
+		if key > tag.Key {
+			continue
+		}
+
+		if key == tag.Key {
+			tag.Value = value
+			return
+		}
+
+		m.tags = append(m.tags, nil)
+		copy(m.tags[i+1:], m.tags[i:])
+		m.tags[i] = &lp.Tag{Key: key, Value: value}
+		return
+	}
+
+	m.tags = append(m.tags, &lp.Tag{Key: key, Value: value})
+}
+
+func (m *ccMetric) HasMeta(key string) bool {
+	for _, tag := range m.meta {
+		if tag.Key == key {
+			return true
+		}
+	}
+	return false
+}
+
+func (m *ccMetric) GetMeta(key string) (string, bool) {
+	for _, tag := range m.meta {
+		if tag.Key == key {
+			return tag.Value, true
+		}
+	}
+	return "", false
+}
+
+func (m *ccMetric) RemoveMeta(key string) {
+	for i, tag := range m.meta {
+		if tag.Key == key {
+			copy(m.meta[i:], m.meta[i+1:])
+			m.meta[len(m.meta)-1] = nil
+			m.meta = m.meta[:len(m.meta)-1]
+			return
+		}
+	}
+}
+
+func (m *ccMetric) AddMeta(key, value string) {
+	for i, tag := range m.meta {
+		if key > tag.Key {
+			continue
+		}
+
+		if key == tag.Key {
+			tag.Value = value
+			return
+		}
+
+		m.meta = append(m.meta, nil)
+		copy(m.meta[i+1:], m.meta[i:])
+		m.meta[i] = &lp.Tag{Key: key, Value: value}
+		return
+	}
+
+	m.meta = append(m.meta, &lp.Tag{Key: key, Value: value})
+}
+
+func (m *ccMetric) AddField(key string, value interface{}) {
+	for i, field := range m.fields {
+		if key == field.Key {
+			m.fields[i] = &lp.Field{Key: key, Value: convertField(value)}
+			return
+		}
+	}
+	m.fields = append(m.fields, &lp.Field{Key: key, Value: convertField(value)})
+}
+
+func New(
+	name string,
+	tags map[string]string,
+	meta map[string]string,
+	fields map[string]interface{},
+	tm time.Time,
+) (CCMetric, error) {
+	m := &ccMetric{
+		name:   name,
+		tags:   nil,
+		fields: nil,
+		tm:     tm,
+		meta:   nil,
+	}
+
+	if len(tags) > 0 {
+		m.tags = make([]*lp.Tag, 0, len(tags))
+		for k, v := range tags {
+			m.tags = append(m.tags,
+				&lp.Tag{Key: k, Value: v})
+		}
+		sort.Slice(m.tags, func(i, j int) bool { return m.tags[i].Key < m.tags[j].Key })
+	}
+
+	if len(meta) > 0 {
+		m.meta = make([]*lp.Tag, 0, len(meta))
+		for k, v := range meta {
+			m.meta = append(m.meta,
+				&lp.Tag{Key: k, Value: v})
+		}
+		sort.Slice(m.meta, func(i, j int) bool { return m.meta[i].Key < m.meta[j].Key })
+	}
+
+	if len(fields) > 0 {
+		m.fields = make([]*lp.Field, 0, len(fields))
+		for k, v := range fields {
+			v := convertField(v)
+			if v == nil {
+				continue
+			}
+			m.AddField(k, v)
+		}
+	}
+
+	return m, nil
+}
+
+func FromMetric(other CCMetric) CCMetric {
+	m := &ccMetric{
+		name:   other.Name(),
+		tags:   make([]*lp.Tag, len(other.TagList())),
+		fields: make([]*lp.Field, len(other.FieldList())),
+		meta:   make([]*lp.Tag, len(other.MetaList())),
+		tm:     other.Time(),
+	}
+
+	for i, tag := range other.TagList() {
+		m.tags[i] = &lp.Tag{Key: tag.Key, Value: tag.Value}
+	}
+	for i, s := range other.MetaList() {
+		m.meta[i] = &lp.Tag{Key: s.Key, Value: s.Value}
+	}
+
+	for i, field := range other.FieldList() {
+		m.fields[i] = &lp.Field{Key: field.Key, Value: field.Value}
+	}
+	return m
+}
+
+func FromInfluxMetric(other lp.Metric) CCMetric {
+	m := &ccMetric{
+		name:   other.Name(),
+		tags:   make([]*lp.Tag, len(other.TagList())),
+		fields: make([]*lp.Field, len(other.FieldList())),
+		meta:   make([]*lp.Tag, 0),
+		tm:     other.Time(),
+	}
+
+	for i, tag := range other.TagList() {
+		m.tags[i] = &lp.Tag{Key: tag.Key, Value: tag.Value}
+	}
+
+	for i, field := range other.FieldList() {
+		m.fields[i] = &lp.Field{Key: field.Key, Value: field.Value}
+	}
+	return m
+}
+
+func convertField(v interface{}) interface{} {
+	switch v := v.(type) {
+	case float64:
+		return v
+	case int64:
+		return v
+	case string:
+		return v
+	case bool:
+		return v
+	case int:
+		return int64(v)
+	case uint:
+		return uint64(v)
+	case uint64:
+		return uint64(v)
+	case []byte:
+		return string(v)
+	case int32:
+		return int64(v)
+	case int16:
+		return int64(v)
+	case int8:
+		return int64(v)
+	case uint32:
+		return uint64(v)
+	case uint16:
+		return uint64(v)
+	case uint8:
+		return uint64(v)
+	case float32:
+		return float64(v)
+	case *float64:
+		if v != nil {
+			return *v
+		}
+	case *int64:
+		if v != nil {
+			return *v
+		}
+	case *string:
+		if v != nil {
+			return *v
+		}
+	case *bool:
+		if v != nil {
+			return *v
+		}
+	case *int:
+		if v != nil {
+			return int64(*v)
+		}
+	case *uint:
+		if v != nil {
+			return uint64(*v)
+		}
+	case *uint64:
+		if v != nil {
+			return uint64(*v)
+		}
+	case *[]byte:
+		if v != nil {
+			return string(*v)
+		}
+	case *int32:
+		if v != nil {
+			return int64(*v)
+		}
+	case *int16:
+		if v != nil {
+			return int64(*v)
+		}
+	case *int8:
+		if v != nil {
+			return int64(*v)
+		}
+	case *uint32:
+		if v != nil {
+			return uint64(*v)
+		}
+	case *uint16:
+		if v != nil {
+			return uint64(*v)
+		}
+	case *uint8:
+		if v != nil {
+			return uint64(*v)
+		}
+	case *float32:
+		if v != nil {
+			return float64(*v)
+		}
+	default:
+		return nil
+	}
+	return nil
+}
--- a/internal/metricRouter/README.md
+++ b/internal/metricRouter/README.md
@@ -0,0 +1,50 @@
+# CC Metric Router
+
+The CCMetric router sits in between the collectors and the sinks and can be used to add and remove tags to/from traversing [CCMetrics](../ccMetric/README.md).
+
+# Configuration
+
+```json
+{
+    "add_tags" : [
+        {
+            "key" : "cluster",
+            "value" : "testcluster",
+            "if" : "*"
+        },
+        {
+            "key" : "test",
+            "value" : "testing",
+            "if" : "name == 'temp_package_id_0'"
+        }
+    ],
+    "delete_tags" : [
+        {
+            "key" : "unit",
+            "value" : "*",
+            "if" : "*"
+        }
+    ],
+    "interval_timestamp" : true
+}
+```
+
+There are three main options `add_tags`, `delete_tags` and `interval_timestamp`. `add_tags` and `delete_tags` are lists consisting of dicts with `key`, `value` and `if`. The `value` can be omitted in the `delete_tags` part as it only uses the `key` for removal. The `interval_timestamp` setting means that a unique timestamp is applied to all metrics traversing the router during an interval.
+
+# Conditional manipulation of tags
+
+The `if` setting allows conditional testing of a single metric like in the example:
+
+```json
+{
+    "key" : "test",
+    "value" : "testing",
+    "if" : "name == 'temp_package_id_0'"
+}
+```
+
+If the CCMetric name is equal to 'temp_package_id_0', it adds an additional tag `test=testing` to the metric.
+
+In order to match all metrics, you can use `*`, so in order to add a flag per default, like the `cluster=testcluster` tag in the example.
+
+
--- a/internal/metricRouter/metricRouter.go
+++ b/internal/metricRouter/metricRouter.go
@@ -0,0 +1,208 @@
+package metricRouter
+
+import (
+	"encoding/json"
+	"log"
+	"os"
+	"sync"
+	"time"
+
+	lp "github.com/ClusterCockpit/cc-metric-collector/internal/ccMetric"
+	mct "github.com/ClusterCockpit/cc-metric-collector/internal/multiChanTicker"
+	"gopkg.in/Knetic/govaluate.v2"
+)
+
+type metricRouterTagConfig struct {
+	Key       string `json:"key"`
+	Value     string `json:"value"`
+	Condition string `json:"if"`
+}
+
+type metricRouterConfig struct {
+	AddTags       []metricRouterTagConfig `json:"add_tags"`
+	DelTags       []metricRouterTagConfig `json:"delete_tags"`
+	IntervalStamp bool                    `json:"interval_timestamp"`
+}
+
+type metricRouter struct {
+	inputs    []chan lp.CCMetric
+	outputs   []chan lp.CCMetric
+	done      chan bool
+	wg        *sync.WaitGroup
+	timestamp time.Time
+	ticker    mct.MultiChanTicker
+	config    metricRouterConfig
+}
+
+type MetricRouter interface {
+	Init(ticker mct.MultiChanTicker, wg *sync.WaitGroup, routerConfigFile string) error
+	AddInput(input chan lp.CCMetric)
+	AddOutput(output chan lp.CCMetric)
+	Start()
+	Close()
+}
+
+func (r *metricRouter) Init(ticker mct.MultiChanTicker, wg *sync.WaitGroup, routerConfigFile string) error {
+	r.inputs = make([]chan lp.CCMetric, 0)
+	r.outputs = make([]chan lp.CCMetric, 0)
+	r.done = make(chan bool)
+	r.wg = wg
+	r.ticker = ticker
+	configFile, err := os.Open(routerConfigFile)
+	if err != nil {
+		log.Print(err.Error())
+		return err
+	}
+	defer configFile.Close()
+	jsonParser := json.NewDecoder(configFile)
+	err = jsonParser.Decode(&r.config)
+	if err != nil {
+		log.Print(err.Error())
+		return err
+	}
+	return nil
+}
+
+func (r *metricRouter) StartTimer() {
+	m := make(chan time.Time)
+	r.ticker.AddChannel(m)
+	go func() {
+		for {
+			select {
+			case t := <-m:
+				r.timestamp = t
+			}
+		}
+	}()
+}
+
+func (r *metricRouter) EvalCondition(Cond string, point lp.CCMetric) (bool, error) {
+	expression, err := govaluate.NewEvaluableExpression(Cond)
+	if err != nil {
+		log.Print(Cond, " = ", err.Error())
+		return false, err
+	}
+	params := make(map[string]interface{})
+	params["name"] = point.Name()
+	for _, t := range point.TagList() {
+		params[t.Key] = t.Value
+	}
+	for _, m := range point.MetaList() {
+		params[m.Key] = m.Value
+	}
+	for _, f := range point.FieldList() {
+		params[f.Key] = f.Value
+	}
+	params["timestamp"] = point.Time()
+
+	result, err := expression.Evaluate(params)
+	if err != nil {
+		log.Print(Cond, " = ", err.Error())
+		return false, err
+	}
+	return bool(result.(bool)), err
+}
+
+func (r *metricRouter) DoAddTags(point lp.CCMetric) {
+	for _, m := range r.config.AddTags {
+		var conditionMatches bool
+
+		if m.Condition == "*" {
+			conditionMatches = true
+		} else {
+			var err error
+			conditionMatches, err = r.EvalCondition(m.Condition, point)
+			if err != nil {
+				log.Print(err.Error())
+				conditionMatches = false
+			}
+		}
+		if conditionMatches {
+			point.AddTag(m.Key, m.Value)
+		}
+	}
+}
+
+func (r *metricRouter) DoDelTags(point lp.CCMetric) {
+	for _, m := range r.config.DelTags {
+		var conditionMatches bool
+
+		if m.Condition == "*" {
+			conditionMatches = true
+		} else {
+			var err error
+			conditionMatches, err = r.EvalCondition(m.Condition, point)
+			if err != nil {
+				log.Print(err.Error())
+				conditionMatches = false
+			}
+		}
+		if conditionMatches {
+			point.RemoveTag(m.Key)
+		}
+	}
+}
+
+func (r *metricRouter) Start() {
+	r.wg.Add(1)
+	r.timestamp = time.Now()
+	if r.config.IntervalStamp {
+		r.StartTimer()
+	}
+	go func() {
+		for {
+		RouterLoop:
+			select {
+			case <-r.done:
+				log.Print("[MetricRouter] DONE\n")
+				r.wg.Done()
+				break RouterLoop
+			default:
+				for _, c := range r.inputs {
+				RouterInputLoop:
+					select {
+					case <-r.done:
+						log.Print("[MetricRouter] DONE\n")
+						r.wg.Done()
+						break RouterInputLoop
+					case p := <-c:
+						log.Print("[MetricRouter] FORWARD ", p)
+						r.DoAddTags(p)
+						r.DoDelTags(p)
+						if r.config.IntervalStamp {
+							p.SetTime(r.timestamp)
+						}
+						for _, o := range r.outputs {
+							o <- p
+						}
+					default:
+					}
+				}
+			}
+		}
+		log.Print("[MetricRouter] EXIT\n")
+	}()
+	log.Print("[MetricRouter] STARTED\n")
+}
+
+func (r *metricRouter) AddInput(input chan lp.CCMetric) {
+	r.inputs = append(r.inputs, input)
+}
+
+func (r *metricRouter) AddOutput(output chan lp.CCMetric) {
+	r.outputs = append(r.outputs, output)
+}
+
+func (r *metricRouter) Close() {
+	r.done <- true
+	log.Print("[MetricRouter] CLOSE\n")
+}
+
+func New(ticker mct.MultiChanTicker, wg *sync.WaitGroup, routerConfigFile string) (MetricRouter, error) {
+	r := new(metricRouter)
+	err := r.Init(ticker, wg, routerConfigFile)
+	if err != nil {
+		return nil, err
+	}
+	return r, err
+}
--- a/internal/multiChanTicker/README.md
+++ b/internal/multiChanTicker/README.md
@@ -0,0 +1,37 @@
+# MultiChanTicker
+
+The idea of this ticker is to multiply the output channels. The original Golang `time.Ticker` provides only a single output channel, so the signal can only be received by a single other class. This ticker allows to add multiple channels which get all notified about the time tick.
+
+```golang
+type MultiChanTicker interface {
+	Init(duration time.Duration)
+	AddChannel(chan time.Time)
+}
+```
+
+The MultiChanTicker is created similarly to the common `time.Ticker`:
+
+```golang
+NewTicker(duration time.Duration) MultiChanTicker
+```
+
+Afterwards, you can add channels:
+
+```golang
+t := MultiChanTicker(duration)
+c1 := make(chan time.Time)
+c2 := make(chan time.Time)
+t.AddChannel(c1)
+t.AddChannel(c2)
+
+for {
+    select {
+    case t1 := <- c1:
+        log.Print(t1)
+    case t2 := <- c2:
+        log.Print(t2)
+    }
+}
+```
+
+The result should be the same `time.Time` output in both channels, notified "simultaneously".
--- a/internal/multiChanTicker/multiChanTicker.go
+++ b/internal/multiChanTicker/multiChanTicker.go
@@ -0,0 +1,39 @@
+package multiChanTicker
+
+import (
+	"time"
+)
+
+type multiChanTicker struct {
+	ticker   *time.Ticker
+	channels []chan time.Time
+}
+
+type MultiChanTicker interface {
+	Init(duration time.Duration)
+	AddChannel(chan time.Time)
+}
+
+func (t *multiChanTicker) Init(duration time.Duration) {
+	t.ticker = time.NewTicker(duration)
+	go func() {
+		for {
+			select {
+			case ts := <-t.ticker.C:
+				for _, c := range t.channels {
+					c <- ts
+				}
+			}
+		}
+	}()
+}
+
+func (t *multiChanTicker) AddChannel(channel chan time.Time) {
+	t.channels = append(t.channels, channel)
+}
+
+func NewTicker(duration time.Duration) MultiChanTicker {
+	t := &multiChanTicker{}
+	t.Init(duration)
+	return t
+}