Modularize the whole thing (#16)

* Use channels, add a metric router, split up configuration and use extended version of Influx line protocol internally

* Use central timer for collectors and router. Add expressions to router

* Add expression to router config

* Update entry points

* Start with README

* Update README for CCMetric

* Formatting

* Update README.md

* Add README for MultiChanTicker

* Add README for MultiChanTicker

* Update README.md

* Add README to metric router

* Update main README

* Remove SinkEntity type

* Update README for sinks

* Update go files

* Update README for receivers

* Update collectors README

* Update collectors README

* Use seperate page per collector

* Fix for tempstat page

* Add docs for customcmd collector

* Add docs for ipmistat collector

* Add docs for topprocs collector

* Update customCmdMetric.md

* Use seconds when calculating LIKWID metrics

* Add IB metrics ib_recv_pkts and ib_xmit_pkts

* Drop domain part of host name

* Updated to latest stable version of likwid

* Define source code dependencies in Makefile

* Add GPFS / IBM Spectrum Scale collector

* Add vet and staticcheck make targets

* Add vet and staticcheck make targets

* Avoid go vet warning:
struct field tag `json:"..., omitempty"` not compatible with reflect.StructTag.Get: suspicious space in struct tag value
struct field tag `json:"...", omitempty` not compatible with reflect.StructTag.Get: key:"value" pairs not separated by spaces

* Add sample collector to README.md

* Add CPU frequency collector

* Avoid staticcheck warning: redundant return statement

* Avoid staticcheck warning: unnecessary assignment to the blank identifier

* Simplified code

* Add CPUFreqCollectorCpuinfo
a metric collector to measure the current frequency of the CPUs
as obtained from /proc/cpuinfo
Only measure on the first hyperthread

* Add collector for NFS clients

* Move publication of metrics into Flush() for NatsSink

* Update GitHub actions

* Refactoring

* Avoid vet warning: Println arg list ends with redundant newline

* Avoid vet warning struct field commands has json tag but is not exported

* Avoid vet warning: return copies lock value.

* Corrected typo

* Refactoring

* Add go sources in internal/...

* Bad separator in Makefile

* Fix Infiniband collector

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
This commit is contained in:
Thomas Gruber
2022-01-25 15:37:43 +01:00
committed by GitHub
parent 222862af32
commit 200af84c54
60 changed files with 2596 additions and 1105 deletions

View File

@@ -0,0 +1,32 @@
# ClusterCockpit metrics
As described in the [ClusterCockpit specifications](https://github.com/ClusterCockpit/cc-specifications), the whole ClusterCockpit stack uses metrics in the InfluxDB line protocol format. This is also the input and output format for the ClusterCockpit Metric Collector but internally it uses an extended format while processing, named CCMetric.
It is basically a copy of the [InfluxDB line protocol](https://github.com/influxdata/line-protocol) `MutableMetric` interface with one extension. Besides the tags and fields, it contains a list of meta information (re-using the `Tag` structure of the original protocol):
```golang
type ccMetric struct {
name string // same as
tags []*influx.Tag // original
fields []*influx.Field // Influx
tm time.Time // line-protocol
meta []*influx.Tag
}
type CCMetric interface {
influx.MutableMetric // the same functions as defined by influx.MutableMetric
RemoveTag(key string) // this is not published by the original influx.MutableMetric
Meta() map[string]string
MetaList() []*inlux.Tag
AddMeta(key, value string)
HasMeta(key string) bool
GetMeta(key string) (string, bool)
RemoveMeta(key string)
}
```
The `CCMetric` interface provides the same functions as the `MutableMetric` like `{Add, Remove, Has}{Tag, Field}` and additionally provides `{Add, Remove, Has}Meta`.
The InfluxDB protocol creates a new metric with `influx.New(name, tags, fields, time)` while CCMetric uses `ccMetric.New(name, tags, meta, fields, time)` where `tags` and `meta` are both of type `map[string]string`.
You can copy a CCMetric with `FromMetric(other CCMetric) CCMetric`. If you get an `influx.Metric` from a function, like the line protocol parser, you can use `FromInfluxMetric(other influx.Metric) CCMetric` to get a CCMetric out of it (see `NatsReceiver` for an example).

View File

@@ -0,0 +1,374 @@
package ccmetric
import (
"fmt"
lp "github.com/influxdata/line-protocol" // MIT license
"sort"
"time"
)
// Most functions are derived from github.com/influxdata/line-protocol/metric.go
// The metric type is extended with an extra meta information list re-using the Tag
// type.
type ccMetric struct {
name string
tags []*lp.Tag
fields []*lp.Field
tm time.Time
meta []*lp.Tag
}
type CCMetric interface {
lp.MutableMetric
AddMeta(key, value string)
MetaList() []*lp.Tag
RemoveTag(key string)
}
func (m *ccMetric) Meta() map[string]string {
meta := make(map[string]string, len(m.meta))
for _, m := range m.meta {
meta[m.Key] = m.Value
}
return meta
}
func (m *ccMetric) MetaList() []*lp.Tag {
return m.meta
}
func (m *ccMetric) String() string {
return fmt.Sprintf("%s %v %v %v %d", m.name, m.Tags(), m.Meta(), m.Fields(), m.tm.UnixNano())
}
func (m *ccMetric) Name() string {
return m.name
}
func (m *ccMetric) Tags() map[string]string {
tags := make(map[string]string, len(m.tags))
for _, tag := range m.tags {
tags[tag.Key] = tag.Value
}
return tags
}
func (m *ccMetric) TagList() []*lp.Tag {
return m.tags
}
func (m *ccMetric) Fields() map[string]interface{} {
fields := make(map[string]interface{}, len(m.fields))
for _, field := range m.fields {
fields[field.Key] = field.Value
}
return fields
}
func (m *ccMetric) FieldList() []*lp.Field {
return m.fields
}
func (m *ccMetric) Time() time.Time {
return m.tm
}
func (m *ccMetric) SetTime(t time.Time) {
m.tm = t
}
func (m *ccMetric) HasTag(key string) bool {
for _, tag := range m.tags {
if tag.Key == key {
return true
}
}
return false
}
func (m *ccMetric) GetTag(key string) (string, bool) {
for _, tag := range m.tags {
if tag.Key == key {
return tag.Value, true
}
}
return "", false
}
func (m *ccMetric) RemoveTag(key string) {
for i, tag := range m.tags {
if tag.Key == key {
copy(m.tags[i:], m.tags[i+1:])
m.tags[len(m.tags)-1] = nil
m.tags = m.tags[:len(m.tags)-1]
return
}
}
}
func (m *ccMetric) AddTag(key, value string) {
for i, tag := range m.tags {
if key > tag.Key {
continue
}
if key == tag.Key {
tag.Value = value
return
}
m.tags = append(m.tags, nil)
copy(m.tags[i+1:], m.tags[i:])
m.tags[i] = &lp.Tag{Key: key, Value: value}
return
}
m.tags = append(m.tags, &lp.Tag{Key: key, Value: value})
}
func (m *ccMetric) HasMeta(key string) bool {
for _, tag := range m.meta {
if tag.Key == key {
return true
}
}
return false
}
func (m *ccMetric) GetMeta(key string) (string, bool) {
for _, tag := range m.meta {
if tag.Key == key {
return tag.Value, true
}
}
return "", false
}
func (m *ccMetric) RemoveMeta(key string) {
for i, tag := range m.meta {
if tag.Key == key {
copy(m.meta[i:], m.meta[i+1:])
m.meta[len(m.meta)-1] = nil
m.meta = m.meta[:len(m.meta)-1]
return
}
}
}
func (m *ccMetric) AddMeta(key, value string) {
for i, tag := range m.meta {
if key > tag.Key {
continue
}
if key == tag.Key {
tag.Value = value
return
}
m.meta = append(m.meta, nil)
copy(m.meta[i+1:], m.meta[i:])
m.meta[i] = &lp.Tag{Key: key, Value: value}
return
}
m.meta = append(m.meta, &lp.Tag{Key: key, Value: value})
}
func (m *ccMetric) AddField(key string, value interface{}) {
for i, field := range m.fields {
if key == field.Key {
m.fields[i] = &lp.Field{Key: key, Value: convertField(value)}
return
}
}
m.fields = append(m.fields, &lp.Field{Key: key, Value: convertField(value)})
}
func New(
name string,
tags map[string]string,
meta map[string]string,
fields map[string]interface{},
tm time.Time,
) (CCMetric, error) {
m := &ccMetric{
name: name,
tags: nil,
fields: nil,
tm: tm,
meta: nil,
}
if len(tags) > 0 {
m.tags = make([]*lp.Tag, 0, len(tags))
for k, v := range tags {
m.tags = append(m.tags,
&lp.Tag{Key: k, Value: v})
}
sort.Slice(m.tags, func(i, j int) bool { return m.tags[i].Key < m.tags[j].Key })
}
if len(meta) > 0 {
m.meta = make([]*lp.Tag, 0, len(meta))
for k, v := range meta {
m.meta = append(m.meta,
&lp.Tag{Key: k, Value: v})
}
sort.Slice(m.meta, func(i, j int) bool { return m.meta[i].Key < m.meta[j].Key })
}
if len(fields) > 0 {
m.fields = make([]*lp.Field, 0, len(fields))
for k, v := range fields {
v := convertField(v)
if v == nil {
continue
}
m.AddField(k, v)
}
}
return m, nil
}
func FromMetric(other CCMetric) CCMetric {
m := &ccMetric{
name: other.Name(),
tags: make([]*lp.Tag, len(other.TagList())),
fields: make([]*lp.Field, len(other.FieldList())),
meta: make([]*lp.Tag, len(other.MetaList())),
tm: other.Time(),
}
for i, tag := range other.TagList() {
m.tags[i] = &lp.Tag{Key: tag.Key, Value: tag.Value}
}
for i, s := range other.MetaList() {
m.meta[i] = &lp.Tag{Key: s.Key, Value: s.Value}
}
for i, field := range other.FieldList() {
m.fields[i] = &lp.Field{Key: field.Key, Value: field.Value}
}
return m
}
func FromInfluxMetric(other lp.Metric) CCMetric {
m := &ccMetric{
name: other.Name(),
tags: make([]*lp.Tag, len(other.TagList())),
fields: make([]*lp.Field, len(other.FieldList())),
meta: make([]*lp.Tag, 0),
tm: other.Time(),
}
for i, tag := range other.TagList() {
m.tags[i] = &lp.Tag{Key: tag.Key, Value: tag.Value}
}
for i, field := range other.FieldList() {
m.fields[i] = &lp.Field{Key: field.Key, Value: field.Value}
}
return m
}
func convertField(v interface{}) interface{} {
switch v := v.(type) {
case float64:
return v
case int64:
return v
case string:
return v
case bool:
return v
case int:
return int64(v)
case uint:
return uint64(v)
case uint64:
return uint64(v)
case []byte:
return string(v)
case int32:
return int64(v)
case int16:
return int64(v)
case int8:
return int64(v)
case uint32:
return uint64(v)
case uint16:
return uint64(v)
case uint8:
return uint64(v)
case float32:
return float64(v)
case *float64:
if v != nil {
return *v
}
case *int64:
if v != nil {
return *v
}
case *string:
if v != nil {
return *v
}
case *bool:
if v != nil {
return *v
}
case *int:
if v != nil {
return int64(*v)
}
case *uint:
if v != nil {
return uint64(*v)
}
case *uint64:
if v != nil {
return uint64(*v)
}
case *[]byte:
if v != nil {
return string(*v)
}
case *int32:
if v != nil {
return int64(*v)
}
case *int16:
if v != nil {
return int64(*v)
}
case *int8:
if v != nil {
return int64(*v)
}
case *uint32:
if v != nil {
return uint64(*v)
}
case *uint16:
if v != nil {
return uint64(*v)
}
case *uint8:
if v != nil {
return uint64(*v)
}
case *float32:
if v != nil {
return float64(*v)
}
default:
return nil
}
return nil
}

View File

@@ -0,0 +1,50 @@
# CC Metric Router
The CCMetric router sits in between the collectors and the sinks and can be used to add and remove tags to/from traversing [CCMetrics](../ccMetric/README.md).
# Configuration
```json
{
"add_tags" : [
{
"key" : "cluster",
"value" : "testcluster",
"if" : "*"
},
{
"key" : "test",
"value" : "testing",
"if" : "name == 'temp_package_id_0'"
}
],
"delete_tags" : [
{
"key" : "unit",
"value" : "*",
"if" : "*"
}
],
"interval_timestamp" : true
}
```
There are three main options `add_tags`, `delete_tags` and `interval_timestamp`. `add_tags` and `delete_tags` are lists consisting of dicts with `key`, `value` and `if`. The `value` can be omitted in the `delete_tags` part as it only uses the `key` for removal. The `interval_timestamp` setting means that a unique timestamp is applied to all metrics traversing the router during an interval.
# Conditional manipulation of tags
The `if` setting allows conditional testing of a single metric like in the example:
```json
{
"key" : "test",
"value" : "testing",
"if" : "name == 'temp_package_id_0'"
}
```
If the CCMetric name is equal to 'temp_package_id_0', it adds an additional tag `test=testing` to the metric.
In order to match all metrics, you can use `*`, so in order to add a flag per default, like the `cluster=testcluster` tag in the example.

View File

@@ -0,0 +1,208 @@
package metricRouter
import (
"encoding/json"
"log"
"os"
"sync"
"time"
lp "github.com/ClusterCockpit/cc-metric-collector/internal/ccMetric"
mct "github.com/ClusterCockpit/cc-metric-collector/internal/multiChanTicker"
"gopkg.in/Knetic/govaluate.v2"
)
type metricRouterTagConfig struct {
Key string `json:"key"`
Value string `json:"value"`
Condition string `json:"if"`
}
type metricRouterConfig struct {
AddTags []metricRouterTagConfig `json:"add_tags"`
DelTags []metricRouterTagConfig `json:"delete_tags"`
IntervalStamp bool `json:"interval_timestamp"`
}
type metricRouter struct {
inputs []chan lp.CCMetric
outputs []chan lp.CCMetric
done chan bool
wg *sync.WaitGroup
timestamp time.Time
ticker mct.MultiChanTicker
config metricRouterConfig
}
type MetricRouter interface {
Init(ticker mct.MultiChanTicker, wg *sync.WaitGroup, routerConfigFile string) error
AddInput(input chan lp.CCMetric)
AddOutput(output chan lp.CCMetric)
Start()
Close()
}
func (r *metricRouter) Init(ticker mct.MultiChanTicker, wg *sync.WaitGroup, routerConfigFile string) error {
r.inputs = make([]chan lp.CCMetric, 0)
r.outputs = make([]chan lp.CCMetric, 0)
r.done = make(chan bool)
r.wg = wg
r.ticker = ticker
configFile, err := os.Open(routerConfigFile)
if err != nil {
log.Print(err.Error())
return err
}
defer configFile.Close()
jsonParser := json.NewDecoder(configFile)
err = jsonParser.Decode(&r.config)
if err != nil {
log.Print(err.Error())
return err
}
return nil
}
func (r *metricRouter) StartTimer() {
m := make(chan time.Time)
r.ticker.AddChannel(m)
go func() {
for {
select {
case t := <-m:
r.timestamp = t
}
}
}()
}
func (r *metricRouter) EvalCondition(Cond string, point lp.CCMetric) (bool, error) {
expression, err := govaluate.NewEvaluableExpression(Cond)
if err != nil {
log.Print(Cond, " = ", err.Error())
return false, err
}
params := make(map[string]interface{})
params["name"] = point.Name()
for _, t := range point.TagList() {
params[t.Key] = t.Value
}
for _, m := range point.MetaList() {
params[m.Key] = m.Value
}
for _, f := range point.FieldList() {
params[f.Key] = f.Value
}
params["timestamp"] = point.Time()
result, err := expression.Evaluate(params)
if err != nil {
log.Print(Cond, " = ", err.Error())
return false, err
}
return bool(result.(bool)), err
}
func (r *metricRouter) DoAddTags(point lp.CCMetric) {
for _, m := range r.config.AddTags {
var conditionMatches bool
if m.Condition == "*" {
conditionMatches = true
} else {
var err error
conditionMatches, err = r.EvalCondition(m.Condition, point)
if err != nil {
log.Print(err.Error())
conditionMatches = false
}
}
if conditionMatches {
point.AddTag(m.Key, m.Value)
}
}
}
func (r *metricRouter) DoDelTags(point lp.CCMetric) {
for _, m := range r.config.DelTags {
var conditionMatches bool
if m.Condition == "*" {
conditionMatches = true
} else {
var err error
conditionMatches, err = r.EvalCondition(m.Condition, point)
if err != nil {
log.Print(err.Error())
conditionMatches = false
}
}
if conditionMatches {
point.RemoveTag(m.Key)
}
}
}
func (r *metricRouter) Start() {
r.wg.Add(1)
r.timestamp = time.Now()
if r.config.IntervalStamp {
r.StartTimer()
}
go func() {
for {
RouterLoop:
select {
case <-r.done:
log.Print("[MetricRouter] DONE\n")
r.wg.Done()
break RouterLoop
default:
for _, c := range r.inputs {
RouterInputLoop:
select {
case <-r.done:
log.Print("[MetricRouter] DONE\n")
r.wg.Done()
break RouterInputLoop
case p := <-c:
log.Print("[MetricRouter] FORWARD ", p)
r.DoAddTags(p)
r.DoDelTags(p)
if r.config.IntervalStamp {
p.SetTime(r.timestamp)
}
for _, o := range r.outputs {
o <- p
}
default:
}
}
}
}
log.Print("[MetricRouter] EXIT\n")
}()
log.Print("[MetricRouter] STARTED\n")
}
func (r *metricRouter) AddInput(input chan lp.CCMetric) {
r.inputs = append(r.inputs, input)
}
func (r *metricRouter) AddOutput(output chan lp.CCMetric) {
r.outputs = append(r.outputs, output)
}
func (r *metricRouter) Close() {
r.done <- true
log.Print("[MetricRouter] CLOSE\n")
}
func New(ticker mct.MultiChanTicker, wg *sync.WaitGroup, routerConfigFile string) (MetricRouter, error) {
r := new(metricRouter)
err := r.Init(ticker, wg, routerConfigFile)
if err != nil {
return nil, err
}
return r, err
}

View File

@@ -0,0 +1,37 @@
# MultiChanTicker
The idea of this ticker is to multiply the output channels. The original Golang `time.Ticker` provides only a single output channel, so the signal can only be received by a single other class. This ticker allows to add multiple channels which get all notified about the time tick.
```golang
type MultiChanTicker interface {
Init(duration time.Duration)
AddChannel(chan time.Time)
}
```
The MultiChanTicker is created similarly to the common `time.Ticker`:
```golang
NewTicker(duration time.Duration) MultiChanTicker
```
Afterwards, you can add channels:
```golang
t := MultiChanTicker(duration)
c1 := make(chan time.Time)
c2 := make(chan time.Time)
t.AddChannel(c1)
t.AddChannel(c2)
for {
select {
case t1 := <- c1:
log.Print(t1)
case t2 := <- c2:
log.Print(t2)
}
}
```
The result should be the same `time.Time` output in both channels, notified "simultaneously".

View File

@@ -0,0 +1,39 @@
package multiChanTicker
import (
"time"
)
type multiChanTicker struct {
ticker *time.Ticker
channels []chan time.Time
}
type MultiChanTicker interface {
Init(duration time.Duration)
AddChannel(chan time.Time)
}
func (t *multiChanTicker) Init(duration time.Duration) {
t.ticker = time.NewTicker(duration)
go func() {
for {
select {
case ts := <-t.ticker.C:
for _, c := range t.channels {
c <- ts
}
}
}
}()
}
func (t *multiChanTicker) AddChannel(channel chan time.Time) {
t.channels = append(t.channels, channel)
}
func NewTicker(duration time.Duration) MultiChanTicker {
t := &multiChanTicker{}
t.Init(duration)
return t
}