Merge develop into main (#109)

* Add cpu_used (all-cpu_idle) to CpustatCollector

* Update to line-protocol/v2

* Update runonce.yml with Golang 1.20

* Update fsnotify in LIKWID Collector

* Use not a pointer to line-protocol.Encoder

* Simplify Makefile

* Use only as many arguments as required

* Allow sum function to handle non float types

* Allow values to be a slice of type float64, float32, int, int64, int32, bool

* Use generic function to simplify code

* Add missing case for type []int32

* Use generic function to compute minimum

* Use generic function to compute maximum

* Use generic function to compute average

* Add error value to sumAnyType

* Use generic function to compute median

* For older versions of go slices is not part of the installation

* Remove old entries from go.sum

* Use simpler sort function

* Compute metrics ib_total and ib_total_pkts

* Add aggregated metrics.
Add missing units

* Update likwidMetric.go

Fixes a potential bug when `fsnotify.NewWatcher()` fails with an error

* Completly avoid memory allocations in infinibandMetric read()

* Fixed initialization: Initalization and measurements should run in the same thread

* Add safe.directory to Release action

* Fix path after installation to /usr/bin after installation

* ioutil.ReadFile is deprecated: As of Go 1.16, this function simply calls os.ReadFile

* Switch to package slices from the golang 1.21 default library

* Read file line by line

* Read file line by line

* Read file line by line

* Use CamelCase

* Use CamelCase

* Fix function getNumaDomain, it always returned 0

* Avoid type conversion by using Atoi
Avoid copying structs by using pointer access
Increase readability with CamelCase variable names

* Add caching

* Cache CpuData

* Cleanup

* Use init function to initalize cache structure to avoid multi threading problems

* Reuse information from /proc/cpuinfo

* Avoid slice cloning. Directly use the cache

* Add DieList

* Add NumaDomainList and SMTList

* Cleanup

* Add comment

* Lookup core ID from /sys/devices/system/cpu, /proc/cpuinfo is not portable

* Lookup all information from /sys/devices/system/cpu, /proc/cpuinfo is not portable

* Correctly handle lists from /sys

* Add Simultaneous Multithreading siblings

* Replace deprecated thread_siblings_list by core_cpus_list

* Reduce number of required slices

* Allow to send total values per core, socket and node

* Send all metrics with same time stamp
calcEventsetMetrics does only computiation, counter measurement is done before

* Input parameters should be float64 when evaluating to float64

* Send all metrics with same time stamp
calcGlobalMetrics does only computiation, counter measurement is done before

* Remove unused variable gmresults

* Add comments

* Updated go packages

* Add build with golang 1.21

* Switch to checkout action version 4

* Switch to setup-go action version 4

* Add workflow_dispatch to allow manual run of workflow

* Add workflow_dispatch to allow manual run of workflow

* Add release build jobs to runonce.yml

* Switch to golang 1.20 for RHEL based distributions

* Use dnf to download golang

* Remove golang versions before 1.20

* Upgrade Ubuntu focal -> jammy

* Pipe golang tar package directly to tar

* Update golang version

* Fix Ubuntu version number

* Add links to ipmi and redfish receivers

* Fix http server addr format

* github.com/influxdata/line-protocol -> github.com/influxdata/line-protocol/v2/lineprotocol

* Corrected spelling

* Add some comments

* github.com/influxdata/line-protocol -> github.com/influxdata/line-protocol/v2/lineprotocol

* Allow other fields not only field "value"

* Add some basic debugging documentation

* Add some basic debugging documentation

* Use a lock for the flush timer

* Add tags in lexical order as required by AddTag()

* Only access meta data, when it gets used as tag

* Use slice to store lexialicly orderd key value pairs

* Increase golang version requirement to 1.20.

* Avoid package cmp to allow builds with golang v1.20

* Fix: Error NVML library not found did crash
cc-metric-collector with "SIGSEGV: segmentation violation"

* Add config option idle_timeout

* Add basic authentication support

* Add basic authentication support

* Avoid unneccessary memory allocations

* Add documentation for send_*_total values

* Use generic package maps to clone maps

* Reuse flush timer

* Add Influx client options

* Reuse ccTopology functionality

* Do not store unused topology information

* Add batch_size config

* Cleanup

* Use stype and stype-id for the NIC in NetstatCollector

* Wait for concurrent flush operations to finish

* Be more verbose in error messages

* Reverted previous changes.
Made the code to complex without much advantages

* Use line protocol encoder

* Go pkg update

* Stop flush timer, when immediatelly flushing

* Fix: Corrected unlock access to batch slice

* Add config option to specify whether to use GZip compression in influx write requests

* Add asynchron send of encoder metrics

* Use DefaultServeMux instead of github.com/gorilla/mux

* Add config option for HTTP keep-alives

* Be more strict, when parsing json

* Add config option for HTTP request timeout and Retry interval

* Allow more then one background send operation

* Fix %sysusers_create_package args (#108)

%sysusers_create_package requires two arguments. See: https://github.com/systemd/systemd/blob/main/src/rpm/macros.systemd.in#L165

* Add nfsiostat to list of collectors

---------

Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Holger Obermaier <holgerob@gmx.de>
Co-authored-by: Obihörnchen <obihoernchende@gmail.com>
This commit is contained in:
Thomas Gruber
2023-12-04 12:21:26 +01:00
committed by GitHub
parent 9df1054e32
commit 6ab45dd3ec
39 changed files with 1835 additions and 1030 deletions

View File

@@ -2,7 +2,7 @@
This folder contains the ReceiveManager and receiver implementations for the cc-metric-collector.
# Configuration
## Configuration
The configuration file for the receivers is a list of configurations. The `type` field in each specifies which receiver to initialize.
@@ -22,8 +22,11 @@ This allows to specify
- [`nats`](./natsReceiver.md): Receive metrics from the NATS network
- [`prometheus`](./prometheusReceiver.md): Scrape data from a Prometheus client
- [`http`](./httpReceiver.md): Listen for HTTP Post requests transporting metrics in InfluxDB line protocol
- [`ipmi`](./ipmiReceiver.md): Read IPMI sensor readings
- [`redfish`](redfishReceiver.md) Use the Redfish (specification) to query thermal and power metrics
## Contributing own receivers
# Contributing own receivers
A receiver contains a few functions and is derived from the type `Receiver` (in `metricReceiver.go`):
For an example, check the [sample receiver](./sampleReceiver.go)

View File

@@ -5,15 +5,14 @@ import (
"encoding/json"
"errors"
"fmt"
"io"
"net/http"
"strings"
"sync"
"time"
cclog "github.com/ClusterCockpit/cc-metric-collector/pkg/ccLogger"
lp "github.com/ClusterCockpit/cc-metric-collector/pkg/ccMetric"
"github.com/gorilla/mux"
influx "github.com/influxdata/line-protocol"
influx "github.com/influxdata/line-protocol/v2/lineprotocol"
)
const HTTP_RECEIVER_PORT = "8080"
@@ -23,22 +22,39 @@ type HttpReceiverConfig struct {
Addr string `json:"address"`
Port string `json:"port"`
Path string `json:"path"`
// Maximum amount of time to wait for the next request when keep-alives are enabled
// should be larger than the measurement interval to keep the connection open
IdleTimeout string `json:"idle_timeout"`
idleTimeout time.Duration
// Controls whether HTTP keep-alives are enabled. By default, keep-alives are enabled
KeepAlivesEnabled bool `json:"keep_alives_enabled"`
// Basic authentication
Username string `json:"username"`
Password string `json:"password"`
useBasicAuth bool
}
type HttpReceiver struct {
receiver
handler *influx.MetricHandler
parser *influx.Parser
meta map[string]string
config HttpReceiverConfig
router *mux.Router
server *http.Server
wg sync.WaitGroup
meta map[string]string
config HttpReceiverConfig
server *http.Server
wg sync.WaitGroup
}
func (r *HttpReceiver) Init(name string, config json.RawMessage) error {
r.name = fmt.Sprintf("HttpReceiver(%s)", name)
// Set default values
r.config.Port = HTTP_RECEIVER_PORT
r.config.KeepAlivesEnabled = true
// should be larger than the measurement interval to keep the connection open
r.config.IdleTimeout = "120s"
// Read config
if len(config) > 0 {
err := json.Unmarshal(config, &r.config)
if err != nil {
@@ -49,20 +65,47 @@ func (r *HttpReceiver) Init(name string, config json.RawMessage) error {
if len(r.config.Port) == 0 {
return errors.New("not all configuration variables set required by HttpReceiver")
}
// Check idle timeout config
if len(r.config.IdleTimeout) > 0 {
t, err := time.ParseDuration(r.config.IdleTimeout)
if err == nil {
cclog.ComponentDebug(r.name, "idleTimeout", t)
r.config.idleTimeout = t
}
}
// Check basic authentication config
if len(r.config.Username) > 0 || len(r.config.Password) > 0 {
r.config.useBasicAuth = true
}
if r.config.useBasicAuth && len(r.config.Username) == 0 {
return errors.New("basic authentication requires username")
}
if r.config.useBasicAuth && len(r.config.Password) == 0 {
return errors.New("basic authentication requires password")
}
r.meta = map[string]string{"source": r.name}
p := r.config.Path
if !strings.HasPrefix(p, "/") {
p = "/" + p
}
uri := fmt.Sprintf("%s:%s%s", r.config.Addr, r.config.Port, p)
cclog.ComponentDebug(r.name, "INIT", uri)
r.handler = influx.NewMetricHandler()
r.parser = influx.NewParser(r.handler)
r.parser.SetTimeFunc(DefaultTime)
addr := fmt.Sprintf("%s:%s", r.config.Addr, r.config.Port)
uri := addr + p
cclog.ComponentDebug(r.name, "INIT", "listen on:", uri)
// Register handler function r.ServerHttp for path p in the DefaultServeMux
http.HandleFunc(p, r.ServerHttp)
// Create http server
r.server = &http.Server{
Addr: addr,
Handler: nil, // handler to invoke, http.DefaultServeMux if nil
IdleTimeout: r.config.idleTimeout,
}
r.server.SetKeepAlivesEnabled(r.config.KeepAlivesEnabled)
r.router = mux.NewRouter()
r.router.Path(p).HandlerFunc(r.ServerHttp)
r.server = &http.Server{Addr: uri, Handler: r.router}
return nil
}
@@ -79,31 +122,97 @@ func (r *HttpReceiver) Start() {
}
func (r *HttpReceiver) ServerHttp(w http.ResponseWriter, req *http.Request) {
// Check request method, only post method is handled
if req.Method != http.MethodPost {
http.Error(w, "Method Not Allowed", http.StatusMethodNotAllowed)
return
}
body, err := io.ReadAll(req.Body)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
metrics, err := r.parser.Parse(body)
if err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
// Check basic authentication
if r.config.useBasicAuth {
username, password, ok := req.BasicAuth()
if !ok || username != r.config.Username || password != r.config.Password {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
}
for _, m := range metrics {
y := lp.FromInfluxMetric(m)
for k, v := range r.meta {
y.AddMeta(k, v)
d := influx.NewDecoder(req.Body)
for d.Next() {
// Decode measurement name
measurement, err := d.Measurement()
if err != nil {
msg := "ServerHttp: Failed to decode measurement: " + err.Error()
cclog.ComponentError(r.name, msg)
http.Error(w, msg, http.StatusInternalServerError)
return
}
// Decode tags
tags := make(map[string]string)
for {
key, value, err := d.NextTag()
if err != nil {
msg := "ServerHttp: Failed to decode tag: " + err.Error()
cclog.ComponentError(r.name, msg)
http.Error(w, msg, http.StatusInternalServerError)
return
}
if key == nil {
break
}
tags[string(key)] = string(value)
}
// Decode fields
fields := make(map[string]interface{})
for {
key, value, err := d.NextField()
if err != nil {
msg := "ServerHttp: Failed to decode field: " + err.Error()
cclog.ComponentError(r.name, msg)
http.Error(w, msg, http.StatusInternalServerError)
return
}
if key == nil {
break
}
fields[string(key)] = value.Interface()
}
// Decode time stamp
t, err := d.Time(influx.Nanosecond, time.Time{})
if err != nil {
msg := "ServerHttp: Failed to decode time stamp: " + err.Error()
cclog.ComponentError(r.name, msg)
http.Error(w, msg, http.StatusInternalServerError)
return
}
y, _ := lp.New(
string(measurement),
tags,
r.meta,
fields,
t,
)
if r.sink != nil {
r.sink <- y
}
}
// Check for IO errors
err := d.Err()
if err != nil {
msg := "ServerHttp: Failed to decode: " + err.Error()
cclog.ComponentError(r.name, msg)
http.Error(w, msg, http.StatusInternalServerError)
return
}
w.WriteHeader(http.StatusOK)
}

View File

@@ -10,7 +10,10 @@ The `http` receiver can be used receive metrics through HTTP POST requests.
"type": "http",
"address" : "",
"port" : "8080",
"path" : "/write"
"path" : "/write",
"idle_timeout": "120s",
"username": "myUser",
"password": "myPW"
}
}
```
@@ -19,5 +22,22 @@ The `http` receiver can be used receive metrics through HTTP POST requests.
- `address`: Listen address
- `port`: Listen port
- `path`: URL path for the write endpoint
- `idle_timeout`: Maximum amount of time to wait for the next request when keep-alives are enabled should be larger than the measurement interval to keep the connection open
- `keep_alives_enabled`: Controls whether HTTP keep-alives are enabled. By default, keep-alives are enabled.
- `username`: username for basic authentication
- `password`: password for basic authentication
The HTTP endpoint listens to `http://<address>:<port>/<path>`
### Debugging
- Install [curl](https://curl.se/)
- Use curl to send message to `http` receiver
```bash
curl http://localhost:8080/write \
--user "myUser:myPW" \
--data \
"myMetric,hostname=myHost,type=hwthread,type-id=0,unit=Hz value=400000i 1694777161164284635
myMetric,hostname=myHost,type=hwthread,type-id=1,unit=Hz value=400001i 1694777161164284635"
```

View File

@@ -8,7 +8,7 @@ import (
cclog "github.com/ClusterCockpit/cc-metric-collector/pkg/ccLogger"
lp "github.com/ClusterCockpit/cc-metric-collector/pkg/ccMetric"
influx "github.com/influxdata/line-protocol"
influx "github.com/influxdata/line-protocol/v2/lineprotocol"
nats "github.com/nats-io/nats.go"
)
@@ -21,37 +21,85 @@ type NatsReceiverConfig struct {
type NatsReceiver struct {
receiver
nc *nats.Conn
handler *influx.MetricHandler
parser *influx.Parser
meta map[string]string
config NatsReceiverConfig
}
var DefaultTime = func() time.Time {
return time.Unix(42, 0)
nc *nats.Conn
meta map[string]string
config NatsReceiverConfig
}
// Start subscribes to the configured NATS subject
// Messages wil be handled by r._NatsReceive
func (r *NatsReceiver) Start() {
cclog.ComponentDebug(r.name, "START")
r.nc.Subscribe(r.config.Subject, r._NatsReceive)
}
// _NatsReceive receives subscribed messages from the NATS server
func (r *NatsReceiver) _NatsReceive(m *nats.Msg) {
metrics, err := r.parser.Parse(m.Data)
if err == nil {
for _, m := range metrics {
y := lp.FromInfluxMetric(m)
for k, v := range r.meta {
y.AddMeta(k, v)
d := influx.NewDecoderWithBytes(m.Data)
for d.Next() {
// Decode measurement name
measurement, err := d.Measurement()
if err != nil {
msg := "_NatsReceive: Failed to decode measurement: " + err.Error()
cclog.ComponentError(r.name, msg)
return
}
// Decode tags
tags := make(map[string]string)
for {
key, value, err := d.NextTag()
if err != nil {
msg := "_NatsReceive: Failed to decode tag: " + err.Error()
cclog.ComponentError(r.name, msg)
return
}
if r.sink != nil {
r.sink <- y
if key == nil {
break
}
tags[string(key)] = string(value)
}
// Decode fields
fields := make(map[string]interface{})
for {
key, value, err := d.NextField()
if err != nil {
msg := "_NatsReceive: Failed to decode field: " + err.Error()
cclog.ComponentError(r.name, msg)
return
}
if key == nil {
break
}
fields[string(key)] = value.Interface()
}
// Decode time stamp
t, err := d.Time(influx.Nanosecond, time.Time{})
if err != nil {
msg := "_NatsReceive: Failed to decode time: " + err.Error()
cclog.ComponentError(r.name, msg)
return
}
y, _ := lp.New(
string(measurement),
tags,
r.meta,
fields,
t,
)
if r.sink != nil {
r.sink <- y
}
}
}
// Close closes the connection to the NATS server
func (r *NatsReceiver) Close() {
if r.nc != nil {
cclog.ComponentDebug(r.name, "CLOSE")
@@ -59,10 +107,13 @@ func (r *NatsReceiver) Close() {
}
}
// NewNatsReceiver creates a new Receiver which subscribes to messages from a NATS server
func NewNatsReceiver(name string, config json.RawMessage) (Receiver, error) {
r := new(NatsReceiver)
r.name = fmt.Sprintf("NatsReceiver(%s)", name)
r.config.Addr = nats.DefaultURL
// Read configuration file, allow overwriting default config
r.config.Addr = "localhost"
r.config.Port = "4222"
if len(config) > 0 {
err := json.Unmarshal(config, &r.config)
@@ -76,17 +127,21 @@ func NewNatsReceiver(name string, config json.RawMessage) (Receiver, error) {
len(r.config.Subject) == 0 {
return nil, errors.New("not all configuration variables set required by NatsReceiver")
}
r.meta = map[string]string{"source": r.name}
uri := fmt.Sprintf("%s:%s", r.config.Addr, r.config.Port)
cclog.ComponentDebug(r.name, "NewNatsReceiver", uri, "Subject", r.config.Subject)
if nc, err := nats.Connect(uri); err == nil {
// Set metadata
r.meta = map[string]string{
"source": r.name,
}
// Connect to NATS server
url := fmt.Sprintf("nats://%s:%s", r.config.Addr, r.config.Port)
cclog.ComponentDebug(r.name, "NewNatsReceiver", url, "Subject", r.config.Subject)
if nc, err := nats.Connect(url); err == nil {
r.nc = nc
} else {
r.nc = nil
return nil, err
}
r.handler = influx.NewMetricHandler()
r.parser = influx.NewParser(r.handler)
r.parser.SetTimeFunc(DefaultTime)
return r, nil
}

View File

@@ -19,3 +19,32 @@ The `nats` receiver can be used receive metrics from the NATS network. The `nats
- `address`: Address of the NATS control server
- `port`: Port of the NATS control server
- `subject`: Subscribes to this subject and receive metrics
### Debugging
- Install NATS server and command line client
- Start NATS server
```bash
nats-server --net nats-server.example.org --port 4222
```
- Check NATS server works as expected
```bash
nats --server=nats-server-db.example.org:4222 server check
```
- Use NATS command line client to subscribe to all messages
```bash
nats --server=nats-server-db.example.org:4222 sub ">"
```
- Use NATS command line client to send message to NATS receiver
```bash
nats --server=nats-server-db.example.org:4222 pub subject \
"myMetric,hostname=myHost,type=hwthread,type-id=0,unit=Hz value=400000i 1694777161164284635
myMetric,hostname=myHost,type=hwthread,type-id=1,unit=Hz value=400001i 1694777161164284635"
```

View File

@@ -11,6 +11,7 @@ import (
)
var AvailableReceivers = map[string]func(name string, config json.RawMessage) (Receiver, error){
"http": NewHttpReceiver,
"ipmi": NewIPMIReceiver,
"nats": NewNatsReceiver,
"redfish": NewRedfishReceiver,