mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2024-11-10 04:27:25 +01:00
162cce0fda
* InfiniBandCollector: Scale raw readings from octets to bytes * Fix clock frequency coming from LikwidCollector and update docs * Build DEB package for Ubuntu 20.04 for releases * Fix memstat collector with numa_stats option * Remove useless prints from MemstatCollector * Replace ioutils with os and io (#87) * Use lower case for error strings in RocmSmiCollector * move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88) * Add collector for monitoring the execution of cc-metric-collector itself (#81) * Add collector to monitor execution of cc-metric-collector itself * Register SelfCollector * Fix import paths for moved packages * Check if at least one CPU with frequency information was detected * Correct type: /proc/stats -> /proc/stat * Update README.md * Run ipmitool asynchron. Improved error handling. * Corrected some typos * Add running average power limit (RAPL) metric collector * Add running average power limit (RAPL) metric collector * Do not mess up with the orignal configuration * * Corrected json config in numastatsMetric.md * Added some debug output to numastatsMetric.go * Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30) * Fix kernel panic for receiver config with missing receiver type * Add receiver to gather remote IPMI sensor metrics * Added config option to add ipmi-sensors command line options * Add documentaion for IPMI receiver * Update to latest version of included go modules * Add go.mod to App dependency * Try to use common metric tags across hardware vendors * Add IPMI metric: current * remove prefix enumeration like 01-... * Add IPMI receiver example configuration to receivers.json * Minimal formating changes * Add hostlist package * Added tests for hostlist Expand() * Use package hostlist to expand a host list * Use package hostlist to expand a host list * Some servers return "ConsumedPowerWatt":65535 instead of "ConsumedPowerWatt":null * Updated to latest package versions * Do not allow unknown fields in JSON configuration file * Add workflow to customize packages to docs * NFS I/O Stats Collector (#91) * Initial version * Delete values for vanished mount points and comments * Fix for Likwid collector (#95) * Run LIKWID in separate thread and check metric type * Change LIKWID collector documentation to use 'type' instead of 'scope' * Re-initialize LIKWID after one read is missing due to lock toggle * Register cc-metric-collector at Zenodo (#93) * Add initial version of Zenodo project file * Orcid ID added * Update .zenodo.json Co-authored-by: Holger Obermaier <holger.obermaier@kit.edu> * Update ipmiMetric.go Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com> Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
127 lines
4.0 KiB
Markdown
127 lines
4.0 KiB
Markdown
# CCMetric collectors
|
|
|
|
This folder contains the collectors for the cc-metric-collector.
|
|
|
|
# Configuration
|
|
|
|
```json
|
|
{
|
|
"collector_type" : {
|
|
<collector specific configuration>
|
|
}
|
|
}
|
|
```
|
|
|
|
In contrast to the configuration files for sinks and receivers, the collectors configuration is not a list but a set of dicts. This is required because we didn't manage to partially read the type before loading the remaining configuration. We are eager to change this to the same format.
|
|
|
|
# Available collectors
|
|
|
|
* [`cpustat`](./cpustatMetric.md)
|
|
* [`memstat`](./memstatMetric.md)
|
|
* [`iostat`](./iostatMetric.md)
|
|
* [`diskstat`](./diskstatMetric.md)
|
|
* [`loadavg`](./loadavgMetric.md)
|
|
* [`netstat`](./netstatMetric.md)
|
|
* [`ibstat`](./infinibandMetric.md)
|
|
* [`ibstat_perfquery`](./infinibandPerfQueryMetric.md)
|
|
* [`tempstat`](./tempMetric.md)
|
|
* [`lustrestat`](./lustreMetric.md)
|
|
* [`likwid`](./likwidMetric.md)
|
|
* [`nvidia`](./nvidiaMetric.md)
|
|
* [`customcmd`](./customCmdMetric.md)
|
|
* [`ipmistat`](./ipmiMetric.md)
|
|
* [`topprocs`](./topprocsMetric.md)
|
|
* [`nfs3stat`](./nfs3Metric.md)
|
|
* [`nfs4stat`](./nfs4Metric.md)
|
|
* [`cpufreq`](./cpufreqMetric.md)
|
|
* [`cpufreq_cpuinfo`](./cpufreqCpuinfoMetric.md)
|
|
* [`numastats`](./numastatsMetric.md)
|
|
* [`gpfs`](./gpfsMetric.md)
|
|
* [`beegfs_meta`](./beegfsmetaMetric.md)
|
|
* [`beegfs_storage`](./beegfsstorageMetric.md)
|
|
* [`rocm_smi`](./rocmsmiMetric.md)
|
|
|
|
## Todos
|
|
|
|
* [ ] Aggreate metrics to higher topology entity (sum hwthread metrics to socket metric, ...). Needs to be configurable
|
|
|
|
# Contributing own collectors
|
|
A collector reads data from any source, parses it to metrics and submits these metrics to the `metric-collector`. A collector provides three function:
|
|
|
|
* `Name() string`: Return the name of the collector
|
|
* `Init(config json.RawMessage) error`: Initializes the collector using the given collector-specific config in JSON. Check if needed files/commands exists, ...
|
|
* `Initialized() bool`: Check if a collector is successfully initialized
|
|
* `Read(duration time.Duration, output chan ccMetric.CCMetric)`: Read, parse and submit data to the `output` channel as [`CCMetric`](../internal/ccMetric/README.md). If the collector has to measure anything for some duration, use the provided function argument `duration`.
|
|
* `Close()`: Closes down the collector.
|
|
|
|
It is recommanded to call `setup()` in the `Init()` function.
|
|
|
|
Finally, the collector needs to be registered in the `collectorManager.go`. There is a list of collectors called `AvailableCollectors` which is a map (`collector_type_string` -> `pointer to MetricCollector interface`). Add a new entry with a descriptive name and the new collector.
|
|
|
|
## Sample collector
|
|
|
|
```go
|
|
package collectors
|
|
|
|
import (
|
|
"encoding/json"
|
|
"time"
|
|
|
|
lp "github.com/ClusterCockpit/cc-metric-collector/internal/ccMetric"
|
|
)
|
|
|
|
// Struct for the collector-specific JSON config
|
|
type SampleCollectorConfig struct {
|
|
ExcludeMetrics []string `json:"exclude_metrics"`
|
|
}
|
|
|
|
type SampleCollector struct {
|
|
metricCollector
|
|
config SampleCollectorConfig
|
|
}
|
|
|
|
func (m *SampleCollector) Init(config json.RawMessage) error {
|
|
// Check if already initialized
|
|
if m.init {
|
|
return nil
|
|
}
|
|
|
|
m.name = "SampleCollector"
|
|
m.setup()
|
|
if len(config) > 0 {
|
|
err := json.Unmarshal(config, &m.config)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
}
|
|
m.meta = map[string]string{"source": m.name, "group": "Sample"}
|
|
|
|
m.init = true
|
|
return nil
|
|
}
|
|
|
|
func (m *SampleCollector) Read(interval time.Duration, output chan lp.CCMetric) {
|
|
if !m.init {
|
|
return
|
|
}
|
|
// tags for the metric, if type != node use proper type and type-id
|
|
tags := map[string]string{"type" : "node"}
|
|
|
|
x, err := GetMetric()
|
|
if err != nil {
|
|
cclog.ComponentError(m.name, fmt.Sprintf("Read(): %v", err))
|
|
}
|
|
|
|
// Each metric has exactly one field: value !
|
|
value := map[string]interface{}{"value": int64(x)}
|
|
if y, err := lp.New("sample_metric", tags, m.meta, value, time.Now()); err == nil {
|
|
output <- y
|
|
}
|
|
}
|
|
|
|
func (m *SampleCollector) Close() {
|
|
m.init = false
|
|
return
|
|
}
|
|
```
|