mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-01-26 05:49:06 +01:00
7840de7b82
* Add cpu_used (all-cpu_idle) to CpustatCollector * Update cc-metric-collector.init * Allow selection of timestamp precision in HttpSink * Add comment about precision requirement for cc-metric-store * Fix for API changes in gofish@v0.15.0 * Update requirements to latest version * Read sensors through redfish * Update golang toolchain to 1.21 * Remove stray error check * Update main config in configuration.md * Update Release action to use golang 1.22 stable release, no golang RPMs anymore * Update runonce action to use golang 1.22 stable release, no golang RPMs anymore * Update README.md Use right JSON type in configuration * Update sink's README * Test whether ipmitool or ipmi-sensors can be executed without errors * Little fixes to the prometheus sink (#115) * Add uint64 to float64 cast option * Add prometheus sink to the list of available sinks * Add aggregated counters by gpu for nvlink errors --------- Co-authored-by: Michael Schwarz <schwarz@uni-paderborn.de> * Ccmessage migration (#119) * Add cpu_used (all-cpu_idle) to CpustatCollector * Update cc-metric-collector.init * Allow selection of timestamp precision in HttpSink * Add comment about precision requirement for cc-metric-store * Fix for API changes in gofish@v0.15.0 * Update requirements to latest version * Read sensors through redfish * Update golang toolchain to 1.21 * Remove stray error check * Update main config in configuration.md * Update Release action to use golang 1.22 stable release, no golang RPMs anymore * Update runonce action to use golang 1.22 stable release, no golang RPMs anymore * Switch to CCMessage for all files. --------- Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu> Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com> * Switch to ccmessage also for latest additions in nvidiaMetric * New Message processor (#118) * Add cpu_used (all-cpu_idle) to CpustatCollector * Update cc-metric-collector.init * Allow selection of timestamp precision in HttpSink * Add comment about precision requirement for cc-metric-store * Fix for API changes in gofish@v0.15.0 * Update requirements to latest version * Read sensors through redfish * Update golang toolchain to 1.21 * Remove stray error check * Update main config in configuration.md * Update Release action to use golang 1.22 stable release, no golang RPMs anymore * Update runonce action to use golang 1.22 stable release, no golang RPMs anymore * New message processor to check whether a message should be dropped or manipulate it in flight * Create a copy of message before manipulation --------- Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu> Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com> * Update collector's Makefile and go.mod/sum files * Use message processor in router, all sinks and all receivers * Add support for credential file (NKEY) to NATS sink and receiver * Fix JSON keys in message processor configuration * Update docs for message processor, router and the default router config file * Add link to expr syntax and fix regex matching docs * Update sample collectors * Minor style change in collector manager * Some helpers for ccTopology * LIKWID collector: write log owner change only once * Fix for metrics without units and reduce debugging messages for messageProcessor * Use shorted hostname for hostname added by router * Define default port for NATS * CPUstat collector: only add unit for applicable metrics * Add precision option to all sinks using Influx's encoder * Add message processor to all sink documentation * Add units to documentation of cpustat collector --------- Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu> Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com> Co-authored-by: oscarminus <me@oscarminus.de> Co-authored-by: Michael Schwarz <schwarz@uni-paderborn.de> |
||
---|---|---|
.. | ||
metricAggregator.go | ||
metricAggregatorFunctions.go | ||
README.md |
The MetricAggregator
In some cases, further combination of metrics or raw values is required. For that strings like foo + 1
with runtime dependent foo
need to be evaluated. The MetricAggregator relies on the gval
Golang package to perform all expression evaluation. The gval
package provides the basic arithmetic operations but the MetricAggregator defines additional ones.
Note: To get an impression which expressions can be handled by gval
, see its README
Simple expression evaluation
For simple expression evaluation, the MetricAggregator provides two function for different use-cases:
EvalBoolCondition(expression string, params map[string]interface{}
: Used by the MetricRouter to match metrics likemetric.Name() == 'mymetric'
EvalFloat64Condition(expression string, params map[string]interface{})
: Used by the MetricRouter and LikwidCollector to derive new values like(PMC0+PMC1)/PMC3
MetricAggregator extensions for gval
The MetricAggregator provides these functions additional to the Full
language in gval
:
sum(array)
: Sum up values in an array likesum(values)
min(array)
: Get the minimum value in an array likemin(values)
avg(array)
: Get the mean value in an array likeavg(values)
mean(array)
: Get the mean value in an array likemean(values)
max(array)
: Get the maximum value in an array likemax(values)
len(array)
: Get the length of an array likelen(values)
median(array)
: Get the median value in an array likemean(values)
in
: Check existence in an array like0 in getCpuList()
to check whether there is an entry0
. Also substring matching works liketemp in metric.Name()
match
: Regular-expression matching likematch('temp_cores_%d+', metric.Name())
. Note all\
in an regex has to be replaced with%
getCpuCore(cpuid)
: For a CPU id, the the corresponding CPU core id likegetCpuCore(0)
getCpuSocket(cpuid)
: For a CPU id, the the corresponding CPU socket idgetCpuNuma(cpuid)
: For a CPU id, the the corresponding NUMA domain idgetCpuDie(cpuid)
: For a CPU id, the the corresponding CPU die idgetSockCpuList(sockid)
: For a given CPU socket id, the list of CPU ids is returned like the CPUs on socket 1getSockCpuList(1)
getNumaCpuList(numaid)
: For a given NUMA node id, the list of CPU ids is returnedgetDieCpuList(dieid)
: For a given CPU die id, the list of CPU ids is returnedgetCoreCpuList(coreid)
: For a given CPU core id, the list of CPU ids is returnedgetCpuList
: Get the list of all CPUs
Limitations
- Since the metrics are written in JSON files which do not allow
""
without proper escaping inside of JSON strings, you have to use''
for strings. - Since
\
is interpreted by JSON as escape character, it cannot be used in metrics. But it is required to write regular expressions. So instead of/
, use%
and the MetricAggregator replaces them after reading the JSON file.