mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-01-13 15:49:06 +01:00
8d85bd53f1
* Cleanup: Remove unused code * Use Golang duration parser for 'interval' and 'duration' in main config * Update handling of LIKWID headers. Download only if not already present in the system. Fixes #73 * Units with cc-units (#64) * Add option to normalize units with cc-unit * Add unit conversion to router * Add option to change unit prefix in the router * Add to MetricRouter README * Add order of operations in router to README * Use second add_tags/del_tags only if metric gets renamed * Skip disks in DiskstatCollector that have size=0 * Check readability of sensor files in TempCollector * Fix for --once option * Rename `cpu` type to `hwthread` (#69) * Rename 'cpu' type to 'hwthread' to avoid naming clashes with MetricStore and CC-Webfrontend * Collectors in parallel (#74) * Provide info to CollectorManager whether the collector can be executed in parallel with others * Split serial and parallel collectors. Read in parallel first * Update NvidiaCollector with new metrics, MIG and NvLink support (#75) * CC topology module update (#76) * Rename CPU to hardware thread, write some comments * Do renaming in other parts * Remove CpuList and SocketList function from metricCollector. Available in ccTopology * Option to use MIG UUID as subtype-id in NvidiaCollector * Option to use MIG slice name as subtype-id in NvidiaCollector * MetricRouter: Fix JSON in README * Fix for Github Action to really use the selected version * Remove Ganglia installation in runonce Action and add Go 1.18 * Fix daemon options in init script * Add separate go.mod files to use it with deprecated 1.16 * Minor updates for Makefiles * fix string comparison * AMD ROCm SMI collector (#77) * Add collector for AMD ROCm SMI metrics * Fix import path * Fix imports * Remove Board Number * store GPU index explicitly * Remove board number from description * Use http instead of ftp to download likwid * Fix serial number in rocmCollector * Improved http sink (#78) * automatic flush in NatsSink * tweak default options of HttpSink * shorter cirt. section and retries for HttpSink * fix error handling * Remove file added by mistake. * Use http instead of ftp to download likwid * Fix serial number in rocmCollector Co-authored-by: Thomas Roehl <thomas.roehl@fau.de> Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com> Co-authored-by: Lou <lou.knauer@gmx.de> |
||
---|---|---|
.. | ||
metricAggregator.go | ||
metricAggregatorFunctions.go | ||
README.md |
The MetricAggregator
In some cases, further combination of metrics or raw values is required. For that strings like foo + 1
with runtime dependent foo
need to be evaluated. The MetricAggregator relies on the gval
Golang package to perform all expression evaluation. The gval
package provides the basic arithmetic operations but the MetricAggregator defines additional ones.
Note: To get an impression which expressions can be handled by gval
, see its README
Simple expression evaluation
For simple expression evaluation, the MetricAggregator provides two function for different use-cases:
EvalBoolCondition(expression string, params map[string]interface{}
: Used by the MetricRouter to match metrics likemetric.Name() == 'mymetric'
EvalFloat64Condition(expression string, params map[string]interface{})
: Used by the MetricRouter and LikwidCollector to derive new values like(PMC0+PMC1)/PMC3
MetricAggregator extensions for gval
The MetricAggregator provides these functions additional to the Full
language in gval
:
sum(array)
: Sum up values in an array likesum(values)
min(array)
: Get the minimum value in an array likemin(values)
avg(array)
: Get the mean value in an array likeavg(values)
mean(array)
: Get the mean value in an array likemean(values)
max(array)
: Get the maximum value in an array likemax(values)
len(array)
: Get the length of an array likelen(values)
median(array)
: Get the median value in an array likemean(values)
in
: Check existence in an array like0 in getCpuList()
to check whether there is an entry0
. Also substring matching works liketemp in metric.Name()
match
: Regular-expression matching likematch('temp_cores_%d+', metric.Name())
. Note all\
in an regex has to be replaced with%
getCpuCore(cpuid)
: For a CPU id, the the corresponding CPU core id likegetCpuCore(0)
getCpuSocket(cpuid)
: For a CPU id, the the corresponding CPU socket idgetCpuNuma(cpuid)
: For a CPU id, the the corresponding NUMA domain idgetCpuDie(cpuid)
: For a CPU id, the the corresponding CPU die idgetSockCpuList(sockid)
: For a given CPU socket id, the list of CPU ids is returned like the CPUs on socket 1getSockCpuList(1)
getNumaCpuList(numaid)
: For a given NUMA node id, the list of CPU ids is returnedgetDieCpuList(dieid)
: For a given CPU die id, the list of CPU ids is returnedgetCoreCpuList(coreid)
: For a given CPU core id, the list of CPU ids is returnedgetCpuList
: Get the list of all CPUs
Limitations
- Since the metrics are written in JSON files which do not allow
""
without proper escaping inside of JSON strings, you have to use''
for strings. - Since
\
is interpreted by JSON as escape character, it cannot be used in metrics. But it is required to write regular expressions. So instead of/
, use%
and the MetricAggregator replaces them after reading the JSON file.