Overview
ClusterCockpit uses the InfluxData line-protocol for transferring metrics between its components. The line-protocol is a text-based representation of a metric with a value, time and describing tags. All metrics have the following format (if written to stdout):
<measurement>,<tag set> <field set> <timestamp>
where <tag set> and <field set> are comma-separated lists of key=value entries. In a mind-model, think about tags as indices in the database for faster lookup and the <field set> as metric values.
Line-protocol in the ClusterCockpit ecosystem
In ClusterCockpit we limit the flexibility of the InfluxData line-protocol slightly. The idea is to keep the format evaluatable by different components.
Each metric is identifiable by the measurement (= metric name), the hostname, the type and, if required, a type-id.
Mandatory tags per measurement:
hostnametypein[node, socket, die, memoryDomain, llb, core, hwthread, (accelerator)]type-idfor further specifying the type like CPU socket or HW Thread identifier
Mandatory fields per measurement:
The field key is always value. No other field keys are evaluated by the ClusterCockpit ecosystem.
Optional tags depending on the measurement:
In some cases, optional tags are required like filesystem, device or version. While you are free to do that, the ClusterCockpit components in the stack above will recognize stype (= sub type) and stype-id in the future. So filesystem=/homes should be better specified as stype=filesystem,stype-id=/homes
Supported measurements
While the measurements (metric names) can be chosen freely, there is a basic set of measurements which should be present as long as you navigate in the ClusterCockpit ecosystem
flops_sp: Single-precision floating point rate inFlops/sflops_dp: Double-precision floating point rate inFlops/sflops_any: Combined floating point rate inFlops/s(often(flops_dp * 2) + flops_sp)cpu_load: The 1m load of the system (see/proc/loadavg)mem_used: The amount of memory used by applications (see/proc/meminfo)ipc: instructions-per-cycle metricmem_bw: Main memory bandwidth (read and write) inMByte/scpu_power: Power consumption of the whole CPU packagemem_power: Power consumption of the memory subsystemclock: CPU clock inMHz- ...
For the whole list, see job-data schema