Overview
ClusterCockpit uses the InfluxData line-protocol for transferring metrics and events between its components. The line-protocol is a text-based representation of a metric/event with a value, time and describing tags. All metrics/events have the following format (if written to stdout):
<measurement>,<tag set> <field set> <timestamp>
where <tag set> and <field set> are comma-separated lists of key=value entries. In a mind-model, think about tags as indices in the database for faster lookup and the <field set> as metric/event values. The <measurement> is here used as "identifier" because it does not represent a "measurement" in all cases.
Line-protocol in the ClusterCockpit ecosystem
In ClusterCockpit we limit the flexibility of the InfluxData line-protocol slightly. The idea is to keep the format evaluatable by different components.
Each metric is identifiable by the measurement (= metric name), the hostname, the type and, if required, a type-id.
Mandatory tags per message:
hostnametypenodesocketdiememoryDomainllccorehwthreadaccelerator
type-idfor further specifying the type like CPU socket or HW Thread identifier
Although no type-id is required if type=node, it is recommended to send type=node,type-id=0.
Optional tags depending on the message:
In some cases, optional tags are required like filesystem, device or version. While you are free to do that, the ClusterCockpit components in the stack above will recognize stype (= "sub type") and stype-id. So filesystem=/homes should be better specified as stype=filesystem,stype-id=/homes.
Mandatory fields per measurement:
Only the single field of each message type (value, event, control or log) is mandatory. No other field keys are evaluated by the ClusterCockpit ecosystem but can be used for other purposes.
Message types
There exist different message types in the ClusterCockpit ecosystem, all specified using the InfluxData line-protocol.
Metrics
Identification: value=X field with X being a number
While the measurements (metric names) can be chosen freely, there is a basic set of measurements which should be present as long as you navigate in the ClusterCockpit ecosystem
flops_sp: Single-precision floating point rate inFlops/sflops_dp: Double-precision floating point rate inFlops/sflops_any: Combined floating point rate inFlops/s(often(flops_dp * 2) + flops_sp)cpu_load: The 1m load of the system (see/proc/loadavg)mem_used: The amount of memory used by applications (see/proc/meminfo)ipc: instructions-per-cycle metricmem_bw: Main memory bandwidth (read and write) inMByte/scpu_power: Power consumption of the whole CPU packagemem_power: Power consumption of the memory subsystemclock: CPU clock inMHz- ...
For the whole list, see job-data schema
Events
Identification: event="X" field with "X" being a string
Controls
Identification:
control="X"field with"X"being a stringmethodtag is eitherGETorPUT
Logs
Identification: log="X" field with "X" being a string