cc-specifications/interfaces/lineprotocol/README.md

102 lines
3.5 KiB
Markdown
Raw Normal View History

2024-12-20 09:29:46 +01:00
# InfluxData line-protocol flavor
2024-12-20 09:29:46 +01:00
## Overview
2024-12-20 09:29:46 +01:00
ClusterCockpit uses the [InfluxData line-protocol](https://docs.influxdata.com/influxdb/v2.1/reference/syntax/line-protocol/) for transferring messages between its components. The line-protocol is a text-based representation of a metric/event with a value, time and describing tags. All metrics/events have the following format (if written to `stdout`):
```
2022-03-18 16:08:40 +01:00
<measurement>,<tag set> <field set> <timestamp>
```
2024-12-20 09:29:46 +01:00
where `<tag set>` and `<field set>` are comma-separated lists of `key=value`
entries. In a mind-model, think about tags as `indices` in the database for
faster lookup and the `<field set>` as values.
2022-03-18 16:08:40 +01:00
2024-12-20 09:29:46 +01:00
**Remark**: In the first iteration, we only sent metrics (number values) but we
had to extend the specification to messages with different meanings. The below
text was changes accordingly. The update is downward-compatible, so for metrics
(number values), nothing changed.
2022-03-18 16:08:40 +01:00
2024-12-20 09:29:46 +01:00
## Line-protocol in the ClusterCockpit ecosystem
2022-03-18 16:08:40 +01:00
2024-12-20 09:29:46 +01:00
In ClusterCockpit we limit the flexibility of the InfluxData line-protocol
slightly. The idea is to keep the format evaluatable by different components.
2024-12-20 09:29:46 +01:00
Each message is identifiable by the `measurement` (= metric name), the
`hostname`, the `type` and, if required, a `type-id`.
2022-03-18 16:08:40 +01:00
2024-12-20 09:29:46 +01:00
### Mandatory tags per message
* `hostname`
* `type`
2024-12-20 09:29:46 +01:00
* `node`
* `socket`
* `die`
* `memoryDomain`
* `llc`
* `core`
* `hwthread`
* `accelerator`
* `type-id` for further specifying the type like CPU socket or HW Thread identifier
Although no `type-id` is required if `type=node`, it is recommended to send `type=node,type-id=0`.
2024-12-20 09:29:46 +01:00
#### Optional tags depending on the message
2024-12-20 09:29:46 +01:00
In some cases, optional tags are required like `filesystem`, `device` or
`version`. While you are free to do that, the ClusterCockpit components in the
stack above will recognize `stype` (= "sub type") and `stype-id`. So
`filesystem=/homes` should be better specified as
`stype=filesystem,stype-id=/homes`.
2024-12-20 09:29:46 +01:00
### Mandatory fields per measurement
2024-06-21 15:06:50 +02:00
2024-12-20 09:29:46 +01:00
* Metric: The field key is always `value`
* Event: The field key is always `event`
* Log message: The field key is always `log`
* Control message: The field key is always `log`
2024-06-21 15:06:50 +02:00
No other field keys are evaluated by the ClusterCockpit ecosystem.
2024-12-20 09:29:46 +01:00
### Message types
2024-12-20 09:29:46 +01:00
There exist different message types in the ClusterCockpit ecosystem, all
specified using the InfluxData line-protocol.
2024-12-20 09:29:46 +01:00
#### Metrics
2024-09-06 13:11:03 +02:00
**Identification:** `value=X` field with `X` being a number
2024-12-20 09:29:46 +01:00
While the measurements (metric names) can be chosen freely, there is a basic set
of measurements which should be present as long as you navigate in the
ClusterCockpit ecosystem
2022-03-18 16:08:40 +01:00
* `flops_sp`: Single-precision floating point rate in `Flops/s`
* `flops_dp`: Double-precision floating point rate in `Flops/s`
* `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`)
* `cpu_load`: The 1m load of the system (see `/proc/loadavg`)
* `mem_used`: The amount of memory used by applications (see `/proc/meminfo`)
* `ipc`: instructions-per-cycle metric
* `mem_bw`: Main memory bandwidth (read and write) in `MByte/s`
* `cpu_power`: Power consumption of the whole CPU package
* `mem_power`: Power consumption of the memory subsystem
* `clock`: CPU clock in `MHz`
* ...
2022-07-17 07:10:12 +02:00
For the whole list, see [job-data schema](../../datastructures/job-data.schema.json)
2024-12-20 09:29:46 +01:00
#### Events
2024-09-06 13:11:03 +02:00
**Identification:** `event="X"` field with `"X"` being a string
2024-12-20 09:29:46 +01:00
#### Controls
2024-12-20 09:29:46 +01:00
**Identification:**
2024-09-06 13:11:03 +02:00
2024-12-20 09:29:46 +01:00
* `control="X"` field with `"X"` being a string
* `method` tag is either `GET` or `PUT`
2024-12-20 09:29:46 +01:00
#### Logs
2024-12-20 09:29:46 +01:00
**Identification:** `log="X"` field with `"X"` being a string