Merge pull request #2 from ClusterCockpit/draft

Update line protocol specification
This commit is contained in:
Jan Eitzinger 2025-03-10 10:11:22 +01:00 committed by GitHub
commit d0ec2dadb1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -2,46 +2,86 @@
## Overview
ClusterCockpit uses the [InfluxData line-protocol](https://docs.influxdata.com/influxdb/v2.1/reference/syntax/line-protocol/) for transferring messages between its components. The line-protocol is a text-based representation of a metric/event with a value, time and describing tags. All metrics/events have the following format (if written to `stdout`):
ClusterCockpit uses the
[InfluxData line-protocol](https://docs.influxdata.com/influxdb/v2.1/reference/syntax/line-protocol/)
for transferring messages between its components. The line-protocol is a
text-based representation of a metric/event with a value, time and describing
tags. All metrics/events have the following format (if written to `stdout`):
```
```txt
<measurement>,<tag set> <field set> <timestamp>
```
where `<tag set>` and `<field set>` are comma-separated lists of `key=value`
Where `<tag set>` and `<field set>` are comma-separated lists of `key=value`
entries. In a mind-model, think about tags as `indices` in the database for
faster lookup and the `<field set>` as values.
The timestamp is UNIX epoch time in seconds!
We are using the tag set to add metadata information and one field for the
payload.
**Remark**: In the first iteration, we only sent metrics (number values) but we
had to extend the specification to messages with different meanings. The below
text was changes accordingly. The update is downward-compatible, so for metrics
(number values), nothing changed.
extended the specification to messages with different purposes. The below
text was changed accordingly. The update is backward-compatible, for metrics
(number values), nothing has changed.
## Line-protocol in the ClusterCockpit ecosystem
## Message categories
In ClusterCockpit we limit the flexibility of the InfluxData line-protocol
slightly. The idea is to keep the format evaluatable by different components.
There are four line-protocol message flavors:
Each message is identifiable by the `measurement` (= metric name), the
- **Metric**: The `field` key is `value`, the `measurement` is the metric name
- **Event**: The `field` key is `event`. Events are actionable informations. The
`measurement` is set to an event class (job, slurm, status, phases, ?? ). Additional tag
`function` to indicate the purpose, similar to a REST endpoint (for the job
class this can be start_job and stop_job).
- **Log**: The `field` key is `log`. Log messages are purely informational.
The `measurement` is set to the component identifier [ccb, ccms, ccmc, ccem,
ccnc]. Additional tag `loglevel` to set the log level (debug, info, warn,
error).
- **Control**: The `field` key is `control`, the `measurement` is set to a
control class (rapl, freq, prefetcher, topology, config). Additional tag
`method` with on of [GET,PUT].
## Messaging subjects
ClusterCockpit uses the NATS messaging network, with the option to support other
messaging frameworks in the future. To distinguish between different message
types and easily filter the types an application is interested in the following
subject hierarchy tree is used:
```txt
<cluster name>. |
--- metrics
|
--- events.[job, slurm]
|
--- log.[ccb, ccms, ccmc, ccem, ccnc]
|
--- control.[get, put]
```
## Rules valid for all message categories
Each message is identifiable by the `measurement`, and the tags
`hostname`, the `type` and, if required, a `type-id`.
### Mandatory tags per message
* `hostname`
* `type`
* `node`
* `socket`
* `die`
* `memoryDomain`
* `llc`
* `core`
* `hwthread`
* `accelerator`
* `type-id` for further specifying the type like CPU socket or HW Thread identifier
- `hostname`
- `type`
- `node`
- `socket`
- `die`
- `memoryDomain`
- `llc`
- `core`
- `hwthread`
- `accelerator`
- `type-id` for further specifying the type like CPU socket or HW Thread identifier
Although no `type-id` is required if `type=node`, it is recommended to send `type=node,type-id=0`.
#### Optional tags depending on the message
#### Optional tags depending on the message type
In some cases, optional tags are required like `filesystem`, `device` or
`version`. While you are free to do that, the ClusterCockpit components in the
@ -49,15 +89,6 @@ stack above will recognize `stype` (= "sub type") and `stype-id`. So
`filesystem=/homes` should be better specified as
`stype=filesystem,stype-id=/homes`.
### Mandatory fields per measurement
* Metric: The field key is always `value`
* Event: The field key is always `event`
* Log message: The field key is always `log`
* Control message: The field key is always `log`
No other field keys are evaluated by the ClusterCockpit ecosystem.
### Message types
There exist different message types in the ClusterCockpit ecosystem, all
@ -71,31 +102,59 @@ While the measurements (metric names) can be chosen freely, there is a basic set
of measurements which should be present as long as you navigate in the
ClusterCockpit ecosystem
* `flops_sp`: Single-precision floating point rate in `Flops/s`
* `flops_dp`: Double-precision floating point rate in `Flops/s`
* `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`)
* `cpu_load`: The 1m load of the system (see `/proc/loadavg`)
* `mem_used`: The amount of memory used by applications (see `/proc/meminfo`)
* `ipc`: instructions-per-cycle metric
* `mem_bw`: Main memory bandwidth (read and write) in `MByte/s`
* `cpu_power`: Power consumption of the whole CPU package
* `mem_power`: Power consumption of the memory subsystem
* `clock`: CPU clock in `MHz`
* ...
- `flops_sp`: Single-precision floating point rate in `Flops/s`
- `flops_dp`: Double-precision floating point rate in `Flops/s`
- `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`)
- `cpu_load`: The 1m load of the system (see `/proc/loadavg`)
- `mem_used`: The amount of memory used by applications (see `/proc/meminfo`)
- `ipc`: instructions-per-cycle metric
- `mem_bw`: Main memory bandwidth (read and write) in `MByte/s`
- `cpu_power`: Power consumption of the whole CPU package
- `mem_power`: Power consumption of the memory subsystem
- `clock`: CPU clock in `MHz`
- ...
FIXME: What about the unit??
For the whole list, see [job-data schema](../../datastructures/job-data.schema.json)
Example:
```txt
flops_any,hostname=e1208,type=core,type-id=23 value=1203.3 1740027951
```
#### Events
**Identification:** `event="X"` field with `"X"` being a string
**Identification:** Field `event="X"` with `"X"` being the payload string.
The name (measurement) of the event message indicates the event
class. The function tag specifies the purpose (similar to REST endpoints), e.g.
`start_job`, and `stop_job` for events of class job.
Example:
```txt
job,hostname=mngmt02,type=node,type-id=0,function=stop_job event={"jobId": 69, "cluster": "ccfront", "stopTime": 1738842306, "jobState": "completed"} 1740027951
```
#### Controls
**Identification:**
**Identification:** Field `control="X"` with `"X"` being the control request. `measurement` is
set to a control class, the tag `method` is either `GET` or `PUT`.
* `control="X"` field with `"X"` being a string
* `method` tag is either `GET` or `PUT`
Example:
```txt
rapl,hostname=e1208,type=socket,type-id=2,method=GET control=intel.pkg.energy_status 1740027951
```
#### Logs
**Identification:** `log="X"` field with `"X"` being a string
**Identification:** `log="X"` field with `"X"` being the log message. The `measurement` is
set to source component id, the tag `loglevel` is one of debug, info, warn,
error.
Example:
```txt
ccb,hostname=server01,type=node,type-id=1,loglevel=info log="component: archiver cluster: alex jobId: 232383 - archiving finished" 1740027951
```