Continue on lineprotocol ccMessage specs

This commit is contained in:
Jan Eitzinger 2025-02-20 05:30:46 +01:00
parent 6205680fb8
commit 917e17d9bf
Signed by: moebiusband
GPG Key ID: 2574BA29B90D6DD5

View File

@ -16,6 +16,9 @@ Where `<tag set>` and `<field set>` are comma-separated lists of `key=value`
entries. In a mind-model, think about tags as `indices` in the database for
faster lookup and the `<field set>` as values.
We are using the tag set to add metadata information and the field for the
payload.
**Remark**: In the first iteration, we only sent metrics (number values) but we
extended the specification to messages with different purposes. The below
text was changed accordingly. The update is backward-compatible, for metrics
@ -25,10 +28,11 @@ text was changed accordingly. The update is backward-compatible, for metrics
There exist the following line line-protocol message flavors:
* Metric: The field key is `value`
* Event: The field key is `event`
* Log message: The field key is `log`
* Control message: The field key is `control`
- Metric: The field key is `value`, measurement = metric name
- Event: The field key is `event`, Events are actionable informations, measurement = event subtype (job, phases, ?? ), Additional tag function=<string>
- Log message: The field key is `log`. Log messages are purely informational,
measurement = [ccb, ccms, ccmc, ccem, ccnc], Additional tag loglevel
- Control message: The field key is `control`, measurement = knob name (rapl, freq, prefetcher, topology, config), Additional tags: method=[get,put]
## Messaging
@ -45,10 +49,10 @@ subject hierarchy tree is used:
|
--- log.[ccb, ccms, ccmc, ccem, ccnc]
|
--- control
--- control.[get,put]
```
## Metric messages
## Points generic for all message categories
In ClusterCockpit we limit the flexibility of the InfluxData line-protocol
slightly. The idea is to keep the format usable by different components.
@ -58,17 +62,17 @@ Each message is identifiable by the `measurement` (= metric name), the
### Mandatory tags per message
* `hostname`
* `type`
* `node`
* `socket`
* `die`
* `memoryDomain`
* `llc`
* `core`
* `hwthread`
* `accelerator`
* `type-id` for further specifying the type like CPU socket or HW Thread identifier
- `hostname`
- `type`
- `node`
- `socket`
- `die`
- `memoryDomain`
- `llc`
- `core`
- `hwthread`
- `accelerator`
- `type-id` for further specifying the type like CPU socket or HW Thread identifier
Although no `type-id` is required if `type=node`, it is recommended to send `type=node,type-id=0`.
@ -93,17 +97,17 @@ While the measurements (metric names) can be chosen freely, there is a basic set
of measurements which should be present as long as you navigate in the
ClusterCockpit ecosystem
* `flops_sp`: Single-precision floating point rate in `Flops/s`
* `flops_dp`: Double-precision floating point rate in `Flops/s`
* `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`)
* `cpu_load`: The 1m load of the system (see `/proc/loadavg`)
* `mem_used`: The amount of memory used by applications (see `/proc/meminfo`)
* `ipc`: instructions-per-cycle metric
* `mem_bw`: Main memory bandwidth (read and write) in `MByte/s`
* `cpu_power`: Power consumption of the whole CPU package
* `mem_power`: Power consumption of the memory subsystem
* `clock`: CPU clock in `MHz`
* ...
- `flops_sp`: Single-precision floating point rate in `Flops/s`
- `flops_dp`: Double-precision floating point rate in `Flops/s`
- `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`)
- `cpu_load`: The 1m load of the system (see `/proc/loadavg`)
- `mem_used`: The amount of memory used by applications (see `/proc/meminfo`)
- `ipc`: instructions-per-cycle metric
- `mem_bw`: Main memory bandwidth (read and write) in `MByte/s`
- `cpu_power`: Power consumption of the whole CPU package
- `mem_power`: Power consumption of the memory subsystem
- `clock`: CPU clock in `MHz`
- ...
For the whole list, see [job-data schema](../../datastructures/job-data.schema.json)
@ -121,8 +125,8 @@ TBD
**Identification:**
* `control="X"` field with `"X"` being a string
* `method` tag is either `GET` or `PUT`
- `control="X"` field with `"X"` being a string
- `method` tag is either `GET` or `PUT`
#### Logs