Continue on lineprotocol ccMessage specs

This commit is contained in:
Jan Eitzinger 2025-02-20 05:30:46 +01:00
parent 6205680fb8
commit 917e17d9bf
Signed by: moebiusband
GPG Key ID: 2574BA29B90D6DD5

View File

@ -16,6 +16,9 @@ Where `<tag set>` and `<field set>` are comma-separated lists of `key=value`
entries. In a mind-model, think about tags as `indices` in the database for entries. In a mind-model, think about tags as `indices` in the database for
faster lookup and the `<field set>` as values. faster lookup and the `<field set>` as values.
We are using the tag set to add metadata information and the field for the
payload.
**Remark**: In the first iteration, we only sent metrics (number values) but we **Remark**: In the first iteration, we only sent metrics (number values) but we
extended the specification to messages with different purposes. The below extended the specification to messages with different purposes. The below
text was changed accordingly. The update is backward-compatible, for metrics text was changed accordingly. The update is backward-compatible, for metrics
@ -25,10 +28,11 @@ text was changed accordingly. The update is backward-compatible, for metrics
There exist the following line line-protocol message flavors: There exist the following line line-protocol message flavors:
* Metric: The field key is `value` - Metric: The field key is `value`, measurement = metric name
* Event: The field key is `event` - Event: The field key is `event`, Events are actionable informations, measurement = event subtype (job, phases, ?? ), Additional tag function=<string>
* Log message: The field key is `log` - Log message: The field key is `log`. Log messages are purely informational,
* Control message: The field key is `control` measurement = [ccb, ccms, ccmc, ccem, ccnc], Additional tag loglevel
- Control message: The field key is `control`, measurement = knob name (rapl, freq, prefetcher, topology, config), Additional tags: method=[get,put]
## Messaging ## Messaging
@ -45,10 +49,10 @@ subject hierarchy tree is used:
| |
--- log.[ccb, ccms, ccmc, ccem, ccnc] --- log.[ccb, ccms, ccmc, ccem, ccnc]
| |
--- control --- control.[get,put]
``` ```
## Metric messages ## Points generic for all message categories
In ClusterCockpit we limit the flexibility of the InfluxData line-protocol In ClusterCockpit we limit the flexibility of the InfluxData line-protocol
slightly. The idea is to keep the format usable by different components. slightly. The idea is to keep the format usable by different components.
@ -58,17 +62,17 @@ Each message is identifiable by the `measurement` (= metric name), the
### Mandatory tags per message ### Mandatory tags per message
* `hostname` - `hostname`
* `type` - `type`
* `node` - `node`
* `socket` - `socket`
* `die` - `die`
* `memoryDomain` - `memoryDomain`
* `llc` - `llc`
* `core` - `core`
* `hwthread` - `hwthread`
* `accelerator` - `accelerator`
* `type-id` for further specifying the type like CPU socket or HW Thread identifier - `type-id` for further specifying the type like CPU socket or HW Thread identifier
Although no `type-id` is required if `type=node`, it is recommended to send `type=node,type-id=0`. Although no `type-id` is required if `type=node`, it is recommended to send `type=node,type-id=0`.
@ -93,17 +97,17 @@ While the measurements (metric names) can be chosen freely, there is a basic set
of measurements which should be present as long as you navigate in the of measurements which should be present as long as you navigate in the
ClusterCockpit ecosystem ClusterCockpit ecosystem
* `flops_sp`: Single-precision floating point rate in `Flops/s` - `flops_sp`: Single-precision floating point rate in `Flops/s`
* `flops_dp`: Double-precision floating point rate in `Flops/s` - `flops_dp`: Double-precision floating point rate in `Flops/s`
* `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`) - `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`)
* `cpu_load`: The 1m load of the system (see `/proc/loadavg`) - `cpu_load`: The 1m load of the system (see `/proc/loadavg`)
* `mem_used`: The amount of memory used by applications (see `/proc/meminfo`) - `mem_used`: The amount of memory used by applications (see `/proc/meminfo`)
* `ipc`: instructions-per-cycle metric - `ipc`: instructions-per-cycle metric
* `mem_bw`: Main memory bandwidth (read and write) in `MByte/s` - `mem_bw`: Main memory bandwidth (read and write) in `MByte/s`
* `cpu_power`: Power consumption of the whole CPU package - `cpu_power`: Power consumption of the whole CPU package
* `mem_power`: Power consumption of the memory subsystem - `mem_power`: Power consumption of the memory subsystem
* `clock`: CPU clock in `MHz` - `clock`: CPU clock in `MHz`
* ... - ...
For the whole list, see [job-data schema](../../datastructures/job-data.schema.json) For the whole list, see [job-data schema](../../datastructures/job-data.schema.json)
@ -121,8 +125,8 @@ TBD
**Identification:** **Identification:**
* `control="X"` field with `"X"` being a string - `control="X"` field with `"X"` being a string
* `method` tag is either `GET` or `PUT` - `method` tag is either `GET` or `PUT`
#### Logs #### Logs