From 917e17d9bf12a12efdbdd7e8a3541049c23a7665 Mon Sep 17 00:00:00 2001 From: Jan Eitzinger Date: Thu, 20 Feb 2025 05:30:46 +0100 Subject: [PATCH] Continue on lineprotocol ccMessage specs --- interfaces/lineprotocol/README.md | 64 ++++++++++++++++--------------- 1 file changed, 34 insertions(+), 30 deletions(-) diff --git a/interfaces/lineprotocol/README.md b/interfaces/lineprotocol/README.md index 8ff3dbc..0d34791 100644 --- a/interfaces/lineprotocol/README.md +++ b/interfaces/lineprotocol/README.md @@ -16,6 +16,9 @@ Where `` and `` are comma-separated lists of `key=value` entries. In a mind-model, think about tags as `indices` in the database for faster lookup and the `` as values. +We are using the tag set to add metadata information and the field for the +payload. + **Remark**: In the first iteration, we only sent metrics (number values) but we extended the specification to messages with different purposes. The below text was changed accordingly. The update is backward-compatible, for metrics @@ -25,10 +28,11 @@ text was changed accordingly. The update is backward-compatible, for metrics There exist the following line line-protocol message flavors: -* Metric: The field key is `value` -* Event: The field key is `event` -* Log message: The field key is `log` -* Control message: The field key is `control` +- Metric: The field key is `value`, measurement = metric name +- Event: The field key is `event`, Events are actionable informations, measurement = event subtype (job, phases, ?? ), Additional tag function= +- Log message: The field key is `log`. Log messages are purely informational, + measurement = [ccb, ccms, ccmc, ccem, ccnc], Additional tag loglevel +- Control message: The field key is `control`, measurement = knob name (rapl, freq, prefetcher, topology, config), Additional tags: method=[get,put] ## Messaging @@ -45,10 +49,10 @@ subject hierarchy tree is used: | --- log.[ccb, ccms, ccmc, ccem, ccnc] | - --- control + --- control.[get,put] ``` -## Metric messages +## Points generic for all message categories In ClusterCockpit we limit the flexibility of the InfluxData line-protocol slightly. The idea is to keep the format usable by different components. @@ -58,17 +62,17 @@ Each message is identifiable by the `measurement` (= metric name), the ### Mandatory tags per message -* `hostname` -* `type` - * `node` - * `socket` - * `die` - * `memoryDomain` - * `llc` - * `core` - * `hwthread` - * `accelerator` -* `type-id` for further specifying the type like CPU socket or HW Thread identifier +- `hostname` +- `type` + - `node` + - `socket` + - `die` + - `memoryDomain` + - `llc` + - `core` + - `hwthread` + - `accelerator` +- `type-id` for further specifying the type like CPU socket or HW Thread identifier Although no `type-id` is required if `type=node`, it is recommended to send `type=node,type-id=0`. @@ -93,17 +97,17 @@ While the measurements (metric names) can be chosen freely, there is a basic set of measurements which should be present as long as you navigate in the ClusterCockpit ecosystem -* `flops_sp`: Single-precision floating point rate in `Flops/s` -* `flops_dp`: Double-precision floating point rate in `Flops/s` -* `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`) -* `cpu_load`: The 1m load of the system (see `/proc/loadavg`) -* `mem_used`: The amount of memory used by applications (see `/proc/meminfo`) -* `ipc`: instructions-per-cycle metric -* `mem_bw`: Main memory bandwidth (read and write) in `MByte/s` -* `cpu_power`: Power consumption of the whole CPU package -* `mem_power`: Power consumption of the memory subsystem -* `clock`: CPU clock in `MHz` -* ... +- `flops_sp`: Single-precision floating point rate in `Flops/s` +- `flops_dp`: Double-precision floating point rate in `Flops/s` +- `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`) +- `cpu_load`: The 1m load of the system (see `/proc/loadavg`) +- `mem_used`: The amount of memory used by applications (see `/proc/meminfo`) +- `ipc`: instructions-per-cycle metric +- `mem_bw`: Main memory bandwidth (read and write) in `MByte/s` +- `cpu_power`: Power consumption of the whole CPU package +- `mem_power`: Power consumption of the memory subsystem +- `clock`: CPU clock in `MHz` +- ... For the whole list, see [job-data schema](../../datastructures/job-data.schema.json) @@ -121,8 +125,8 @@ TBD **Identification:** -* `control="X"` field with `"X"` being a string -* `method` tag is either `GET` or `PUT` +- `control="X"` field with `"X"` being a string +- `method` tag is either `GET` or `PUT` #### Logs