mirror of
https://github.com/ClusterCockpit/cc-specifications.git
synced 2025-03-14 19:05:55 +01:00
Continue on lineprotocol ccMessage specs
This commit is contained in:
parent
6205680fb8
commit
917e17d9bf
@ -16,6 +16,9 @@ Where `<tag set>` and `<field set>` are comma-separated lists of `key=value`
|
||||
entries. In a mind-model, think about tags as `indices` in the database for
|
||||
faster lookup and the `<field set>` as values.
|
||||
|
||||
We are using the tag set to add metadata information and the field for the
|
||||
payload.
|
||||
|
||||
**Remark**: In the first iteration, we only sent metrics (number values) but we
|
||||
extended the specification to messages with different purposes. The below
|
||||
text was changed accordingly. The update is backward-compatible, for metrics
|
||||
@ -25,10 +28,11 @@ text was changed accordingly. The update is backward-compatible, for metrics
|
||||
|
||||
There exist the following line line-protocol message flavors:
|
||||
|
||||
* Metric: The field key is `value`
|
||||
* Event: The field key is `event`
|
||||
* Log message: The field key is `log`
|
||||
* Control message: The field key is `control`
|
||||
- Metric: The field key is `value`, measurement = metric name
|
||||
- Event: The field key is `event`, Events are actionable informations, measurement = event subtype (job, phases, ?? ), Additional tag function=<string>
|
||||
- Log message: The field key is `log`. Log messages are purely informational,
|
||||
measurement = [ccb, ccms, ccmc, ccem, ccnc], Additional tag loglevel
|
||||
- Control message: The field key is `control`, measurement = knob name (rapl, freq, prefetcher, topology, config), Additional tags: method=[get,put]
|
||||
|
||||
## Messaging
|
||||
|
||||
@ -45,10 +49,10 @@ subject hierarchy tree is used:
|
||||
|
|
||||
--- log.[ccb, ccms, ccmc, ccem, ccnc]
|
||||
|
|
||||
--- control
|
||||
--- control.[get,put]
|
||||
```
|
||||
|
||||
## Metric messages
|
||||
## Points generic for all message categories
|
||||
|
||||
In ClusterCockpit we limit the flexibility of the InfluxData line-protocol
|
||||
slightly. The idea is to keep the format usable by different components.
|
||||
@ -58,17 +62,17 @@ Each message is identifiable by the `measurement` (= metric name), the
|
||||
|
||||
### Mandatory tags per message
|
||||
|
||||
* `hostname`
|
||||
* `type`
|
||||
* `node`
|
||||
* `socket`
|
||||
* `die`
|
||||
* `memoryDomain`
|
||||
* `llc`
|
||||
* `core`
|
||||
* `hwthread`
|
||||
* `accelerator`
|
||||
* `type-id` for further specifying the type like CPU socket or HW Thread identifier
|
||||
- `hostname`
|
||||
- `type`
|
||||
- `node`
|
||||
- `socket`
|
||||
- `die`
|
||||
- `memoryDomain`
|
||||
- `llc`
|
||||
- `core`
|
||||
- `hwthread`
|
||||
- `accelerator`
|
||||
- `type-id` for further specifying the type like CPU socket or HW Thread identifier
|
||||
|
||||
Although no `type-id` is required if `type=node`, it is recommended to send `type=node,type-id=0`.
|
||||
|
||||
@ -93,17 +97,17 @@ While the measurements (metric names) can be chosen freely, there is a basic set
|
||||
of measurements which should be present as long as you navigate in the
|
||||
ClusterCockpit ecosystem
|
||||
|
||||
* `flops_sp`: Single-precision floating point rate in `Flops/s`
|
||||
* `flops_dp`: Double-precision floating point rate in `Flops/s`
|
||||
* `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`)
|
||||
* `cpu_load`: The 1m load of the system (see `/proc/loadavg`)
|
||||
* `mem_used`: The amount of memory used by applications (see `/proc/meminfo`)
|
||||
* `ipc`: instructions-per-cycle metric
|
||||
* `mem_bw`: Main memory bandwidth (read and write) in `MByte/s`
|
||||
* `cpu_power`: Power consumption of the whole CPU package
|
||||
* `mem_power`: Power consumption of the memory subsystem
|
||||
* `clock`: CPU clock in `MHz`
|
||||
* ...
|
||||
- `flops_sp`: Single-precision floating point rate in `Flops/s`
|
||||
- `flops_dp`: Double-precision floating point rate in `Flops/s`
|
||||
- `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`)
|
||||
- `cpu_load`: The 1m load of the system (see `/proc/loadavg`)
|
||||
- `mem_used`: The amount of memory used by applications (see `/proc/meminfo`)
|
||||
- `ipc`: instructions-per-cycle metric
|
||||
- `mem_bw`: Main memory bandwidth (read and write) in `MByte/s`
|
||||
- `cpu_power`: Power consumption of the whole CPU package
|
||||
- `mem_power`: Power consumption of the memory subsystem
|
||||
- `clock`: CPU clock in `MHz`
|
||||
- ...
|
||||
|
||||
For the whole list, see [job-data schema](../../datastructures/job-data.schema.json)
|
||||
|
||||
@ -121,8 +125,8 @@ TBD
|
||||
|
||||
**Identification:**
|
||||
|
||||
* `control="X"` field with `"X"` being a string
|
||||
* `method` tag is either `GET` or `PUT`
|
||||
- `control="X"` field with `"X"` being a string
|
||||
- `method` tag is either `GET` or `PUT`
|
||||
|
||||
#### Logs
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user