mirror of
https://github.com/ClusterCockpit/cc-specifications.git
synced 2025-03-14 19:05:55 +01:00
Initial complete draft of updated line protocol specs
This commit is contained in:
parent
917e17d9bf
commit
4594c99395
@ -15,8 +15,9 @@ tags. All metrics/events have the following format (if written to `stdout`):
|
|||||||
Where `<tag set>` and `<field set>` are comma-separated lists of `key=value`
|
Where `<tag set>` and `<field set>` are comma-separated lists of `key=value`
|
||||||
entries. In a mind-model, think about tags as `indices` in the database for
|
entries. In a mind-model, think about tags as `indices` in the database for
|
||||||
faster lookup and the `<field set>` as values.
|
faster lookup and the `<field set>` as values.
|
||||||
|
The timestamp is UNIX epoch time in seconds!
|
||||||
|
|
||||||
We are using the tag set to add metadata information and the field for the
|
We are using the tag set to add metadata information and one field for the
|
||||||
payload.
|
payload.
|
||||||
|
|
||||||
**Remark**: In the first iteration, we only sent metrics (number values) but we
|
**Remark**: In the first iteration, we only sent metrics (number values) but we
|
||||||
@ -26,15 +27,22 @@ text was changed accordingly. The update is backward-compatible, for metrics
|
|||||||
|
|
||||||
## Message categories
|
## Message categories
|
||||||
|
|
||||||
There exist the following line line-protocol message flavors:
|
There are four line-protocol message flavors:
|
||||||
|
|
||||||
- Metric: The field key is `value`, measurement = metric name
|
- **Metric**: The `field` key is `value`, the `measurement` is the metric name
|
||||||
- Event: The field key is `event`, Events are actionable informations, measurement = event subtype (job, phases, ?? ), Additional tag function=<string>
|
- **Event**: The `field` key is `event`. Events are actionable informations. The
|
||||||
- Log message: The field key is `log`. Log messages are purely informational,
|
`measurement` is set to an event class (job, slurm, status, phases, ?? ). Additional tag
|
||||||
measurement = [ccb, ccms, ccmc, ccem, ccnc], Additional tag loglevel
|
`function` to indicate the purpose, similar to a REST endpoint (for the job
|
||||||
- Control message: The field key is `control`, measurement = knob name (rapl, freq, prefetcher, topology, config), Additional tags: method=[get,put]
|
class this can be start_job and stop_job).
|
||||||
|
- **Log**: The `field` key is `log`. Log messages are purely informational.
|
||||||
|
The `measurement` is set to the component identifier [ccb, ccms, ccmc, ccem,
|
||||||
|
ccnc]. Additional tag `loglevel` to set the log level (debug, info, warn,
|
||||||
|
error).
|
||||||
|
- **Control**: The `field` key is `control`, the `measurement` is set to a
|
||||||
|
control class (rapl, freq, prefetcher, topology, config). Additional tag
|
||||||
|
`method` with on of [GET,PUT].
|
||||||
|
|
||||||
## Messaging
|
## Messaging subjects
|
||||||
|
|
||||||
ClusterCockpit uses the NATS messaging network, with the option to support other
|
ClusterCockpit uses the NATS messaging network, with the option to support other
|
||||||
messaging frameworks in the future. To distinguish between different message
|
messaging frameworks in the future. To distinguish between different message
|
||||||
@ -45,19 +53,16 @@ subject hierarchy tree is used:
|
|||||||
<cluster name>. |
|
<cluster name>. |
|
||||||
--- metrics
|
--- metrics
|
||||||
|
|
|
|
||||||
--- events.[job]
|
--- events.[job, slurm]
|
||||||
|
|
|
|
||||||
--- log.[ccb, ccms, ccmc, ccem, ccnc]
|
--- log.[ccb, ccms, ccmc, ccem, ccnc]
|
||||||
|
|
|
|
||||||
--- control.[get,put]
|
--- control.[get, put]
|
||||||
```
|
```
|
||||||
|
|
||||||
## Points generic for all message categories
|
## Rules valid for all message categories
|
||||||
|
|
||||||
In ClusterCockpit we limit the flexibility of the InfluxData line-protocol
|
Each message is identifiable by the `measurement`, and the tags
|
||||||
slightly. The idea is to keep the format usable by different components.
|
|
||||||
|
|
||||||
Each message is identifiable by the `measurement` (= metric name), the
|
|
||||||
`hostname`, the `type` and, if required, a `type-id`.
|
`hostname`, the `type` and, if required, a `type-id`.
|
||||||
|
|
||||||
### Mandatory tags per message
|
### Mandatory tags per message
|
||||||
@ -76,7 +81,7 @@ Each message is identifiable by the `measurement` (= metric name), the
|
|||||||
|
|
||||||
Although no `type-id` is required if `type=node`, it is recommended to send `type=node,type-id=0`.
|
Although no `type-id` is required if `type=node`, it is recommended to send `type=node,type-id=0`.
|
||||||
|
|
||||||
#### Optional tags depending on the message
|
#### Optional tags depending on the message type
|
||||||
|
|
||||||
In some cases, optional tags are required like `filesystem`, `device` or
|
In some cases, optional tags are required like `filesystem`, `device` or
|
||||||
`version`. While you are free to do that, the ClusterCockpit components in the
|
`version`. While you are free to do that, the ClusterCockpit components in the
|
||||||
@ -109,25 +114,47 @@ ClusterCockpit ecosystem
|
|||||||
- `clock`: CPU clock in `MHz`
|
- `clock`: CPU clock in `MHz`
|
||||||
- ...
|
- ...
|
||||||
|
|
||||||
|
FIXME: What about the unit??
|
||||||
For the whole list, see [job-data schema](../../datastructures/job-data.schema.json)
|
For the whole list, see [job-data schema](../../datastructures/job-data.schema.json)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```txt
|
||||||
|
flops_any,hostname=e1208,type=core,type-id=23 value=1203.3 1740027951
|
||||||
|
```
|
||||||
|
|
||||||
#### Events
|
#### Events
|
||||||
|
|
||||||
**Identification:** `event="X"` field with `"X"` being a string
|
**Identification:** Field `event="X"` with `"X"` being the payload string.
|
||||||
The name (measurement) of the event message can further specialize the purpose
|
The name (measurement) of the event message indicates the event
|
||||||
(similar to REST endpoints), e.g. `start_job`, and `stop_job` for events of type
|
class. The function tag specifies the purpose (similar to REST endpoints), e.g.
|
||||||
job.
|
`start_job`, and `stop_job` for events of class job.
|
||||||
|
|
||||||
Example start job event:
|
Example:
|
||||||
TBD
|
|
||||||
|
```txt
|
||||||
|
job,hostname=mngmt02,type=node,type-id=0,function=stop_job event={"jobId": 69, "cluster": "ccfront", "stopTime": 1738842306, "jobState": "completed"} 1740027951
|
||||||
|
```
|
||||||
|
|
||||||
#### Controls
|
#### Controls
|
||||||
|
|
||||||
**Identification:**
|
**Identification:** Field `control="X"` with `"X"` being the control request. `measurement` is
|
||||||
|
set to a control class, the tag `method` is either `GET` or `PUT`.
|
||||||
|
|
||||||
- `control="X"` field with `"X"` being a string
|
Example:
|
||||||
- `method` tag is either `GET` or `PUT`
|
|
||||||
|
```txt
|
||||||
|
rapl,hostname=e1208,type=socket,type-id=2,method=GET control=intel.pkg.energy_status 1740027951
|
||||||
|
```
|
||||||
|
|
||||||
#### Logs
|
#### Logs
|
||||||
|
|
||||||
**Identification:** `log="X"` field with `"X"` being a string
|
**Identification:** `log="X"` field with `"X"` being the log message. The `measurement` is
|
||||||
|
set to source component id, the tag `loglevel` is one of debug, info, warn,
|
||||||
|
error.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```txt
|
||||||
|
ccb,hostname=server01,type=node,type-id=1,loglevel=info log="component: archiver cluster: alex jobId: 232383 - archiving finished" 1740027951
|
||||||
|
```
|
||||||
|
Loading…
x
Reference in New Issue
Block a user