From 4594c99395b0a2884b26976659f57b53a98e70be Mon Sep 17 00:00:00 2001
From: Jan Eitzinger <jan@moebiusband.org>
Date: Thu, 20 Feb 2025 06:30:00 +0100
Subject: [PATCH] Initial complete draft of updated line protocol specs

---
 interfaces/lineprotocol/README.md | 79 +++++++++++++++++++++----------
 1 file changed, 53 insertions(+), 26 deletions(-)

diff --git a/interfaces/lineprotocol/README.md b/interfaces/lineprotocol/README.md
index 0d34791..77b6cd8 100644
--- a/interfaces/lineprotocol/README.md
+++ b/interfaces/lineprotocol/README.md
@@ -15,8 +15,9 @@ tags. All metrics/events have the following format (if written to `stdout`):
 Where `<tag set>` and `<field set>` are comma-separated lists of `key=value`
 entries. In a mind-model, think about tags as `indices` in the database for
 faster lookup and the `<field set>` as values.
+The timestamp is UNIX epoch time in seconds!
 
-We are using the tag set to add metadata information and the field for the
+We are using the tag set to add metadata information and one field for the
 payload.
 
 **Remark**: In the first iteration, we only sent metrics (number values) but we
@@ -26,15 +27,22 @@ text was changed accordingly. The update is backward-compatible, for metrics
 
 ## Message categories
 
-There exist the following line line-protocol message flavors:
+There are four line-protocol message flavors:
 
-- Metric: The field key is `value`, measurement = metric name
-- Event: The field key is `event`, Events are actionable informations, measurement = event subtype (job, phases, ?? ), Additional tag function=<string>
-- Log message: The field key is `log`. Log messages are purely informational,
-  measurement = [ccb, ccms, ccmc, ccem, ccnc], Additional tag loglevel
-- Control message: The field key is `control`, measurement = knob name (rapl, freq, prefetcher, topology, config), Additional tags: method=[get,put]
+- **Metric**: The `field` key is `value`, the `measurement` is the metric name
+- **Event**: The `field` key is `event`. Events are actionable informations. The
+`measurement` is set to an event class (job, slurm, status, phases, ?? ). Additional tag
+`function` to indicate the purpose, similar to a REST endpoint (for the job
+class this can be start_job and stop_job).
+- **Log**: The `field` key is `log`. Log messages are purely informational.
+  The `measurement` is set to the component identifier [ccb, ccms, ccmc, ccem,
+ccnc]. Additional tag `loglevel` to set the log level (debug, info, warn,
+error).
+- **Control**: The `field` key is `control`, the `measurement` is set to a
+control class (rapl, freq, prefetcher, topology, config). Additional tag
+`method` with on of [GET,PUT].
 
-## Messaging
+## Messaging subjects
 
 ClusterCockpit uses the NATS messaging network, with the option to support other
 messaging frameworks in the future. To distinguish between different message
@@ -45,19 +53,16 @@ subject hierarchy tree is used:
 <cluster name>. |
                 --- metrics
                 |
-                --- events.[job]
+                --- events.[job, slurm]
                 |
                 --- log.[ccb, ccms, ccmc, ccem, ccnc]
                 |
-                --- control.[get,put]
+                --- control.[get, put]
 ```
 
-## Points generic for all message categories
+## Rules valid for all message categories
 
-In ClusterCockpit we limit the flexibility of the InfluxData line-protocol
-slightly. The idea is to keep the format usable by different components.
-
-Each message is identifiable by the `measurement` (= metric name), the
+Each message is identifiable by the `measurement`, and the tags
 `hostname`, the `type` and, if required, a `type-id`.
 
 ### Mandatory tags per message
@@ -76,7 +81,7 @@ Each message is identifiable by the `measurement` (= metric name), the
 
 Although no `type-id` is required if `type=node`, it is recommended to send `type=node,type-id=0`.
 
-#### Optional tags depending on the message
+#### Optional tags depending on the message type
 
 In some cases, optional tags are required like `filesystem`, `device` or
 `version`. While you are free to do that, the ClusterCockpit components in the
@@ -109,25 +114,47 @@ ClusterCockpit ecosystem
 - `clock`: CPU clock in `MHz`
 - ...
 
+FIXME: What about the unit??
 For the whole list, see [job-data schema](../../datastructures/job-data.schema.json)
 
+Example:
+
+```txt
+flops_any,hostname=e1208,type=core,type-id=23 value=1203.3 1740027951
+```
+
 #### Events
 
-**Identification:** `event="X"` field with `"X"` being a string
-The name (measurement) of the event message can further specialize the purpose
-(similar to REST endpoints), e.g. `start_job`, and `stop_job` for events of type
-job.
+**Identification:** Field `event="X"` with `"X"` being the payload string.
+The name (measurement) of the event message indicates the event
+class. The function tag specifies the purpose (similar to REST endpoints), e.g.
+`start_job`, and `stop_job` for events of class job.
 
-Example start job event:
-TBD
+Example:
+
+```txt
+job,hostname=mngmt02,type=node,type-id=0,function=stop_job event={"jobId": 69, "cluster": "ccfront", "stopTime": 1738842306, "jobState": "completed"} 1740027951
+```
 
 #### Controls
 
-**Identification:**
+**Identification:** Field `control="X"` with `"X"` being the control request. `measurement` is
+set to a control class, the tag `method` is either `GET` or `PUT`.
 
-- `control="X"` field with `"X"` being a string
-- `method` tag is either `GET` or `PUT`
+Example:
+
+```txt
+rapl,hostname=e1208,type=socket,type-id=2,method=GET control=intel.pkg.energy_status 1740027951
+```
 
 #### Logs
 
-**Identification:** `log="X"` field with `"X"` being a string
+**Identification:** `log="X"` field with `"X"` being the log message. The `measurement` is
+set to source component id, the tag `loglevel` is one of debug, info, warn,
+error.
+
+Example:
+
+```txt
+ccb,hostname=server01,type=node,type-id=1,loglevel=info log="component: archiver cluster: alex jobId: 232383 - archiving finished" 1740027951
+```