mirror of
				https://github.com/ClusterCockpit/cc-specifications.git
				synced 2025-10-31 17:15:07 +01:00 
			
		
		
		
	Continue on lineprotocol ccMessage specs
This commit is contained in:
		| @@ -16,6 +16,9 @@ Where `<tag set>` and `<field set>` are comma-separated lists of `key=value` | ||||
| entries. In a mind-model, think about tags as `indices` in the database for | ||||
| faster lookup and the `<field set>` as values. | ||||
|  | ||||
| We are using the tag set to add metadata information and the field for the | ||||
| payload. | ||||
|  | ||||
| **Remark**: In the first iteration, we only sent metrics (number values) but we | ||||
| extended the specification to messages with different purposes. The below | ||||
| text was changed accordingly. The update is backward-compatible, for metrics | ||||
| @@ -25,10 +28,11 @@ text was changed accordingly. The update is backward-compatible, for metrics | ||||
|  | ||||
| There exist the following line line-protocol message flavors: | ||||
|  | ||||
| * Metric: The field key is `value` | ||||
| * Event: The field key is `event` | ||||
| * Log message: The field key is `log` | ||||
| * Control message: The field key is `control` | ||||
| - Metric: The field key is `value`, measurement = metric name | ||||
| - Event: The field key is `event`, Events are actionable informations, measurement = event subtype (job, phases, ?? ), Additional tag function=<string> | ||||
| - Log message: The field key is `log`. Log messages are purely informational, | ||||
|   measurement = [ccb, ccms, ccmc, ccem, ccnc], Additional tag loglevel | ||||
| - Control message: The field key is `control`, measurement = knob name (rapl, freq, prefetcher, topology, config), Additional tags: method=[get,put] | ||||
|  | ||||
| ## Messaging | ||||
|  | ||||
| @@ -45,10 +49,10 @@ subject hierarchy tree is used: | ||||
|                 | | ||||
|                 --- log.[ccb, ccms, ccmc, ccem, ccnc] | ||||
|                 | | ||||
|                 --- control | ||||
|                 --- control.[get,put] | ||||
| ``` | ||||
|  | ||||
| ## Metric messages | ||||
| ## Points generic for all message categories | ||||
|  | ||||
| In ClusterCockpit we limit the flexibility of the InfluxData line-protocol | ||||
| slightly. The idea is to keep the format usable by different components. | ||||
| @@ -58,17 +62,17 @@ Each message is identifiable by the `measurement` (= metric name), the | ||||
|  | ||||
| ### Mandatory tags per message | ||||
|  | ||||
| * `hostname` | ||||
| * `type` | ||||
|   * `node` | ||||
|   * `socket` | ||||
|   * `die` | ||||
|   * `memoryDomain` | ||||
|   * `llc` | ||||
|   * `core` | ||||
|   * `hwthread` | ||||
|   * `accelerator` | ||||
| * `type-id` for further specifying the type like CPU socket  or HW Thread identifier | ||||
| - `hostname` | ||||
| - `type` | ||||
|   - `node` | ||||
|   - `socket` | ||||
|   - `die` | ||||
|   - `memoryDomain` | ||||
|   - `llc` | ||||
|   - `core` | ||||
|   - `hwthread` | ||||
|   - `accelerator` | ||||
| - `type-id` for further specifying the type like CPU socket or HW Thread identifier | ||||
|  | ||||
| Although no `type-id` is required if `type=node`, it is recommended to send `type=node,type-id=0`. | ||||
|  | ||||
| @@ -93,17 +97,17 @@ While the measurements (metric names) can be chosen freely, there is a basic set | ||||
| of measurements which should be present as long as you navigate in the | ||||
| ClusterCockpit ecosystem | ||||
|  | ||||
| * `flops_sp`: Single-precision floating point rate in `Flops/s` | ||||
| * `flops_dp`: Double-precision floating point rate in `Flops/s` | ||||
| * `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`) | ||||
| * `cpu_load`: The 1m load of the system (see `/proc/loadavg`) | ||||
| * `mem_used`: The amount of memory used by applications (see `/proc/meminfo`) | ||||
| * `ipc`: instructions-per-cycle metric | ||||
| * `mem_bw`: Main memory bandwidth (read and write) in `MByte/s` | ||||
| * `cpu_power`: Power consumption of the whole CPU package | ||||
| * `mem_power`: Power consumption of the memory subsystem | ||||
| * `clock`: CPU clock in `MHz` | ||||
| * ... | ||||
| - `flops_sp`: Single-precision floating point rate in `Flops/s` | ||||
| - `flops_dp`: Double-precision floating point rate in `Flops/s` | ||||
| - `flops_any`: Combined floating point rate in `Flops/s` (often `(flops_dp * 2) + flops_sp`) | ||||
| - `cpu_load`: The 1m load of the system (see `/proc/loadavg`) | ||||
| - `mem_used`: The amount of memory used by applications (see `/proc/meminfo`) | ||||
| - `ipc`: instructions-per-cycle metric | ||||
| - `mem_bw`: Main memory bandwidth (read and write) in `MByte/s` | ||||
| - `cpu_power`: Power consumption of the whole CPU package | ||||
| - `mem_power`: Power consumption of the memory subsystem | ||||
| - `clock`: CPU clock in `MHz` | ||||
| - ... | ||||
|  | ||||
| For the whole list, see [job-data schema](../../datastructures/job-data.schema.json) | ||||
|  | ||||
| @@ -121,8 +125,8 @@ TBD | ||||
|  | ||||
| **Identification:** | ||||
|  | ||||
| * `control="X"` field with `"X"` being a string | ||||
| * `method` tag is either `GET` or `PUT` | ||||
| - `control="X"` field with `"X"` being a string | ||||
| - `method` tag is either `GET` or `PUT` | ||||
|  | ||||
| #### Logs | ||||
|  | ||||
|   | ||||
		Reference in New Issue
	
	Block a user