mirror of
https://github.com/ClusterCockpit/cc-metric-store.git
synced 2024-11-10 05:07:25 +01:00
parent
2046415f9c
commit
3b2ec98ba0
46
README.md
46
README.md
@ -2,11 +2,35 @@
|
||||
|
||||
[![Build & Test](https://github.com/ClusterCockpit/cc-metric-store/actions/workflows/test.yml/badge.svg)](https://github.com/ClusterCockpit/cc-metric-store/actions/workflows/test.yml)
|
||||
|
||||
Barely unusable yet. Go look at the [GitHub Issues](https://github.com/ClusterCockpit/cc-metric-store/issues) for a progress overview.
|
||||
Go look at the `TODO.md` file and the [GitHub Issues](https://github.com/ClusterCockpit/cc-metric-store/issues) for a progress overview. Things work, but are not properly tested.
|
||||
The [NATS.io](https://nats.io/) based writing endpoint consumes messages in [this format of the InfluxDB line protocol](https://github.com/ClusterCockpit/cc-specifications/blob/master/metrics/lineprotocol.md), but will change to another format in the future.
|
||||
|
||||
### REST API Endpoints
|
||||
|
||||
_TODO... (For now, look at the examples below)_
|
||||
In case `jwt-public-key` is a non-empty string in the `config.json` file, the API is protected by JWT based authentication. The signing algorithm has to be `Ed25519`, but no
|
||||
fields are required in the JWT payload. Expiration will be checked if specified. The JWT has to be provided using the HTTP `Authorization` header.
|
||||
|
||||
All but one endpoints use *selectors* to access the data. A selector must be an array of strings or another array of strings. Examples are provided below.
|
||||
|
||||
In the requests, `to` and `from` have to be UNIX timestamps in seconds. The response might also contain `from`/`to` timestamps. They can differ from those in the request,
|
||||
if there was not data for a section of the requested data.
|
||||
|
||||
1. `POST /api/<from>/<to>/timeseries`
|
||||
- Request-Body: `{ "selectors": [<sel1>, <sel2>, <sel3>, ...], "metrics": ["flops_any", "load_one", ...] }`
|
||||
- The response will be a JSON array, each entry in the array corresponding to the selector found at that index in the request's `selectors` array
|
||||
- Each array entry will be a map from every requested metric to this: `{ "from": Timestamp, "to": Timestamp, "data": Array of Floats }`
|
||||
- Some values in `data` might be `null` if there is no data available for that time slot
|
||||
2. `POST /api/<from>/<to>/stats`
|
||||
- The Request-Body shall be the same as for a `timeseries` query
|
||||
- The response will be a JSON array, each entry in the array corresponding to the selector found at that index in the request's `selectors` array
|
||||
- Each array entry will be a map from every requested metric to this: `{ "from": Timestamp, "to": Timestamp, "samples": Int, "avg": Float, "min": Float, "max": Float }`
|
||||
- If the `samples` value is 0, the statistics should be ignored.
|
||||
3. `POST /api/<to>/free`
|
||||
- Request-Body: Array of selectors
|
||||
- This request will free up and release all data older than `to` for all nodes specified by the selectors
|
||||
4. `GET /api/{cluster}/peek`
|
||||
- Return a map from every node in the specified cluster to a map from every metric to the newest value available for that metric
|
||||
- All cpu/socket level metrics are aggregated to the node level
|
||||
|
||||
### Run tests
|
||||
|
||||
@ -33,17 +57,10 @@ This project also works as a time-series database and uses the InfluxDB line pro
|
||||
Unlike InfluxDB, the data is indexed by one single strictly hierarchical tree structure.
|
||||
A selector is build out of the tags in the InfluxDB line protocol, and can be used to select
|
||||
a node (not in the sense of a compute node, can also be a socket, cpu, ...) in that tree.
|
||||
The implementation calls those nodes `level` to avoid confusion. It is impossible to access data
|
||||
only by knowing the *socket* or *cpu* tag, all higher up levels have to be specified as well.
|
||||
The implementation calls those nodes `level` to avoid confusion.
|
||||
It is impossible to access data only by knowing the *socket* or *cpu* tag, all higher up levels have to be specified as well.
|
||||
|
||||
Metrics have to be specified in advance! Those are taken from the *fields* of a line-protocol message.
|
||||
New levels will be created on the fly at any depth, meaning that the clusters, hosts, sockets, number of cpus,
|
||||
and so on do *not* have to be known at startup. Every level can hold all kinds of metrics. If a level is asked for
|
||||
metrics it does not have itself, *all* child-levels will be asked for their values for that metric and
|
||||
the data will be aggregated on a per-timestep basis.
|
||||
|
||||
A.t.m., there is no way to specify which CPU belongs to which Socket, so the hierarchy within a node is flat. That
|
||||
will probably change.
|
||||
This is what the hierarchy currently looks like:
|
||||
|
||||
- cluster1
|
||||
- host1
|
||||
@ -60,6 +77,11 @@ will probably change.
|
||||
- cluster2
|
||||
- ...
|
||||
|
||||
Example selectors:
|
||||
1. `["cluster1", "host1", "cpu0"]`: Select only the cpu0 of host1 in cluster1
|
||||
2. `["cluster1", "host1", ["cpu4", "cpu5", "cpu6", "cpu7"]]`: Select only CPUs 4-7 of host1 in cluster1
|
||||
3. `["cluster1", "host1"]`: Select the complete node. If querying for a CPU-specific metric such as floats, all CPUs are implied
|
||||
|
||||
### Config file
|
||||
|
||||
- `metrics`: Map of metric-name to objects with the following properties
|
||||
|
6
TODO.md
6
TODO.md
@ -6,13 +6,7 @@
|
||||
- Check for corner cases that should fail gracefully
|
||||
- Write a more realistic `ToArchive`/`FromArchive` tests
|
||||
- Test edgecases for horizontal aggregations
|
||||
- Release Data
|
||||
- Implement API endpoint for releasing old data
|
||||
- Make sure data is written to disk before it is released
|
||||
- Automatically free up old buffers periodically?
|
||||
- Optimization: Once a buffer is full, calculate min, max and avg
|
||||
- Calculate averages buffer-wise, average weighted by length of buffer
|
||||
- Only the head-buffer needs to be fully traversed
|
||||
- Implement basic support for query of most recent value for every metric on every host
|
||||
- All metrics are known in advance, including the level: Use this to replace `level.metrics` hashmap by slice?
|
||||
- ...
|
||||
|
Loading…
Reference in New Issue
Block a user