Update README.md, document API endpoints

Fixes #2
This commit is contained in:
Lou Knauer 2021-09-20 11:25:25 +02:00
parent 2046415f9c
commit 3b2ec98ba0
2 changed files with 34 additions and 18 deletions

View File

@ -2,11 +2,35 @@
[![Build & Test](https://github.com/ClusterCockpit/cc-metric-store/actions/workflows/test.yml/badge.svg)](https://github.com/ClusterCockpit/cc-metric-store/actions/workflows/test.yml) [![Build & Test](https://github.com/ClusterCockpit/cc-metric-store/actions/workflows/test.yml/badge.svg)](https://github.com/ClusterCockpit/cc-metric-store/actions/workflows/test.yml)
Barely unusable yet. Go look at the [GitHub Issues](https://github.com/ClusterCockpit/cc-metric-store/issues) for a progress overview. Go look at the `TODO.md` file and the [GitHub Issues](https://github.com/ClusterCockpit/cc-metric-store/issues) for a progress overview. Things work, but are not properly tested.
The [NATS.io](https://nats.io/) based writing endpoint consumes messages in [this format of the InfluxDB line protocol](https://github.com/ClusterCockpit/cc-specifications/blob/master/metrics/lineprotocol.md), but will change to another format in the future.
### REST API Endpoints ### REST API Endpoints
_TODO... (For now, look at the examples below)_ In case `jwt-public-key` is a non-empty string in the `config.json` file, the API is protected by JWT based authentication. The signing algorithm has to be `Ed25519`, but no
fields are required in the JWT payload. Expiration will be checked if specified. The JWT has to be provided using the HTTP `Authorization` header.
All but one endpoints use *selectors* to access the data. A selector must be an array of strings or another array of strings. Examples are provided below.
In the requests, `to` and `from` have to be UNIX timestamps in seconds. The response might also contain `from`/`to` timestamps. They can differ from those in the request,
if there was not data for a section of the requested data.
1. `POST /api/<from>/<to>/timeseries`
- Request-Body: `{ "selectors": [<sel1>, <sel2>, <sel3>, ...], "metrics": ["flops_any", "load_one", ...] }`
- The response will be a JSON array, each entry in the array corresponding to the selector found at that index in the request's `selectors` array
- Each array entry will be a map from every requested metric to this: `{ "from": Timestamp, "to": Timestamp, "data": Array of Floats }`
- Some values in `data` might be `null` if there is no data available for that time slot
2. `POST /api/<from>/<to>/stats`
- The Request-Body shall be the same as for a `timeseries` query
- The response will be a JSON array, each entry in the array corresponding to the selector found at that index in the request's `selectors` array
- Each array entry will be a map from every requested metric to this: `{ "from": Timestamp, "to": Timestamp, "samples": Int, "avg": Float, "min": Float, "max": Float }`
- If the `samples` value is 0, the statistics should be ignored.
3. `POST /api/<to>/free`
- Request-Body: Array of selectors
- This request will free up and release all data older than `to` for all nodes specified by the selectors
4. `GET /api/{cluster}/peek`
- Return a map from every node in the specified cluster to a map from every metric to the newest value available for that metric
- All cpu/socket level metrics are aggregated to the node level
### Run tests ### Run tests
@ -33,17 +57,10 @@ This project also works as a time-series database and uses the InfluxDB line pro
Unlike InfluxDB, the data is indexed by one single strictly hierarchical tree structure. Unlike InfluxDB, the data is indexed by one single strictly hierarchical tree structure.
A selector is build out of the tags in the InfluxDB line protocol, and can be used to select A selector is build out of the tags in the InfluxDB line protocol, and can be used to select
a node (not in the sense of a compute node, can also be a socket, cpu, ...) in that tree. a node (not in the sense of a compute node, can also be a socket, cpu, ...) in that tree.
The implementation calls those nodes `level` to avoid confusion. It is impossible to access data The implementation calls those nodes `level` to avoid confusion.
only by knowing the *socket* or *cpu* tag, all higher up levels have to be specified as well. It is impossible to access data only by knowing the *socket* or *cpu* tag, all higher up levels have to be specified as well.
Metrics have to be specified in advance! Those are taken from the *fields* of a line-protocol message. This is what the hierarchy currently looks like:
New levels will be created on the fly at any depth, meaning that the clusters, hosts, sockets, number of cpus,
and so on do *not* have to be known at startup. Every level can hold all kinds of metrics. If a level is asked for
metrics it does not have itself, *all* child-levels will be asked for their values for that metric and
the data will be aggregated on a per-timestep basis.
A.t.m., there is no way to specify which CPU belongs to which Socket, so the hierarchy within a node is flat. That
will probably change.
- cluster1 - cluster1
- host1 - host1
@ -60,6 +77,11 @@ will probably change.
- cluster2 - cluster2
- ... - ...
Example selectors:
1. `["cluster1", "host1", "cpu0"]`: Select only the cpu0 of host1 in cluster1
2. `["cluster1", "host1", ["cpu4", "cpu5", "cpu6", "cpu7"]]`: Select only CPUs 4-7 of host1 in cluster1
3. `["cluster1", "host1"]`: Select the complete node. If querying for a CPU-specific metric such as floats, all CPUs are implied
### Config file ### Config file
- `metrics`: Map of metric-name to objects with the following properties - `metrics`: Map of metric-name to objects with the following properties

View File

@ -6,13 +6,7 @@
- Check for corner cases that should fail gracefully - Check for corner cases that should fail gracefully
- Write a more realistic `ToArchive`/`FromArchive` tests - Write a more realistic `ToArchive`/`FromArchive` tests
- Test edgecases for horizontal aggregations - Test edgecases for horizontal aggregations
- Release Data
- Implement API endpoint for releasing old data
- Make sure data is written to disk before it is released
- Automatically free up old buffers periodically?
- Optimization: Once a buffer is full, calculate min, max and avg - Optimization: Once a buffer is full, calculate min, max and avg
- Calculate averages buffer-wise, average weighted by length of buffer - Calculate averages buffer-wise, average weighted by length of buffer
- Only the head-buffer needs to be fully traversed - Only the head-buffer needs to be fully traversed
- Implement basic support for query of most recent value for every metric on every host
- All metrics are known in advance, including the level: Use this to replace `level.metrics` hashmap by slice?
- ... - ...