diff --git a/README.md b/README.md index e6aeef3..1b7f062 100644 --- a/README.md +++ b/README.md @@ -9,32 +9,7 @@ The [NATS.io](https://nats.io/) based writing endpoint consumes messages in [thi ### REST API Endpoints -In case `jwt-public-key` is a non-empty string in the `config.json` file, the API is protected by JWT based authentication. The signing algorithm has to be `Ed25519`, but no -fields are required in the JWT payload. Expiration will be checked if specified. The JWT has to be provided using the HTTP `Authorization` header. - -All but one endpoints use *selectors* to access the data. A selector must be an array of strings or another array of strings. Examples are provided below. - -In the requests, `to` and `from` have to be UNIX timestamps in seconds. The response might also contain `from`/`to` timestamps. They can differ from those in the request, -if there was not data for a section of the requested data. - -1. `POST /api///timeseries` - - Request-Body: `{ "selectors": [, , , ...], "metrics": ["flops_any", "load_one", ...] }` - - The response will be a JSON array, each entry in the array corresponding to the selector found at that index in the request's `selectors` array - - Each array entry will be a map from every requested metric to this: `{ "from": Timestamp, "to": Timestamp, "data": Array of Floats }` - - Some values in `data` might be `null` if there is no data available for that time slot -2. `POST /api///stats` - - The Request-Body shall be the same as for a `timeseries` query - - The response will be a JSON array, each entry in the array corresponding to the selector found at that index in the request's `selectors` array - - Each array entry will be a map from every requested metric to this: `{ "from": Timestamp, "to": Timestamp, "samples": Int, "avg": Float, "min": Float, "max": Float }` - - If the `samples` value is 0, the statistics should be ignored. -3. `POST /api//free` - - Request-Body: Array of selectors - - This request will free up and release all data older than `to` for all nodes specified by the selectors -4. `GET /api/{cluster}/peek` - - Return a map from every node in the specified cluster to a map from every metric to the newest value available for that metric - - All cpu/socket level metrics are aggregated to the node level -5. `POST /api/write` - - You can send lines of the InfluxDB line protocol to this endpoint and they will be written to the store (Basically an alternative to NATS) +The REST API is documented in [openapi.yaml](./openapi.yaml) in the OpenAPI 3.0 format. ### Run tests @@ -51,7 +26,7 @@ go test -v ./... go test -bench=. -race -v ./... ``` -### What are these selectors mentioned in the code and API? +### What are these selectors mentioned in the code? Tags in InfluxDB are used to build indexes over the stored data. InfluxDB-Tags have no relation to each other, they do not depend on each other and have no hierarchy. @@ -149,23 +124,10 @@ And finally, use the API to fetch some data. The API is protected by JWT based a JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw" # If the collector and store and nats-server have been running for at least 60 seconds on the same host, you may run: -curl -H "Authorization: Bearer $JWT" -D - "http://localhost:8080/api/$(expr $(date +%s) - 60)/$(date +%s)/timeseries" -d "{ \"selectors\": [[\"testcluster\", \"$(hostname)\"]], \"metrics\": [\"load_one\"] }" - -# Get flops_any for all CPUs: -curl -H "Authorization: Bearer $JWT" -D - "http://localhost:8080/api/$(expr $(date +%s) - 60)/$(date +%s)/timeseries" -d "{ \"selectors\": [[\"testcluster\", \"$(hostname)\"]], \"metrics\": [\"flops_any\"] }" - -# Get flops_any for CPU 0: -curl -H "Authorization: Bearer $JWT" -D - "http://localhost:8080/api/$(expr $(date +%s) - 60)/$(date +%s)/timeseries" -d "{ \"selectors\": [[\"testcluster\", \"$(hostname)\", \"cpu0\"]], \"metrics\": [\"flops_any\"] }" - -# Get flops_any for CPU 0, 1, 2 and 3: -curl -H "Authorization: Bearer $JWT" -D - "http://localhost:8080/api/$(expr $(date +%s) - 60)/$(date +%s)/timeseries" -d "{ \"selectors\": [[\"testcluster\", \"$(hostname)\", [\"cpu0\", \"cpu1\", \"cpu2\", \"cpu3\"]]], \"metrics\": [\"flops_any\"] }" - -# Stats for load_one and proc_run: -curl -H "Authorization: Bearer $JWT" -D - "http://localhost:8080/api/$(expr $(date +%s) - 60)/$(date +%s)/stats" -d "{ \"selectors\": [[\"testcluster\", \"$(hostname)\"]], \"metrics\": [\"load_one\", \"proc_run\"] }" - -# Stats for *all* CPUs aggregated both from CPU to node and over time: -curl -H "Authorization: Bearer $JWT" -D - "http://localhost:8080/api/$(expr $(date +%s) - 60)/$(date +%s)/stats" -d "{ \"selectors\": [[\"testcluster\", \"$(hostname)\"]], \"metrics\": [\"flops_sp\", \"flops_dp\"] }" - +curl -H "Authorization: Bearer $JWT" -D - "http://localhost:8080/api/query" -d "{ \"cluster\": \"testcluster\", \"from\": $(expr $(date +%s) - 60), \"to\": $(date +%s), \"queries\": [{ + \"metric\": \"load_one\", + \"hostname\": \"$(hostname)\" +}] }" # ... ``` diff --git a/TODO.md b/TODO.md index a8f37a4..91b2eab 100644 --- a/TODO.md +++ b/TODO.md @@ -10,4 +10,5 @@ - Optimization: Once a buffer is full, calculate min, max and avg - Calculate averages buffer-wise, average weighted by length of buffer - Only the head-buffer needs to be fully traversed +- Optimization: If aggregating over hwthreads/cores/sockets cache those results and reuse some of that for new queres aggregating only over the newer data - ... diff --git a/openapi.yaml b/openapi.yaml new file mode 100644 index 0000000..f33e01d --- /dev/null +++ b/openapi.yaml @@ -0,0 +1,138 @@ +# OpenAPI spec describing a subset of the HTTP REST API for the cc-metric-store. + +openapi: 3.0.3 +info: + title: 'cc-metric-store REST API' + description: 'In-memory time series database for hpc metrics to be used with the [ClusterCockpit](https://github.com/ClusterCockpit) toolsuite' + version: 0.1.0 +paths: + '/api/write': + post: + operationId: 'writeMetrics' + description: 'Recieves metrics in the influx line-protocol using [this format](https://github.com/ClusterCockpit/cc-specifications/blob/master/metrics/lineprotocol_alternative.md)' + requestBody: + required: true + content: + 'text/plain': + example: + 'flops_any,cluster=emmy,hostname=e1001,type=cpu,type-id=0 value=42.0' + responses: + 200: + description: 'Everything went fine' + 400: + description: 'Bad Request' + '/api/query': + post: + operationId: 'queryMetrics' + description: 'Query metrics' + requestBody: + required: true + content: + 'application/json': + schema: + type: object + required: [cluster, from, to] + properties: + cluster: + type: string + from: + type: integer + to: + type: integer + with-stats: + type: boolean + default: true + with-data: + type: boolean + default: true + queries: + type: array + items: + $ref: '#/components/schemas/ApiQuery' + for-all-nodes: + description: 'If not null, add a new query for every known host on that cluster and every metric (at node-scope) specified in this array to the request. This can be used to get a metric for every host in a cluster without knowing the name of every host.' + type: array + items: + type: string + responses: + 200: + description: 'Requested data and stats as JSON' + content: + 'application/json': + schema: + description: 'Array where each element is a response to the query at that same index in the request' + type: array + items: + description: 'If `aggreg` is true, only ever has one element.' + type: array + items: + type: object + properties: + error: + description: 'If not null or undefined, an error happend processing that query' + type: string + nullable: true + data: + type: array + items: + type: number + nullable: true + avg: { type: number } + min: { type: number } + max: { type: number } + 400: + description: 'Bad Request' + '/api/free': + post: + operationId: 'freeBuffers' + description: 'Allow all buffers containing only data older than `to`' + parameters: + - name: to + in: query + description: 'Unix Timestamp' + required: true + schema: + type: integer + requestBody: + required: true + content: + 'application/json': + schema: + type: array + items: + type: array + items: + type: string + responses: + 200: + description: 'Everything went fine' + 400: + description: 'Bad Request' +components: + schemas: + ApiQuery: + description: 'A single query for a specific metric resulting in one series' + type: object + required: [metric, hostname, aggreg] + properties: + metirc: + type: string + hostname: + type: string + type: + description: 'Not required for node-level requests. Usually something like socket, cpu or hwthread.' + type: string + type-ids: + type: array + items: + type: integer + aggreg: + type: boolean + description: 'If true, every query result will have exactly one element. Otherwise, the data for every requested type-id/sub-type-id is provided seperately' + securitySchemes: + bearerAuth: + type: http + scheme: bearer + bearerFormat: JWT +security: + - bearerAuth: [] # Applies `bearerAuth` globally \ No newline at end of file