Update README

Entire-Checkpoint: dd6b5959d7c6
This commit is contained in:
2026-03-04 11:24:59 +01:00
parent 0fa5d8bb8d
commit 542f8371be

224
README.md
View File

@@ -5,18 +5,20 @@
The cc-metric-store provides a simple in-memory time series database for storing The cc-metric-store provides a simple in-memory time series database for storing
metrics of cluster nodes at preconfigured intervals. It is meant to be used as metrics of cluster nodes at preconfigured intervals. It is meant to be used as
part of the [ClusterCockpit suite](https://github.com/ClusterCockpit). As all part of the [ClusterCockpit suite](https://github.com/ClusterCockpit). As all
data is kept in-memory (but written to disk as compressed JSON for long term data is kept in-memory, accessing it is very fast. It also provides topology aware
storage), accessing it is very fast. It also provides topology aware
aggregations over time _and_ nodes/sockets/cpus. aggregations over time _and_ nodes/sockets/cpus.
There are major limitations: Data only gets written to disk at periodic There are major limitations: Data only gets written to disk at periodic
checkpoints, not as soon as it is received. Also only the fixed configured checkpoints (or via WAL on every write), not immediately as it is received.
duration is stored and available. Only the configured retention window is kept in memory.
Still metric data is kept as long as running jobs is using it.
Go look at the [GitHub The storage engine is provided by the
Issues](https://github.com/ClusterCockpit/cc-metric-store/issues) for a progress [cc-backend](https://github.com/ClusterCockpit/cc-backend) package
overview. The [NATS.io](https://nats.io/) based writing endpoint consumes messages in [this (`cc-backend/pkg/metricstore`). This repository provides the HTTP API wrapper.
format of the InfluxDB line
The [NATS.io](https://nats.io/) based writing endpoint and the HTTP write
endpoint both consume messages in [this format of the InfluxDB line
protocol](https://github.com/ClusterCockpit/cc-specifications/blob/master/metrics/lineprotocol_alternative.md). protocol](https://github.com/ClusterCockpit/cc-specifications/blob/master/metrics/lineprotocol_alternative.md).
## Building ## Building
@@ -24,22 +26,47 @@ protocol](https://github.com/ClusterCockpit/cc-specifications/blob/master/metric
`cc-metric-store` can be built using the provided `Makefile`. `cc-metric-store` can be built using the provided `Makefile`.
It supports the following targets: It supports the following targets:
- `make`: Build the application, copy a example configuration file and generate - `make`: Build the application, copy an example configuration file and generate
checkpoint folders if required. checkpoint folders if required.
- `make clean`: Clean the golang build cache and application binary - `make clean`: Clean the golang build cache and application binary
- `make distclean`: In addition to the clean target also remove the `./var` - `make distclean`: In addition to the clean target also remove the `./var`
folder folder and `config.json`
- `make swagger`: Regenerate the Swagger files from the source comments. - `make swagger`: Regenerate the Swagger files from the source comments.
- `make test`: Run test and basic checks. - `make test`: Run tests and basic checks (`go build`, `go vet`, `go test`).
## Running
```sh
./cc-metric-store # Uses ./config.json
./cc-metric-store -config /path/to/config.json
./cc-metric-store -dev # Enable Swagger UI at /swagger/
./cc-metric-store -loglevel debug # debug|info|warn (default)|err|crit
./cc-metric-store -logdate # Add date and time to log messages
./cc-metric-store -version # Show version information and exit
./cc-metric-store -gops # Enable gops agent for debugging
```
## REST API Endpoints ## REST API Endpoints
The REST API is documented in [swagger.json](./api/swagger.json). You can The REST API is documented in [swagger.json](./api/swagger.json). You can
explore and try the REST API using the integrated [SwaggerUI web explore and try the REST API using the integrated [SwaggerUI web
interface](http://localhost:8082/swagger). interface](http://localhost:8082/swagger/) (requires the `-dev` flag).
For more information on the `cc-metric-store` REST API have a look at the For more information on the `cc-metric-store` REST API have a look at the
ClusterCockpit documentation [website](https://clustercockpit.org/docs/reference/cc-metric-store/ccms-rest-api/) ClusterCockpit documentation [website](https://clustercockpit.org/docs/reference/cc-metric-store/ccms-rest-api/).
All endpoints support both trailing-slash and non-trailing-slash variants:
| Method | Path | Description |
| ------ | ------------------- | -------------------------------------- |
| `GET` | `/api/query/` | Query metrics with selectors |
| `POST` | `/api/write/` | Write metrics (InfluxDB line protocol) |
| `POST` | `/api/free/` | Free buffers up to a timestamp |
| `GET` | `/api/debug/` | Dump internal state |
| `GET` | `/api/healthcheck/` | Check node health status |
If `jwt-public-key` is set in `config.json`, all endpoints require JWT
authentication using an Ed25519 key (`Authorization: Bearer <token>`).
## Run tests ## Run tests
@@ -60,11 +87,11 @@ go test -bench=. -race -v ./...
The cc-metric-store works as a time-series database and uses the InfluxDB line The cc-metric-store works as a time-series database and uses the InfluxDB line
protocol as input format. Unlike InfluxDB, the data is indexed by one single protocol as input format. Unlike InfluxDB, the data is indexed by one single
strictly hierarchical tree structure. A selector is build out of the tags in the strictly hierarchical tree structure. A selector is built out of the tags in the
InfluxDB line protocol, and can be used to select a node (not in the sense of a InfluxDB line protocol, and can be used to select a node (not in the sense of a
compute node, can also be a socket, cpu, ...) in that tree. The implementation compute node, can also be a socket, cpu, ...) in that tree. The implementation
calls those nodes `level` to avoid confusion. It is impossible to access data calls those nodes `level` to avoid confusion. It is impossible to access data
only by knowing the _socket_ or _cpu_ tag, all higher up levels have to be only by knowing the _socket_ or _cpu_ tag all higher up levels have to be
specified as well. specified as well.
This is what the hierarchy currently looks like: This is what the hierarchy currently looks like:
@@ -90,18 +117,154 @@ Example selectors:
1. `["cluster1", "host1", "cpu0"]`: Select only the cpu0 of host1 in cluster1 1. `["cluster1", "host1", "cpu0"]`: Select only the cpu0 of host1 in cluster1
2. `["cluster1", "host1", ["cpu4", "cpu5", "cpu6", "cpu7"]]`: Select only CPUs 4-7 of host1 in cluster1 2. `["cluster1", "host1", ["cpu4", "cpu5", "cpu6", "cpu7"]]`: Select only CPUs 4-7 of host1 in cluster1
3. `["cluster1", "host1"]`: Select the complete node. If querying for a CPU-specific metric such as floats, all CPUs are implied 3. `["cluster1", "host1"]`: Select the complete node. If querying for a CPU-specific metric such as flops, all CPUs are implied
## Config file ## Config file
You find the configuration options on the ClusterCockpit [website](https://clustercockpit.org/docs/reference/cc-metric-store/ccms-configuration/). The config file is a JSON document with four top-level sections.
### `main`
```json
"main": {
"addr": "0.0.0.0:8082",
"https-cert-file": "",
"https-key-file": "",
"jwt-public-key": "<base64-encoded Ed25519 public key>",
"user": "",
"group": "",
"backend-url": ""
}
```
- `addr`: Address and port to listen on (default: `0.0.0.0:8082`)
- `https-cert-file` / `https-key-file`: Paths to TLS certificate/key for HTTPS
- `jwt-public-key`: Base64-encoded Ed25519 public key for JWT authentication. If empty, no auth is required.
- `user` / `group`: Drop privileges to this user/group after startup
- `backend-url`: Optional URL of a cc-backend instance used as node provider
### `metrics`
Per-metric configuration. Each key is the metric name:
```json
"metrics": {
"cpu_load": { "frequency": 60, "aggregation": null },
"flops_any": { "frequency": 60, "aggregation": "sum" },
"cpu_user": { "frequency": 60, "aggregation": "avg" }
}
```
- `frequency`: Sampling interval in seconds
- `aggregation`: How to aggregate sub-level data: `"sum"`, `"avg"`, or `null` (no aggregation)
### `metric-store`
```json
"metric-store": {
"checkpoints": {
"file-format": "wal",
"directory": "./var/checkpoints"
},
"memory-cap": 100,
"retention-in-memory": "24h",
"num-workers": 0,
"cleanup": {
"mode": "archive",
"directory": "./var/archive"
},
"nats-subscriptions": [
{ "subscribe-to": "hpc-nats", "cluster-tag": "fritz" }
]
}
```
- `checkpoints.file-format`: Checkpoint format: `"json"` (default, human-readable) or `"wal"` (binary WAL, crash-safe). See [Checkpoint formats](#checkpoint-formats) below.
- `checkpoints.directory`: Root directory for checkpoint files (organized as `<dir>/<cluster>/<host>/`)
- `memory-cap`: Approximate memory cap in MB for metric buffers
- `retention-in-memory`: How long to keep data in memory (e.g. `"48h"`)
- `num-workers`: Number of parallel workers for checkpoint/archive I/O (0 = auto, capped at 10)
- `cleanup.mode`: What to do with data older than `retention-in-memory`: `"archive"` (write Parquet) or `"delete"`
- `cleanup.directory`: Root directory for Parquet archive files (required when `mode` is `"archive"`)
- `nats-subscriptions`: List of NATS subjects to subscribe to, with associated cluster tag
### Checkpoint formats
The `checkpoints.file-format` field controls how in-memory data is persisted to disk.
**`"json"` (default)** — human-readable JSON snapshots written periodically. Each
snapshot is stored as `<dir>/<cluster>/<host>/<timestamp>.json` and contains the
full metric hierarchy. Easy to inspect and recover manually, but larger on disk
and slower to write.
**`"wal"`** — binary Write-Ahead Log format designed for crash safety. Two file
types are used per host:
- `current.wal` — append-only binary log. Every incoming data point is appended
immediately (magic `0xCC1DA7A1`, 4-byte CRC32 per record). Truncated trailing
records from unclean shutdowns are silently skipped on restart.
- `<timestamp>.bin` — binary snapshot written at each checkpoint interval
(magic `0xCC5B0001`). Contains the complete hierarchical metric state
column-by-column. Written atomically via a `.tmp` rename.
On startup the most recent `.bin` snapshot is loaded, then any remaining WAL
entries are replayed on top. The WAL is rotated (old file deleted, new one
started) after each successful snapshot.
The `"wal"` option is the default and will be the only supported option in the
future. The `"json"` checkpoint format is still provided to migrate from
previous cc-metric-store version.
### Parquet archive
When `cleanup.mode` is `"archive"`, data that ages out of the in-memory
retention window is written to [Apache Parquet](https://parquet.apache.org/)
files before being freed. Files are organized as:
```
<cleanup.directory>/
<cluster>/
<timestamp>.parquet
```
One Parquet file is produced per cluster per cleanup run, consolidating all
hosts. Rows use a long (tidy) schema:
| Column | Type | Description |
| ----------- | ------- | ----------------------------------------------------------------------- |
| `cluster` | string | Cluster name |
| `hostname` | string | Host name |
| `metric` | string | Metric name |
| `scope` | string | Hardware scope (`node`, `socket`, `core`, `hwthread`, `accelerator`, …) |
| `scope_id` | string | Numeric ID within the scope (e.g. `"0"`) |
| `timestamp` | int64 | Unix timestamp (seconds) |
| `frequency` | int64 | Sampling interval in seconds |
| `value` | float32 | Metric value |
Files are compressed with Zstandard and sorted by `(cluster, hostname, metric,
timestamp)` for efficient columnar reads. The `cpu` prefix in the tree is
treated as an alias for `hwthread` scope.
### `nats`
```json
"nats": {
"address": "nats://0.0.0.0:4222",
"username": "root",
"password": "root"
}
```
NATS connection is optional. If not configured, only the HTTP write endpoint is available.
For more information see the ClusterCockpit documentation [website](https://clustercockpit.org/docs/reference/cc-metric-store/ccms-configuration/).
## Test the complete setup (excluding cc-backend itself) ## Test the complete setup (excluding cc-backend itself)
There are two ways for sending data to the cc-metric-store, both of which are There are two ways for sending data to the cc-metric-store, both of which are
supported by the supported by the
[cc-metric-collector](https://github.com/ClusterCockpit/cc-metric-collector). [cc-metric-collector](https://github.com/ClusterCockpit/cc-metric-collector).
This example uses NATS, the alternative is to use HTTP. This example uses NATS; the alternative is to use HTTP.
```sh ```sh
# Only needed once, downloads the docker image # Only needed once, downloads the docker image
@@ -142,22 +305,25 @@ for testing:
```sh ```sh
JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw" JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"
# If the collector and store and nats-server have been running for at least 60 seconds on the same host, you may run: # If the collector and store and nats-server have been running for at least 60 seconds on the same host:
curl -H "Authorization: Bearer $JWT" -D - "http://localhost:8080/api/query" -d "{ \"cluster\": \"testcluster\", \"from\": $(expr $(date +%s) - 60), \"to\": $(date +%s), \"queries\": [{ curl -H "Authorization: Bearer $JWT" \
\"metric\": \"load_one\", "http://localhost:8082/api/query/" \
\"host\": \"$(hostname)\" -d '{
}] }" "cluster": "testcluster",
"from": '"$(expr $(date +%s) - 60)"',
# ... "to": '"$(date +%s)"',
"queries": [{ "metric": "cpu_load", "host": "'"$(hostname)"'" }]
}'
``` ```
For debugging there is a debug endpoint to dump the current content to stdout: For debugging, the debug endpoint dumps the current content to stdout:
```sh ```sh
JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw" JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"
# If the collector and store and nats-server have been running for at least 60 seconds on the same host, you may run: # Dump everything
curl -H "Authorization: Bearer $JWT" -D - "http://localhost:8080/api/debug" curl -H "Authorization: Bearer $JWT" "http://localhost:8082/api/debug/"
# ... # Dump a specific selector (colon-separated path)
curl -H "Authorization: Bearer $JWT" "http://localhost:8082/api/debug/?selector=testcluster:host1"
``` ```