diff --git a/Makefile b/Makefile index 6e8c538..c351c7e 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ TARGET = ./cc-metric-store VAR = ./var/checkpoints/ -VERSION = 1.5.0 +VERSION = 1.5.3 GIT_HASH := $(shell git rev-parse --short HEAD || echo 'development') CURRENT_TIME = $(shell date +"%Y-%m-%d:T%H:%M:%S") LD_FLAGS = '-s -X main.date=${CURRENT_TIME} -X main.version=${VERSION} -X main.commit=${GIT_HASH}' diff --git a/ReleaseNotes.md b/ReleaseNotes.md index acb2144..1a8822a 100644 --- a/ReleaseNotes.md +++ b/ReleaseNotes.md @@ -1,11 +1,49 @@ -# `cc-metric-store` version 1.5.0 +# `cc-metric-store` version 1.5.3 -This is a major release of `cc-metric-store`, the metric timeseries cache +This is a bugfix release of `cc-metric-store`, the metric timeseries cache implementation of ClusterCockpit. Since the storage engine is now part of `cc-backend` we will follow the version number of `cc-backend`. For release specific notes visit the [ClusterCockpit Documentation](https://clusterockpit.org/docs/release/). -## Breaking changes +## Notable changes + +- **`-cleanup-checkpoints` CLI flag**: New flag triggers checkpoint cleanup + (delete or archive to Parquet) based on the configured retention and cleanup + settings, then exits. Useful for one-off maintenance without starting the full + server. +- **GC initialised before checkpoint load**: `GOGC=15` is now set before + `metricstore.Init` so the garbage-collector baseline is established prior to + the largest allocation event (loading checkpoints from disk), reducing + unnecessary heap growth at startup. +- **Dependency upgrades**: `cc-backend` updated from v1.5.0 to v1.5.3; + `cc-lib` updated from v2.8.0 to v2.11.0; `nats.go` bumped from v1.49.0 to + v1.50.0; `parquet-go` bumped from v0.28.0 to v0.29.0; various other module + upgrades. + +## Metricstore package fixes (cc-backend v1.5.0 → v1.5.3) + +The following fixes landed in the upstream `cc-backend/pkg/metricstore` package +and are included via the dependency upgrade: + +- **WAL correctness**: Fixed WAL rotation being skipped for all nodes due to a + non-blocking send on a too-small channel; fixed unbound growth of WAL files + when a checkpointing error occurs; fixed bugs in the WAL journal pipeline. +- **WAL throughput**: Sharded the WAL consumer for higher write throughput; added + buffered I/O to WAL writes. +- **Checkpoint stability**: Paused WAL writes during binary checkpoint creation + to prevent message drops; restructured cleanup archiving to stay within the + 32 k row limit of `parquet-go`. +- **Memory**: Fixed a memory explosion caused by broken emergency-free and batch + aborts; reduced memory usage in the Parquet checkpoint archiver; fixed + preventing memory spikes in the Parquet writer during the move/archive policy. +- **NATS**: Fixed blocking `ReceiveNats` call; fixed NATS contention under load. +- **Shutdown**: Increased shutdown timeouts; added WAL flush interval tuning; + added shutdown timing logs. +- **Observability**: Added verbose logs for `DataDoesNotAlign` errors; reduced + noise by demoting a missing-metric warning to debug level. +- **Configuration**: Restored `checkpointInterval` as an optional config key. + +## Breaking changes (from v1.4.x) - The internal `memorystore`, `avro`, `resampler`, and `util` packages have been removed. The storage engine is now provided by the @@ -14,22 +52,3 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus only. - The configuration schema has changed. Refer to `configs/config.json` for the updated structure. - -## Notable changes - -- **Storage engine extracted to `cc-backend` library**: The entire in-memory - time-series storage engine was moved to `cc-backend/pkg/metricstore`. This - reduces duplication in the ClusterCockpit suite and enables shared maintenance - of the storage layer. -- **HealthCheck API endpoint**: New `GET /api/healthcheck/` endpoint reports the - health status of cluster nodes. -- **Dynamic memory management**: Memory limits can now be adjusted at runtime via - a callback from the `cc-backend` library. -- **Configuration schema validation**: The config and metric config JSON schemas - have been updated and are now validated against the structs they describe. -- **Startup refactored**: Application startup has been split into `cli.go` and - `server.go` for clearer separation of concerns. -- **`go fix` applied**: Codebase updated to current Go idioms. -- **Dependency upgrades**: `nats.go` bumped from 1.36.0 to 1.47.0; - `cc-lib` updated to v2.8.0; `cc-backend` updated to v1.5.0; various other - module upgrades.