From defa8fa994869592467fdf9e6e8b62b12220ea7c Mon Sep 17 00:00:00 2001 From: Jan Eitzinger Date: Mon, 23 Feb 2026 08:45:49 +0100 Subject: [PATCH] Update ReleaseNotes for v1.5.0 --- ReleaseNotes.md | 105 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 97 insertions(+), 8 deletions(-) diff --git a/ReleaseNotes.md b/ReleaseNotes.md index 7ea43620..1178ca8e 100644 --- a/ReleaseNotes.md +++ b/ReleaseNotes.md @@ -16,19 +16,25 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus - **Removed `disable-archive` option**: This obsolete configuration option has been removed. - **Removed `clusters` config section**: The separate clusters configuration section has been removed. Cluster information is now derived from the job archive. -- **`apiAllowedIPs` is now optional**: If not specified, defaults to secure settings. +- **`apiAllowedIPs` is now optional**: If not specified, defaults to not + restricted. ### Architecture changes +- **Web framework replaced**: Migrated from `gorilla/mux` to `chi` as the HTTP + router. This should be transparent to users but affects how middleware and + routes are composed. A proper 404 handler is now in place. - **MetricStore moved**: The `metricstore` package has been moved from `internal/` to `pkg/` as it is now part of the public API. - **MySQL/MariaDB support removed**: Only SQLite is now supported as the database backend. - **Archive to Cleanup renaming**: Archive-related functions have been refactored and renamed to "Cleanup" for clarity. +- **`minRunningFor` filter removed**: This undocumented filter has been removed + from the API and frontend. ### Dependency changes -- **cc-lib v2**: Switched to cc-lib version 2 with updated APIs +- **cc-lib v2.5.1**: Switched to cc-lib version 2 with updated APIs (currently at v2.5.1) - **cclib NATS client**: Now using the cclib NATS client implementation - Removed obsolete `util.Float` usage from cclib @@ -51,13 +57,30 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus - **Node state tracking**: New node table in database with timestamp tracking - **Node state filtering**: Filter jobs by node state in systems view -- **Node metrics improvements**: Better handling of node-level metrics and data - **Node list enhancements**: Improved paging, filtering, and continuous scroll support +- **Nodestate retention and archiving**: Node state data is now subject to configurable + retention policies and can be archived to Parquet format for long-term storage +- **Faulty node metric tracking**: Faulty node state metric lists are persisted to the database + +### Health Monitoring + +- **Health status dashboard**: New dedicated "Health" tab in the status details view + showing per-node metric health across the cluster +- **CCMS health check**: Support for querying health status of external + cc-metric-store (CCMS) instances via the API +- **GraphQL health endpoints**: New GraphQL queries and resolvers for health data +- **Cluster/subcluster filter**: Filter health status view by cluster or subcluster + +### Log Viewer + +- **Web-based log viewer**: New log viewer page in the admin interface for inspecting + backend log output directly from the browser without shell access +- **Accessible from header**: Quick access link from the navigation header ### MetricStore Improvements - **Memory tracking worker**: New worker for CCMS memory usage tracking -- **Dynamic retention**: Support for cluster/subcluster-specific retention times +- **Dynamic retention**: Support for job specific dynamic retention times - **Improved compression**: Transparent compression for job archive imports - **Parallel processing**: Parallelized Iter function in all archive backends @@ -65,15 +88,32 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus - **Job tagger option**: Enable automatic job tagging via configuration flag - **Application detection**: Automatic detection of applications (MATLAB, GROMACS, etc.) -- **Job classifaction**: Automatic detection of pathological jobs +- **Job classification**: Automatic detection of pathological jobs - **omitTagged flag**: Option to exclude tagged jobs from retention/cleanup operations +- **Admin UI trigger**: Taggers can be run on-demand from the admin web interface + without restarting the backend ### Archive Backends +- **Parquet archive format**: New Parquet file format for job archiving, providing + columnar storage with efficient compression for analytical workloads - **S3 backend**: Full support for S3-compatible object storage - **SQLite backend**: Full support for SQLite backend using blobs - **Performance improvements**: Fixed performance bugs in archive backends - **Better error handling**: Improved error messages and fallback handling +- **Zstd compression**: Parquet writers use zstd compression for better + compression ratios compared to the previous snappy default +- **Optimized sort order**: Job and nodestate Parquet files are sorted by + cluster, subcluster, and start time for efficient range queries + +### Unified Archive Retention and Format Conversion + +- **Uniform retention policy**: Job archive retention now supports both JSON and + Parquet as target formats under a single, consistent policy configuration +- **Archive manager tool**: The `tools/archive-manager` utility now supports + format conversion between JSON and Parquet job archives +- **Parquet reader**: Full Parquet archive reader implementation for reading back + archived job data ## New features and improvements @@ -85,6 +125,14 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus - **Filter presets**: Move list filter preset to URL for easy sharing - **Job comparison**: Improved job comparison views and plots - **Subcluster reactivity**: Job list now reacts to subcluster filter changes +- **Short jobs quick selection**: New "Short jobs" quick-filter button in job lists + replaces the removed undocumented `minRunningFor` filter +- **Row plot cursor sync**: Cursor position is now synchronized across all metric + plots in a job list row for easier cross-metric comparison +- **Disabled metrics handling**: Improved handling and display of disabled metrics + across job view, node view, and list rows +- **"Not configured" info cards**: Informational cards shown when optional features + are not yet configured - **Frontend dependencies**: Bumped frontend dependencies to latest versions - **Svelte 5 compatibility**: Fixed Svelte state warnings and compatibility issues @@ -95,6 +143,15 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus - **Graceful shutdown**: Fixed shutdown timeout bugs and hanging issues - **Configuration defaults**: Sensible defaults for most configuration options - **Documentation**: Extensive documentation improvements across packages +- **Server flag in systemd unit**: Example systemd unit now includes the `-server` flag + +### Security + +- **LDAP security hardening**: Improved input validation, connection handling, and + error reporting in the LDAP authenticator +- **OIDC security hardening**: Stricter token validation and improved error handling + in the OIDC authenticator +- **Auth schema extensions**: Additional schema fields for improved auth configuration ### API improvements @@ -102,6 +159,8 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus - **Job exclusivity filter**: New filter for exclusive vs. shared jobs - **Improved error messages**: Better error messages and documentation in REST API - **GraphQL enhancements**: Improved GraphQL queries and resolvers +- **Stop job lookup order**: Reversed lookup order in stop job requests for + more reliable job matching (cluster+jobId first, then jobId alone) ### Performance @@ -109,13 +168,17 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus - **Job cache**: Introduced caching table for faster job inserts - **Parallel imports**: Archive imports now run in parallel where possible - **External tool integration**: Optimized use of external tools (fd) for better performance +- **Node repository queries**: Reviewed and optimized node repository SQL queries +- **Buffer pool**: Resized and pooled internal buffers for better memory reuse ### Developer experience - **AI agent guidelines**: Added documentation for AI coding agents (AGENTS.md, CLAUDE.md) - **Example API payloads**: Added example JSON API payloads for testing -- **Unit tests**: Added more unit tests for NATS API and other components -- **Test improvements**: Better test coverage and test data +- **Unit tests**: Added more unit tests for NATS API, node repository, and other components +- **Test improvements**: Better test coverage; test DB is now copied before unit tests + to avoid state pollution between test runs +- **Parquet writer tests**: Comprehensive tests for Parquet archive writing and conversion ## Bug fixes @@ -132,6 +195,16 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus - Fixed polar plot data query decoupling - Fixed missing resolution parameter handling - Fixed node table initialization fallback +- Fixed reactivity key placement in nodeList +- Fixed nodeList resolver data handling and increased nodestate filter cutoff +- Fixed job always being transferred to main job table before archiving +- Fixed AppTagger error handling and logging +- Fixed log endpoint formatting and correctness +- Fixed automatic refresh in metric status tab +- Fixed NULL value handling in `health_state` and `health_metrics` columns +- Fixed bugs related to `job_cache` IDs being used in the main job table +- Fixed SyncJobs bug causing start job hooks to be called with wrong (cache) IDs +- Fixed 404 handler route for sub-routers ## Configuration changes @@ -167,6 +240,20 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus "interval": "48h", "directory": "./var/archive" } + }, + "archive": { + "retention": { + "policy": "delete", + "age": "6months", + "target-format": "parquet" + } + }, + "nodestate": { + "retention": { + "policy": "archive", + "age": "30d", + "archive-path": "./var/nodestate-archive" + } } } ``` @@ -178,11 +265,13 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus - If using S3 archive backend, configure the new `archive` section options - Test the new public dashboard at `/public` route - Review cron worker configuration if you need different frequencies +- If using the archive retention feature, configure the `target-format` option + to choose between `json` (default) and `parquet` output formats +- Consider enabling nodestate retention if you track node states over time ## Known issues - Currently energy footprint metrics of type energy are ignored for calculating total energy. -- Resampling for running jobs only works with cc-metric-store - With energy footprint metrics of type power the unit is ignored and it is assumed the metric has the unit Watt.