mirror of
https://github.com/ClusterCockpit/cc-backend
synced 2026-02-24 11:27:30 +01:00
Update ReleaseNotes for v1.5.0
This commit is contained in:
105
ReleaseNotes.md
105
ReleaseNotes.md
@@ -16,19 +16,25 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus
|
|||||||
- **Removed `disable-archive` option**: This obsolete configuration option has been removed.
|
- **Removed `disable-archive` option**: This obsolete configuration option has been removed.
|
||||||
- **Removed `clusters` config section**: The separate clusters configuration section
|
- **Removed `clusters` config section**: The separate clusters configuration section
|
||||||
has been removed. Cluster information is now derived from the job archive.
|
has been removed. Cluster information is now derived from the job archive.
|
||||||
- **`apiAllowedIPs` is now optional**: If not specified, defaults to secure settings.
|
- **`apiAllowedIPs` is now optional**: If not specified, defaults to not
|
||||||
|
restricted.
|
||||||
|
|
||||||
### Architecture changes
|
### Architecture changes
|
||||||
|
|
||||||
|
- **Web framework replaced**: Migrated from `gorilla/mux` to `chi` as the HTTP
|
||||||
|
router. This should be transparent to users but affects how middleware and
|
||||||
|
routes are composed. A proper 404 handler is now in place.
|
||||||
- **MetricStore moved**: The `metricstore` package has been moved from `internal/`
|
- **MetricStore moved**: The `metricstore` package has been moved from `internal/`
|
||||||
to `pkg/` as it is now part of the public API.
|
to `pkg/` as it is now part of the public API.
|
||||||
- **MySQL/MariaDB support removed**: Only SQLite is now supported as the database backend.
|
- **MySQL/MariaDB support removed**: Only SQLite is now supported as the database backend.
|
||||||
- **Archive to Cleanup renaming**: Archive-related functions have been refactored
|
- **Archive to Cleanup renaming**: Archive-related functions have been refactored
|
||||||
and renamed to "Cleanup" for clarity.
|
and renamed to "Cleanup" for clarity.
|
||||||
|
- **`minRunningFor` filter removed**: This undocumented filter has been removed
|
||||||
|
from the API and frontend.
|
||||||
|
|
||||||
### Dependency changes
|
### Dependency changes
|
||||||
|
|
||||||
- **cc-lib v2**: Switched to cc-lib version 2 with updated APIs
|
- **cc-lib v2.5.1**: Switched to cc-lib version 2 with updated APIs (currently at v2.5.1)
|
||||||
- **cclib NATS client**: Now using the cclib NATS client implementation
|
- **cclib NATS client**: Now using the cclib NATS client implementation
|
||||||
- Removed obsolete `util.Float` usage from cclib
|
- Removed obsolete `util.Float` usage from cclib
|
||||||
|
|
||||||
@@ -51,13 +57,30 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus
|
|||||||
|
|
||||||
- **Node state tracking**: New node table in database with timestamp tracking
|
- **Node state tracking**: New node table in database with timestamp tracking
|
||||||
- **Node state filtering**: Filter jobs by node state in systems view
|
- **Node state filtering**: Filter jobs by node state in systems view
|
||||||
- **Node metrics improvements**: Better handling of node-level metrics and data
|
|
||||||
- **Node list enhancements**: Improved paging, filtering, and continuous scroll support
|
- **Node list enhancements**: Improved paging, filtering, and continuous scroll support
|
||||||
|
- **Nodestate retention and archiving**: Node state data is now subject to configurable
|
||||||
|
retention policies and can be archived to Parquet format for long-term storage
|
||||||
|
- **Faulty node metric tracking**: Faulty node state metric lists are persisted to the database
|
||||||
|
|
||||||
|
### Health Monitoring
|
||||||
|
|
||||||
|
- **Health status dashboard**: New dedicated "Health" tab in the status details view
|
||||||
|
showing per-node metric health across the cluster
|
||||||
|
- **CCMS health check**: Support for querying health status of external
|
||||||
|
cc-metric-store (CCMS) instances via the API
|
||||||
|
- **GraphQL health endpoints**: New GraphQL queries and resolvers for health data
|
||||||
|
- **Cluster/subcluster filter**: Filter health status view by cluster or subcluster
|
||||||
|
|
||||||
|
### Log Viewer
|
||||||
|
|
||||||
|
- **Web-based log viewer**: New log viewer page in the admin interface for inspecting
|
||||||
|
backend log output directly from the browser without shell access
|
||||||
|
- **Accessible from header**: Quick access link from the navigation header
|
||||||
|
|
||||||
### MetricStore Improvements
|
### MetricStore Improvements
|
||||||
|
|
||||||
- **Memory tracking worker**: New worker for CCMS memory usage tracking
|
- **Memory tracking worker**: New worker for CCMS memory usage tracking
|
||||||
- **Dynamic retention**: Support for cluster/subcluster-specific retention times
|
- **Dynamic retention**: Support for job specific dynamic retention times
|
||||||
- **Improved compression**: Transparent compression for job archive imports
|
- **Improved compression**: Transparent compression for job archive imports
|
||||||
- **Parallel processing**: Parallelized Iter function in all archive backends
|
- **Parallel processing**: Parallelized Iter function in all archive backends
|
||||||
|
|
||||||
@@ -65,15 +88,32 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus
|
|||||||
|
|
||||||
- **Job tagger option**: Enable automatic job tagging via configuration flag
|
- **Job tagger option**: Enable automatic job tagging via configuration flag
|
||||||
- **Application detection**: Automatic detection of applications (MATLAB, GROMACS, etc.)
|
- **Application detection**: Automatic detection of applications (MATLAB, GROMACS, etc.)
|
||||||
- **Job classifaction**: Automatic detection of pathological jobs
|
- **Job classification**: Automatic detection of pathological jobs
|
||||||
- **omitTagged flag**: Option to exclude tagged jobs from retention/cleanup operations
|
- **omitTagged flag**: Option to exclude tagged jobs from retention/cleanup operations
|
||||||
|
- **Admin UI trigger**: Taggers can be run on-demand from the admin web interface
|
||||||
|
without restarting the backend
|
||||||
|
|
||||||
### Archive Backends
|
### Archive Backends
|
||||||
|
|
||||||
|
- **Parquet archive format**: New Parquet file format for job archiving, providing
|
||||||
|
columnar storage with efficient compression for analytical workloads
|
||||||
- **S3 backend**: Full support for S3-compatible object storage
|
- **S3 backend**: Full support for S3-compatible object storage
|
||||||
- **SQLite backend**: Full support for SQLite backend using blobs
|
- **SQLite backend**: Full support for SQLite backend using blobs
|
||||||
- **Performance improvements**: Fixed performance bugs in archive backends
|
- **Performance improvements**: Fixed performance bugs in archive backends
|
||||||
- **Better error handling**: Improved error messages and fallback handling
|
- **Better error handling**: Improved error messages and fallback handling
|
||||||
|
- **Zstd compression**: Parquet writers use zstd compression for better
|
||||||
|
compression ratios compared to the previous snappy default
|
||||||
|
- **Optimized sort order**: Job and nodestate Parquet files are sorted by
|
||||||
|
cluster, subcluster, and start time for efficient range queries
|
||||||
|
|
||||||
|
### Unified Archive Retention and Format Conversion
|
||||||
|
|
||||||
|
- **Uniform retention policy**: Job archive retention now supports both JSON and
|
||||||
|
Parquet as target formats under a single, consistent policy configuration
|
||||||
|
- **Archive manager tool**: The `tools/archive-manager` utility now supports
|
||||||
|
format conversion between JSON and Parquet job archives
|
||||||
|
- **Parquet reader**: Full Parquet archive reader implementation for reading back
|
||||||
|
archived job data
|
||||||
|
|
||||||
## New features and improvements
|
## New features and improvements
|
||||||
|
|
||||||
@@ -85,6 +125,14 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus
|
|||||||
- **Filter presets**: Move list filter preset to URL for easy sharing
|
- **Filter presets**: Move list filter preset to URL for easy sharing
|
||||||
- **Job comparison**: Improved job comparison views and plots
|
- **Job comparison**: Improved job comparison views and plots
|
||||||
- **Subcluster reactivity**: Job list now reacts to subcluster filter changes
|
- **Subcluster reactivity**: Job list now reacts to subcluster filter changes
|
||||||
|
- **Short jobs quick selection**: New "Short jobs" quick-filter button in job lists
|
||||||
|
replaces the removed undocumented `minRunningFor` filter
|
||||||
|
- **Row plot cursor sync**: Cursor position is now synchronized across all metric
|
||||||
|
plots in a job list row for easier cross-metric comparison
|
||||||
|
- **Disabled metrics handling**: Improved handling and display of disabled metrics
|
||||||
|
across job view, node view, and list rows
|
||||||
|
- **"Not configured" info cards**: Informational cards shown when optional features
|
||||||
|
are not yet configured
|
||||||
- **Frontend dependencies**: Bumped frontend dependencies to latest versions
|
- **Frontend dependencies**: Bumped frontend dependencies to latest versions
|
||||||
- **Svelte 5 compatibility**: Fixed Svelte state warnings and compatibility issues
|
- **Svelte 5 compatibility**: Fixed Svelte state warnings and compatibility issues
|
||||||
|
|
||||||
@@ -95,6 +143,15 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus
|
|||||||
- **Graceful shutdown**: Fixed shutdown timeout bugs and hanging issues
|
- **Graceful shutdown**: Fixed shutdown timeout bugs and hanging issues
|
||||||
- **Configuration defaults**: Sensible defaults for most configuration options
|
- **Configuration defaults**: Sensible defaults for most configuration options
|
||||||
- **Documentation**: Extensive documentation improvements across packages
|
- **Documentation**: Extensive documentation improvements across packages
|
||||||
|
- **Server flag in systemd unit**: Example systemd unit now includes the `-server` flag
|
||||||
|
|
||||||
|
### Security
|
||||||
|
|
||||||
|
- **LDAP security hardening**: Improved input validation, connection handling, and
|
||||||
|
error reporting in the LDAP authenticator
|
||||||
|
- **OIDC security hardening**: Stricter token validation and improved error handling
|
||||||
|
in the OIDC authenticator
|
||||||
|
- **Auth schema extensions**: Additional schema fields for improved auth configuration
|
||||||
|
|
||||||
### API improvements
|
### API improvements
|
||||||
|
|
||||||
@@ -102,6 +159,8 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus
|
|||||||
- **Job exclusivity filter**: New filter for exclusive vs. shared jobs
|
- **Job exclusivity filter**: New filter for exclusive vs. shared jobs
|
||||||
- **Improved error messages**: Better error messages and documentation in REST API
|
- **Improved error messages**: Better error messages and documentation in REST API
|
||||||
- **GraphQL enhancements**: Improved GraphQL queries and resolvers
|
- **GraphQL enhancements**: Improved GraphQL queries and resolvers
|
||||||
|
- **Stop job lookup order**: Reversed lookup order in stop job requests for
|
||||||
|
more reliable job matching (cluster+jobId first, then jobId alone)
|
||||||
|
|
||||||
### Performance
|
### Performance
|
||||||
|
|
||||||
@@ -109,13 +168,17 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus
|
|||||||
- **Job cache**: Introduced caching table for faster job inserts
|
- **Job cache**: Introduced caching table for faster job inserts
|
||||||
- **Parallel imports**: Archive imports now run in parallel where possible
|
- **Parallel imports**: Archive imports now run in parallel where possible
|
||||||
- **External tool integration**: Optimized use of external tools (fd) for better performance
|
- **External tool integration**: Optimized use of external tools (fd) for better performance
|
||||||
|
- **Node repository queries**: Reviewed and optimized node repository SQL queries
|
||||||
|
- **Buffer pool**: Resized and pooled internal buffers for better memory reuse
|
||||||
|
|
||||||
### Developer experience
|
### Developer experience
|
||||||
|
|
||||||
- **AI agent guidelines**: Added documentation for AI coding agents (AGENTS.md, CLAUDE.md)
|
- **AI agent guidelines**: Added documentation for AI coding agents (AGENTS.md, CLAUDE.md)
|
||||||
- **Example API payloads**: Added example JSON API payloads for testing
|
- **Example API payloads**: Added example JSON API payloads for testing
|
||||||
- **Unit tests**: Added more unit tests for NATS API and other components
|
- **Unit tests**: Added more unit tests for NATS API, node repository, and other components
|
||||||
- **Test improvements**: Better test coverage and test data
|
- **Test improvements**: Better test coverage; test DB is now copied before unit tests
|
||||||
|
to avoid state pollution between test runs
|
||||||
|
- **Parquet writer tests**: Comprehensive tests for Parquet archive writing and conversion
|
||||||
|
|
||||||
## Bug fixes
|
## Bug fixes
|
||||||
|
|
||||||
@@ -132,6 +195,16 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus
|
|||||||
- Fixed polar plot data query decoupling
|
- Fixed polar plot data query decoupling
|
||||||
- Fixed missing resolution parameter handling
|
- Fixed missing resolution parameter handling
|
||||||
- Fixed node table initialization fallback
|
- Fixed node table initialization fallback
|
||||||
|
- Fixed reactivity key placement in nodeList
|
||||||
|
- Fixed nodeList resolver data handling and increased nodestate filter cutoff
|
||||||
|
- Fixed job always being transferred to main job table before archiving
|
||||||
|
- Fixed AppTagger error handling and logging
|
||||||
|
- Fixed log endpoint formatting and correctness
|
||||||
|
- Fixed automatic refresh in metric status tab
|
||||||
|
- Fixed NULL value handling in `health_state` and `health_metrics` columns
|
||||||
|
- Fixed bugs related to `job_cache` IDs being used in the main job table
|
||||||
|
- Fixed SyncJobs bug causing start job hooks to be called with wrong (cache) IDs
|
||||||
|
- Fixed 404 handler route for sub-routers
|
||||||
|
|
||||||
## Configuration changes
|
## Configuration changes
|
||||||
|
|
||||||
@@ -167,6 +240,20 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus
|
|||||||
"interval": "48h",
|
"interval": "48h",
|
||||||
"directory": "./var/archive"
|
"directory": "./var/archive"
|
||||||
}
|
}
|
||||||
|
},
|
||||||
|
"archive": {
|
||||||
|
"retention": {
|
||||||
|
"policy": "delete",
|
||||||
|
"age": "6months",
|
||||||
|
"target-format": "parquet"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nodestate": {
|
||||||
|
"retention": {
|
||||||
|
"policy": "archive",
|
||||||
|
"age": "30d",
|
||||||
|
"archive-path": "./var/nodestate-archive"
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
@@ -178,11 +265,13 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus
|
|||||||
- If using S3 archive backend, configure the new `archive` section options
|
- If using S3 archive backend, configure the new `archive` section options
|
||||||
- Test the new public dashboard at `/public` route
|
- Test the new public dashboard at `/public` route
|
||||||
- Review cron worker configuration if you need different frequencies
|
- Review cron worker configuration if you need different frequencies
|
||||||
|
- If using the archive retention feature, configure the `target-format` option
|
||||||
|
to choose between `json` (default) and `parquet` output formats
|
||||||
|
- Consider enabling nodestate retention if you track node states over time
|
||||||
|
|
||||||
## Known issues
|
## Known issues
|
||||||
|
|
||||||
- Currently energy footprint metrics of type energy are ignored for calculating
|
- Currently energy footprint metrics of type energy are ignored for calculating
|
||||||
total energy.
|
total energy.
|
||||||
- Resampling for running jobs only works with cc-metric-store
|
|
||||||
- With energy footprint metrics of type power the unit is ignored and it is
|
- With energy footprint metrics of type power the unit is ignored and it is
|
||||||
assumed the metric has the unit Watt.
|
assumed the metric has the unit Watt.
|
||||||
|
|||||||
Reference in New Issue
Block a user