Merge pull request #471 from ClusterCockpit/dev

Dev
This commit is contained in:
Jan Eitzinger
2026-01-15 16:00:30 +01:00
committed by GitHub
195 changed files with 10417 additions and 5385 deletions

305
CLAUDE.md Normal file
View File

@@ -0,0 +1,305 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with
code in this repository.
## Project Overview
ClusterCockpit is a job-specific performance monitoring framework for HPC
clusters. This is a Golang backend that provides REST and GraphQL APIs, serves a
Svelte-based frontend, and manages job archives and metric data from various
time-series databases.
## Build and Development Commands
### Building
```bash
# Build everything (frontend + backend)
make
# Build only the frontend
make frontend
# Build only the backend (requires frontend to be built first)
go build -ldflags='-s -X main.date=$(date +"%Y-%m-%d:T%H:%M:%S") -X main.version=1.4.4 -X main.commit=$(git rev-parse --short HEAD)' ./cmd/cc-backend
```
### Testing
```bash
# Run all tests
make test
# Run tests with verbose output
go test -v ./...
# Run tests for a specific package
go test ./internal/repository
```
### Code Generation
```bash
# Regenerate GraphQL schema and resolvers (after modifying api/*.graphqls)
make graphql
# Regenerate Swagger/OpenAPI docs (after modifying API comments)
make swagger
```
### Frontend Development
```bash
cd web/frontend
# Install dependencies
npm install
# Build for production
npm run build
# Development mode with watch
npm run dev
```
### Running
```bash
# Initialize database and create admin user
./cc-backend -init-db -add-user demo:admin:demo
# Start server in development mode (enables GraphQL Playground and Swagger UI)
./cc-backend -server -dev -loglevel info
# Start demo with sample data
./startDemo.sh
```
## Architecture
### Backend Structure
The backend follows a layered architecture with clear separation of concerns:
- **cmd/cc-backend**: Entry point, orchestrates initialization of all subsystems
- **internal/repository**: Data access layer using repository pattern
- Abstracts database operations (SQLite3 only)
- Implements LRU caching for performance
- Provides repositories for Job, User, Node, and Tag entities
- Transaction support for batch operations
- **internal/api**: REST API endpoints (Swagger/OpenAPI documented)
- **internal/graph**: GraphQL API (uses gqlgen)
- Schema in `api/*.graphqls`
- Generated code in `internal/graph/generated/`
- Resolvers in `internal/graph/schema.resolvers.go`
- **internal/auth**: Authentication layer
- Supports local accounts, LDAP, OIDC, and JWT tokens
- Implements rate limiting for login attempts
- **internal/metricstore**: Metric store with data loading API
- In-memory metric storage with checkpointing
- Query API for loading job metric data
- **internal/archiver**: Job archiving to file-based archive
- **internal/api/nats.go**: NATS-based API for job and node operations
- Subscribes to NATS subjects for job events (start/stop)
- Handles node state updates via NATS
- Uses InfluxDB line protocol message format
- **pkg/archive**: Job archive backend implementations
- File system backend (default)
- S3 backend
- SQLite backend (experimental)
- **pkg/nats**: NATS client and message decoding utilities
### Frontend Structure
- **web/frontend**: Svelte 5 application
- Uses Rollup for building
- Components organized by feature (analysis, job, user, etc.)
- GraphQL client using @urql/svelte
- Bootstrap 5 + SvelteStrap for UI
- uPlot for time-series visualization
- **web/templates**: Server-side Go templates
### Key Concepts
**Job Archive**: Completed jobs are stored in a file-based archive following the
[ClusterCockpit job-archive
specification](https://github.com/ClusterCockpit/cc-specifications/tree/master/job-archive).
Each job has a `meta.json` file with metadata and metric data files.
**Metric Data Repositories**: Time-series metric data is stored separately from
job metadata. The system supports multiple backends (cc-metric-store is
recommended). Configuration is per-cluster in `config.json`.
**Authentication Flow**:
1. Multiple authenticators can be configured (local, LDAP, OIDC, JWT)
2. Each authenticator's `CanLogin` method is called to determine if it should handle the request
3. The first authenticator that returns true performs the actual `Login`
4. JWT tokens are used for API authentication
**Database Migrations**: SQL migrations in `internal/repository/migrations/` are
applied automatically on startup. Version tracking in `version` table.
**Scopes**: Metrics can be collected at different scopes:
- Node scope (always available)
- Core scope (for jobs with ≤8 nodes)
- Accelerator scope (for GPU/accelerator metrics)
## Configuration
- **config.json**: Main configuration (clusters, metric repositories, archive settings)
- `main.apiSubjects`: NATS subject configuration (optional)
- `subjectJobEvent`: Subject for job start/stop events (e.g., "cc.job.event")
- `subjectNodeState`: Subject for node state updates (e.g., "cc.node.state")
- `nats`: NATS client connection configuration (optional)
- `address`: NATS server address (e.g., "nats://localhost:4222")
- `username`: Authentication username (optional)
- `password`: Authentication password (optional)
- `creds-file-path`: Path to NATS credentials file (optional)
- **.env**: Environment variables (secrets like JWT keys)
- Copy from `configs/env-template.txt`
- NEVER commit this file
- **cluster.json**: Cluster topology and metric definitions (loaded from archive or config)
## Database
- Default: SQLite 3 (`./var/job.db`)
- Connection managed by `internal/repository`
- Schema version in `internal/repository/migration.go`
## Code Generation
**GraphQL** (gqlgen):
- Schema: `api/*.graphqls`
- Config: `gqlgen.yml`
- Generated code: `internal/graph/generated/`
- Custom resolvers: `internal/graph/schema.resolvers.go`
- Run `make graphql` after schema changes
**Swagger/OpenAPI**:
- Annotations in `internal/api/*.go`
- Generated docs: `api/docs.go`, `api/swagger.yaml`
- Run `make swagger` after API changes
## Testing Conventions
- Test files use `_test.go` suffix
- Test data in `testdata/` subdirectories
- Repository tests use in-memory SQLite
- API tests use httptest
## Common Workflows
### Adding a new GraphQL field
1. Edit schema in `api/*.graphqls`
2. Run `make graphql`
3. Implement resolver in `internal/graph/schema.resolvers.go`
### Adding a new REST endpoint
1. Add handler in `internal/api/*.go`
2. Add route in `internal/api/rest.go`
3. Add Swagger annotations
4. Run `make swagger`
### Adding a new metric data backend
1. Implement metric loading functions in `internal/metricstore/query.go`
2. Add cluster configuration to metric store initialization
3. Update config.json schema documentation
### Modifying database schema
1. Create new migration in `internal/repository/migrations/`
2. Increment `repository.Version`
3. Test with fresh database and existing database
## NATS API
The backend supports a NATS-based API as an alternative to the REST API for job and node operations.
### Setup
1. Configure NATS client connection in `config.json`:
```json
{
"nats": {
"address": "nats://localhost:4222",
"username": "user",
"password": "pass"
}
}
```
2. Configure API subjects in `config.json` under `main`:
```json
{
"main": {
"apiSubjects": {
"subjectJobEvent": "cc.job.event",
"subjectNodeState": "cc.node.state"
}
}
}
```
### Message Format
Messages use **InfluxDB line protocol** format with the following structure:
#### Job Events
**Start Job:**
```
job,function=start_job event="{\"jobId\":123,\"user\":\"alice\",\"cluster\":\"test\", ...}" 1234567890000000000
```
**Stop Job:**
```
job,function=stop_job event="{\"jobId\":123,\"cluster\":\"test\",\"startTime\":1234567890,\"stopTime\":1234571490,\"jobState\":\"completed\"}" 1234571490000000000
```
**Tags:**
- `function`: Either `start_job` or `stop_job`
**Fields:**
- `event`: JSON payload containing job data (see REST API documentation for schema)
#### Node State Updates
```json
{
"cluster": "testcluster",
"nodes": [
{
"hostname": "node001",
"states": ["allocated"],
"cpusAllocated": 8,
"memoryAllocated": 16384,
"gpusAllocated": 0,
"jobsRunning": 1
}
]
}
```
### Implementation Notes
- NATS API mirrors REST API functionality but uses messaging
- Job start/stop events are processed asynchronously
- Duplicate job detection is handled (same as REST API)
- All validation rules from REST API apply
- Messages are logged; no responses are sent back to publishers
- If NATS client is unavailable, API subscriptions are skipped (logged as warning)
## Dependencies
- Go 1.24.0+ (check go.mod for exact version)
- Node.js (for frontend builds)
- SQLite 3 (only supported database)
- Optional: NATS server for NATS API integration

View File

@@ -22,19 +22,23 @@ switching from PHP Symfony to a Golang based solution are explained
## Overview ## Overview
This is a Golang web backend for the ClusterCockpit job-specific performance This is a Golang web backend for the ClusterCockpit job-specific performance
monitoring framework. It provides a REST API for integrating ClusterCockpit with monitoring framework. It provides a REST API and an optional NATS-based messaging
an HPC cluster batch system and external analysis scripts. Data exchange between API for integrating ClusterCockpit with an HPC cluster batch system and external
the web front-end and the back-end is based on a GraphQL API. The web frontend analysis scripts. Data exchange between the web front-end and the back-end is
is also served by the backend using [Svelte](https://svelte.dev/) components. based on a GraphQL API. The web frontend is also served by the backend using
Layout and styling are based on [Bootstrap 5](https://getbootstrap.com/) using [Svelte](https://svelte.dev/) components. Layout and styling are based on
[Bootstrap 5](https://getbootstrap.com/) using
[Bootstrap Icons](https://icons.getbootstrap.com/). [Bootstrap Icons](https://icons.getbootstrap.com/).
The backend uses [SQLite 3](https://sqlite.org/) as a relational SQL database by The backend uses [SQLite 3](https://sqlite.org/) as the relational SQL database.
default. Optionally it can use a MySQL/MariaDB database server. While there are While there are metric data backends for the InfluxDB and Prometheus time series
metric data backends for the InfluxDB and Prometheus time series databases, the databases, the only tested and supported setup is to use cc-metric-store as the
only tested and supported setup is to use cc-metric-store as the metric data metric data backend. Documentation on how to integrate ClusterCockpit with other
backend. Documentation on how to integrate ClusterCockpit with other time series time series databases will be added in the future.
databases will be added in the future.
For real-time integration with HPC systems, the backend can subscribe to
[NATS](https://nats.io/) subjects to receive job start/stop events and node
state updates, providing an alternative to REST API polling.
Completed batch jobs are stored in a file-based job archive according to Completed batch jobs are stored in a file-based job archive according to
[this specification](https://github.com/ClusterCockpit/cc-specifications/tree/master/job-archive). [this specification](https://github.com/ClusterCockpit/cc-specifications/tree/master/job-archive).
@@ -131,27 +135,58 @@ ln -s <your-existing-job-archive> ./var/job-archive
## Project file structure ## Project file structure
- [`.github/`](https://github.com/ClusterCockpit/cc-backend/tree/master/.github)
GitHub Actions workflows and dependabot configuration for CI/CD.
- [`api/`](https://github.com/ClusterCockpit/cc-backend/tree/master/api) - [`api/`](https://github.com/ClusterCockpit/cc-backend/tree/master/api)
contains the API schema files for the REST and GraphQL APIs. The REST API is contains the API schema files for the REST and GraphQL APIs. The REST API is
documented in the OpenAPI 3.0 format in documented in the OpenAPI 3.0 format in
[./api/openapi.yaml](./api/openapi.yaml). [./api/swagger.yaml](./api/swagger.yaml). The GraphQL schema is in
[./api/schema.graphqls](./api/schema.graphqls).
- [`cmd/cc-backend`](https://github.com/ClusterCockpit/cc-backend/tree/master/cmd/cc-backend) - [`cmd/cc-backend`](https://github.com/ClusterCockpit/cc-backend/tree/master/cmd/cc-backend)
contains `main.go` for the main application. contains the main application entry point and CLI implementation.
- [`configs/`](https://github.com/ClusterCockpit/cc-backend/tree/master/configs) - [`configs/`](https://github.com/ClusterCockpit/cc-backend/tree/master/configs)
contains documentation about configuration and command line options and required contains documentation about configuration and command line options and required
environment variables. A sample configuration file is provided. environment variables. Sample configuration files are provided.
- [`docs/`](https://github.com/ClusterCockpit/cc-backend/tree/master/docs)
contains more in-depth documentation.
- [`init/`](https://github.com/ClusterCockpit/cc-backend/tree/master/init) - [`init/`](https://github.com/ClusterCockpit/cc-backend/tree/master/init)
contains an example of setting up systemd for production use. contains an example of setting up systemd for production use.
- [`internal/`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal) - [`internal/`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal)
contains library source code that is not intended for use by others. contains library source code that is not intended for use by others.
- [`api`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/api)
REST API handlers and NATS integration
- [`archiver`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/archiver)
Job archiving functionality
- [`auth`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/auth)
Authentication (local, LDAP, OIDC) and JWT token handling
- [`config`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/config)
Configuration management and validation
- [`graph`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/graph)
GraphQL schema and resolvers
- [`importer`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/importer)
Job data import and database initialization
- [`metricstore`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/metricstore)
In-memory metric data store with checkpointing and metric loading
- [`metricdispatch`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/metricdispatch)
Dispatches metric data loading to appropriate backends
- [`repository`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/repository)
Database repository layer for jobs and metadata
- [`routerConfig`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/routerConfig)
HTTP router configuration and middleware
- [`tagger`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/tagger)
Job classification and application detection
- [`taskmanager`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/taskmanager)
Background task management and scheduled jobs
- [`pkg/`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg) - [`pkg/`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg)
contains Go packages that can be used by other projects. contains Go packages that can be used by other projects.
- [`archive`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg/archive)
Job archive backend implementations (filesystem, S3)
- [`nats`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg/nats)
NATS client and message handling
- [`tools/`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools) - [`tools/`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools)
Additional command line helper tools. Additional command line helper tools.
- [`archive-manager`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/archive-manager) - [`archive-manager`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/archive-manager)
Commands for getting infos about and existing job archive. Commands for getting infos about an existing job archive.
- [`archive-migration`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/archive-migration)
Tool for migrating job archives between formats.
- [`convert-pem-pubkey`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/convert-pem-pubkey) - [`convert-pem-pubkey`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/convert-pem-pubkey)
Tool to convert external pubkey for use in `cc-backend`. Tool to convert external pubkey for use in `cc-backend`.
- [`gen-keypair`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/gen-keypair) - [`gen-keypair`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/gen-keypair)
@@ -163,7 +198,7 @@ ln -s <your-existing-job-archive> ./var/job-archive
- [`frontend`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/frontend) - [`frontend`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/frontend)
Svelte components and static assets for the frontend UI Svelte components and static assets for the frontend UI
- [`templates`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/templates) - [`templates`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/templates)
Server-side Go templates Server-side Go templates, including monitoring views
- [`gqlgen.yml`](https://github.com/ClusterCockpit/cc-backend/blob/master/gqlgen.yml) - [`gqlgen.yml`](https://github.com/ClusterCockpit/cc-backend/blob/master/gqlgen.yml)
Configures the behaviour and generation of Configures the behaviour and generation of
[gqlgen](https://github.com/99designs/gqlgen). [gqlgen](https://github.com/99designs/gqlgen).

View File

@@ -458,6 +458,7 @@ input JobFilter {
state: [JobState!] state: [JobState!]
metricStats: [MetricStatItem!] metricStats: [MetricStatItem!]
shared: String shared: String
schedule: String
node: StringInput node: StringInput
} }

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -15,8 +15,8 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/config" "github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
"github.com/ClusterCockpit/cc-backend/pkg/archive" "github.com/ClusterCockpit/cc-backend/pkg/archive"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/util" "github.com/ClusterCockpit/cc-lib/v2/util"
) )
const envString = ` const envString = `
@@ -36,7 +36,7 @@ const configString = `
"short-running-jobs-duration": 300, "short-running-jobs-duration": 300,
"resampling": { "resampling": {
"minimumPoints": 600, "minimumPoints": 600,
"trigger": 180, "trigger": 300,
"resolutions": [ "resolutions": [
240, 240,
60 60
@@ -48,7 +48,7 @@ const configString = `
"emission-constant": 317 "emission-constant": 317
}, },
"cron": { "cron": {
"commit-job-worker": "2m", "commit-job-worker": "1m",
"duration-worker": "5m", "duration-worker": "5m",
"footprint-worker": "10m" "footprint-worker": "10m"
}, },
@@ -60,31 +60,7 @@ const configString = `
"jwts": { "jwts": {
"max-age": "2000h" "max-age": "2000h"
} }
}, }
"clusters": [
{
"name": "name",
"metricDataRepository": {
"kind": "cc-metric-store",
"url": "http://localhost:8082",
"token": ""
},
"filterRanges": {
"numNodes": {
"from": 1,
"to": 64
},
"duration": {
"from": 0,
"to": 86400
},
"startTime": {
"from": "2023-01-01T00:00:00Z",
"to": null
}
}
}
]
} }
` `
@@ -105,9 +81,9 @@ func initEnv() {
cclog.Abortf("Could not create default ./var folder with permissions '0o777'. Application initialization failed, exited.\nError: %s\n", err.Error()) cclog.Abortf("Could not create default ./var folder with permissions '0o777'. Application initialization failed, exited.\nError: %s\n", err.Error())
} }
err := repository.MigrateDB("sqlite3", "./var/job.db") err := repository.MigrateDB("./var/job.db")
if err != nil { if err != nil {
cclog.Abortf("Could not initialize default sqlite3 database as './var/job.db'. Application initialization failed, exited.\nError: %s\n", err.Error()) cclog.Abortf("Could not initialize default SQLite database as './var/job.db'. Application initialization failed, exited.\nError: %s\n", err.Error())
} }
if err := os.Mkdir("var/job-archive", 0o777); err != nil { if err := os.Mkdir("var/job-archive", 0o777); err != nil {
cclog.Abortf("Could not create default ./var/job-archive folder with permissions '0o777'. Application initialization failed, exited.\nError: %s\n", err.Error()) cclog.Abortf("Could not create default ./var/job-archive folder with permissions '0o777'. Application initialization failed, exited.\nError: %s\n", err.Error())

View File

@@ -24,23 +24,21 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/auth" "github.com/ClusterCockpit/cc-backend/internal/auth"
"github.com/ClusterCockpit/cc-backend/internal/config" "github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/importer" "github.com/ClusterCockpit/cc-backend/internal/importer"
"github.com/ClusterCockpit/cc-backend/internal/memorystore" "github.com/ClusterCockpit/cc-backend/internal/metricstore"
"github.com/ClusterCockpit/cc-backend/internal/metricdata"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
"github.com/ClusterCockpit/cc-backend/internal/tagger" "github.com/ClusterCockpit/cc-backend/internal/tagger"
"github.com/ClusterCockpit/cc-backend/internal/taskmanager" "github.com/ClusterCockpit/cc-backend/internal/taskmanager"
"github.com/ClusterCockpit/cc-backend/pkg/archive" "github.com/ClusterCockpit/cc-backend/pkg/archive"
"github.com/ClusterCockpit/cc-backend/pkg/nats" "github.com/ClusterCockpit/cc-backend/pkg/nats"
"github.com/ClusterCockpit/cc-backend/web" "github.com/ClusterCockpit/cc-backend/web"
ccconf "github.com/ClusterCockpit/cc-lib/ccConfig" ccconf "github.com/ClusterCockpit/cc-lib/v2/ccConfig"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/runtimeEnv" "github.com/ClusterCockpit/cc-lib/v2/runtime"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/ClusterCockpit/cc-lib/util" "github.com/ClusterCockpit/cc-lib/v2/util"
"github.com/google/gops/agent" "github.com/google/gops/agent"
"github.com/joho/godotenv" "github.com/joho/godotenv"
_ "github.com/go-sql-driver/mysql"
_ "github.com/mattn/go-sqlite3" _ "github.com/mattn/go-sqlite3"
) )
@@ -104,12 +102,7 @@ func initConfiguration() error {
return fmt.Errorf("main configuration must be present") return fmt.Errorf("main configuration must be present")
} }
clustercfg := ccconf.GetPackageConfig("clusters") config.Init(cfg)
if clustercfg == nil {
return fmt.Errorf("cluster configuration must be present")
}
config.Init(cfg, clustercfg)
return nil return nil
} }
@@ -120,30 +113,30 @@ func initDatabase() error {
func handleDatabaseCommands() error { func handleDatabaseCommands() error {
if flagMigrateDB { if flagMigrateDB {
err := repository.MigrateDB(config.Keys.DBDriver, config.Keys.DB) err := repository.MigrateDB(config.Keys.DB)
if err != nil { if err != nil {
return fmt.Errorf("migrating database to version %d: %w", repository.Version, err) return fmt.Errorf("migrating database to version %d: %w", repository.Version, err)
} }
cclog.Exitf("MigrateDB Success: Migrated '%s' database at location '%s' to version %d.\n", cclog.Exitf("MigrateDB Success: Migrated SQLite database at '%s' to version %d.\n",
config.Keys.DBDriver, config.Keys.DB, repository.Version) config.Keys.DB, repository.Version)
} }
if flagRevertDB { if flagRevertDB {
err := repository.RevertDB(config.Keys.DBDriver, config.Keys.DB) err := repository.RevertDB(config.Keys.DB)
if err != nil { if err != nil {
return fmt.Errorf("reverting database to version %d: %w", repository.Version-1, err) return fmt.Errorf("reverting database to version %d: %w", repository.Version-1, err)
} }
cclog.Exitf("RevertDB Success: Reverted '%s' database at location '%s' to version %d.\n", cclog.Exitf("RevertDB Success: Reverted SQLite database at '%s' to version %d.\n",
config.Keys.DBDriver, config.Keys.DB, repository.Version-1) config.Keys.DB, repository.Version-1)
} }
if flagForceDB { if flagForceDB {
err := repository.ForceDB(config.Keys.DBDriver, config.Keys.DB) err := repository.ForceDB(config.Keys.DB)
if err != nil { if err != nil {
return fmt.Errorf("forcing database to version %d: %w", repository.Version, err) return fmt.Errorf("forcing database to version %d: %w", repository.Version, err)
} }
cclog.Exitf("ForceDB Success: Forced '%s' database at location '%s' to version %d.\n", cclog.Exitf("ForceDB Success: Forced SQLite database at '%s' to version %d.\n",
config.Keys.DBDriver, config.Keys.DB, repository.Version) config.Keys.DB, repository.Version)
} }
return nil return nil
@@ -278,16 +271,14 @@ func initSubsystems() error {
// Initialize job archive // Initialize job archive
archiveCfg := ccconf.GetPackageConfig("archive") archiveCfg := ccconf.GetPackageConfig("archive")
if archiveCfg == nil { if archiveCfg == nil {
cclog.Debug("Archive configuration not found, using default archive configuration")
archiveCfg = json.RawMessage(defaultArchiveConfig) archiveCfg = json.RawMessage(defaultArchiveConfig)
} }
if err := archive.Init(archiveCfg, config.Keys.DisableArchive); err != nil { if err := archive.Init(archiveCfg, config.Keys.DisableArchive); err != nil {
return fmt.Errorf("initializing archive: %w", err) return fmt.Errorf("initializing archive: %w", err)
} }
// Initialize metricdata // Note: metricstore.Init() is called later in runServer() with proper configuration
if err := metricdata.Init(); err != nil {
return fmt.Errorf("initializing metricdata repository: %w", err)
}
// Handle database re-initialization // Handle database re-initialization
if flagReinitDB { if flagReinitDB {
@@ -312,6 +303,8 @@ func initSubsystems() error {
// Apply tags if requested // Apply tags if requested
if flagApplyTags { if flagApplyTags {
tagger.Init()
if err := tagger.RunTaggers(); err != nil { if err := tagger.RunTaggers(); err != nil {
return fmt.Errorf("running job taggers: %w", err) return fmt.Errorf("running job taggers: %w", err)
} }
@@ -323,13 +316,17 @@ func initSubsystems() error {
func runServer(ctx context.Context) error { func runServer(ctx context.Context) error {
var wg sync.WaitGroup var wg sync.WaitGroup
// Start metric store if enabled // Initialize metric store if configuration is provided
if memorystore.InternalCCMSFlag { mscfg := ccconf.GetPackageConfig("metric-store")
mscfg := ccconf.GetPackageConfig("metric-store") if mscfg != nil {
if mscfg == nil { metricstore.Init(mscfg, &wg)
return fmt.Errorf("metric store configuration must be present")
} // Inject repository as NodeProvider to break import cycle
memorystore.Init(mscfg, &wg) ms := metricstore.GetMemoryStore()
jobRepo := repository.GetJobRepository()
ms.SetNodeProvider(jobRepo)
} else {
return fmt.Errorf("missing metricstore configuration")
} }
// Start archiver and task manager // Start archiver and task manager
@@ -372,7 +369,7 @@ func runServer(ctx context.Context) error {
case <-ctx.Done(): case <-ctx.Done():
} }
runtimeEnv.SystemdNotifiy(false, "Shutting down ...") runtime.SystemdNotify(false, "Shutting down ...")
srv.Shutdown(ctx) srv.Shutdown(ctx)
util.FsWatcherShutdown() util.FsWatcherShutdown()
taskmanager.Shutdown() taskmanager.Shutdown()
@@ -382,24 +379,39 @@ func runServer(ctx context.Context) error {
if os.Getenv(envGOGC) == "" { if os.Getenv(envGOGC) == "" {
debug.SetGCPercent(25) debug.SetGCPercent(25)
} }
runtimeEnv.SystemdNotifiy(true, "running") runtime.SystemdNotify(true, "running")
// Wait for completion or error waitDone := make(chan struct{})
go func() { go func() {
wg.Wait() wg.Wait()
close(waitDone)
}()
go func() {
<-waitDone
close(errChan) close(errChan)
}() }()
// Check for server startup errors // Wait for either:
// 1. An error from server startup
// 2. Completion of all goroutines (normal shutdown or crash)
select { select {
case err := <-errChan: case err := <-errChan:
// errChan will be closed when waitDone is closed, which happens
// when all goroutines complete (either from normal shutdown or error)
if err != nil { if err != nil {
return err return err
} }
case <-time.After(100 * time.Millisecond): case <-time.After(100 * time.Millisecond):
// Server started successfully, wait for completion // Give the server 100ms to start and report any immediate startup errors
if err := <-errChan; err != nil { // After that, just wait for normal shutdown completion
return err select {
case err := <-errChan:
if err != nil {
return err
}
case <-waitDone:
// Normal shutdown completed
} }
} }

View File

@@ -29,12 +29,12 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/config" "github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/graph" "github.com/ClusterCockpit/cc-backend/internal/graph"
"github.com/ClusterCockpit/cc-backend/internal/graph/generated" "github.com/ClusterCockpit/cc-backend/internal/graph/generated"
"github.com/ClusterCockpit/cc-backend/internal/memorystore" "github.com/ClusterCockpit/cc-backend/internal/metricstore"
"github.com/ClusterCockpit/cc-backend/internal/routerConfig" "github.com/ClusterCockpit/cc-backend/internal/routerConfig"
"github.com/ClusterCockpit/cc-backend/pkg/nats" "github.com/ClusterCockpit/cc-backend/pkg/nats"
"github.com/ClusterCockpit/cc-backend/web" "github.com/ClusterCockpit/cc-backend/web"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/runtimeEnv" "github.com/ClusterCockpit/cc-lib/v2/runtime"
"github.com/gorilla/handlers" "github.com/gorilla/handlers"
"github.com/gorilla/mux" "github.com/gorilla/mux"
httpSwagger "github.com/swaggo/http-swagger" httpSwagger "github.com/swaggo/http-swagger"
@@ -49,9 +49,10 @@ const (
// Server encapsulates the HTTP server state and dependencies // Server encapsulates the HTTP server state and dependencies
type Server struct { type Server struct {
router *mux.Router router *mux.Router
server *http.Server server *http.Server
apiHandle *api.RestAPI restAPIHandle *api.RestAPI
natsAPIHandle *api.NatsAPI
} }
func onFailureResponse(rw http.ResponseWriter, r *http.Request, err error) { func onFailureResponse(rw http.ResponseWriter, r *http.Request, err error) {
@@ -104,7 +105,7 @@ func (s *Server) init() error {
authHandle := auth.GetAuthInstance() authHandle := auth.GetAuthInstance()
s.apiHandle = api.New() s.restAPIHandle = api.New()
info := map[string]any{} info := map[string]any{}
info["hasOpenIDConnect"] = false info["hasOpenIDConnect"] = false
@@ -240,15 +241,20 @@ func (s *Server) init() error {
// Mount all /monitoring/... and /api/... routes. // Mount all /monitoring/... and /api/... routes.
routerConfig.SetupRoutes(secured, buildInfo) routerConfig.SetupRoutes(secured, buildInfo)
s.apiHandle.MountAPIRoutes(securedapi) s.restAPIHandle.MountAPIRoutes(securedapi)
s.apiHandle.MountUserAPIRoutes(userapi) s.restAPIHandle.MountUserAPIRoutes(userapi)
s.apiHandle.MountConfigAPIRoutes(configapi) s.restAPIHandle.MountConfigAPIRoutes(configapi)
s.apiHandle.MountFrontendAPIRoutes(frontendapi) s.restAPIHandle.MountFrontendAPIRoutes(frontendapi)
if memorystore.InternalCCMSFlag { if config.Keys.APISubjects != nil {
s.apiHandle.MountMetricStoreAPIRoutes(metricstoreapi) s.natsAPIHandle = api.NewNatsAPI()
if err := s.natsAPIHandle.StartSubscriptions(); err != nil {
return fmt.Errorf("starting NATS subscriptions: %w", err)
}
} }
s.restAPIHandle.MountMetricStoreAPIRoutes(metricstoreapi)
if config.Keys.EmbedStaticFiles { if config.Keys.EmbedStaticFiles {
if i, err := os.Stat("./var/img"); err == nil { if i, err := os.Stat("./var/img"); err == nil {
if i.IsDir() { if i.IsDir() {
@@ -339,7 +345,7 @@ func (s *Server) Start(ctx context.Context) error {
// Because this program will want to bind to a privileged port (like 80), the listener must // Because this program will want to bind to a privileged port (like 80), the listener must
// be established first, then the user can be changed, and after that, // be established first, then the user can be changed, and after that,
// the actual http server can be started. // the actual http server can be started.
if err := runtimeEnv.DropPrivileges(config.Keys.Group, config.Keys.User); err != nil { if err := runtime.DropPrivileges(config.Keys.Group, config.Keys.User); err != nil {
return fmt.Errorf("dropping privileges: %w", err) return fmt.Errorf("dropping privileges: %w", err)
} }
@@ -375,9 +381,7 @@ func (s *Server) Shutdown(ctx context.Context) {
} }
// Archive all the metric store data // Archive all the metric store data
if memorystore.InternalCCMSFlag { metricstore.Shutdown()
memorystore.Shutdown()
}
// Shutdown archiver with 10 second timeout for fast shutdown // Shutdown archiver with 10 second timeout for fast shutdown
if err := archiver.Shutdown(10 * time.Second); err != nil { if err := archiver.Shutdown(10 * time.Second); err != nil {

View File

@@ -1,106 +1,22 @@
{ {
"main": { "main": {
"addr": "127.0.0.1:8080", "addr": "127.0.0.1:8080",
"short-running-jobs-duration": 300, "apiAllowedIPs": ["*"]
"resampling": {
"minimumPoints": 600,
"trigger": 180,
"resolutions": [
240,
60
]
},
"apiAllowedIPs": [
"*"
],
"emission-constant": 317
}, },
"cron": { "cron": {
"commit-job-worker": "2m", "commit-job-worker": "1m",
"duration-worker": "5m", "duration-worker": "3m",
"footprint-worker": "10m" "footprint-worker": "5m"
},
"archive": {
"kind": "file",
"path": "./var/job-archive"
}, },
"auth": { "auth": {
"jwts": { "jwts": {
"max-age": "2000h" "max-age": "2000h"
} }
}, },
"nats": {
"address": "nats://0.0.0.0:4222",
"username": "root",
"password": "root"
},
"clusters": [
{
"name": "fritz",
"metricDataRepository": {
"kind": "cc-metric-store-internal",
"url": "http://localhost:8082",
"token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"
},
"filterRanges": {
"numNodes": {
"from": 1,
"to": 64
},
"duration": {
"from": 0,
"to": 86400
},
"startTime": {
"from": "2022-01-01T00:00:00Z",
"to": null
}
}
},
{
"name": "alex",
"metricDataRepository": {
"kind": "cc-metric-store-internal",
"url": "http://localhost:8082",
"token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"
},
"filterRanges": {
"numNodes": {
"from": 1,
"to": 64
},
"duration": {
"from": 0,
"to": 86400
},
"startTime": {
"from": "2022-01-01T00:00:00Z",
"to": null
}
}
}
],
"metric-store": { "metric-store": {
"checkpoints": { "checkpoints": {
"file-format": "avro", "interval": "1h"
"interval": "1h",
"directory": "./var/checkpoints",
"restore": "48h"
}, },
"archive": { "retention-in-memory": "12h"
"interval": "1h",
"directory": "./var/archive"
},
"retention-in-memory": "48h",
"subscriptions": [
{
"subscribe-to": "hpc-nats",
"cluster-tag": "fritz"
},
{
"subscribe-to": "hpc-nats",
"cluster-tag": "alex"
}
]
} }
} }

View File

@@ -1,64 +0,0 @@
{
"addr": "127.0.0.1:8080",
"short-running-jobs-duration": 300,
"archive": {
"kind": "file",
"path": "./var/job-archive"
},
"jwts": {
"max-age": "2000h"
},
"db-driver": "mysql",
"db": "clustercockpit:demo@tcp(127.0.0.1:3306)/clustercockpit",
"enable-resampling": {
"trigger": 30,
"resolutions": [600, 300, 120, 60]
},
"emission-constant": 317,
"clusters": [
{
"name": "fritz",
"metricDataRepository": {
"kind": "cc-metric-store",
"url": "http://localhost:8082",
"token": ""
},
"filterRanges": {
"numNodes": {
"from": 1,
"to": 64
},
"duration": {
"from": 0,
"to": 86400
},
"startTime": {
"from": "2022-01-01T00:00:00Z",
"to": null
}
}
},
{
"name": "alex",
"metricDataRepository": {
"kind": "cc-metric-store",
"url": "http://localhost:8082",
"token": ""
},
"filterRanges": {
"numNodes": {
"from": 1,
"to": 64
},
"duration": {
"from": 0,
"to": 86400
},
"startTime": {
"from": "2022-01-01T00:00:00Z",
"to": null
}
}
}
]
}

View File

@@ -5,50 +5,61 @@
"https-key-file": "/etc/letsencrypt/live/url/privkey.pem", "https-key-file": "/etc/letsencrypt/live/url/privkey.pem",
"user": "clustercockpit", "user": "clustercockpit",
"group": "clustercockpit", "group": "clustercockpit",
"validate": false,
"apiAllowedIPs": ["*"], "apiAllowedIPs": ["*"],
"short-running-jobs-duration": 300, "short-running-jobs-duration": 300,
"enable-job-taggers": true,
"resampling": { "resampling": {
"minimumPoints": 600, "minimumPoints": 600,
"trigger": 180, "trigger": 180,
"resolutions": [ "resolutions": [240, 60]
240, },
60 "apiSubjects": {
] "subjectJobEvent": "cc.job.event",
"subjectNodeState": "cc.node.state"
}
},
"nats": {
"address": "nats://0.0.0.0:4222",
"username": "root",
"password": "root"
},
"auth": {
"jwts": {
"max-age": "2000h"
} }
}, },
"cron": { "cron": {
"commit-job-worker": "2m", "commit-job-worker": "1m",
"duration-worker": "5m", "duration-worker": "5m",
"footprint-worker": "10m" "footprint-worker": "10m"
}, },
"archive": { "archive": {
"kind": "file", "kind": "s3",
"path": "./var/job-archive" "endpoint": "http://x.x.x.x",
}, "bucket": "jobarchive",
"clusters": [ "accessKey": "xx",
{ "secretKey": "xx",
"name": "test", "retention": {
"metricDataRepository": { "policy": "move",
"kind": "cc-metric-store", "age": 365,
"url": "http://localhost:8082", "location": "./var/archive"
"token": "eyJhbGciOiJF-E-pQBQ"
},
"filterRanges": {
"numNodes": {
"from": 1,
"to": 64
},
"duration": {
"from": 0,
"to": 86400
},
"startTime": {
"from": "2022-01-01T00:00:00Z",
"to": null
}
}
} }
] },
"metric-store": {
"checkpoints": {
"interval": "1h"
},
"retention-in-memory": "12h",
"subscriptions": [
{
"subscribe-to": "hpc-nats",
"cluster-tag": "fritz"
},
{
"subscribe-to": "hpc-nats",
"cluster-tag": "alex"
}
]
},
"ui-file": "ui-config.json"
} }

View File

@@ -0,0 +1,22 @@
{
"cluster": "fritz",
"jobId": 123000,
"jobState": "running",
"numAcc": 0,
"numHwthreads": 72,
"numNodes": 1,
"partition": "main",
"requestedMemory": 128000,
"resources": [{ "hostname": "f0726" }],
"startTime": 1649723812,
"subCluster": "main",
"submitTime": 1649723812,
"user": "k106eb10",
"project": "k106eb",
"walltime": 86400,
"metaData": {
"slurmInfo": "JobId=398759\nJobName=myJob\nUserId=dummyUser\nGroupId=dummyGroup\nAccount=dummyAccount\nQOS=normal Requeue=False Restarts=0 BatchFlag=True\nTimeLimit=1439'\nSubmitTime=2023-02-09T14:10:18\nPartition=singlenode\nNodeList=xx\nNumNodes=xx NumCPUs=72 NumTasks=72 CPUs/Task=1\nNTasksPerNode:Socket:Core=0:None:None\nTRES_req=cpu=72,mem=250000M,node=1,billing=72\nTRES_alloc=cpu=72,node=1,billing=72\nCommand=myCmd\nWorkDir=myDir\nStdErr=\nStdOut=\n",
"jobScript": "#!/bin/bash -l\n#SBATCH --job-name=dummy_job\n#SBATCH --time=23:59:00\n#SBATCH --partition=singlenode\n#SBATCH --ntasks=72\n#SBATCH --hint=multithread\n#SBATCH --chdir=/home/atuin/k106eb/dummy/\n#SBATCH --export=NONE\nunset SLURM_EXPORT_ENV\n\n#This is a dummy job script\n./mybinary\n",
"jobName": "ams_pipeline"
}
}

View File

@@ -0,0 +1,7 @@
{
"cluster": "fritz",
"jobId": 123000,
"jobState": "completed",
"startTime": 1649723812,
"stopTime": 1649763839
}

419
configs/tagger/README.md Normal file
View File

@@ -0,0 +1,419 @@
# Job Tagging Configuration
ClusterCockpit provides automatic job tagging functionality to classify and
categorize jobs based on configurable rules. The tagging system consists of two
main components:
1. **Application Detection** - Identifies which application a job is running
2. **Job Classification** - Analyzes job performance characteristics and applies classification tags
## Directory Structure
```
configs/tagger/
├── apps/ # Application detection patterns
│ ├── vasp.txt
│ ├── gromacs.txt
│ └── ...
└── jobclasses/ # Job classification rules
├── parameters.json
├── lowUtilization.json
├── highload.json
└── ...
```
## Activating Tagger Rules
### Step 1: Copy Configuration Files
To activate tagging, review, adapt, and copy the configuration files from
`configs/tagger/` to `var/tagger/`:
```bash
# From the cc-backend root directory
mkdir -p var/tagger
cp -r configs/tagger/apps var/tagger/
cp -r configs/tagger/jobclasses var/tagger/
```
### Step 2: Enable Tagging in Configuration
Add or set the following configuration key in the `main` section of your `config.json`:
```json
{
"enable-job-taggers": true
}
```
**Important**: Automatic tagging is disabled by default. You must explicitly
enable it by setting `enable-job-taggers: true` in the main configuration file.
### Step 3: Restart cc-backend
The tagger system automatically loads configuration from `./var/tagger/` at
startup. After copying the files and enabling the feature, restart cc-backend:
```bash
./cc-backend -server
```
### Step 4: Verify Configuration Loaded
Check the logs for messages indicating successful configuration loading:
```
[INFO] Setup file watch for ./var/tagger/apps
[INFO] Setup file watch for ./var/tagger/jobclasses
```
## How Tagging Works
### Automatic Tagging
When `enable-job-taggers` is set to `true` in the configuration, tags are
automatically applied when:
- **Job Start**: Application detection runs immediately when a job starts
- **Job Stop**: Job classification runs when a job completes
The system analyzes job metadata and metrics to determine appropriate tags.
**Note**: Automatic tagging only works for jobs that start or stop after the
feature is enabled. Existing jobs are not automatically retagged.
### Manual Tagging (Retroactive)
To apply tags to existing jobs in the database, use the `-apply-tags` command
line option:
```bash
./cc-backend -apply-tags
```
This processes all jobs in the database and applies current tagging rules. This
is useful when:
- You have existing jobs that were created before tagging was enabled
- You've added new tagging rules and want to apply them to historical data
- You've modified existing rules and want to re-evaluate all jobs
### Hot Reload
The tagger system watches the configuration directories for changes. You can
modify or add rules without restarting `cc-backend`:
- Changes to `var/tagger/apps/*` are detected automatically
- Changes to `var/tagger/jobclasses/*` are detected automatically
## Application Detection
Application detection identifies which software a job is running by matching
patterns in the job script.
### Configuration Format
Application patterns are stored in text files under `var/tagger/apps/`. Each
file contains one or more regular expression patterns (one per line) that match
against the job script.
**Example: `apps/vasp.txt`**
```
vasp
VASP
```
### How It Works
1. When a job starts, the system retrieves the job script from metadata
2. Each line in the app files is treated as a regex pattern
3. Patterns are matched case-insensitively against the lowercased job script
4. If a match is found, a tag of type `app` with the filename (without extension) is applied
5. Only the first matching application is tagged
### Adding New Applications
1. Create a new file in `var/tagger/apps/` (e.g., `tensorflow.txt`)
2. Add regex patterns, one per line:
```
tensorflow
tf\.keras
import tensorflow
```
3. The file is automatically detected and loaded
**Note**: The tag name will be the filename without the `.txt` extension (e.g., `tensorflow`).
## Job Classification
Job classification analyzes completed jobs based on their metrics and properties
to identify performance issues or characteristics.
### Configuration Format
Job classification rules are defined in JSON files under
`var/tagger/jobclasses/`. Each rule file defines:
- **Metrics required**: Which job metrics to analyze
- **Requirements**: Pre-conditions that must be met
- **Variables**: Computed values used in the rule
- **Rule expression**: Boolean expression that determines if the rule matches
- **Hint template**: Message displayed when the rule matches
### Parameters File
`jobclasses/parameters.json` defines shared threshold values used across multiple rules:
```json
{
"lowcpuload_threshold_factor": 0.9,
"highmemoryusage_threshold_factor": 0.9,
"job_min_duration_seconds": 600.0,
"sampling_interval_seconds": 30.0
}
```
### Rule File Structure
**Example: `jobclasses/lowUtilization.json`**
```json
{
"name": "Low resource utilization",
"tag": "lowutilization",
"parameters": ["job_min_duration_seconds"],
"metrics": ["flops_any", "mem_bw"],
"requirements": [
"job.shared == \"none\"",
"job.duration > job_min_duration_seconds"
],
"variables": [
{
"name": "mem_bw_perc",
"expr": "1.0 - (mem_bw.avg / mem_bw.limits.peak)"
}
],
"rule": "flops_any.avg < flops_any.limits.alert",
"hint": "Average flop rate {{.flops_any.avg}} falls below threshold {{.flops_any.limits.alert}}"
}
```
#### Field Descriptions
| Field | Description |
| -------------- | ----------------------------------------------------------------------------- |
| `name` | Human-readable description of the rule |
| `tag` | Tag identifier applied when the rule matches |
| `parameters` | List of parameter names from `parameters.json` to include in rule environment |
| `metrics` | List of metrics required for evaluation (must be present in job data) |
| `requirements` | Boolean expressions that must all be true for the rule to be evaluated |
| `variables` | Named expressions computed before evaluating the main rule |
| `rule` | Boolean expression that determines if the job matches this classification |
| `hint` | Go template string for generating a user-visible message |
### Expression Environment
Expressions in `requirements`, `variables`, and `rule` have access to:
**Job Properties:**
- `job.shared` - Shared node allocation type
- `job.duration` - Job runtime in seconds
- `job.numCores` - Number of CPU cores
- `job.numNodes` - Number of nodes
- `job.jobState` - Job completion state
- `job.numAcc` - Number of accelerators
- `job.smt` - SMT setting
**Metric Statistics (for each metric in `metrics`):**
- `<metric>.min` - Minimum value
- `<metric>.max` - Maximum value
- `<metric>.avg` - Average value
- `<metric>.limits.peak` - Peak limit from cluster config
- `<metric>.limits.normal` - Normal threshold
- `<metric>.limits.caution` - Caution threshold
- `<metric>.limits.alert` - Alert threshold
**Parameters:**
- All parameters listed in the `parameters` field
**Variables:**
- All variables defined in the `variables` array
### Expression Language
Rules use the [expr](https://github.com/expr-lang/expr) language for expressions. Supported operations:
- **Arithmetic**: `+`, `-`, `*`, `/`, `%`, `^`
- **Comparison**: `==`, `!=`, `<`, `<=`, `>`, `>=`
- **Logical**: `&&`, `||`, `!`
- **Functions**: Standard math functions (see expr documentation)
### Hint Templates
Hints use Go's `text/template` syntax. Variables from the evaluation environment are accessible:
```
{{.flops_any.avg}} # Access metric average
{{.job.duration}} # Access job property
{{.my_variable}} # Access computed variable
```
### Adding New Classification Rules
1. Create a new JSON file in `var/tagger/jobclasses/` (e.g., `memoryLeak.json`)
2. Define the rule structure:
```json
{
"name": "Memory Leak Detection",
"tag": "memory_leak",
"parameters": ["memory_leak_slope_threshold"],
"metrics": ["mem_used"],
"requirements": ["job.duration > 3600"],
"variables": [
{
"name": "mem_growth",
"expr": "(mem_used.max - mem_used.min) / job.duration"
}
],
"rule": "mem_growth > memory_leak_slope_threshold",
"hint": "Memory usage grew by {{.mem_growth}} per second"
}
```
3. Add any new parameters to `parameters.json`
4. The file is automatically detected and loaded
## Configuration Paths
The tagger system reads from these paths (relative to cc-backend working directory):
- **Application patterns**: `./var/tagger/apps/`
- **Job classification rules**: `./var/tagger/jobclasses/`
These paths are defined as constants in the source code and cannot be changed without recompiling.
## Troubleshooting
### Tags Not Applied
1. **Check tagging is enabled**: Verify `enable-job-taggers: true` is set in `config.json`
2. **Check configuration exists**:
```bash
ls -la var/tagger/apps
ls -la var/tagger/jobclasses
```
3. **Check logs for errors**:
```bash
./cc-backend -server -loglevel debug
```
4. **Verify file permissions**: Ensure cc-backend can read the configuration files
5. **For existing jobs**: Use `./cc-backend -apply-tags` to retroactively tag jobs
### Rules Not Matching
1. **Enable debug logging**: Set `loglevel: debug` to see detailed rule evaluation
2. **Check requirements**: Ensure all requirements in the rule are satisfied
3. **Verify metrics exist**: Classification rules require job metrics to be available
4. **Check metric names**: Ensure metric names match those in your cluster configuration
### File Watch Not Working
If changes to configuration files aren't detected:
1. Restart cc-backend to reload all configuration
2. Check filesystem supports file watching (network filesystems may not)
3. Check logs for file watch setup messages
## Best Practices
1. **Start Simple**: Begin with basic rules and refine based on results
2. **Use Requirements**: Filter out irrelevant jobs early with requirements
3. **Test Incrementally**: Add one rule at a time and verify behavior
4. **Document Rules**: Use descriptive names and clear hint messages
5. **Share Parameters**: Define common thresholds in `parameters.json` for consistency
6. **Version Control**: Keep your `var/tagger/` configuration in version control
7. **Backup Before Changes**: Test new rules on a copy before deploying to production
## Examples
### Simple Application Detection
**File: `var/tagger/apps/python.txt`**
```
python
python3
\.py
```
This detects jobs running Python scripts.
### Complex Classification Rule
**File: `var/tagger/jobclasses/cpuImbalance.json`**
```json
{
"name": "CPU Load Imbalance",
"tag": "cpu_imbalance",
"parameters": ["core_load_imbalance_threshold_factor"],
"metrics": ["cpu_load"],
"requirements": ["job.numCores > 1", "job.duration > 600"],
"variables": [
{
"name": "load_variance",
"expr": "(cpu_load.max - cpu_load.min) / cpu_load.avg"
}
],
"rule": "load_variance > core_load_imbalance_threshold_factor",
"hint": "CPU load varies by {{printf \"%.1f%%\" (load_variance * 100)}} across cores"
}
```
This detects jobs where CPU load is unevenly distributed across cores.
## Reference
### Configuration Options
**Main Configuration (`config.json`)**:
- `enable-job-taggers` (boolean, default: `false`) - Enables automatic job tagging system
- Must be set to `true` to activate automatic tagging on job start/stop events
- Does not affect the `-apply-tags` command line option
**Command Line Options**:
- `-apply-tags` - Apply all tagging rules to existing jobs in the database
- Works independently of `enable-job-taggers` configuration
- Useful for retroactively tagging jobs or re-evaluating with updated rules
### Default Configuration Location
The example configurations are provided in:
- `configs/tagger/apps/` - Example application patterns (16 applications)
- `configs/tagger/jobclasses/` - Example classification rules (3 rules)
Copy these to `var/tagger/` and customize for your environment.
### Tag Types
- `app` - Application tags (e.g., "vasp", "gromacs")
- `jobClass` - Classification tags (e.g., "lowutilization", "highload")
Tags can be queried and filtered in the ClusterCockpit UI and API.

30
go.mod
View File

@@ -11,7 +11,7 @@ tool (
require ( require (
github.com/99designs/gqlgen v0.17.85 github.com/99designs/gqlgen v0.17.85
github.com/ClusterCockpit/cc-lib v1.0.2 github.com/ClusterCockpit/cc-lib/v2 v2.1.0
github.com/Masterminds/squirrel v1.5.4 github.com/Masterminds/squirrel v1.5.4
github.com/aws/aws-sdk-go-v2 v1.41.1 github.com/aws/aws-sdk-go-v2 v1.41.1
github.com/aws/aws-sdk-go-v2/config v1.32.6 github.com/aws/aws-sdk-go-v2/config v1.32.6
@@ -21,7 +21,6 @@ require (
github.com/expr-lang/expr v1.17.7 github.com/expr-lang/expr v1.17.7
github.com/go-co-op/gocron/v2 v2.19.0 github.com/go-co-op/gocron/v2 v2.19.0
github.com/go-ldap/ldap/v3 v3.4.12 github.com/go-ldap/ldap/v3 v3.4.12
github.com/go-sql-driver/mysql v1.9.3
github.com/golang-jwt/jwt/v5 v5.3.0 github.com/golang-jwt/jwt/v5 v5.3.0
github.com/golang-migrate/migrate/v4 v4.19.1 github.com/golang-migrate/migrate/v4 v4.19.1
github.com/google/gops v0.3.28 github.com/google/gops v0.3.28
@@ -34,8 +33,6 @@ require (
github.com/linkedin/goavro/v2 v2.14.1 github.com/linkedin/goavro/v2 v2.14.1
github.com/mattn/go-sqlite3 v1.14.33 github.com/mattn/go-sqlite3 v1.14.33
github.com/nats-io/nats.go v1.47.0 github.com/nats-io/nats.go v1.47.0
github.com/prometheus/client_golang v1.23.2
github.com/prometheus/common v0.67.4
github.com/qustavo/sqlhooks/v2 v2.1.0 github.com/qustavo/sqlhooks/v2 v2.1.0
github.com/santhosh-tekuri/jsonschema/v5 v5.3.1 github.com/santhosh-tekuri/jsonschema/v5 v5.3.1
github.com/stretchr/testify v1.11.1 github.com/stretchr/testify v1.11.1
@@ -48,10 +45,10 @@ require (
) )
require ( require (
filippo.io/edwards25519 v1.1.0 // indirect
github.com/Azure/go-ntlmssp v0.0.0-20221128193559-754e69321358 // indirect github.com/Azure/go-ntlmssp v0.0.0-20221128193559-754e69321358 // indirect
github.com/KyleBanks/depth v1.2.1 // indirect github.com/KyleBanks/depth v1.2.1 // indirect
github.com/agnivade/levenshtein v1.2.1 // indirect github.com/agnivade/levenshtein v1.2.1 // indirect
github.com/apapsch/go-jsonmerge/v2 v2.0.0 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.4 // indirect github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.4 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.17 // indirect github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.17 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.17 // indirect github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.17 // indirect
@@ -67,8 +64,6 @@ require (
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.13 // indirect github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.13 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.41.6 // indirect github.com/aws/aws-sdk-go-v2/service/sts v1.41.6 // indirect
github.com/aws/smithy-go v1.24.0 // indirect github.com/aws/smithy-go v1.24.0 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/cpuguy83/go-md2man/v2 v2.0.7 // indirect github.com/cpuguy83/go-md2man/v2 v2.0.7 // indirect
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect github.com/felixge/httpsnoop v1.0.4 // indirect
@@ -88,28 +83,27 @@ require (
github.com/go-viper/mapstructure/v2 v2.4.0 // indirect github.com/go-viper/mapstructure/v2 v2.4.0 // indirect
github.com/goccy/go-yaml v1.19.0 // indirect github.com/goccy/go-yaml v1.19.0 // indirect
github.com/golang/snappy v0.0.4 // indirect github.com/golang/snappy v0.0.4 // indirect
github.com/google/go-cmp v0.7.0 // indirect
github.com/google/uuid v1.6.0 // indirect github.com/google/uuid v1.6.0 // indirect
github.com/gorilla/securecookie v1.1.2 // indirect github.com/gorilla/securecookie v1.1.2 // indirect
github.com/gorilla/websocket v1.5.3 // indirect github.com/gorilla/websocket v1.5.3 // indirect
github.com/hashicorp/golang-lru/v2 v2.0.7 // indirect github.com/hashicorp/golang-lru/v2 v2.0.7 // indirect
github.com/influxdata/influxdb-client-go/v2 v2.14.0 // indirect
github.com/influxdata/line-protocol v0.0.0-20210922203350-b1ad95c89adf // indirect
github.com/jonboulle/clockwork v0.5.0 // indirect github.com/jonboulle/clockwork v0.5.0 // indirect
github.com/jpillora/backoff v1.0.0 // indirect github.com/klauspost/compress v1.18.2 // indirect
github.com/json-iterator/go v1.1.12 // indirect github.com/kr/pretty v0.3.1 // indirect
github.com/klauspost/compress v1.18.1 // indirect
github.com/lann/builder v0.0.0-20180802200727-47ae307949d0 // indirect github.com/lann/builder v0.0.0-20180802200727-47ae307949d0 // indirect
github.com/lann/ps v0.0.0-20150810152359-62de8c46ede0 // indirect github.com/lann/ps v0.0.0-20150810152359-62de8c46ede0 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect github.com/nats-io/nkeys v0.4.12 // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f // indirect
github.com/nats-io/nkeys v0.4.11 // indirect
github.com/nats-io/nuid v1.0.1 // indirect github.com/nats-io/nuid v1.0.1 // indirect
github.com/oapi-codegen/runtime v1.1.1 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/prometheus/client_model v0.6.2 // indirect github.com/prometheus/common v0.67.4 // indirect
github.com/prometheus/procfs v0.16.1 // indirect
github.com/robfig/cron/v3 v3.0.1 // indirect github.com/robfig/cron/v3 v3.0.1 // indirect
github.com/russross/blackfriday/v2 v2.1.0 // indirect github.com/russross/blackfriday/v2 v2.1.0 // indirect
github.com/sosodev/duration v1.3.1 // indirect github.com/sosodev/duration v1.3.1 // indirect
github.com/stmcginnis/gofish v0.20.0 // indirect
github.com/stretchr/objx v0.5.2 // indirect github.com/stretchr/objx v0.5.2 // indirect
github.com/swaggo/files v1.0.1 // indirect github.com/swaggo/files v1.0.1 // indirect
github.com/urfave/cli/v2 v2.27.7 // indirect github.com/urfave/cli/v2 v2.27.7 // indirect
@@ -117,13 +111,13 @@ require (
github.com/xrash/smetrics v0.0.0-20250705151800-55b8f293f342 // indirect github.com/xrash/smetrics v0.0.0-20250705151800-55b8f293f342 // indirect
go.yaml.in/yaml/v2 v2.4.3 // indirect go.yaml.in/yaml/v2 v2.4.3 // indirect
go.yaml.in/yaml/v3 v3.0.4 // indirect go.yaml.in/yaml/v3 v3.0.4 // indirect
golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b // indirect
golang.org/x/mod v0.31.0 // indirect golang.org/x/mod v0.31.0 // indirect
golang.org/x/net v0.48.0 // indirect golang.org/x/net v0.48.0 // indirect
golang.org/x/sync v0.19.0 // indirect golang.org/x/sync v0.19.0 // indirect
golang.org/x/sys v0.39.0 // indirect golang.org/x/sys v0.39.0 // indirect
golang.org/x/text v0.32.0 // indirect golang.org/x/text v0.32.0 // indirect
golang.org/x/tools v0.40.0 // indirect golang.org/x/tools v0.40.0 // indirect
google.golang.org/protobuf v1.36.11 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect gopkg.in/yaml.v3 v3.0.1 // indirect
sigs.k8s.io/yaml v1.6.0 // indirect sigs.k8s.io/yaml v1.6.0 // indirect
) )

91
go.sum
View File

@@ -2,22 +2,19 @@ filippo.io/edwards25519 v1.1.0 h1:FNf4tywRC1HmFuKW5xopWpigGjJKiJSV0Cqo0cJWDaA=
filippo.io/edwards25519 v1.1.0/go.mod h1:BxyFTGdWcka3PhytdK4V28tE5sGfRvvvRV7EaN4VDT4= filippo.io/edwards25519 v1.1.0/go.mod h1:BxyFTGdWcka3PhytdK4V28tE5sGfRvvvRV7EaN4VDT4=
github.com/99designs/gqlgen v0.17.85 h1:EkGx3U2FDcxQm8YDLQSpXIAVmpDyZ3IcBMOJi2nH1S0= github.com/99designs/gqlgen v0.17.85 h1:EkGx3U2FDcxQm8YDLQSpXIAVmpDyZ3IcBMOJi2nH1S0=
github.com/99designs/gqlgen v0.17.85/go.mod h1:yvs8s0bkQlRfqg03YXr3eR4OQUowVhODT/tHzCXnbOU= github.com/99designs/gqlgen v0.17.85/go.mod h1:yvs8s0bkQlRfqg03YXr3eR4OQUowVhODT/tHzCXnbOU=
github.com/Azure/go-ansiterm v0.0.0-20230124172434-306776ec8161 h1:L/gRVlceqvL25UVaW/CKtUDjefjrs0SPonmDGUVOYP0=
github.com/Azure/go-ansiterm v0.0.0-20230124172434-306776ec8161/go.mod h1:xomTg63KZ2rFqZQzSB4Vz2SUXa1BpHTVz9L5PTmPC4E=
github.com/Azure/go-ntlmssp v0.0.0-20221128193559-754e69321358 h1:mFRzDkZVAjdal+s7s0MwaRv9igoPqLRdzOLzw/8Xvq8= github.com/Azure/go-ntlmssp v0.0.0-20221128193559-754e69321358 h1:mFRzDkZVAjdal+s7s0MwaRv9igoPqLRdzOLzw/8Xvq8=
github.com/Azure/go-ntlmssp v0.0.0-20221128193559-754e69321358/go.mod h1:chxPXzSsl7ZWRAuOIE23GDNzjWuZquvFlgA8xmpunjU= github.com/Azure/go-ntlmssp v0.0.0-20221128193559-754e69321358/go.mod h1:chxPXzSsl7ZWRAuOIE23GDNzjWuZquvFlgA8xmpunjU=
github.com/ClusterCockpit/cc-lib v1.0.2 h1:ZWn3oZkXgxrr3zSigBdlOOfayZ4Om4xL20DhmritPPg= github.com/ClusterCockpit/cc-lib/v2 v2.1.0 h1:B6l6h0IjfEuY9DU6aVM3fSsj24lQ1eudXK9QTKmJjqg=
github.com/ClusterCockpit/cc-lib v1.0.2/go.mod h1:UGdOvXEnjFqlnPSxtvtFwO6BtXYW6NnXFoud9FtN93k= github.com/ClusterCockpit/cc-lib/v2 v2.1.0/go.mod h1:JuxMAuEOaLLNEnnL9U3ejha8kMvsSatLdKPZEgJw6iw=
github.com/KyleBanks/depth v1.2.1 h1:5h8fQADFrWtarTdtDudMmGsC7GPbOAu6RVB3ffsVFHc= github.com/KyleBanks/depth v1.2.1 h1:5h8fQADFrWtarTdtDudMmGsC7GPbOAu6RVB3ffsVFHc=
github.com/KyleBanks/depth v1.2.1/go.mod h1:jzSb9d0L43HxTQfT+oSA1EEp2q+ne2uh6XgeJcm8brE= github.com/KyleBanks/depth v1.2.1/go.mod h1:jzSb9d0L43HxTQfT+oSA1EEp2q+ne2uh6XgeJcm8brE=
github.com/Masterminds/squirrel v1.5.4 h1:uUcX/aBc8O7Fg9kaISIUsHXdKuqehiXAMQTYX8afzqM= github.com/Masterminds/squirrel v1.5.4 h1:uUcX/aBc8O7Fg9kaISIUsHXdKuqehiXAMQTYX8afzqM=
github.com/Masterminds/squirrel v1.5.4/go.mod h1:NNaOrjSoIDfDA40n7sr2tPNZRfjzjA400rg+riTZj10= github.com/Masterminds/squirrel v1.5.4/go.mod h1:NNaOrjSoIDfDA40n7sr2tPNZRfjzjA400rg+riTZj10=
github.com/Microsoft/go-winio v0.6.2 h1:F2VQgta7ecxGYO8k3ZZz3RS8fVIXVxONVUPlNERoyfY=
github.com/Microsoft/go-winio v0.6.2/go.mod h1:yd8OoFMLzJbo9gZq8j5qaps8bJ9aShtEA8Ipt1oGCvU=
github.com/NVIDIA/go-nvml v0.13.0-1 h1:OLX8Jq3dONuPOQPC7rndB6+iDmDakw0XTYgzMxObkEw= github.com/NVIDIA/go-nvml v0.13.0-1 h1:OLX8Jq3dONuPOQPC7rndB6+iDmDakw0XTYgzMxObkEw=
github.com/NVIDIA/go-nvml v0.13.0-1/go.mod h1:+KNA7c7gIBH7SKSJ1ntlwkfN80zdx8ovl4hrK3LmPt4= github.com/NVIDIA/go-nvml v0.13.0-1/go.mod h1:+KNA7c7gIBH7SKSJ1ntlwkfN80zdx8ovl4hrK3LmPt4=
github.com/PuerkitoBio/goquery v1.11.0 h1:jZ7pwMQXIITcUXNH83LLk+txlaEy6NVOfTuP43xxfqw= github.com/PuerkitoBio/goquery v1.11.0 h1:jZ7pwMQXIITcUXNH83LLk+txlaEy6NVOfTuP43xxfqw=
github.com/PuerkitoBio/goquery v1.11.0/go.mod h1:wQHgxUOU3JGuj3oD/QFfxUdlzW6xPHfqyHre6VMY4DQ= github.com/PuerkitoBio/goquery v1.11.0/go.mod h1:wQHgxUOU3JGuj3oD/QFfxUdlzW6xPHfqyHre6VMY4DQ=
github.com/RaveNoX/go-jsoncommentstrip v1.0.0/go.mod h1:78ihd09MekBnJnxpICcwzCMzGrKSKYe4AqU6PDYYpjk=
github.com/agnivade/levenshtein v1.2.1 h1:EHBY3UOn1gwdy/VbFwgo4cxecRznFk7fKWN1KOX7eoM= github.com/agnivade/levenshtein v1.2.1 h1:EHBY3UOn1gwdy/VbFwgo4cxecRznFk7fKWN1KOX7eoM=
github.com/agnivade/levenshtein v1.2.1/go.mod h1:QVVI16kDrtSuwcpd0p1+xMC6Z/VfhtCyDIjcwga4/DU= github.com/agnivade/levenshtein v1.2.1/go.mod h1:QVVI16kDrtSuwcpd0p1+xMC6Z/VfhtCyDIjcwga4/DU=
github.com/alexbrainman/sspi v0.0.0-20250919150558-7d374ff0d59e h1:4dAU9FXIyQktpoUAgOJK3OTFc/xug0PCXYCqU0FgDKI= github.com/alexbrainman/sspi v0.0.0-20250919150558-7d374ff0d59e h1:4dAU9FXIyQktpoUAgOJK3OTFc/xug0PCXYCqU0FgDKI=
@@ -26,6 +23,8 @@ github.com/andreyvit/diff v0.0.0-20170406064948-c7f18ee00883 h1:bvNMNQO63//z+xNg
github.com/andreyvit/diff v0.0.0-20170406064948-c7f18ee00883/go.mod h1:rCTlJbsFo29Kk6CurOXKm700vrz8f0KW0JNfpkRJY/8= github.com/andreyvit/diff v0.0.0-20170406064948-c7f18ee00883/go.mod h1:rCTlJbsFo29Kk6CurOXKm700vrz8f0KW0JNfpkRJY/8=
github.com/andybalholm/cascadia v1.3.3 h1:AG2YHrzJIm4BZ19iwJ/DAua6Btl3IwJX+VI4kktS1LM= github.com/andybalholm/cascadia v1.3.3 h1:AG2YHrzJIm4BZ19iwJ/DAua6Btl3IwJX+VI4kktS1LM=
github.com/andybalholm/cascadia v1.3.3/go.mod h1:xNd9bqTn98Ln4DwST8/nG+H0yuB8Hmgu1YHNnWw0GeA= github.com/andybalholm/cascadia v1.3.3/go.mod h1:xNd9bqTn98Ln4DwST8/nG+H0yuB8Hmgu1YHNnWw0GeA=
github.com/antithesishq/antithesis-sdk-go v0.5.0-default-no-op h1:Ucf+QxEKMbPogRO5guBNe5cgd9uZgfoJLOYs8WWhtjM=
github.com/antithesishq/antithesis-sdk-go v0.5.0-default-no-op/go.mod h1:IUpT2DPAKh6i/YhSbt6Gl3v2yvUZjmKncl7U91fup7E=
github.com/apapsch/go-jsonmerge/v2 v2.0.0 h1:axGnT1gRIfimI7gJifB699GoE/oq+F2MU7Dml6nw9rQ= github.com/apapsch/go-jsonmerge/v2 v2.0.0 h1:axGnT1gRIfimI7gJifB699GoE/oq+F2MU7Dml6nw9rQ=
github.com/apapsch/go-jsonmerge/v2 v2.0.0/go.mod h1:lvDnEdqiQrp0O42VQGgmlKpxL1AP2+08jFMw88y4klk= github.com/apapsch/go-jsonmerge/v2 v2.0.0/go.mod h1:lvDnEdqiQrp0O42VQGgmlKpxL1AP2+08jFMw88y4klk=
github.com/arbovm/levenshtein v0.0.0-20160628152529-48b4e1c0c4d0 h1:jfIu9sQUG6Ig+0+Ap1h4unLjW6YQJpKZVmUzxsD4E/Q= github.com/arbovm/levenshtein v0.0.0-20160628152529-48b4e1c0c4d0 h1:jfIu9sQUG6Ig+0+Ap1h4unLjW6YQJpKZVmUzxsD4E/Q=
@@ -70,12 +69,9 @@ github.com/aws/smithy-go v1.24.0 h1:LpilSUItNPFr1eY85RYgTIg5eIEPtvFbskaFcmmIUnk=
github.com/aws/smithy-go v1.24.0/go.mod h1:LEj2LM3rBRQJxPZTB4KuzZkaZYnZPnvgIhb4pu07mx0= github.com/aws/smithy-go v1.24.0/go.mod h1:LEj2LM3rBRQJxPZTB4KuzZkaZYnZPnvgIhb4pu07mx0=
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM= github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw= github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
github.com/bmatcuk/doublestar v1.1.1/go.mod h1:UD6OnuiIn0yFxxA2le/rnRU1G4RaI4UvFv1sNto9p6w=
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs= github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs= github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/containerd/errdefs v1.0.0 h1:tg5yIfIlQIrxYtu9ajqY42W3lpS19XqdxRQeEwYG8PI=
github.com/containerd/errdefs v1.0.0/go.mod h1:+YBYIdtsnF4Iw6nWZhJcqGSg/dwvV7tyJ/kCkyJ2k+M=
github.com/containerd/errdefs/pkg v0.3.0 h1:9IKJ06FvyNlexW690DXuQNx2KA2cUJXx151Xdx3ZPPE=
github.com/containerd/errdefs/pkg v0.3.0/go.mod h1:NJw6s9HwNuRhnjJhM7pylWwMyAkmCQvQ4GpJHEqRLVk=
github.com/coreos/go-oidc/v3 v3.17.0 h1:hWBGaQfbi0iVviX4ibC7bk8OKT5qNr4klBaCHVNvehc= github.com/coreos/go-oidc/v3 v3.17.0 h1:hWBGaQfbi0iVviX4ibC7bk8OKT5qNr4klBaCHVNvehc=
github.com/coreos/go-oidc/v3 v3.17.0/go.mod h1:wqPbKFrVnE90vty060SB40FCJ8fTHTxSwyXJqZH+sI8= github.com/coreos/go-oidc/v3 v3.17.0/go.mod h1:wqPbKFrVnE90vty060SB40FCJ8fTHTxSwyXJqZH+sI8=
github.com/cpuguy83/go-md2man/v2 v2.0.7 h1:zbFlGlXEAKlwXpmvle3d8Oe3YnkKIK4xSRTd3sHPnBo= github.com/cpuguy83/go-md2man/v2 v2.0.7 h1:zbFlGlXEAKlwXpmvle3d8Oe3YnkKIK4xSRTd3sHPnBo=
@@ -87,16 +83,6 @@ github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dgryski/trifles v0.0.0-20230903005119-f50d829f2e54 h1:SG7nF6SRlWhcT7cNTs5R6Hk4V2lcmLz2NsG2VnInyNo= github.com/dgryski/trifles v0.0.0-20230903005119-f50d829f2e54 h1:SG7nF6SRlWhcT7cNTs5R6Hk4V2lcmLz2NsG2VnInyNo=
github.com/dgryski/trifles v0.0.0-20230903005119-f50d829f2e54/go.mod h1:if7Fbed8SFyPtHLHbg49SI7NAdJiC5WIA09pe59rfAA= github.com/dgryski/trifles v0.0.0-20230903005119-f50d829f2e54/go.mod h1:if7Fbed8SFyPtHLHbg49SI7NAdJiC5WIA09pe59rfAA=
github.com/dhui/dktest v0.4.6 h1:+DPKyScKSEp3VLtbMDHcUq6V5Lm5zfZZVb0Sk7Ahom4=
github.com/dhui/dktest v0.4.6/go.mod h1:JHTSYDtKkvFNFHJKqCzVzqXecyv+tKt8EzceOmQOgbU=
github.com/distribution/reference v0.6.0 h1:0IXCQ5g4/QMHHkarYzh5l+u8T3t73zM5QvfrDyIgxBk=
github.com/distribution/reference v0.6.0/go.mod h1:BbU0aIcezP1/5jX/8MP0YiH4SdvB5Y4f/wlDRiLyi3E=
github.com/docker/docker v28.3.3+incompatible h1:Dypm25kh4rmk49v1eiVbsAtpAsYURjYkaKubwuBdxEI=
github.com/docker/docker v28.3.3+incompatible/go.mod h1:eEKB0N0r5NX/I1kEveEz05bcu8tLC/8azJZsviup8Sk=
github.com/docker/go-connections v0.5.0 h1:USnMq7hx7gwdVZq1L49hLXaFtUdTADjXGp+uj1Br63c=
github.com/docker/go-connections v0.5.0/go.mod h1:ov60Kzw0kKElRwhNs9UlUHAE/F9Fe6GLaXnqyDdmEXc=
github.com/docker/go-units v0.5.0 h1:69rxXcBk27SvSaaxTtLh/8llcHD8vYHT7WSdRZ/jvr4=
github.com/docker/go-units v0.5.0/go.mod h1:fgPhTUdO+D/Jk86RDLlptpiXQzgHJF7gydDDbaIK4Dk=
github.com/expr-lang/expr v1.17.7 h1:Q0xY/e/2aCIp8g9s/LGvMDCC5PxYlvHgDZRQ4y16JX8= github.com/expr-lang/expr v1.17.7 h1:Q0xY/e/2aCIp8g9s/LGvMDCC5PxYlvHgDZRQ4y16JX8=
github.com/expr-lang/expr v1.17.7/go.mod h1:8/vRC7+7HBzESEqt5kKpYXxrxkr31SaO8r40VO/1IT4= github.com/expr-lang/expr v1.17.7/go.mod h1:8/vRC7+7HBzESEqt5kKpYXxrxkr31SaO8r40VO/1IT4=
github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg= github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg=
@@ -115,10 +101,6 @@ github.com/go-jose/go-jose/v4 v4.1.3 h1:CVLmWDhDVRa6Mi/IgCgaopNosCaHz7zrMeF9MlZR
github.com/go-jose/go-jose/v4 v4.1.3/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08= github.com/go-jose/go-jose/v4 v4.1.3/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
github.com/go-ldap/ldap/v3 v3.4.12 h1:1b81mv7MagXZ7+1r7cLTWmyuTqVqdwbtJSjC0DAp9s4= github.com/go-ldap/ldap/v3 v3.4.12 h1:1b81mv7MagXZ7+1r7cLTWmyuTqVqdwbtJSjC0DAp9s4=
github.com/go-ldap/ldap/v3 v3.4.12/go.mod h1:+SPAGcTtOfmGsCb3h1RFiq4xpp4N636G75OEace8lNo= github.com/go-ldap/ldap/v3 v3.4.12/go.mod h1:+SPAGcTtOfmGsCb3h1RFiq4xpp4N636G75OEace8lNo=
github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI=
github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag=
github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE=
github.com/go-openapi/jsonpointer v0.22.3 h1:dKMwfV4fmt6Ah90zloTbUKWMD+0he+12XYAsPotrkn8= github.com/go-openapi/jsonpointer v0.22.3 h1:dKMwfV4fmt6Ah90zloTbUKWMD+0he+12XYAsPotrkn8=
github.com/go-openapi/jsonpointer v0.22.3/go.mod h1:0lBbqeRsQ5lIanv3LHZBrmRGHLHcQoOXQnf88fHlGWo= github.com/go-openapi/jsonpointer v0.22.3/go.mod h1:0lBbqeRsQ5lIanv3LHZBrmRGHLHcQoOXQnf88fHlGWo=
github.com/go-openapi/jsonreference v0.21.3 h1:96Dn+MRPa0nYAR8DR1E03SblB5FJvh7W6krPI0Z7qMc= github.com/go-openapi/jsonreference v0.21.3 h1:96Dn+MRPa0nYAR8DR1E03SblB5FJvh7W6krPI0Z7qMc=
@@ -147,15 +129,12 @@ github.com/go-openapi/testify/enable/yaml/v2 v2.0.2/go.mod h1:kme83333GCtJQHXQ8U
github.com/go-openapi/testify/v2 v2.0.2 h1:X999g3jeLcoY8qctY/c/Z8iBHTbwLz7R2WXd6Ub6wls= github.com/go-openapi/testify/v2 v2.0.2 h1:X999g3jeLcoY8qctY/c/Z8iBHTbwLz7R2WXd6Ub6wls=
github.com/go-openapi/testify/v2 v2.0.2/go.mod h1:HCPmvFFnheKK2BuwSA0TbbdxJ3I16pjwMkYkP4Ywn54= github.com/go-openapi/testify/v2 v2.0.2/go.mod h1:HCPmvFFnheKK2BuwSA0TbbdxJ3I16pjwMkYkP4Ywn54=
github.com/go-sql-driver/mysql v1.4.1/go.mod h1:zAC/RDZ24gD3HViQzih4MyKcchzm+sOG5ZlKdlhCg5w= github.com/go-sql-driver/mysql v1.4.1/go.mod h1:zAC/RDZ24gD3HViQzih4MyKcchzm+sOG5ZlKdlhCg5w=
github.com/go-sql-driver/mysql v1.8.1 h1:LedoTUt/eveggdHS9qUFC1EFSa8bU2+1pZjSRpvNJ1Y=
github.com/go-sql-driver/mysql v1.8.1/go.mod h1:wEBSXgmK//2ZFJyE+qWnIsVGmvmEKlqwuVSjsCm7DZg= github.com/go-sql-driver/mysql v1.8.1/go.mod h1:wEBSXgmK//2ZFJyE+qWnIsVGmvmEKlqwuVSjsCm7DZg=
github.com/go-sql-driver/mysql v1.9.3 h1:U/N249h2WzJ3Ukj8SowVFjdtZKfu9vlLZxjPXV1aweo=
github.com/go-sql-driver/mysql v1.9.3/go.mod h1:qn46aNg1333BRMNU69Lq93t8du/dwxI64Gl8i5p1WMU=
github.com/go-viper/mapstructure/v2 v2.4.0 h1:EBsztssimR/CONLSZZ04E8qAkxNYq4Qp9LvH92wZUgs= github.com/go-viper/mapstructure/v2 v2.4.0 h1:EBsztssimR/CONLSZZ04E8qAkxNYq4Qp9LvH92wZUgs=
github.com/go-viper/mapstructure/v2 v2.4.0/go.mod h1:oJDH3BJKyqBA2TXFhDsKDGDTlndYOZ6rGS0BRZIxGhM= github.com/go-viper/mapstructure/v2 v2.4.0/go.mod h1:oJDH3BJKyqBA2TXFhDsKDGDTlndYOZ6rGS0BRZIxGhM=
github.com/goccy/go-yaml v1.19.0 h1:EmkZ9RIsX+Uq4DYFowegAuJo8+xdX3T/2dwNPXbxEYE= github.com/goccy/go-yaml v1.19.0 h1:EmkZ9RIsX+Uq4DYFowegAuJo8+xdX3T/2dwNPXbxEYE=
github.com/goccy/go-yaml v1.19.0/go.mod h1:XBurs7gK8ATbW4ZPGKgcbrY1Br56PdM69F7LkFRi1kA= github.com/goccy/go-yaml v1.19.0/go.mod h1:XBurs7gK8ATbW4ZPGKgcbrY1Br56PdM69F7LkFRi1kA=
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q=
github.com/golang-jwt/jwt/v5 v5.3.0 h1:pv4AsKCKKZuqlgs5sUmn4x8UlGa0kEVt/puTpKx9vvo= github.com/golang-jwt/jwt/v5 v5.3.0 h1:pv4AsKCKKZuqlgs5sUmn4x8UlGa0kEVt/puTpKx9vvo=
github.com/golang-jwt/jwt/v5 v5.3.0/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE= github.com/golang-jwt/jwt/v5 v5.3.0/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
github.com/golang-migrate/migrate/v4 v4.19.1 h1:OCyb44lFuQfYXYLx1SCxPZQGU7mcaZ7gH9yH4jSFbBA= github.com/golang-migrate/migrate/v4 v4.19.1 h1:OCyb44lFuQfYXYLx1SCxPZQGU7mcaZ7gH9yH4jSFbBA=
@@ -167,7 +146,8 @@ github.com/google/go-cmp v0.5.2/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/
github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8= github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU= github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg= github.com/google/go-tpm v0.9.7 h1:u89J4tUUeDTlH8xxC3CTW7OHZjbjKoHdQ9W7gCUhtxA=
github.com/google/go-tpm v0.9.7/go.mod h1:h9jEsEECg7gtLis0upRBQU+GhYVH6jMjrFxI8u6bVUY=
github.com/google/gofuzz v1.2.0 h1:xRy4A+RhZaiKjJ1bPfwQ8sedCA+YS2YcCHW6ec7JMi0= github.com/google/gofuzz v1.2.0 h1:xRy4A+RhZaiKjJ1bPfwQ8sedCA+YS2YcCHW6ec7JMi0=
github.com/google/gofuzz v1.2.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg= github.com/google/gofuzz v1.2.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
github.com/google/gops v0.3.28 h1:2Xr57tqKAmQYRAfG12E+yLcoa2Y42UJo2lOrUFL9ark= github.com/google/gops v0.3.28 h1:2Xr57tqKAmQYRAfG12E+yLcoa2Y42UJo2lOrUFL9ark=
@@ -217,12 +197,9 @@ github.com/joho/godotenv v1.5.1 h1:7eLL/+HRGLY0ldzfGMeQkb7vMd0as4CfYvUVzLqw0N0=
github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4= github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4=
github.com/jonboulle/clockwork v0.5.0 h1:Hyh9A8u51kptdkR+cqRpT1EebBwTn1oK9YfGYbdFz6I= github.com/jonboulle/clockwork v0.5.0 h1:Hyh9A8u51kptdkR+cqRpT1EebBwTn1oK9YfGYbdFz6I=
github.com/jonboulle/clockwork v0.5.0/go.mod h1:3mZlmanh0g2NDKO5TWZVJAfofYk64M7XN3SzBPjZF60= github.com/jonboulle/clockwork v0.5.0/go.mod h1:3mZlmanh0g2NDKO5TWZVJAfofYk64M7XN3SzBPjZF60=
github.com/jpillora/backoff v1.0.0 h1:uvFg412JmmHBHw7iwprIxkPMI+sGQ4kzOWsMeHnm2EA= github.com/juju/gnuflag v0.0.0-20171113085948-2ce1bb71843d/go.mod h1:2PavIy+JPciBPrBUjwbNvtwB6RQlve+hkpll6QSNmOE=
github.com/jpillora/backoff v1.0.0/go.mod h1:J/6gKK9jxlEcS3zixgDgUAsiuZ7yrSoa/FX5e0EB2j4= github.com/klauspost/compress v1.18.2 h1:iiPHWW0YrcFgpBYhsA6D1+fqHssJscY/Tm/y2Uqnapk=
github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM= github.com/klauspost/compress v1.18.2/go.mod h1:R0h/fSBs8DE4ENlcrlib3PsXS61voFxhIs2DeRhCvJ4=
github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
github.com/klauspost/compress v1.18.1 h1:bcSGx7UbpBqMChDtsF28Lw6v/G94LPrrbMbdC3JH2co=
github.com/klauspost/compress v1.18.1/go.mod h1:ZQFFVG+MdnR0P+l6wpXgIL4NTtwiKIdBnrBd8Nrxr+0=
github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI= github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI=
github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
@@ -243,37 +220,25 @@ github.com/mattn/go-sqlite3 v1.10.0/go.mod h1:FPy6KqzDD04eiIsT53CuJW3U88zkxoIYsO
github.com/mattn/go-sqlite3 v1.14.22/go.mod h1:Uh1q+B4BYcTPb+yiD3kU8Ct7aC0hY9fxUwlHK0RXw+Y= github.com/mattn/go-sqlite3 v1.14.22/go.mod h1:Uh1q+B4BYcTPb+yiD3kU8Ct7aC0hY9fxUwlHK0RXw+Y=
github.com/mattn/go-sqlite3 v1.14.33 h1:A5blZ5ulQo2AtayQ9/limgHEkFreKj1Dv226a1K73s0= github.com/mattn/go-sqlite3 v1.14.33 h1:A5blZ5ulQo2AtayQ9/limgHEkFreKj1Dv226a1K73s0=
github.com/mattn/go-sqlite3 v1.14.33/go.mod h1:Uh1q+B4BYcTPb+yiD3kU8Ct7aC0hY9fxUwlHK0RXw+Y= github.com/mattn/go-sqlite3 v1.14.33/go.mod h1:Uh1q+B4BYcTPb+yiD3kU8Ct7aC0hY9fxUwlHK0RXw+Y=
github.com/moby/docker-image-spec v1.3.1 h1:jMKff3w6PgbfSa69GfNg+zN/XLhfXJGnEx3Nl2EsFP0= github.com/minio/highwayhash v1.0.4-0.20251030100505-070ab1a87a76 h1:KGuD/pM2JpL9FAYvBrnBBeENKZNh6eNtjqytV6TYjnk=
github.com/moby/docker-image-spec v1.3.1/go.mod h1:eKmb5VW8vQEh/BAr2yvVNvuiJuY6UIocYsFu/DxxRpo= github.com/minio/highwayhash v1.0.4-0.20251030100505-070ab1a87a76/go.mod h1:GGYsuwP/fPD6Y9hMiXuapVvlIUEhFhMTh0rxU3ik1LQ=
github.com/moby/term v0.5.0 h1:xt8Q1nalod/v7BqbG21f8mQPqH+xAaC9C3N3wfWbVP0=
github.com/moby/term v0.5.0/go.mod h1:8FzsFHVUBGZdbDsJw/ot+X+d5HLUbvklYLJ9uGfcI3Y=
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9Gz0M=
github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
github.com/morikuni/aec v1.0.0 h1:nP9CBfwrvYnBRgY6qfDQkygYDmYwOilePFkwzv4dU8A=
github.com/morikuni/aec v1.0.0/go.mod h1:BbKIizmSmc5MMPqRYbxO4ZU0S0+P200+tUnFx7PXmsc=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA= github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ= github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ=
github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f h1:KUppIJq7/+SVif2QVs3tOP0zanoHgBEVAwHxUSIzRqU= github.com/nats-io/jwt/v2 v2.8.0 h1:K7uzyz50+yGZDO5o772eRE7atlcSEENpL7P+b74JV1g=
github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f/go.mod h1:qRWi+5nqEBWmkhHvq77mSJWrCKwh8bxhgT7d/eI7P4U= github.com/nats-io/jwt/v2 v2.8.0/go.mod h1:me11pOkwObtcBNR8AiMrUbtVOUGkqYjMQZ6jnSdVUIA=
github.com/nats-io/nats-server/v2 v2.12.3 h1:KRv+1n7lddMVgkJPQer+pt36TcO0ENxjilBmeWdjcHs=
github.com/nats-io/nats-server/v2 v2.12.3/go.mod h1:MQXjG9WjyXKz9koWzUc3jYUMKD8x3CLmTNy91IQQz3Y=
github.com/nats-io/nats.go v1.47.0 h1:YQdADw6J/UfGUd2Oy6tn4Hq6YHxCaJrVKayxxFqYrgM= github.com/nats-io/nats.go v1.47.0 h1:YQdADw6J/UfGUd2Oy6tn4Hq6YHxCaJrVKayxxFqYrgM=
github.com/nats-io/nats.go v1.47.0/go.mod h1:iRWIPokVIFbVijxuMQq4y9ttaBTMe0SFdlZfMDd+33g= github.com/nats-io/nats.go v1.47.0/go.mod h1:iRWIPokVIFbVijxuMQq4y9ttaBTMe0SFdlZfMDd+33g=
github.com/nats-io/nkeys v0.4.11 h1:q44qGV008kYd9W1b1nEBkNzvnWxtRSQ7A8BoqRrcfa0= github.com/nats-io/nkeys v0.4.12 h1:nssm7JKOG9/x4J8II47VWCL1Ds29avyiQDRn0ckMvDc=
github.com/nats-io/nkeys v0.4.11/go.mod h1:szDimtgmfOi9n25JpfIdGw12tZFYXqhGxjhVxsatHVE= github.com/nats-io/nkeys v0.4.12/go.mod h1:MT59A1HYcjIcyQDJStTfaOY6vhy9XTUjOFo+SVsvpBg=
github.com/nats-io/nuid v1.0.1 h1:5iA8DT8V7q8WK2EScv2padNa/rTESc1KdnPw4TC2paw= github.com/nats-io/nuid v1.0.1 h1:5iA8DT8V7q8WK2EScv2padNa/rTESc1KdnPw4TC2paw=
github.com/nats-io/nuid v1.0.1/go.mod h1:19wcPz3Ph3q0Jbyiqsd0kePYG7A95tJPxeL+1OSON2c= github.com/nats-io/nuid v1.0.1/go.mod h1:19wcPz3Ph3q0Jbyiqsd0kePYG7A95tJPxeL+1OSON2c=
github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e/go.mod h1:zD1mROLANZcx1PVRCS0qkT7pwLkGfwJo4zjcN/Tysno= github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e/go.mod h1:zD1mROLANZcx1PVRCS0qkT7pwLkGfwJo4zjcN/Tysno=
github.com/oapi-codegen/runtime v1.1.1 h1:EXLHh0DXIJnWhdRPN2w4MXAzFyE4CskzhNLUmtpMYro= github.com/oapi-codegen/runtime v1.1.1 h1:EXLHh0DXIJnWhdRPN2w4MXAzFyE4CskzhNLUmtpMYro=
github.com/oapi-codegen/runtime v1.1.1/go.mod h1:SK9X900oXmPWilYR5/WKPzt3Kqxn/uS/+lbpREv+eCg= github.com/oapi-codegen/runtime v1.1.1/go.mod h1:SK9X900oXmPWilYR5/WKPzt3Kqxn/uS/+lbpREv+eCg=
github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U=
github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM=
github.com/opencontainers/image-spec v1.1.0 h1:8SG7/vwALn54lVB/0yZ/MMwhFrPYtpEHQb2IpWsCzug=
github.com/opencontainers/image-spec v1.1.0/go.mod h1:W4s4sFTMaBeK1BQLXbG4AdM2szdn85PY75RI83NrTrM=
github.com/opentracing/opentracing-go v1.1.0/go.mod h1:UkNAQd3GIcIGf0SeVgPpRdFStlNbqXla1AfSYxPUl2o= github.com/opentracing/opentracing-go v1.1.0/go.mod h1:UkNAQd3GIcIGf0SeVgPpRdFStlNbqXla1AfSYxPUl2o=
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4= github.com/pkg/diff v0.0.0-20210226163009-20ebb0f2a09e/go.mod h1:pJLUxLENpZxwdsKMEsNbx1VGcRFpLqf3715MtcvvzbA=
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U= github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U=
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
@@ -289,6 +254,7 @@ github.com/qustavo/sqlhooks/v2 v2.1.0 h1:54yBemHnGHp/7xgT+pxwmIlMSDNYKx5JW5dfRAi
github.com/qustavo/sqlhooks/v2 v2.1.0/go.mod h1:aMREyKo7fOKTwiLuWPsaHRXEmtqG4yREztO0idF83AU= github.com/qustavo/sqlhooks/v2 v2.1.0/go.mod h1:aMREyKo7fOKTwiLuWPsaHRXEmtqG4yREztO0idF83AU=
github.com/robfig/cron/v3 v3.0.1 h1:WdRxkvbJztn8LMz/QEvLN5sBU+xKpSqwwUO1Pjr4qDs= github.com/robfig/cron/v3 v3.0.1 h1:WdRxkvbJztn8LMz/QEvLN5sBU+xKpSqwwUO1Pjr4qDs=
github.com/robfig/cron/v3 v3.0.1/go.mod h1:eQICP3HwyT7UooqI/z+Ov+PtYAWygg1TEWWzGIFLtro= github.com/robfig/cron/v3 v3.0.1/go.mod h1:eQICP3HwyT7UooqI/z+Ov+PtYAWygg1TEWWzGIFLtro=
github.com/rogpeppe/go-internal v1.9.0/go.mod h1:WtVeX8xhTBvf0smdhujwtBcq4Qrzq/fJaraNFVN+nFs=
github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ= github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ=
github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7so1lCWt35ZSgc= github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7so1lCWt35ZSgc=
github.com/russross/blackfriday/v2 v2.1.0 h1:JIOH55/0cWyOuilr9/qlrm0BSXldqnqwMsf35Ld67mk= github.com/russross/blackfriday/v2 v2.1.0 h1:JIOH55/0cWyOuilr9/qlrm0BSXldqnqwMsf35Ld67mk=
@@ -299,6 +265,9 @@ github.com/sergi/go-diff v1.3.1 h1:xkr+Oxo4BOQKmkn/B9eMK0g5Kg/983T9DqqPHwYqD+8=
github.com/sergi/go-diff v1.3.1/go.mod h1:aMJSSKb2lpPvRNec0+w3fl7LP9IOFzdc9Pa4NFbPK1I= github.com/sergi/go-diff v1.3.1/go.mod h1:aMJSSKb2lpPvRNec0+w3fl7LP9IOFzdc9Pa4NFbPK1I=
github.com/sosodev/duration v1.3.1 h1:qtHBDMQ6lvMQsL15g4aopM4HEfOaYuhWBw3NPTtlqq4= github.com/sosodev/duration v1.3.1 h1:qtHBDMQ6lvMQsL15g4aopM4HEfOaYuhWBw3NPTtlqq4=
github.com/sosodev/duration v1.3.1/go.mod h1:RQIBBX0+fMLc/D9+Jb/fwvVmo0eZvDDEERAikUR6SDg= github.com/sosodev/duration v1.3.1/go.mod h1:RQIBBX0+fMLc/D9+Jb/fwvVmo0eZvDDEERAikUR6SDg=
github.com/spkg/bom v0.0.0-20160624110644-59b7046e48ad/go.mod h1:qLr4V1qq6nMqFKkMo8ZTx3f+BZEkzsRUY10Xsm2mwU0=
github.com/stmcginnis/gofish v0.20.0 h1:hH2V2Qe898F2wWT1loApnkDUrXXiLKqbSlMaH3Y1n08=
github.com/stmcginnis/gofish v0.20.0/go.mod h1:PzF5i8ecRG9A2ol8XT64npKUunyraJ+7t0kYMpQAtqU=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw= github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
github.com/stretchr/objx v0.5.2 h1:xuMeJ0Sdp5ZMRXx/aWO6RZxdr3beISkG5/G/aIRr3pY= github.com/stretchr/objx v0.5.2 h1:xuMeJ0Sdp5ZMRXx/aWO6RZxdr3beISkG5/G/aIRr3pY=
@@ -325,16 +294,6 @@ github.com/vektah/gqlparser/v2 v2.5.31/go.mod h1:c1I28gSOVNzlfc4WuDlqU7voQnsqI6O
github.com/xrash/smetrics v0.0.0-20250705151800-55b8f293f342 h1:FnBeRrxr7OU4VvAzt5X7s6266i6cSVkkFPS0TuXWbIg= github.com/xrash/smetrics v0.0.0-20250705151800-55b8f293f342 h1:FnBeRrxr7OU4VvAzt5X7s6266i6cSVkkFPS0TuXWbIg=
github.com/xrash/smetrics v0.0.0-20250705151800-55b8f293f342/go.mod h1:Ohn+xnUBiLI6FVj/9LpzZWtj1/D6lUovWYBkxHVV3aM= github.com/xrash/smetrics v0.0.0-20250705151800-55b8f293f342/go.mod h1:Ohn+xnUBiLI6FVj/9LpzZWtj1/D6lUovWYBkxHVV3aM=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY= github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
go.opentelemetry.io/auto/sdk v1.1.0 h1:cH53jehLUN6UFLY71z+NDOiNJqDdPRaXzTel0sJySYA=
go.opentelemetry.io/auto/sdk v1.1.0/go.mod h1:3wSPjt5PWp2RhlCcmmOial7AvC4DQqZb7a7wCow3W8A=
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.61.0 h1:F7Jx+6hwnZ41NSFTO5q4LYDtJRXBf2PD0rNBkeB/lus=
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.61.0/go.mod h1:UHB22Z8QsdRDrnAtX4PntOl36ajSxcdUMt1sF7Y6E7Q=
go.opentelemetry.io/otel v1.37.0 h1:9zhNfelUvx0KBfu/gb+ZgeAfAgtWrfHJZcAqFC228wQ=
go.opentelemetry.io/otel v1.37.0/go.mod h1:ehE/umFRLnuLa/vSccNq9oS1ErUlkkK71gMcN34UG8I=
go.opentelemetry.io/otel/metric v1.37.0 h1:mvwbQS5m0tbmqML4NqK+e3aDiO02vsf/WgbsdpcPoZE=
go.opentelemetry.io/otel/metric v1.37.0/go.mod h1:04wGrZurHYKOc+RKeye86GwKiTb9FKm1WHtO+4EVr2E=
go.opentelemetry.io/otel/trace v1.37.0 h1:HLdcFNbRQBE2imdSEgm/kwqmQj1Or1l/7bW6mxVK7z4=
go.opentelemetry.io/otel/trace v1.37.0/go.mod h1:TlgrlQ+PtQO5XFerSPUYG0JSgGyryXewPGyayAWSBS0=
go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto= go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto=
go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE= go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE=
go.yaml.in/yaml/v2 v2.4.3 h1:6gvOSjQoTB3vt1l+CU+tSyi/HOjfOjRLJ4YwYZGwRO0= go.yaml.in/yaml/v2 v2.4.3 h1:6gvOSjQoTB3vt1l+CU+tSyi/HOjfOjRLJ4YwYZGwRO0=

View File

@@ -52,51 +52,51 @@ models:
- github.com/99designs/gqlgen/graphql.Int64 - github.com/99designs/gqlgen/graphql.Int64
- github.com/99designs/gqlgen/graphql.Int32 - github.com/99designs/gqlgen/graphql.Int32
Job: Job:
model: "github.com/ClusterCockpit/cc-lib/schema.Job" model: "github.com/ClusterCockpit/cc-lib/v2/schema.Job"
fields: fields:
tags: tags:
resolver: true resolver: true
metaData: metaData:
resolver: true resolver: true
Cluster: Cluster:
model: "github.com/ClusterCockpit/cc-lib/schema.Cluster" model: "github.com/ClusterCockpit/cc-lib/v2/schema.Cluster"
fields: fields:
partitions: partitions:
resolver: true resolver: true
# Node: # Node:
# model: "github.com/ClusterCockpit/cc-lib/schema.Node" # model: "github.com/ClusterCockpit/cc-lib/v2/schema.Node"
# fields: # fields:
# metaData: # metaData:
# resolver: true # resolver: true
NullableFloat: { model: "github.com/ClusterCockpit/cc-lib/schema.Float" } NullableFloat: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.Float" }
MetricScope: { model: "github.com/ClusterCockpit/cc-lib/schema.MetricScope" } MetricScope: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.MetricScope" }
MetricValue: { model: "github.com/ClusterCockpit/cc-lib/schema.MetricValue" } MetricValue: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.MetricValue" }
JobStatistics: JobStatistics:
{ model: "github.com/ClusterCockpit/cc-lib/schema.JobStatistics" } { model: "github.com/ClusterCockpit/cc-lib/v2/schema.JobStatistics" }
GlobalMetricListItem: GlobalMetricListItem:
{ model: "github.com/ClusterCockpit/cc-lib/schema.GlobalMetricListItem" } { model: "github.com/ClusterCockpit/cc-lib/v2/schema.GlobalMetricListItem" }
ClusterSupport: ClusterSupport:
{ model: "github.com/ClusterCockpit/cc-lib/schema.ClusterSupport" } { model: "github.com/ClusterCockpit/cc-lib/v2/schema.ClusterSupport" }
Tag: { model: "github.com/ClusterCockpit/cc-lib/schema.Tag" } Tag: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.Tag" }
Resource: { model: "github.com/ClusterCockpit/cc-lib/schema.Resource" } Resource: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.Resource" }
JobState: { model: "github.com/ClusterCockpit/cc-lib/schema.JobState" } JobState: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.JobState" }
Node: { model: "github.com/ClusterCockpit/cc-lib/schema.Node" } Node: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.Node" }
SchedulerState: SchedulerState:
{ model: "github.com/ClusterCockpit/cc-lib/schema.SchedulerState" } { model: "github.com/ClusterCockpit/cc-lib/v2/schema.SchedulerState" }
HealthState: HealthState:
{ model: "github.com/ClusterCockpit/cc-lib/schema.MonitoringState" } { model: "github.com/ClusterCockpit/cc-lib/v2/schema.MonitoringState" }
JobMetric: { model: "github.com/ClusterCockpit/cc-lib/schema.JobMetric" } JobMetric: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.JobMetric" }
Series: { model: "github.com/ClusterCockpit/cc-lib/schema.Series" } Series: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.Series" }
MetricStatistics: MetricStatistics:
{ model: "github.com/ClusterCockpit/cc-lib/schema.MetricStatistics" } { model: "github.com/ClusterCockpit/cc-lib/v2/schema.MetricStatistics" }
MetricConfig: MetricConfig:
{ model: "github.com/ClusterCockpit/cc-lib/schema.MetricConfig" } { model: "github.com/ClusterCockpit/cc-lib/v2/schema.MetricConfig" }
SubClusterConfig: SubClusterConfig:
{ model: "github.com/ClusterCockpit/cc-lib/schema.SubClusterConfig" } { model: "github.com/ClusterCockpit/cc-lib/v2/schema.SubClusterConfig" }
Accelerator: { model: "github.com/ClusterCockpit/cc-lib/schema.Accelerator" } Accelerator: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.Accelerator" }
Topology: { model: "github.com/ClusterCockpit/cc-lib/schema.Topology" } Topology: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.Topology" }
FilterRanges: FilterRanges:
{ model: "github.com/ClusterCockpit/cc-lib/schema.FilterRanges" } { model: "github.com/ClusterCockpit/cc-lib/v2/schema.FilterRanges" }
SubCluster: { model: "github.com/ClusterCockpit/cc-lib/schema.SubCluster" } SubCluster: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.SubCluster" }
StatsSeries: { model: "github.com/ClusterCockpit/cc-lib/schema.StatsSeries" } StatsSeries: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.StatsSeries" }
Unit: { model: "github.com/ClusterCockpit/cc-lib/schema.Unit" } Unit: { model: "github.com/ClusterCockpit/cc-lib/v2/schema.Unit" }

View File

@@ -3,7 +3,7 @@ Description=ClusterCockpit Web Server
Documentation=https://github.com/ClusterCockpit/cc-backend Documentation=https://github.com/ClusterCockpit/cc-backend
Wants=network-online.target Wants=network-online.target
After=network-online.target After=network-online.target
After=mariadb.service mysql.service # Database is file-based SQLite - no service dependency required
[Service] [Service]
WorkingDirectory=/opt/monitoring/cc-backend WorkingDirectory=/opt/monitoring/cc-backend

View File

@@ -23,47 +23,38 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/auth" "github.com/ClusterCockpit/cc-backend/internal/auth"
"github.com/ClusterCockpit/cc-backend/internal/config" "github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/graph" "github.com/ClusterCockpit/cc-backend/internal/graph"
"github.com/ClusterCockpit/cc-backend/internal/metricDataDispatcher" "github.com/ClusterCockpit/cc-backend/internal/metricdispatch"
"github.com/ClusterCockpit/cc-backend/internal/metricdata" "github.com/ClusterCockpit/cc-backend/internal/metricstore"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
"github.com/ClusterCockpit/cc-backend/pkg/archive" "github.com/ClusterCockpit/cc-backend/pkg/archive"
ccconf "github.com/ClusterCockpit/cc-lib/ccConfig" ccconf "github.com/ClusterCockpit/cc-lib/v2/ccConfig"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/gorilla/mux" "github.com/gorilla/mux"
_ "github.com/mattn/go-sqlite3" _ "github.com/mattn/go-sqlite3"
) )
func setup(t *testing.T) *api.RestAPI { func setup(t *testing.T) *api.RestAPI {
repository.ResetConnection()
const testconfig = `{ const testconfig = `{
"main": { "main": {
"addr": "0.0.0.0:8080", "addr": "0.0.0.0:8080",
"validate": false, "validate": false,
"apiAllowedIPs": [ "apiAllowedIPs": [
"*" "*"
] ]
}, },
"archive": { "archive": {
"kind": "file", "kind": "file",
"path": "./var/job-archive" "path": "./var/job-archive"
}, },
"auth": { "auth": {
"jwts": { "jwts": {
"max-age": "2m" "max-age": "2m"
}
} }
},
"clusters": [
{
"name": "testcluster",
"metricDataRepository": {"kind": "test", "url": "bla:8081"},
"filterRanges": {
"numNodes": { "from": 1, "to": 64 },
"duration": { "from": 0, "to": 86400 },
"startTime": { "from": "2022-01-01T00:00:00Z", "to": null }
}
}
]
}` }`
const testclusterJSON = `{ const testclusterJSON = `{
"name": "testcluster", "name": "testcluster",
@@ -141,7 +132,7 @@ func setup(t *testing.T) *api.RestAPI {
} }
dbfilepath := filepath.Join(tmpdir, "test.db") dbfilepath := filepath.Join(tmpdir, "test.db")
err := repository.MigrateDB("sqlite3", dbfilepath) err := repository.MigrateDB(dbfilepath)
if err != nil { if err != nil {
t.Fatal(err) t.Fatal(err)
} }
@@ -155,11 +146,7 @@ func setup(t *testing.T) *api.RestAPI {
// Load and check main configuration // Load and check main configuration
if cfg := ccconf.GetPackageConfig("main"); cfg != nil { if cfg := ccconf.GetPackageConfig("main"); cfg != nil {
if clustercfg := ccconf.GetPackageConfig("clusters"); clustercfg != nil { config.Init(cfg)
config.Init(cfg, clustercfg)
} else {
cclog.Abort("Cluster configuration must be present")
}
} else { } else {
cclog.Abort("Main configuration must be present") cclog.Abort("Main configuration must be present")
} }
@@ -171,9 +158,7 @@ func setup(t *testing.T) *api.RestAPI {
t.Fatal(err) t.Fatal(err)
} }
if err := metricdata.Init(); err != nil { // metricstore initialization removed - it's initialized via callback in tests
t.Fatal(err)
}
archiver.Start(repository.GetJobRepository(), context.Background()) archiver.Start(repository.GetJobRepository(), context.Background())
@@ -190,11 +175,9 @@ func setup(t *testing.T) *api.RestAPI {
} }
func cleanup() { func cleanup() {
// Gracefully shutdown archiver with timeout
if err := archiver.Shutdown(5 * time.Second); err != nil { if err := archiver.Shutdown(5 * time.Second); err != nil {
cclog.Warnf("Archiver shutdown timeout in tests: %v", err) cclog.Warnf("Archiver shutdown timeout in tests: %v", err)
} }
// TODO: Clear all caches, reset all modules, etc...
} }
/* /*
@@ -221,7 +204,7 @@ func TestRestApi(t *testing.T) {
}, },
} }
metricdata.TestLoadDataCallback = func(job *schema.Job, metrics []string, scopes []schema.MetricScope, ctx context.Context, resolution int) (schema.JobData, error) { metricstore.TestLoadDataCallback = func(job *schema.Job, metrics []string, scopes []schema.MetricScope, ctx context.Context, resolution int) (schema.JobData, error) {
return testData, nil return testData, nil
} }
@@ -230,7 +213,7 @@ func TestRestApi(t *testing.T) {
r.StrictSlash(true) r.StrictSlash(true)
restapi.MountAPIRoutes(r) restapi.MountAPIRoutes(r)
var TestJobId int64 = 123 var TestJobID int64 = 123
TestClusterName := "testcluster" TestClusterName := "testcluster"
var TestStartTime int64 = 123456789 var TestStartTime int64 = 123456789
@@ -280,7 +263,7 @@ func TestRestApi(t *testing.T) {
} }
// resolver := graph.GetResolverInstance() // resolver := graph.GetResolverInstance()
restapi.JobRepository.SyncJobs() restapi.JobRepository.SyncJobs()
job, err := restapi.JobRepository.Find(&TestJobId, &TestClusterName, &TestStartTime) job, err := restapi.JobRepository.Find(&TestJobID, &TestClusterName, &TestStartTime)
if err != nil { if err != nil {
t.Fatal(err) t.Fatal(err)
} }
@@ -338,7 +321,7 @@ func TestRestApi(t *testing.T) {
} }
// Archiving happens asynchronously, will be completed in cleanup // Archiving happens asynchronously, will be completed in cleanup
job, err := restapi.JobRepository.Find(&TestJobId, &TestClusterName, &TestStartTime) job, err := restapi.JobRepository.Find(&TestJobID, &TestClusterName, &TestStartTime)
if err != nil { if err != nil {
t.Fatal(err) t.Fatal(err)
} }
@@ -366,7 +349,7 @@ func TestRestApi(t *testing.T) {
} }
t.Run("CheckArchive", func(t *testing.T) { t.Run("CheckArchive", func(t *testing.T) {
data, err := metricDataDispatcher.LoadData(stoppedJob, []string{"load_one"}, []schema.MetricScope{schema.MetricScopeNode}, context.Background(), 60) data, err := metricdispatch.LoadData(stoppedJob, []string{"load_one"}, []schema.MetricScope{schema.MetricScopeNode}, context.Background(), 60)
if err != nil { if err != nil {
t.Fatal(err) t.Fatal(err)
} }

View File

@@ -13,7 +13,7 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
"github.com/ClusterCockpit/cc-backend/pkg/archive" "github.com/ClusterCockpit/cc-backend/pkg/archive"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
) )
// GetClustersAPIResponse model // GetClustersAPIResponse model
@@ -27,7 +27,7 @@ type GetClustersAPIResponse struct {
// @description Get a list of all cluster configs. Specific cluster can be requested using query parameter. // @description Get a list of all cluster configs. Specific cluster can be requested using query parameter.
// @produce json // @produce json
// @param cluster query string false "Job Cluster" // @param cluster query string false "Job Cluster"
// @success 200 {object} api.GetClustersApiResponse "Array of clusters" // @success 200 {object} api.GetClustersAPIResponse "Array of clusters"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
// @failure 403 {object} api.ErrorResponse "Forbidden" // @failure 403 {object} api.ErrorResponse "Forbidden"

File diff suppressed because it is too large Load Diff

View File

@@ -22,11 +22,11 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/graph" "github.com/ClusterCockpit/cc-backend/internal/graph"
"github.com/ClusterCockpit/cc-backend/internal/graph/model" "github.com/ClusterCockpit/cc-backend/internal/graph/model"
"github.com/ClusterCockpit/cc-backend/internal/importer" "github.com/ClusterCockpit/cc-backend/internal/importer"
"github.com/ClusterCockpit/cc-backend/internal/metricDataDispatcher" "github.com/ClusterCockpit/cc-backend/internal/metricdispatch"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
"github.com/ClusterCockpit/cc-backend/pkg/archive" "github.com/ClusterCockpit/cc-backend/pkg/archive"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/gorilla/mux" "github.com/gorilla/mux"
) )
@@ -104,7 +104,7 @@ type JobMetricWithName struct {
// @param items-per-page query int false "Items per page (Default: 25)" // @param items-per-page query int false "Items per page (Default: 25)"
// @param page query int false "Page Number (Default: 1)" // @param page query int false "Page Number (Default: 1)"
// @param with-metadata query bool false "Include metadata (e.g. jobScript) in response" // @param with-metadata query bool false "Include metadata (e.g. jobScript) in response"
// @success 200 {object} api.GetJobsApiResponse "Job array and page info" // @success 200 {object} api.GetJobsAPIResponse "Job array and page info"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
// @failure 403 {object} api.ErrorResponse "Forbidden" // @failure 403 {object} api.ErrorResponse "Forbidden"
@@ -232,7 +232,7 @@ func (api *RestAPI) getJobs(rw http.ResponseWriter, r *http.Request) {
// @produce json // @produce json
// @param id path int true "Database ID of Job" // @param id path int true "Database ID of Job"
// @param all-metrics query bool false "Include all available metrics" // @param all-metrics query bool false "Include all available metrics"
// @success 200 {object} api.GetJobApiResponse "Job resource" // @success 200 {object} api.GetJobAPIResponse "Job resource"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
// @failure 403 {object} api.ErrorResponse "Forbidden" // @failure 403 {object} api.ErrorResponse "Forbidden"
@@ -293,7 +293,7 @@ func (api *RestAPI) getCompleteJobByID(rw http.ResponseWriter, r *http.Request)
} }
if r.URL.Query().Get("all-metrics") == "true" { if r.URL.Query().Get("all-metrics") == "true" {
data, err = metricDataDispatcher.LoadData(job, nil, scopes, r.Context(), resolution) data, err = metricdispatch.LoadData(job, nil, scopes, r.Context(), resolution)
if err != nil { if err != nil {
cclog.Warnf("REST: error while loading all-metrics job data for JobID %d on %s", job.JobID, job.Cluster) cclog.Warnf("REST: error while loading all-metrics job data for JobID %d on %s", job.JobID, job.Cluster)
return return
@@ -324,8 +324,8 @@ func (api *RestAPI) getCompleteJobByID(rw http.ResponseWriter, r *http.Request)
// @accept json // @accept json
// @produce json // @produce json
// @param id path int true "Database ID of Job" // @param id path int true "Database ID of Job"
// @param request body api.GetJobApiRequest true "Array of metric names" // @param request body api.GetJobAPIRequest true "Array of metric names"
// @success 200 {object} api.GetJobApiResponse "Job resource" // @success 200 {object} api.GetJobAPIResponse "Job resource"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
// @failure 403 {object} api.ErrorResponse "Forbidden" // @failure 403 {object} api.ErrorResponse "Forbidden"
@@ -389,7 +389,7 @@ func (api *RestAPI) getJobByID(rw http.ResponseWriter, r *http.Request) {
resolution = max(resolution, mc.Timestep) resolution = max(resolution, mc.Timestep)
} }
data, err := metricDataDispatcher.LoadData(job, metrics, scopes, r.Context(), resolution) data, err := metricdispatch.LoadData(job, metrics, scopes, r.Context(), resolution)
if err != nil { if err != nil {
cclog.Warnf("REST: error while loading job data for JobID %d on %s", job.JobID, job.Cluster) cclog.Warnf("REST: error while loading job data for JobID %d on %s", job.JobID, job.Cluster)
return return
@@ -478,7 +478,7 @@ func (api *RestAPI) editMeta(rw http.ResponseWriter, r *http.Request) {
// @accept json // @accept json
// @produce json // @produce json
// @param id path int true "Job Database ID" // @param id path int true "Job Database ID"
// @param request body api.TagJobApiRequest true "Array of tag-objects to add" // @param request body api.TagJobAPIRequest true "Array of tag-objects to add"
// @success 200 {object} schema.Job "Updated job resource" // @success 200 {object} schema.Job "Updated job resource"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
@@ -542,7 +542,7 @@ func (api *RestAPI) tagJob(rw http.ResponseWriter, r *http.Request) {
// @accept json // @accept json
// @produce json // @produce json
// @param id path int true "Job Database ID" // @param id path int true "Job Database ID"
// @param request body api.TagJobApiRequest true "Array of tag-objects to remove" // @param request body api.TagJobAPIRequest true "Array of tag-objects to remove"
// @success 200 {object} schema.Job "Updated job resource" // @success 200 {object} schema.Job "Updated job resource"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
@@ -606,7 +606,7 @@ func (api *RestAPI) removeTagJob(rw http.ResponseWriter, r *http.Request) {
// @description Tag wills be removed from respective archive files. // @description Tag wills be removed from respective archive files.
// @accept json // @accept json
// @produce plain // @produce plain
// @param request body api.TagJobApiRequest true "Array of tag-objects to remove" // @param request body api.TagJobAPIRequest true "Array of tag-objects to remove"
// @success 200 {string} string "Success Response" // @success 200 {string} string "Success Response"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
@@ -650,7 +650,7 @@ func (api *RestAPI) removeTags(rw http.ResponseWriter, r *http.Request) {
// @accept json // @accept json
// @produce json // @produce json
// @param request body schema.Job true "Job to add" // @param request body schema.Job true "Job to add"
// @success 201 {object} api.DefaultApiResponse "Job added successfully" // @success 201 {object} api.DefaultAPIResponse "Job added successfully"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
// @failure 403 {object} api.ErrorResponse "Forbidden" // @failure 403 {object} api.ErrorResponse "Forbidden"
@@ -728,7 +728,7 @@ func (api *RestAPI) startJob(rw http.ResponseWriter, r *http.Request) {
// @description Job to stop is specified by request body. All fields are required in this case. // @description Job to stop is specified by request body. All fields are required in this case.
// @description Returns full job resource information according to 'Job' scheme. // @description Returns full job resource information according to 'Job' scheme.
// @produce json // @produce json
// @param request body api.StopJobApiRequest true "All fields required" // @param request body api.StopJobAPIRequest true "All fields required"
// @success 200 {object} schema.Job "Success message" // @success 200 {object} schema.Job "Success message"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
@@ -754,7 +754,6 @@ func (api *RestAPI) stopJobByRequest(rw http.ResponseWriter, r *http.Request) {
return return
} }
// cclog.Printf("loading db job for stopJobByRequest... : stopJobApiRequest=%v", req)
job, err = api.JobRepository.Find(req.JobID, req.Cluster, req.StartTime) job, err = api.JobRepository.Find(req.JobID, req.Cluster, req.StartTime)
if err != nil { if err != nil {
// Try cached jobs if not found in main repository // Try cached jobs if not found in main repository
@@ -776,7 +775,7 @@ func (api *RestAPI) stopJobByRequest(rw http.ResponseWriter, r *http.Request) {
// @description Job to remove is specified by database ID. This will not remove the job from the job archive. // @description Job to remove is specified by database ID. This will not remove the job from the job archive.
// @produce json // @produce json
// @param id path int true "Database ID of Job" // @param id path int true "Database ID of Job"
// @success 200 {object} api.DefaultApiResponse "Success message" // @success 200 {object} api.DefaultAPIResponse "Success message"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
// @failure 403 {object} api.ErrorResponse "Forbidden" // @failure 403 {object} api.ErrorResponse "Forbidden"
@@ -820,8 +819,8 @@ func (api *RestAPI) deleteJobByID(rw http.ResponseWriter, r *http.Request) {
// @description Job to delete is specified by request body. All fields are required in this case. // @description Job to delete is specified by request body. All fields are required in this case.
// @accept json // @accept json
// @produce json // @produce json
// @param request body api.DeleteJobApiRequest true "All fields required" // @param request body api.DeleteJobAPIRequest true "All fields required"
// @success 200 {object} api.DefaultApiResponse "Success message" // @success 200 {object} api.DefaultAPIResponse "Success message"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
// @failure 403 {object} api.ErrorResponse "Forbidden" // @failure 403 {object} api.ErrorResponse "Forbidden"
@@ -873,7 +872,7 @@ func (api *RestAPI) deleteJobByRequest(rw http.ResponseWriter, r *http.Request)
// @description Remove all jobs with start time before timestamp. The jobs will not be removed from the job archive. // @description Remove all jobs with start time before timestamp. The jobs will not be removed from the job archive.
// @produce json // @produce json
// @param ts path int true "Unix epoch timestamp" // @param ts path int true "Unix epoch timestamp"
// @success 200 {object} api.DefaultApiResponse "Success message" // @success 200 {object} api.DefaultAPIResponse "Success message"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
// @failure 403 {object} api.ErrorResponse "Forbidden" // @failure 403 {object} api.ErrorResponse "Forbidden"

View File

@@ -15,8 +15,8 @@ import (
"strconv" "strconv"
"strings" "strings"
"github.com/ClusterCockpit/cc-backend/internal/memorystore" "github.com/ClusterCockpit/cc-backend/internal/metricstore"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/influxdata/line-protocol/v2/lineprotocol" "github.com/influxdata/line-protocol/v2/lineprotocol"
) )
@@ -58,7 +58,7 @@ func freeMetrics(rw http.ResponseWriter, r *http.Request) {
return return
} }
ms := memorystore.GetMemoryStore() ms := metricstore.GetMemoryStore()
n := 0 n := 0
for _, sel := range selectors { for _, sel := range selectors {
bn, err := ms.Free(sel, to) bn, err := ms.Free(sel, to)
@@ -97,9 +97,9 @@ func writeMetrics(rw http.ResponseWriter, r *http.Request) {
return return
} }
ms := memorystore.GetMemoryStore() ms := metricstore.GetMemoryStore()
dec := lineprotocol.NewDecoderWithBytes(bytes) dec := lineprotocol.NewDecoderWithBytes(bytes)
if err := memorystore.DecodeLine(dec, ms, r.URL.Query().Get("cluster")); err != nil { if err := metricstore.DecodeLine(dec, ms, r.URL.Query().Get("cluster")); err != nil {
cclog.Errorf("/api/write error: %s", err.Error()) cclog.Errorf("/api/write error: %s", err.Error())
handleError(err, http.StatusBadRequest, rw) handleError(err, http.StatusBadRequest, rw)
return return
@@ -129,7 +129,7 @@ func debugMetrics(rw http.ResponseWriter, r *http.Request) {
selector = strings.Split(raw, ":") selector = strings.Split(raw, ":")
} }
ms := memorystore.GetMemoryStore() ms := metricstore.GetMemoryStore()
if err := ms.DebugDump(bufio.NewWriter(rw), selector); err != nil { if err := ms.DebugDump(bufio.NewWriter(rw), selector); err != nil {
handleError(err, http.StatusBadRequest, rw) handleError(err, http.StatusBadRequest, rw)
return return
@@ -162,7 +162,7 @@ func metricsHealth(rw http.ResponseWriter, r *http.Request) {
selector := []string{rawCluster, rawNode} selector := []string{rawCluster, rawNode}
ms := memorystore.GetMemoryStore() ms := metricstore.GetMemoryStore()
if err := ms.HealthCheck(bufio.NewWriter(rw), selector); err != nil { if err := ms.HealthCheck(bufio.NewWriter(rw), selector); err != nil {
handleError(err, http.StatusBadRequest, rw) handleError(err, http.StatusBadRequest, rw)
return return

View File

@@ -6,9 +6,9 @@
package api package api
import ( import (
"bytes"
"database/sql" "database/sql"
"encoding/json" "encoding/json"
"strings"
"sync" "sync"
"time" "time"
@@ -17,12 +17,48 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/importer" "github.com/ClusterCockpit/cc-backend/internal/importer"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
"github.com/ClusterCockpit/cc-backend/pkg/nats" "github.com/ClusterCockpit/cc-backend/pkg/nats"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
"github.com/ClusterCockpit/cc-lib/v2/receivers"
"github.com/ClusterCockpit/cc-lib/v2/schema"
influx "github.com/influxdata/line-protocol/v2/lineprotocol"
) )
// NatsAPI provides NATS subscription-based handlers for Job and Node operations. // NatsAPI provides NATS subscription-based handlers for Job and Node operations.
// It mirrors the functionality of the REST API but uses NATS messaging. // It mirrors the functionality of the REST API but uses NATS messaging with
// InfluxDB line protocol as the message format.
//
// # Message Format
//
// All NATS messages use InfluxDB line protocol format (https://docs.influxdata.com/influxdb/v2.0/reference/syntax/line-protocol/)
// with the following structure:
//
// measurement,tag1=value1,tag2=value2 field1=value1,field2=value2 timestamp
//
// # Job Events
//
// Job start/stop events use the "job" measurement with a "function" tag to distinguish operations:
//
// job,function=start_job event="{...JSON payload...}" <timestamp>
// job,function=stop_job event="{...JSON payload...}" <timestamp>
//
// The JSON payload in the "event" field follows the schema.Job or StopJobAPIRequest structure.
//
// Example job start message:
//
// job,function=start_job event="{\"jobId\":1001,\"user\":\"testuser\",\"cluster\":\"testcluster\",...}" 1234567890000000000
//
// # Node State Events
//
// Node state updates use the "nodestate" measurement with cluster information:
//
// nodestate event="{...JSON payload...}" <timestamp>
//
// The JSON payload follows the UpdateNodeStatesRequest structure.
//
// Example node state message:
//
// nodestate event="{\"cluster\":\"testcluster\",\"nodes\":[{\"hostname\":\"node01\",\"states\":[\"idle\"]}]}" 1234567890000000000
type NatsAPI struct { type NatsAPI struct {
JobRepository *repository.JobRepository JobRepository *repository.JobRepository
// RepositoryMutex protects job creation operations from race conditions // RepositoryMutex protects job creation operations from race conditions
@@ -50,11 +86,7 @@ func (api *NatsAPI) StartSubscriptions() error {
s := config.Keys.APISubjects s := config.Keys.APISubjects
if err := client.Subscribe(s.SubjectJobStart, api.handleStartJob); err != nil { if err := client.Subscribe(s.SubjectJobEvent, api.handleJobEvent); err != nil {
return err
}
if err := client.Subscribe(s.SubjectJobStop, api.handleStopJob); err != nil {
return err return err
} }
@@ -67,26 +99,96 @@ func (api *NatsAPI) StartSubscriptions() error {
return nil return nil
} }
// processJobEvent routes job event messages to the appropriate handler based on the "function" tag.
// Validates that required tags and fields are present before processing.
func (api *NatsAPI) processJobEvent(msg lp.CCMessage) {
function, ok := msg.GetTag("function")
if !ok {
cclog.Errorf("Job event is missing required tag 'function': measurement=%s", msg.Name())
return
}
switch function {
case "start_job":
v, ok := msg.GetEventValue()
if !ok {
cclog.Errorf("Job start event is missing event field with JSON payload")
return
}
api.handleStartJob(v)
case "stop_job":
v, ok := msg.GetEventValue()
if !ok {
cclog.Errorf("Job stop event is missing event field with JSON payload")
return
}
api.handleStopJob(v)
default:
cclog.Warnf("Unknown job event function '%s', expected 'start_job' or 'stop_job'", function)
}
}
// handleJobEvent processes job-related messages received via NATS using InfluxDB line protocol.
// The message must be in line protocol format with measurement="job" and include:
// - tag "function" with value "start_job" or "stop_job"
// - field "event" containing JSON payload (schema.Job or StopJobAPIRequest)
//
// Example: job,function=start_job event="{\"jobId\":1001,...}" 1234567890000000000
func (api *NatsAPI) handleJobEvent(subject string, data []byte) {
if len(data) == 0 {
cclog.Warnf("NATS %s: received empty message", subject)
return
}
d := influx.NewDecoderWithBytes(data)
for d.Next() {
m, err := receivers.DecodeInfluxMessage(d)
if err != nil {
cclog.Errorf("NATS %s: failed to decode InfluxDB line protocol message: %v", subject, err)
return
}
if !m.IsEvent() {
cclog.Debugf("NATS %s: received non-event message, skipping", subject)
continue
}
if m.Name() == "job" {
api.processJobEvent(m)
} else {
cclog.Debugf("NATS %s: unexpected measurement name '%s', expected 'job'", subject, m.Name())
}
}
}
// handleStartJob processes job start messages received via NATS. // handleStartJob processes job start messages received via NATS.
// Expected JSON payload follows the schema.Job structure. // The payload parameter contains JSON following the schema.Job structure.
func (api *NatsAPI) handleStartJob(subject string, data []byte) { // Jobs are validated, checked for duplicates, and inserted into the database.
func (api *NatsAPI) handleStartJob(payload string) {
if payload == "" {
cclog.Error("NATS start job: payload is empty")
return
}
req := schema.Job{ req := schema.Job{
Shared: "none", Shared: "none",
MonitoringStatus: schema.MonitoringStatusRunningOrArchiving, MonitoringStatus: schema.MonitoringStatusRunningOrArchiving,
} }
dec := json.NewDecoder(bytes.NewReader(data)) dec := json.NewDecoder(strings.NewReader(payload))
dec.DisallowUnknownFields() dec.DisallowUnknownFields()
if err := dec.Decode(&req); err != nil { if err := dec.Decode(&req); err != nil {
cclog.Errorf("NATS %s: parsing request failed: %v", subject, err) cclog.Errorf("NATS start job: parsing request failed: %v", err)
return return
} }
cclog.Debugf("NATS %s: %s", subject, req.GoString()) cclog.Debugf("NATS start job: %s", req.GoString())
req.State = schema.JobStateRunning req.State = schema.JobStateRunning
if err := importer.SanityChecks(&req); err != nil { if err := importer.SanityChecks(&req); err != nil {
cclog.Errorf("NATS %s: sanity check failed: %v", subject, err) cclog.Errorf("NATS start job: sanity check failed: %v", err)
return return
} }
@@ -96,14 +198,14 @@ func (api *NatsAPI) handleStartJob(subject string, data []byte) {
jobs, err := api.JobRepository.FindAll(&req.JobID, &req.Cluster, nil) jobs, err := api.JobRepository.FindAll(&req.JobID, &req.Cluster, nil)
if err != nil && err != sql.ErrNoRows { if err != nil && err != sql.ErrNoRows {
cclog.Errorf("NATS %s: checking for duplicate failed: %v", subject, err) cclog.Errorf("NATS start job: checking for duplicate failed: %v", err)
return return
} }
if err == nil { if err == nil {
for _, job := range jobs { for _, job := range jobs {
if (req.StartTime - job.StartTime) < secondsPerDay { if (req.StartTime - job.StartTime) < secondsPerDay {
cclog.Errorf("NATS %s: job with jobId %d, cluster %s already exists (dbid: %d)", cclog.Errorf("NATS start job: job with jobId %d, cluster %s already exists (dbid: %d)",
subject, req.JobID, req.Cluster, job.ID) req.JobID, req.Cluster, job.ID)
return return
} }
} }
@@ -111,14 +213,14 @@ func (api *NatsAPI) handleStartJob(subject string, data []byte) {
id, err := api.JobRepository.Start(&req) id, err := api.JobRepository.Start(&req)
if err != nil { if err != nil {
cclog.Errorf("NATS %s: insert into database failed: %v", subject, err) cclog.Errorf("NATS start job: insert into database failed: %v", err)
return return
} }
unlockOnce.Do(api.RepositoryMutex.Unlock) unlockOnce.Do(api.RepositoryMutex.Unlock)
for _, tag := range req.Tags { for _, tag := range req.Tags {
if _, err := api.JobRepository.AddTagOrCreate(nil, id, tag.Type, tag.Name, tag.Scope); err != nil { if _, err := api.JobRepository.AddTagOrCreate(nil, id, tag.Type, tag.Name, tag.Scope); err != nil {
cclog.Errorf("NATS %s: adding tag to new job %d failed: %v", subject, id, err) cclog.Errorf("NATS start job: adding tag to new job %d failed: %v", id, err)
return return
} }
} }
@@ -128,19 +230,24 @@ func (api *NatsAPI) handleStartJob(subject string, data []byte) {
} }
// handleStopJob processes job stop messages received via NATS. // handleStopJob processes job stop messages received via NATS.
// Expected JSON payload follows the StopJobAPIRequest structure. // The payload parameter contains JSON following the StopJobAPIRequest structure.
func (api *NatsAPI) handleStopJob(subject string, data []byte) { // The job is marked as stopped in the database and archiving is triggered if monitoring is enabled.
func (api *NatsAPI) handleStopJob(payload string) {
if payload == "" {
cclog.Error("NATS stop job: payload is empty")
return
}
var req StopJobAPIRequest var req StopJobAPIRequest
dec := json.NewDecoder(bytes.NewReader(data)) dec := json.NewDecoder(strings.NewReader(payload))
dec.DisallowUnknownFields() dec.DisallowUnknownFields()
if err := dec.Decode(&req); err != nil { if err := dec.Decode(&req); err != nil {
cclog.Errorf("NATS %s: parsing request failed: %v", subject, err) cclog.Errorf("NATS job stop: parsing request failed: %v", err)
return return
} }
if req.JobID == nil { if req.JobID == nil {
cclog.Errorf("NATS %s: the field 'jobId' is required", subject) cclog.Errorf("NATS job stop: the field 'jobId' is required")
return return
} }
@@ -148,28 +255,28 @@ func (api *NatsAPI) handleStopJob(subject string, data []byte) {
if err != nil { if err != nil {
cachedJob, cachedErr := api.JobRepository.FindCached(req.JobID, req.Cluster, req.StartTime) cachedJob, cachedErr := api.JobRepository.FindCached(req.JobID, req.Cluster, req.StartTime)
if cachedErr != nil { if cachedErr != nil {
cclog.Errorf("NATS %s: finding job failed: %v (cached lookup also failed: %v)", cclog.Errorf("NATS job stop: finding job failed: %v (cached lookup also failed: %v)",
subject, err, cachedErr) err, cachedErr)
return return
} }
job = cachedJob job = cachedJob
} }
if job.State != schema.JobStateRunning { if job.State != schema.JobStateRunning {
cclog.Errorf("NATS %s: jobId %d (id %d) on %s: job has already been stopped (state is: %s)", cclog.Errorf("NATS job stop: jobId %d (id %d) on %s: job has already been stopped (state is: %s)",
subject, job.JobID, job.ID, job.Cluster, job.State) job.JobID, job.ID, job.Cluster, job.State)
return return
} }
if job.StartTime > req.StopTime { if job.StartTime > req.StopTime {
cclog.Errorf("NATS %s: jobId %d (id %d) on %s: stopTime %d must be >= startTime %d", cclog.Errorf("NATS job stop: jobId %d (id %d) on %s: stopTime %d must be >= startTime %d",
subject, job.JobID, job.ID, job.Cluster, req.StopTime, job.StartTime) job.JobID, job.ID, job.Cluster, req.StopTime, job.StartTime)
return return
} }
if req.State != "" && !req.State.Valid() { if req.State != "" && !req.State.Valid() {
cclog.Errorf("NATS %s: jobId %d (id %d) on %s: invalid job state: %#v", cclog.Errorf("NATS job stop: jobId %d (id %d) on %s: invalid job state: %#v",
subject, job.JobID, job.ID, job.Cluster, req.State) job.JobID, job.ID, job.Cluster, req.State)
return return
} else if req.State == "" { } else if req.State == "" {
req.State = schema.JobStateCompleted req.State = schema.JobStateCompleted
@@ -182,8 +289,8 @@ func (api *NatsAPI) handleStopJob(subject string, data []byte) {
if err := api.JobRepository.Stop(*job.ID, job.Duration, job.State, job.MonitoringStatus); err != nil { if err := api.JobRepository.Stop(*job.ID, job.Duration, job.State, job.MonitoringStatus); err != nil {
if err := api.JobRepository.StopCached(*job.ID, job.Duration, job.State, job.MonitoringStatus); err != nil { if err := api.JobRepository.StopCached(*job.ID, job.Duration, job.State, job.MonitoringStatus); err != nil {
cclog.Errorf("NATS %s: jobId %d (id %d) on %s: marking job as '%s' failed: %v", cclog.Errorf("NATS job stop: jobId %d (id %d) on %s: marking job as '%s' failed: %v",
subject, job.JobID, job.ID, job.Cluster, job.State, err) job.JobID, job.ID, job.Cluster, job.State, err)
return return
} }
} }
@@ -198,15 +305,21 @@ func (api *NatsAPI) handleStopJob(subject string, data []byte) {
archiver.TriggerArchiving(job) archiver.TriggerArchiving(job)
} }
// handleNodeState processes node state update messages received via NATS. // processNodestateEvent extracts and processes node state data from the InfluxDB message.
// Expected JSON payload follows the UpdateNodeStatesRequest structure. // Updates node states in the repository for all nodes in the payload.
func (api *NatsAPI) handleNodeState(subject string, data []byte) { func (api *NatsAPI) processNodestateEvent(msg lp.CCMessage) {
v, ok := msg.GetEventValue()
if !ok {
cclog.Errorf("Nodestate event is missing event field with JSON payload")
return
}
var req UpdateNodeStatesRequest var req UpdateNodeStatesRequest
dec := json.NewDecoder(bytes.NewReader(data)) dec := json.NewDecoder(strings.NewReader(v))
dec.DisallowUnknownFields() dec.DisallowUnknownFields()
if err := dec.Decode(&req); err != nil { if err := dec.Decode(&req); err != nil {
cclog.Errorf("NATS %s: parsing request failed: %v", subject, err) cclog.Errorf("NATS nodestate: parsing request failed: %v", err)
return return
} }
@@ -224,8 +337,44 @@ func (api *NatsAPI) handleNodeState(subject string, data []byte) {
JobsRunning: node.JobsRunning, JobsRunning: node.JobsRunning,
} }
repo.UpdateNodeState(node.Hostname, req.Cluster, &nodeState) if err := repo.UpdateNodeState(node.Hostname, req.Cluster, &nodeState); err != nil {
cclog.Errorf("NATS nodestate: updating node state for %s on %s failed: %v",
node.Hostname, req.Cluster, err)
}
} }
cclog.Debugf("NATS %s: updated %d node states for cluster %s", subject, len(req.Nodes), req.Cluster) cclog.Debugf("NATS nodestate: updated %d node states for cluster %s", len(req.Nodes), req.Cluster)
}
// handleNodeState processes node state update messages received via NATS using InfluxDB line protocol.
// The message must be in line protocol format with measurement="nodestate" and include:
// - field "event" containing JSON payload (UpdateNodeStatesRequest)
//
// Example: nodestate event="{\"cluster\":\"testcluster\",\"nodes\":[...]}" 1234567890000000000
func (api *NatsAPI) handleNodeState(subject string, data []byte) {
if len(data) == 0 {
cclog.Warnf("NATS %s: received empty message", subject)
return
}
d := influx.NewDecoderWithBytes(data)
for d.Next() {
m, err := receivers.DecodeInfluxMessage(d)
if err != nil {
cclog.Errorf("NATS %s: failed to decode InfluxDB line protocol message: %v", subject, err)
return
}
if !m.IsEvent() {
cclog.Warnf("NATS %s: received non-event message, skipping", subject)
continue
}
if m.Name() == "nodestate" {
api.processNodestateEvent(m)
} else {
cclog.Warnf("NATS %s: unexpected measurement name '%s', expected 'nodestate'", subject, m.Name())
}
}
} }

947
internal/api/nats_test.go Normal file
View File

@@ -0,0 +1,947 @@
// Copyright (C) NHR@FAU, University Erlangen-Nuremberg.
// All rights reserved. This file is part of cc-backend.
// Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file.
package api
import (
"context"
"database/sql"
"encoding/json"
"fmt"
"os"
"path/filepath"
"testing"
"time"
"github.com/ClusterCockpit/cc-backend/internal/archiver"
"github.com/ClusterCockpit/cc-backend/internal/auth"
"github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/graph"
"github.com/ClusterCockpit/cc-backend/internal/metricstore"
"github.com/ClusterCockpit/cc-backend/internal/repository"
"github.com/ClusterCockpit/cc-backend/pkg/archive"
ccconf "github.com/ClusterCockpit/cc-lib/v2/ccConfig"
cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
lp "github.com/ClusterCockpit/cc-lib/v2/ccMessage"
"github.com/ClusterCockpit/cc-lib/v2/schema"
_ "github.com/mattn/go-sqlite3"
)
func setupNatsTest(t *testing.T) *NatsAPI {
repository.ResetConnection()
const testconfig = `{
"main": {
"addr": "0.0.0.0:8080",
"validate": false,
"apiAllowedIPs": [
"*"
]
},
"archive": {
"kind": "file",
"path": "./var/job-archive"
},
"auth": {
"jwts": {
"max-age": "2m"
}
}
}`
const testclusterJSON = `{
"name": "testcluster",
"subClusters": [
{
"name": "sc1",
"nodes": "host123,host124,host125",
"processorType": "Intel Core i7-4770",
"socketsPerNode": 1,
"coresPerSocket": 4,
"threadsPerCore": 2,
"flopRateScalar": {
"unit": {
"prefix": "G",
"base": "F/s"
},
"value": 14
},
"flopRateSimd": {
"unit": {
"prefix": "G",
"base": "F/s"
},
"value": 112
},
"memoryBandwidth": {
"unit": {
"prefix": "G",
"base": "B/s"
},
"value": 24
},
"numberOfNodes": 70,
"topology": {
"node": [0, 1, 2, 3, 4, 5, 6, 7],
"socket": [[0, 1, 2, 3, 4, 5, 6, 7]],
"memoryDomain": [[0, 1, 2, 3, 4, 5, 6, 7]],
"die": [[0, 1, 2, 3, 4, 5, 6, 7]],
"core": [[0], [1], [2], [3], [4], [5], [6], [7]]
}
}
],
"metricConfig": [
{
"name": "load_one",
"unit": { "base": ""},
"scope": "node",
"timestep": 60,
"aggregation": "avg",
"peak": 8,
"normal": 0,
"caution": 0,
"alert": 0
}
]
}`
cclog.Init("info", true)
tmpdir := t.TempDir()
jobarchive := filepath.Join(tmpdir, "job-archive")
if err := os.Mkdir(jobarchive, 0o777); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(jobarchive, "version.txt"), fmt.Appendf(nil, "%d", 3), 0o666); err != nil {
t.Fatal(err)
}
if err := os.Mkdir(filepath.Join(jobarchive, "testcluster"), 0o777); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(jobarchive, "testcluster", "cluster.json"), []byte(testclusterJSON), 0o666); err != nil {
t.Fatal(err)
}
dbfilepath := filepath.Join(tmpdir, "test.db")
err := repository.MigrateDB(dbfilepath)
if err != nil {
t.Fatal(err)
}
cfgFilePath := filepath.Join(tmpdir, "config.json")
if err := os.WriteFile(cfgFilePath, []byte(testconfig), 0o666); err != nil {
t.Fatal(err)
}
ccconf.Init(cfgFilePath)
// Load and check main configuration
if cfg := ccconf.GetPackageConfig("main"); cfg != nil {
config.Init(cfg)
} else {
cclog.Abort("Main configuration must be present")
}
archiveCfg := fmt.Sprintf("{\"kind\": \"file\",\"path\": \"%s\"}", jobarchive)
repository.Connect("sqlite3", dbfilepath)
if err := archive.Init(json.RawMessage(archiveCfg), config.Keys.DisableArchive); err != nil {
t.Fatal(err)
}
// metricstore initialization removed - it's initialized via callback in tests
archiver.Start(repository.GetJobRepository(), context.Background())
if cfg := ccconf.GetPackageConfig("auth"); cfg != nil {
auth.Init(&cfg)
} else {
cclog.Warn("Authentication disabled due to missing configuration")
auth.Init(nil)
}
graph.Init()
return NewNatsAPI()
}
func cleanupNatsTest() {
if err := archiver.Shutdown(5 * time.Second); err != nil {
cclog.Warnf("Archiver shutdown timeout in tests: %v", err)
}
}
func TestNatsHandleStartJob(t *testing.T) {
natsAPI := setupNatsTest(t)
t.Cleanup(cleanupNatsTest)
tests := []struct {
name string
payload string
expectError bool
validateJob func(t *testing.T, job *schema.Job)
shouldFindJob bool
}{
{
name: "valid job start",
payload: `{
"jobId": 1001,
"user": "testuser1",
"project": "testproj1",
"cluster": "testcluster",
"partition": "main",
"walltime": 7200,
"numNodes": 1,
"numHwthreads": 8,
"numAcc": 0,
"shared": "none",
"monitoringStatus": 1,
"smt": 1,
"resources": [
{
"hostname": "host123",
"hwthreads": [0, 1, 2, 3, 4, 5, 6, 7]
}
],
"startTime": 1234567890
}`,
expectError: false,
shouldFindJob: true,
validateJob: func(t *testing.T, job *schema.Job) {
if job.JobID != 1001 {
t.Errorf("expected JobID 1001, got %d", job.JobID)
}
if job.User != "testuser1" {
t.Errorf("expected user testuser1, got %s", job.User)
}
if job.State != schema.JobStateRunning {
t.Errorf("expected state running, got %s", job.State)
}
},
},
{
name: "invalid JSON",
payload: `{
"jobId": "not a number",
"user": "testuser2"
}`,
expectError: true,
shouldFindJob: false,
},
{
name: "missing required fields",
payload: `{
"jobId": 1002
}`,
expectError: true,
shouldFindJob: false,
},
{
name: "job with unknown fields (should fail due to DisallowUnknownFields)",
payload: `{
"jobId": 1003,
"user": "testuser3",
"project": "testproj3",
"cluster": "testcluster",
"partition": "main",
"walltime": 3600,
"numNodes": 1,
"numHwthreads": 8,
"unknownField": "should cause error",
"startTime": 1234567900
}`,
expectError: true,
shouldFindJob: false,
},
{
name: "job with tags",
payload: `{
"jobId": 1004,
"user": "testuser4",
"project": "testproj4",
"cluster": "testcluster",
"partition": "main",
"walltime": 3600,
"numNodes": 1,
"numHwthreads": 8,
"numAcc": 0,
"shared": "none",
"monitoringStatus": 1,
"smt": 1,
"resources": [
{
"hostname": "host123",
"hwthreads": [0, 1, 2, 3]
}
],
"tags": [
{
"type": "test",
"name": "testtag",
"scope": "testuser4"
}
],
"startTime": 1234567910
}`,
expectError: false,
shouldFindJob: true,
validateJob: func(t *testing.T, job *schema.Job) {
if job.JobID != 1004 {
t.Errorf("expected JobID 1004, got %d", job.JobID)
}
},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
natsAPI.handleStartJob(tt.payload)
natsAPI.JobRepository.SyncJobs()
// Allow some time for async operations
time.Sleep(100 * time.Millisecond)
if tt.shouldFindJob {
// Extract jobId from payload
var payloadMap map[string]any
json.Unmarshal([]byte(tt.payload), &payloadMap)
jobID := int64(payloadMap["jobId"].(float64))
cluster := payloadMap["cluster"].(string)
startTime := int64(payloadMap["startTime"].(float64))
job, err := natsAPI.JobRepository.Find(&jobID, &cluster, &startTime)
if err != nil {
if !tt.expectError {
t.Fatalf("expected to find job, but got error: %v", err)
}
return
}
if tt.validateJob != nil {
tt.validateJob(t, job)
}
}
})
}
}
func TestNatsHandleStopJob(t *testing.T) {
natsAPI := setupNatsTest(t)
t.Cleanup(cleanupNatsTest)
// First, create a running job
startPayload := `{
"jobId": 2001,
"user": "testuser",
"project": "testproj",
"cluster": "testcluster",
"partition": "main",
"walltime": 3600,
"numNodes": 1,
"numHwthreads": 8,
"numAcc": 0,
"shared": "none",
"monitoringStatus": 1,
"smt": 1,
"resources": [
{
"hostname": "host123",
"hwthreads": [0, 1, 2, 3, 4, 5, 6, 7]
}
],
"startTime": 1234567890
}`
natsAPI.handleStartJob(startPayload)
natsAPI.JobRepository.SyncJobs()
time.Sleep(100 * time.Millisecond)
tests := []struct {
name string
payload string
expectError bool
validateJob func(t *testing.T, job *schema.Job)
setupJobFunc func() // Optional: create specific test job
}{
{
name: "valid job stop - completed",
payload: `{
"jobId": 2001,
"cluster": "testcluster",
"startTime": 1234567890,
"jobState": "completed",
"stopTime": 1234571490
}`,
expectError: false,
validateJob: func(t *testing.T, job *schema.Job) {
if job.State != schema.JobStateCompleted {
t.Errorf("expected state completed, got %s", job.State)
}
expectedDuration := int32(1234571490 - 1234567890)
if job.Duration != expectedDuration {
t.Errorf("expected duration %d, got %d", expectedDuration, job.Duration)
}
},
},
{
name: "valid job stop - failed",
setupJobFunc: func() {
startPayloadFailed := `{
"jobId": 2002,
"user": "testuser",
"project": "testproj",
"cluster": "testcluster",
"partition": "main",
"walltime": 3600,
"numNodes": 1,
"numHwthreads": 8,
"numAcc": 0,
"shared": "none",
"monitoringStatus": 1,
"smt": 1,
"resources": [
{
"hostname": "host123",
"hwthreads": [0, 1, 2, 3]
}
],
"startTime": 1234567900
}`
natsAPI.handleStartJob(startPayloadFailed)
natsAPI.JobRepository.SyncJobs()
time.Sleep(100 * time.Millisecond)
},
payload: `{
"jobId": 2002,
"cluster": "testcluster",
"startTime": 1234567900,
"jobState": "failed",
"stopTime": 1234569900
}`,
expectError: false,
validateJob: func(t *testing.T, job *schema.Job) {
if job.State != schema.JobStateFailed {
t.Errorf("expected state failed, got %s", job.State)
}
},
},
{
name: "invalid JSON",
payload: `{
"jobId": "not a number"
}`,
expectError: true,
},
{
name: "missing jobId",
payload: `{
"cluster": "testcluster",
"jobState": "completed",
"stopTime": 1234571490
}`,
expectError: true,
},
{
name: "invalid job state",
setupJobFunc: func() {
startPayloadInvalid := `{
"jobId": 2003,
"user": "testuser",
"project": "testproj",
"cluster": "testcluster",
"partition": "main",
"walltime": 3600,
"numNodes": 1,
"numHwthreads": 8,
"numAcc": 0,
"shared": "none",
"monitoringStatus": 1,
"smt": 1,
"resources": [
{
"hostname": "host123",
"hwthreads": [0, 1]
}
],
"startTime": 1234567910
}`
natsAPI.handleStartJob(startPayloadInvalid)
natsAPI.JobRepository.SyncJobs()
time.Sleep(100 * time.Millisecond)
},
payload: `{
"jobId": 2003,
"cluster": "testcluster",
"startTime": 1234567910,
"jobState": "invalid_state",
"stopTime": 1234571510
}`,
expectError: true,
},
{
name: "stopTime before startTime",
setupJobFunc: func() {
startPayloadTime := `{
"jobId": 2004,
"user": "testuser",
"project": "testproj",
"cluster": "testcluster",
"partition": "main",
"walltime": 3600,
"numNodes": 1,
"numHwthreads": 8,
"numAcc": 0,
"shared": "none",
"monitoringStatus": 1,
"smt": 1,
"resources": [
{
"hostname": "host123",
"hwthreads": [0]
}
],
"startTime": 1234567920
}`
natsAPI.handleStartJob(startPayloadTime)
natsAPI.JobRepository.SyncJobs()
time.Sleep(100 * time.Millisecond)
},
payload: `{
"jobId": 2004,
"cluster": "testcluster",
"startTime": 1234567920,
"jobState": "completed",
"stopTime": 1234567900
}`,
expectError: true,
},
{
name: "job not found",
payload: `{
"jobId": 99999,
"cluster": "testcluster",
"startTime": 1234567890,
"jobState": "completed",
"stopTime": 1234571490
}`,
expectError: true,
},
}
testData := schema.JobData{
"load_one": map[schema.MetricScope]*schema.JobMetric{
schema.MetricScopeNode: {
Unit: schema.Unit{Base: "load"},
Timestep: 60,
Series: []schema.Series{
{
Hostname: "host123",
Statistics: schema.MetricStatistics{Min: 0.1, Avg: 0.2, Max: 0.3},
Data: []schema.Float{0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.3, 0.3, 0.3},
},
},
},
},
}
metricstore.TestLoadDataCallback = func(job *schema.Job, metrics []string, scopes []schema.MetricScope, ctx context.Context, resolution int) (schema.JobData, error) {
return testData, nil
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if tt.setupJobFunc != nil {
tt.setupJobFunc()
}
natsAPI.handleStopJob(tt.payload)
// Allow some time for async operations
time.Sleep(100 * time.Millisecond)
if !tt.expectError && tt.validateJob != nil {
// Extract job details from payload
var payloadMap map[string]any
json.Unmarshal([]byte(tt.payload), &payloadMap)
jobID := int64(payloadMap["jobId"].(float64))
cluster := payloadMap["cluster"].(string)
var startTime *int64
if st, ok := payloadMap["startTime"]; ok {
t := int64(st.(float64))
startTime = &t
}
job, err := natsAPI.JobRepository.Find(&jobID, &cluster, startTime)
if err != nil {
t.Fatalf("expected to find job, but got error: %v", err)
}
tt.validateJob(t, job)
}
})
}
}
func TestNatsHandleNodeState(t *testing.T) {
natsAPI := setupNatsTest(t)
t.Cleanup(cleanupNatsTest)
tests := []struct {
name string
data []byte
expectError bool
validateFn func(t *testing.T)
}{
{
name: "valid node state update",
data: []byte(`nodestate event="{\"cluster\":\"testcluster\",\"nodes\":[{\"hostname\":\"host123\",\"states\":[\"allocated\"],\"cpusAllocated\":8,\"memoryAllocated\":16384,\"gpusAllocated\":0,\"jobsRunning\":1}]}" 1234567890000000000`),
expectError: false,
validateFn: func(t *testing.T) {
// In a full test, we would verify the node state was updated in the database
// For now, just ensure no error occurred
},
},
{
name: "multiple nodes",
data: []byte(`nodestate event="{\"cluster\":\"testcluster\",\"nodes\":[{\"hostname\":\"host123\",\"states\":[\"idle\"],\"cpusAllocated\":0,\"memoryAllocated\":0,\"gpusAllocated\":0,\"jobsRunning\":0},{\"hostname\":\"host124\",\"states\":[\"allocated\"],\"cpusAllocated\":4,\"memoryAllocated\":8192,\"gpusAllocated\":1,\"jobsRunning\":1}]}" 1234567890000000000`),
expectError: false,
},
{
name: "invalid JSON in event field",
data: []byte(`nodestate event="{\"cluster\":\"testcluster\",\"nodes\":\"not an array\"}" 1234567890000000000`),
expectError: true,
},
{
name: "empty nodes array",
data: []byte(`nodestate event="{\"cluster\":\"testcluster\",\"nodes\":[]}" 1234567890000000000`),
expectError: false, // Empty array should not cause error
},
{
name: "invalid line protocol format",
data: []byte(`invalid line protocol format`),
expectError: true,
},
{
name: "empty data",
data: []byte(``),
expectError: false, // Should be handled gracefully with warning
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
natsAPI.handleNodeState("test.subject", tt.data)
// Allow some time for async operations
time.Sleep(50 * time.Millisecond)
if tt.validateFn != nil {
tt.validateFn(t)
}
})
}
}
func TestNatsProcessJobEvent(t *testing.T) {
natsAPI := setupNatsTest(t)
t.Cleanup(cleanupNatsTest)
msgStartJob, err := lp.NewMessage(
"job",
map[string]string{"function": "start_job"},
nil,
map[string]any{
"event": `{
"jobId": 3001,
"user": "testuser",
"project": "testproj",
"cluster": "testcluster",
"partition": "main",
"walltime": 3600,
"numNodes": 1,
"numHwthreads": 8,
"numAcc": 0,
"shared": "none",
"monitoringStatus": 1,
"smt": 1,
"resources": [
{
"hostname": "host123",
"hwthreads": [0, 1, 2, 3]
}
],
"startTime": 1234567890
}`,
},
time.Now(),
)
if err != nil {
t.Fatalf("failed to create test message: %v", err)
}
msgMissingTag, err := lp.NewMessage(
"job",
map[string]string{},
nil,
map[string]any{
"event": `{}`,
},
time.Now(),
)
if err != nil {
t.Fatalf("failed to create test message: %v", err)
}
msgUnknownFunc, err := lp.NewMessage(
"job",
map[string]string{"function": "unknown_function"},
nil,
map[string]any{
"event": `{}`,
},
time.Now(),
)
if err != nil {
t.Fatalf("failed to create test message: %v", err)
}
tests := []struct {
name string
message lp.CCMessage
expectError bool
}{
{
name: "start_job function",
message: msgStartJob,
expectError: false,
},
{
name: "missing function tag",
message: msgMissingTag,
expectError: true,
},
{
name: "unknown function",
message: msgUnknownFunc,
expectError: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
natsAPI.processJobEvent(tt.message)
time.Sleep(50 * time.Millisecond)
})
}
}
func TestNatsHandleJobEvent(t *testing.T) {
natsAPI := setupNatsTest(t)
t.Cleanup(cleanupNatsTest)
tests := []struct {
name string
data []byte
expectError bool
}{
{
name: "valid influx line protocol",
data: []byte(`job,function=start_job event="{\"jobId\":4001,\"user\":\"testuser\",\"project\":\"testproj\",\"cluster\":\"testcluster\",\"partition\":\"main\",\"walltime\":3600,\"numNodes\":1,\"numHwthreads\":8,\"numAcc\":0,\"shared\":\"none\",\"monitoringStatus\":1,\"smt\":1,\"resources\":[{\"hostname\":\"host123\",\"hwthreads\":[0,1,2,3]}],\"startTime\":1234567890}" 1234567890000000000`),
expectError: false,
},
{
name: "invalid influx line protocol",
data: []byte(`invalid line protocol format`),
expectError: true,
},
{
name: "empty data",
data: []byte(``),
expectError: false, // Decoder should handle empty input gracefully
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// HandleJobEvent doesn't return errors, it logs them
// We're just ensuring it doesn't panic
natsAPI.handleJobEvent("test.subject", tt.data)
time.Sleep(50 * time.Millisecond)
})
}
}
func TestNatsHandleJobEventEdgeCases(t *testing.T) {
natsAPI := setupNatsTest(t)
t.Cleanup(cleanupNatsTest)
tests := []struct {
name string
data []byte
expectError bool
description string
}{
{
name: "non-event message (metric data)",
data: []byte(`job,function=start_job value=123.45 1234567890000000000`),
expectError: false,
description: "Should skip non-event messages gracefully",
},
{
name: "wrong measurement name",
data: []byte(`wrongmeasurement,function=start_job event="{}" 1234567890000000000`),
expectError: false,
description: "Should warn about unexpected measurement but not fail",
},
{
name: "missing event field",
data: []byte(`job,function=start_job other_field="value" 1234567890000000000`),
expectError: true,
description: "Should error when event field is missing",
},
{
name: "multiple measurements in one message",
data: []byte("job,function=start_job event=\"{}\" 1234567890000000000\njob,function=stop_job event=\"{}\" 1234567890000000000"),
expectError: false,
description: "Should process multiple lines",
},
{
name: "escaped quotes in JSON payload",
data: []byte(`job,function=start_job event="{\"jobId\":6001,\"user\":\"test\\\"user\",\"cluster\":\"test\"}" 1234567890000000000`),
expectError: true,
description: "Should handle escaped quotes (though JSON parsing may fail)",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
natsAPI.handleJobEvent("test.subject", tt.data)
time.Sleep(50 * time.Millisecond)
})
}
}
func TestNatsHandleNodeStateEdgeCases(t *testing.T) {
natsAPI := setupNatsTest(t)
t.Cleanup(cleanupNatsTest)
tests := []struct {
name string
data []byte
expectError bool
description string
}{
{
name: "missing cluster field in JSON",
data: []byte(`nodestate event="{\"nodes\":[]}" 1234567890000000000`),
expectError: true,
description: "Should fail when cluster is missing",
},
{
name: "malformed JSON with unescaped quotes",
data: []byte(`nodestate event="{\"cluster\":\"test"cluster\",\"nodes\":[]}" 1234567890000000000`),
expectError: true,
description: "Should fail on malformed JSON",
},
{
name: "unicode characters in hostname",
data: []byte(`nodestate event="{\"cluster\":\"testcluster\",\"nodes\":[{\"hostname\":\"host-ñ123\",\"states\":[\"idle\"],\"cpusAllocated\":0,\"memoryAllocated\":0,\"gpusAllocated\":0,\"jobsRunning\":0}]}" 1234567890000000000`),
expectError: false,
description: "Should handle unicode characters",
},
{
name: "very large node count",
data: []byte(`nodestate event="{\"cluster\":\"testcluster\",\"nodes\":[{\"hostname\":\"node1\",\"states\":[\"idle\"],\"cpusAllocated\":0,\"memoryAllocated\":0,\"gpusAllocated\":0,\"jobsRunning\":0},{\"hostname\":\"node2\",\"states\":[\"idle\"],\"cpusAllocated\":0,\"memoryAllocated\":0,\"gpusAllocated\":0,\"jobsRunning\":0},{\"hostname\":\"node3\",\"states\":[\"idle\"],\"cpusAllocated\":0,\"memoryAllocated\":0,\"gpusAllocated\":0,\"jobsRunning\":0}]}" 1234567890000000000`),
expectError: false,
description: "Should handle multiple nodes efficiently",
},
{
name: "timestamp in past",
data: []byte(`nodestate event="{\"cluster\":\"testcluster\",\"nodes\":[]}" 1000000000000000000`),
expectError: false,
description: "Should accept any valid timestamp",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
natsAPI.handleNodeState("test.subject", tt.data)
time.Sleep(50 * time.Millisecond)
})
}
}
func TestNatsHandleStartJobDuplicatePrevention(t *testing.T) {
natsAPI := setupNatsTest(t)
t.Cleanup(cleanupNatsTest)
// Start a job
payload := `{
"jobId": 5001,
"user": "testuser",
"project": "testproj",
"cluster": "testcluster",
"partition": "main",
"walltime": 3600,
"numNodes": 1,
"numHwthreads": 8,
"numAcc": 0,
"shared": "none",
"monitoringStatus": 1,
"smt": 1,
"resources": [
{
"hostname": "host123",
"hwthreads": [0, 1, 2, 3]
}
],
"startTime": 1234567890
}`
natsAPI.handleStartJob(payload)
natsAPI.JobRepository.SyncJobs()
time.Sleep(100 * time.Millisecond)
// Try to start the same job again (within 24 hours)
duplicatePayload := `{
"jobId": 5001,
"user": "testuser",
"project": "testproj",
"cluster": "testcluster",
"partition": "main",
"walltime": 3600,
"numNodes": 1,
"numHwthreads": 8,
"numAcc": 0,
"shared": "none",
"monitoringStatus": 1,
"smt": 1,
"resources": [
{
"hostname": "host123",
"hwthreads": [0, 1, 2, 3]
}
],
"startTime": 1234567900
}`
natsAPI.handleStartJob(duplicatePayload)
natsAPI.JobRepository.SyncJobs()
time.Sleep(100 * time.Millisecond)
// Verify only one job exists
jobID := int64(5001)
cluster := "testcluster"
jobs, err := natsAPI.JobRepository.FindAll(&jobID, &cluster, nil)
if err != nil && err != sql.ErrNoRows {
t.Fatalf("unexpected error: %v", err)
}
if len(jobs) != 1 {
t.Errorf("expected 1 job, got %d", len(jobs))
}
}

View File

@@ -12,7 +12,7 @@ import (
"time" "time"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
) )
type UpdateNodeStatesRequest struct { type UpdateNodeStatesRequest struct {
@@ -47,7 +47,7 @@ func determineState(states []string) schema.SchedulerState {
// @description Required query-parameter defines if all users or only users with additional special roles are returned. // @description Required query-parameter defines if all users or only users with additional special roles are returned.
// @produce json // @produce json
// @param request body UpdateNodeStatesRequest true "Request body containing nodes and their states" // @param request body UpdateNodeStatesRequest true "Request body containing nodes and their states"
// @success 200 {object} api.DefaultApiResponse "Success message" // @success 200 {object} api.DefaultAPIResponse "Success message"
// @failure 400 {object} api.ErrorResponse "Bad Request" // @failure 400 {object} api.ErrorResponse "Bad Request"
// @failure 401 {object} api.ErrorResponse "Unauthorized" // @failure 401 {object} api.ErrorResponse "Unauthorized"
// @failure 403 {object} api.ErrorResponse "Forbidden" // @failure 403 {object} api.ErrorResponse "Forbidden"

View File

@@ -22,9 +22,9 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/auth" "github.com/ClusterCockpit/cc-backend/internal/auth"
"github.com/ClusterCockpit/cc-backend/internal/config" "github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/ClusterCockpit/cc-lib/util" "github.com/ClusterCockpit/cc-lib/v2/util"
"github.com/gorilla/mux" "github.com/gorilla/mux"
) )
@@ -48,6 +48,7 @@ import (
const ( const (
noticeFilePath = "./var/notice.txt" noticeFilePath = "./var/notice.txt"
noticeFilePerms = 0o644 noticeFilePerms = 0o644
maxNoticeLength = 10000 // Maximum allowed notice content length in characters
) )
type RestAPI struct { type RestAPI struct {
@@ -61,6 +62,7 @@ type RestAPI struct {
RepositoryMutex sync.Mutex RepositoryMutex sync.Mutex
} }
// New creates and initializes a new RestAPI instance with configured dependencies.
func New() *RestAPI { func New() *RestAPI {
return &RestAPI{ return &RestAPI{
JobRepository: repository.GetJobRepository(), JobRepository: repository.GetJobRepository(),
@@ -69,6 +71,8 @@ func New() *RestAPI {
} }
} }
// MountAPIRoutes registers REST API endpoints for job and cluster management.
// These routes use JWT token authentication via the X-Auth-Token header.
func (api *RestAPI) MountAPIRoutes(r *mux.Router) { func (api *RestAPI) MountAPIRoutes(r *mux.Router) {
r.StrictSlash(true) r.StrictSlash(true)
// REST API Uses TokenAuth // REST API Uses TokenAuth
@@ -79,8 +83,11 @@ func (api *RestAPI) MountAPIRoutes(r *mux.Router) {
// Slurm node state // Slurm node state
r.HandleFunc("/nodestate/", api.updateNodeStates).Methods(http.MethodPost, http.MethodPut) r.HandleFunc("/nodestate/", api.updateNodeStates).Methods(http.MethodPost, http.MethodPut)
// Job Handler // Job Handler
r.HandleFunc("/jobs/start_job/", api.startJob).Methods(http.MethodPost, http.MethodPut) if config.Keys.APISubjects == nil {
r.HandleFunc("/jobs/stop_job/", api.stopJobByRequest).Methods(http.MethodPost, http.MethodPut) cclog.Info("Enabling REST start/stop job API")
r.HandleFunc("/jobs/start_job/", api.startJob).Methods(http.MethodPost, http.MethodPut)
r.HandleFunc("/jobs/stop_job/", api.stopJobByRequest).Methods(http.MethodPost, http.MethodPut)
}
r.HandleFunc("/jobs/", api.getJobs).Methods(http.MethodGet) r.HandleFunc("/jobs/", api.getJobs).Methods(http.MethodGet)
r.HandleFunc("/jobs/{id}", api.getJobByID).Methods(http.MethodPost) r.HandleFunc("/jobs/{id}", api.getJobByID).Methods(http.MethodPost)
r.HandleFunc("/jobs/{id}", api.getCompleteJobByID).Methods(http.MethodGet) r.HandleFunc("/jobs/{id}", api.getCompleteJobByID).Methods(http.MethodGet)
@@ -100,6 +107,8 @@ func (api *RestAPI) MountAPIRoutes(r *mux.Router) {
} }
} }
// MountUserAPIRoutes registers user-accessible REST API endpoints.
// These are limited endpoints for regular users with JWT token authentication.
func (api *RestAPI) MountUserAPIRoutes(r *mux.Router) { func (api *RestAPI) MountUserAPIRoutes(r *mux.Router) {
r.StrictSlash(true) r.StrictSlash(true)
// REST API Uses TokenAuth // REST API Uses TokenAuth
@@ -109,6 +118,8 @@ func (api *RestAPI) MountUserAPIRoutes(r *mux.Router) {
r.HandleFunc("/jobs/metrics/{id}", api.getJobMetrics).Methods(http.MethodGet) r.HandleFunc("/jobs/metrics/{id}", api.getJobMetrics).Methods(http.MethodGet)
} }
// MountMetricStoreAPIRoutes registers metric storage API endpoints.
// These endpoints handle metric data ingestion and health checks with JWT token authentication.
func (api *RestAPI) MountMetricStoreAPIRoutes(r *mux.Router) { func (api *RestAPI) MountMetricStoreAPIRoutes(r *mux.Router) {
// REST API Uses TokenAuth // REST API Uses TokenAuth
// Note: StrictSlash handles trailing slash variations automatically // Note: StrictSlash handles trailing slash variations automatically
@@ -123,6 +134,8 @@ func (api *RestAPI) MountMetricStoreAPIRoutes(r *mux.Router) {
r.HandleFunc("/api/healthcheck/", metricsHealth).Methods(http.MethodGet) r.HandleFunc("/api/healthcheck/", metricsHealth).Methods(http.MethodGet)
} }
// MountConfigAPIRoutes registers configuration and user management endpoints.
// These routes use session-based authentication and require admin privileges.
func (api *RestAPI) MountConfigAPIRoutes(r *mux.Router) { func (api *RestAPI) MountConfigAPIRoutes(r *mux.Router) {
r.StrictSlash(true) r.StrictSlash(true)
// Settings Frontend Uses SessionAuth // Settings Frontend Uses SessionAuth
@@ -136,6 +149,8 @@ func (api *RestAPI) MountConfigAPIRoutes(r *mux.Router) {
} }
} }
// MountFrontendAPIRoutes registers frontend-specific API endpoints.
// These routes support JWT generation and user configuration updates with session authentication.
func (api *RestAPI) MountFrontendAPIRoutes(r *mux.Router) { func (api *RestAPI) MountFrontendAPIRoutes(r *mux.Router) {
r.StrictSlash(true) r.StrictSlash(true)
// Settings Frontend Uses SessionAuth // Settings Frontend Uses SessionAuth
@@ -157,6 +172,8 @@ type DefaultAPIResponse struct {
Message string `json:"msg"` Message string `json:"msg"`
} }
// handleError writes a standardized JSON error response with the given status code.
// It logs the error at WARN level and ensures proper Content-Type headers are set.
func handleError(err error, statusCode int, rw http.ResponseWriter) { func handleError(err error, statusCode int, rw http.ResponseWriter) {
cclog.Warnf("REST ERROR : %s", err.Error()) cclog.Warnf("REST ERROR : %s", err.Error())
rw.Header().Add("Content-Type", "application/json") rw.Header().Add("Content-Type", "application/json")
@@ -169,15 +186,38 @@ func handleError(err error, statusCode int, rw http.ResponseWriter) {
} }
} }
// decode reads JSON from r into val with strict validation that rejects unknown fields.
func decode(r io.Reader, val any) error { func decode(r io.Reader, val any) error {
dec := json.NewDecoder(r) dec := json.NewDecoder(r)
dec.DisallowUnknownFields() dec.DisallowUnknownFields()
return dec.Decode(val) return dec.Decode(val)
} }
func (api *RestAPI) editNotice(rw http.ResponseWriter, r *http.Request) { // validatePathComponent checks if a path component contains potentially malicious patterns
// SecuredCheck() only worked with TokenAuth: Removed // that could be used for path traversal attacks. Returns an error if validation fails.
func validatePathComponent(component, componentName string) error {
if strings.Contains(component, "..") ||
strings.Contains(component, "/") ||
strings.Contains(component, "\\") {
return fmt.Errorf("invalid %s", componentName)
}
return nil
}
// editNotice godoc
// @summary Update system notice
// @tags Config
// @description Updates the notice.txt file content. Only admins are allowed. Content is limited to 10000 characters.
// @accept mpfd
// @produce plain
// @param new-content formData string true "New notice content (max 10000 characters)"
// @success 200 {string} string "Update Notice Content Success"
// @failure 400 {object} ErrorResponse "Bad Request"
// @failure 403 {object} ErrorResponse "Forbidden"
// @failure 500 {object} ErrorResponse "Internal Server Error"
// @security ApiKeyAuth
// @router /notice/ [post]
func (api *RestAPI) editNotice(rw http.ResponseWriter, r *http.Request) {
if user := repository.GetUserFromContext(r.Context()); !user.HasRole(schema.RoleAdmin) { if user := repository.GetUserFromContext(r.Context()); !user.HasRole(schema.RoleAdmin) {
handleError(fmt.Errorf("only admins are allowed to update the notice.txt file"), http.StatusForbidden, rw) handleError(fmt.Errorf("only admins are allowed to update the notice.txt file"), http.StatusForbidden, rw)
return return
@@ -186,9 +226,8 @@ func (api *RestAPI) editNotice(rw http.ResponseWriter, r *http.Request) {
// Get Value // Get Value
newContent := r.FormValue("new-content") newContent := r.FormValue("new-content")
// Validate content length to prevent DoS if len(newContent) > maxNoticeLength {
if len(newContent) > 10000 { handleError(fmt.Errorf("notice content exceeds maximum length of %d characters", maxNoticeLength), http.StatusBadRequest, rw)
handleError(fmt.Errorf("notice content exceeds maximum length of 10000 characters"), http.StatusBadRequest, rw)
return return
} }
@@ -200,7 +239,9 @@ func (api *RestAPI) editNotice(rw http.ResponseWriter, r *http.Request) {
handleError(fmt.Errorf("creating notice file failed: %w", err), http.StatusInternalServerError, rw) handleError(fmt.Errorf("creating notice file failed: %w", err), http.StatusInternalServerError, rw)
return return
} }
ntxt.Close() if err := ntxt.Close(); err != nil {
cclog.Warnf("Failed to close notice file: %v", err)
}
} }
if err := os.WriteFile(noticeFilePath, []byte(newContent), noticeFilePerms); err != nil { if err := os.WriteFile(noticeFilePath, []byte(newContent), noticeFilePerms); err != nil {
@@ -210,13 +251,30 @@ func (api *RestAPI) editNotice(rw http.ResponseWriter, r *http.Request) {
rw.Header().Set("Content-Type", "text/plain") rw.Header().Set("Content-Type", "text/plain")
rw.WriteHeader(http.StatusOK) rw.WriteHeader(http.StatusOK)
var msg []byte
if newContent != "" { if newContent != "" {
rw.Write([]byte("Update Notice Content Success")) msg = []byte("Update Notice Content Success")
} else { } else {
rw.Write([]byte("Empty Notice Content Success")) msg = []byte("Empty Notice Content Success")
}
if _, err := rw.Write(msg); err != nil {
cclog.Errorf("Failed to write response: %v", err)
} }
} }
// getJWT godoc
// @summary Generate JWT token
// @tags Frontend
// @description Generates a JWT token for a user. Admins can generate tokens for any user, regular users only for themselves.
// @accept mpfd
// @produce plain
// @param username formData string true "Username to generate JWT for"
// @success 200 {string} string "JWT token"
// @failure 403 {object} ErrorResponse "Forbidden"
// @failure 404 {object} ErrorResponse "User Not Found"
// @failure 500 {object} ErrorResponse "Internal Server Error"
// @security ApiKeyAuth
// @router /jwt/ [get]
func (api *RestAPI) getJWT(rw http.ResponseWriter, r *http.Request) { func (api *RestAPI) getJWT(rw http.ResponseWriter, r *http.Request) {
rw.Header().Set("Content-Type", "text/plain") rw.Header().Set("Content-Type", "text/plain")
username := r.FormValue("username") username := r.FormValue("username")
@@ -241,12 +299,22 @@ func (api *RestAPI) getJWT(rw http.ResponseWriter, r *http.Request) {
} }
rw.WriteHeader(http.StatusOK) rw.WriteHeader(http.StatusOK)
rw.Write([]byte(jwt)) if _, err := rw.Write([]byte(jwt)); err != nil {
cclog.Errorf("Failed to write JWT response: %v", err)
}
} }
// getRoles godoc
// @summary Get available roles
// @tags Config
// @description Returns a list of valid user roles. Only admins are allowed.
// @produce json
// @success 200 {array} string "List of role names"
// @failure 403 {object} ErrorResponse "Forbidden"
// @failure 500 {object} ErrorResponse "Internal Server Error"
// @security ApiKeyAuth
// @router /roles/ [get]
func (api *RestAPI) getRoles(rw http.ResponseWriter, r *http.Request) { func (api *RestAPI) getRoles(rw http.ResponseWriter, r *http.Request) {
// SecuredCheck() only worked with TokenAuth: Removed
user := repository.GetUserFromContext(r.Context()) user := repository.GetUserFromContext(r.Context())
if !user.HasRole(schema.RoleAdmin) { if !user.HasRole(schema.RoleAdmin) {
handleError(fmt.Errorf("only admins are allowed to fetch a list of roles"), http.StatusForbidden, rw) handleError(fmt.Errorf("only admins are allowed to fetch a list of roles"), http.StatusForbidden, rw)
@@ -265,6 +333,18 @@ func (api *RestAPI) getRoles(rw http.ResponseWriter, r *http.Request) {
} }
} }
// updateConfiguration godoc
// @summary Update user configuration
// @tags Frontend
// @description Updates a user's configuration key-value pair.
// @accept mpfd
// @produce plain
// @param key formData string true "Configuration key"
// @param value formData string true "Configuration value"
// @success 200 {string} string "success"
// @failure 500 {object} ErrorResponse "Internal Server Error"
// @security ApiKeyAuth
// @router /configuration/ [post]
func (api *RestAPI) updateConfiguration(rw http.ResponseWriter, r *http.Request) { func (api *RestAPI) updateConfiguration(rw http.ResponseWriter, r *http.Request) {
rw.Header().Set("Content-Type", "text/plain") rw.Header().Set("Content-Type", "text/plain")
key, value := r.FormValue("key"), r.FormValue("value") key, value := r.FormValue("key"), r.FormValue("value")
@@ -275,9 +355,25 @@ func (api *RestAPI) updateConfiguration(rw http.ResponseWriter, r *http.Request)
} }
rw.WriteHeader(http.StatusOK) rw.WriteHeader(http.StatusOK)
rw.Write([]byte("success")) if _, err := rw.Write([]byte("success")); err != nil {
cclog.Errorf("Failed to write response: %v", err)
}
} }
// putMachineState godoc
// @summary Store machine state
// @tags Machine State
// @description Stores machine state data for a specific cluster node. Validates cluster and host names to prevent path traversal.
// @accept json
// @produce plain
// @param cluster path string true "Cluster name"
// @param host path string true "Host name"
// @success 201 "Created"
// @failure 400 {object} ErrorResponse "Bad Request"
// @failure 404 {object} ErrorResponse "Machine state not enabled"
// @failure 500 {object} ErrorResponse "Internal Server Error"
// @security ApiKeyAuth
// @router /machine_state/{cluster}/{host} [put]
func (api *RestAPI) putMachineState(rw http.ResponseWriter, r *http.Request) { func (api *RestAPI) putMachineState(rw http.ResponseWriter, r *http.Request) {
if api.MachineStateDir == "" { if api.MachineStateDir == "" {
handleError(fmt.Errorf("machine state not enabled"), http.StatusNotFound, rw) handleError(fmt.Errorf("machine state not enabled"), http.StatusNotFound, rw)
@@ -288,13 +384,12 @@ func (api *RestAPI) putMachineState(rw http.ResponseWriter, r *http.Request) {
cluster := vars["cluster"] cluster := vars["cluster"]
host := vars["host"] host := vars["host"]
// Validate cluster and host to prevent path traversal attacks if err := validatePathComponent(cluster, "cluster name"); err != nil {
if strings.Contains(cluster, "..") || strings.Contains(cluster, "/") || strings.Contains(cluster, "\\") { handleError(err, http.StatusBadRequest, rw)
handleError(fmt.Errorf("invalid cluster name"), http.StatusBadRequest, rw)
return return
} }
if strings.Contains(host, "..") || strings.Contains(host, "/") || strings.Contains(host, "\\") { if err := validatePathComponent(host, "host name"); err != nil {
handleError(fmt.Errorf("invalid host name"), http.StatusBadRequest, rw) handleError(err, http.StatusBadRequest, rw)
return return
} }
@@ -320,6 +415,18 @@ func (api *RestAPI) putMachineState(rw http.ResponseWriter, r *http.Request) {
rw.WriteHeader(http.StatusCreated) rw.WriteHeader(http.StatusCreated)
} }
// getMachineState godoc
// @summary Retrieve machine state
// @tags Machine State
// @description Retrieves stored machine state data for a specific cluster node. Validates cluster and host names to prevent path traversal.
// @produce json
// @param cluster path string true "Cluster name"
// @param host path string true "Host name"
// @success 200 {object} object "Machine state JSON data"
// @failure 400 {object} ErrorResponse "Bad Request"
// @failure 404 {object} ErrorResponse "Machine state not enabled or file not found"
// @security ApiKeyAuth
// @router /machine_state/{cluster}/{host} [get]
func (api *RestAPI) getMachineState(rw http.ResponseWriter, r *http.Request) { func (api *RestAPI) getMachineState(rw http.ResponseWriter, r *http.Request) {
if api.MachineStateDir == "" { if api.MachineStateDir == "" {
handleError(fmt.Errorf("machine state not enabled"), http.StatusNotFound, rw) handleError(fmt.Errorf("machine state not enabled"), http.StatusNotFound, rw)
@@ -330,13 +437,12 @@ func (api *RestAPI) getMachineState(rw http.ResponseWriter, r *http.Request) {
cluster := vars["cluster"] cluster := vars["cluster"]
host := vars["host"] host := vars["host"]
// Validate cluster and host to prevent path traversal attacks if err := validatePathComponent(cluster, "cluster name"); err != nil {
if strings.Contains(cluster, "..") || strings.Contains(cluster, "/") || strings.Contains(cluster, "\\") { handleError(err, http.StatusBadRequest, rw)
handleError(fmt.Errorf("invalid cluster name"), http.StatusBadRequest, rw)
return return
} }
if strings.Contains(host, "..") || strings.Contains(host, "/") || strings.Contains(host, "\\") { if err := validatePathComponent(host, "host name"); err != nil {
handleError(fmt.Errorf("invalid host name"), http.StatusBadRequest, rw) handleError(err, http.StatusBadRequest, rw)
return return
} }

View File

@@ -11,8 +11,8 @@ import (
"net/http" "net/http"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/gorilla/mux" "github.com/gorilla/mux"
) )
@@ -31,7 +31,7 @@ type APIReturnedUser struct {
// @description Required query-parameter defines if all users or only users with additional special roles are returned. // @description Required query-parameter defines if all users or only users with additional special roles are returned.
// @produce json // @produce json
// @param not-just-user query bool true "If returned list should contain all users or only users with additional special roles" // @param not-just-user query bool true "If returned list should contain all users or only users with additional special roles"
// @success 200 {array} api.ApiReturnedUser "List of users returned successfully" // @success 200 {array} api.APIReturnedUser "List of users returned successfully"
// @failure 400 {string} string "Bad Request" // @failure 400 {string} string "Bad Request"
// @failure 401 {string} string "Unauthorized" // @failure 401 {string} string "Unauthorized"
// @failure 403 {string} string "Forbidden" // @failure 403 {string} string "Forbidden"

View File

@@ -106,7 +106,7 @@ Data is archived at the highest available resolution (typically 60s intervals).
```go ```go
// In archiver.go ArchiveJob() function // In archiver.go ArchiveJob() function
jobData, err := metricDataDispatcher.LoadData(job, allMetrics, scopes, ctx, 300) jobData, err := metricdispatch.LoadData(job, allMetrics, scopes, ctx, 300)
// 0 = highest resolution // 0 = highest resolution
// 300 = 5-minute resolution // 300 = 5-minute resolution
``` ```
@@ -185,6 +185,6 @@ Internal state is protected by:
## Dependencies ## Dependencies
- `internal/repository`: Database operations for job metadata - `internal/repository`: Database operations for job metadata
- `internal/metricDataDispatcher`: Loading metric data from various backends - `internal/metricdispatch`: Loading metric data from various backends
- `pkg/archive`: Archive backend abstraction (filesystem, S3, SQLite) - `pkg/archive`: Archive backend abstraction (filesystem, S3, SQLite)
- `cc-lib/schema`: Job and metric data structures - `cc-lib/schema`: Job and metric data structures

View File

@@ -54,8 +54,8 @@ import (
"time" "time"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
sq "github.com/Masterminds/squirrel" sq "github.com/Masterminds/squirrel"
) )

View File

@@ -10,10 +10,10 @@ import (
"math" "math"
"github.com/ClusterCockpit/cc-backend/internal/config" "github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/metricDataDispatcher" "github.com/ClusterCockpit/cc-backend/internal/metricdispatch"
"github.com/ClusterCockpit/cc-backend/pkg/archive" "github.com/ClusterCockpit/cc-backend/pkg/archive"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
) )
// ArchiveJob archives a completed job's metric data to the configured archive backend. // ArchiveJob archives a completed job's metric data to the configured archive backend.
@@ -60,7 +60,7 @@ func ArchiveJob(job *schema.Job, ctx context.Context) (*schema.Job, error) {
scopes = append(scopes, schema.MetricScopeAccelerator) scopes = append(scopes, schema.MetricScopeAccelerator)
} }
jobData, err := metricDataDispatcher.LoadData(job, allMetrics, scopes, ctx, 0) // 0 Resulotion-Value retrieves highest res (60s) jobData, err := metricdispatch.LoadData(job, allMetrics, scopes, ctx, 0) // 0 Resulotion-Value retrieves highest res (60s)
if err != nil { if err != nil {
cclog.Error("Error wile loading job data for archiving") cclog.Error("Error wile loading job data for archiving")
return nil, err return nil, err

View File

@@ -25,9 +25,9 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/config" "github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/ClusterCockpit/cc-lib/util" "github.com/ClusterCockpit/cc-lib/v2/util"
"github.com/gorilla/sessions" "github.com/gorilla/sessions"
) )
@@ -40,7 +40,7 @@ type Authenticator interface {
// authenticator should attempt the login. This method should not perform // authenticator should attempt the login. This method should not perform
// expensive operations or actual authentication. // expensive operations or actual authentication.
CanLogin(user *schema.User, username string, rw http.ResponseWriter, r *http.Request) (*schema.User, bool) CanLogin(user *schema.User, username string, rw http.ResponseWriter, r *http.Request) (*schema.User, bool)
// Login performs the actually authentication for the user. // Login performs the actually authentication for the user.
// It returns the authenticated user or an error if authentication fails. // It returns the authenticated user or an error if authentication fails.
// The user parameter may be nil if the user doesn't exist in the database yet. // The user parameter may be nil if the user doesn't exist in the database yet.
@@ -65,13 +65,13 @@ var ipUserLimiters sync.Map
func getIPUserLimiter(ip, username string) *rate.Limiter { func getIPUserLimiter(ip, username string) *rate.Limiter {
key := ip + ":" + username key := ip + ":" + username
now := time.Now() now := time.Now()
if entry, ok := ipUserLimiters.Load(key); ok { if entry, ok := ipUserLimiters.Load(key); ok {
rle := entry.(*rateLimiterEntry) rle := entry.(*rateLimiterEntry)
rle.lastUsed = now rle.lastUsed = now
return rle.limiter return rle.limiter
} }
// More aggressive rate limiting: 5 attempts per 15 minutes // More aggressive rate limiting: 5 attempts per 15 minutes
newLimiter := rate.NewLimiter(rate.Every(15*time.Minute/5), 5) newLimiter := rate.NewLimiter(rate.Every(15*time.Minute/5), 5)
ipUserLimiters.Store(key, &rateLimiterEntry{ ipUserLimiters.Store(key, &rateLimiterEntry{
@@ -176,7 +176,7 @@ func (auth *Authentication) AuthViaSession(
func Init(authCfg *json.RawMessage) { func Init(authCfg *json.RawMessage) {
initOnce.Do(func() { initOnce.Do(func() {
authInstance = &Authentication{} authInstance = &Authentication{}
// Start background cleanup of rate limiters // Start background cleanup of rate limiters
startRateLimiterCleanup() startRateLimiterCleanup()
@@ -272,7 +272,7 @@ func handleUserSync(user *schema.User, syncUserOnLogin, updateUserOnLogin bool)
cclog.Errorf("Error while loading user '%s': %v", user.Username, err) cclog.Errorf("Error while loading user '%s': %v", user.Username, err)
return return
} }
if err == sql.ErrNoRows && syncUserOnLogin { // Add new user if err == sql.ErrNoRows && syncUserOnLogin { // Add new user
if err := r.AddUser(user); err != nil { if err := r.AddUser(user); err != nil {
cclog.Errorf("Error while adding user '%s' to DB: %v", user.Username, err) cclog.Errorf("Error while adding user '%s' to DB: %v", user.Username, err)

View File

@@ -15,25 +15,25 @@ import (
func TestGetIPUserLimiter(t *testing.T) { func TestGetIPUserLimiter(t *testing.T) {
ip := "192.168.1.1" ip := "192.168.1.1"
username := "testuser" username := "testuser"
// Get limiter for the first time // Get limiter for the first time
limiter1 := getIPUserLimiter(ip, username) limiter1 := getIPUserLimiter(ip, username)
if limiter1 == nil { if limiter1 == nil {
t.Fatal("Expected limiter to be created") t.Fatal("Expected limiter to be created")
} }
// Get the same limiter again // Get the same limiter again
limiter2 := getIPUserLimiter(ip, username) limiter2 := getIPUserLimiter(ip, username)
if limiter1 != limiter2 { if limiter1 != limiter2 {
t.Error("Expected to get the same limiter instance") t.Error("Expected to get the same limiter instance")
} }
// Get a different limiter for different user // Get a different limiter for different user
limiter3 := getIPUserLimiter(ip, "otheruser") limiter3 := getIPUserLimiter(ip, "otheruser")
if limiter1 == limiter3 { if limiter1 == limiter3 {
t.Error("Expected different limiter for different user") t.Error("Expected different limiter for different user")
} }
// Get a different limiter for different IP // Get a different limiter for different IP
limiter4 := getIPUserLimiter("192.168.1.2", username) limiter4 := getIPUserLimiter("192.168.1.2", username)
if limiter1 == limiter4 { if limiter1 == limiter4 {
@@ -45,16 +45,16 @@ func TestGetIPUserLimiter(t *testing.T) {
func TestRateLimiterBehavior(t *testing.T) { func TestRateLimiterBehavior(t *testing.T) {
ip := "10.0.0.1" ip := "10.0.0.1"
username := "ratelimituser" username := "ratelimituser"
limiter := getIPUserLimiter(ip, username) limiter := getIPUserLimiter(ip, username)
// Should allow first 5 attempts // Should allow first 5 attempts
for i := 0; i < 5; i++ { for i := 0; i < 5; i++ {
if !limiter.Allow() { if !limiter.Allow() {
t.Errorf("Request %d should be allowed within rate limit", i+1) t.Errorf("Request %d should be allowed within rate limit", i+1)
} }
} }
// 6th attempt should be blocked // 6th attempt should be blocked
if limiter.Allow() { if limiter.Allow() {
t.Error("Request 6 should be blocked by rate limiter") t.Error("Request 6 should be blocked by rate limiter")
@@ -65,19 +65,19 @@ func TestRateLimiterBehavior(t *testing.T) {
func TestCleanupOldRateLimiters(t *testing.T) { func TestCleanupOldRateLimiters(t *testing.T) {
// Clear all existing limiters first to avoid interference from other tests // Clear all existing limiters first to avoid interference from other tests
cleanupOldRateLimiters(time.Now().Add(24 * time.Hour)) cleanupOldRateLimiters(time.Now().Add(24 * time.Hour))
// Create some new rate limiters // Create some new rate limiters
limiter1 := getIPUserLimiter("1.1.1.1", "user1") limiter1 := getIPUserLimiter("1.1.1.1", "user1")
limiter2 := getIPUserLimiter("2.2.2.2", "user2") limiter2 := getIPUserLimiter("2.2.2.2", "user2")
if limiter1 == nil || limiter2 == nil { if limiter1 == nil || limiter2 == nil {
t.Fatal("Failed to create test limiters") t.Fatal("Failed to create test limiters")
} }
// Cleanup limiters older than 1 second from now (should keep both) // Cleanup limiters older than 1 second from now (should keep both)
time.Sleep(10 * time.Millisecond) // Small delay to ensure timestamp difference time.Sleep(10 * time.Millisecond) // Small delay to ensure timestamp difference
cleanupOldRateLimiters(time.Now().Add(-1 * time.Second)) cleanupOldRateLimiters(time.Now().Add(-1 * time.Second))
// Verify they still exist (should get same instance) // Verify they still exist (should get same instance)
if getIPUserLimiter("1.1.1.1", "user1") != limiter1 { if getIPUserLimiter("1.1.1.1", "user1") != limiter1 {
t.Error("Limiter 1 was incorrectly cleaned up") t.Error("Limiter 1 was incorrectly cleaned up")
@@ -85,10 +85,10 @@ func TestCleanupOldRateLimiters(t *testing.T) {
if getIPUserLimiter("2.2.2.2", "user2") != limiter2 { if getIPUserLimiter("2.2.2.2", "user2") != limiter2 {
t.Error("Limiter 2 was incorrectly cleaned up") t.Error("Limiter 2 was incorrectly cleaned up")
} }
// Cleanup limiters older than 1 hour from now (should remove both) // Cleanup limiters older than 1 hour from now (should remove both)
cleanupOldRateLimiters(time.Now().Add(2 * time.Hour)) cleanupOldRateLimiters(time.Now().Add(2 * time.Hour))
// Getting them again should create new instances // Getting them again should create new instances
newLimiter1 := getIPUserLimiter("1.1.1.1", "user1") newLimiter1 := getIPUserLimiter("1.1.1.1", "user1")
if newLimiter1 == limiter1 { if newLimiter1 == limiter1 {
@@ -107,14 +107,14 @@ func TestIPv4Extraction(t *testing.T) {
{"IPv4 without port", "192.168.1.1", "192.168.1.1"}, {"IPv4 without port", "192.168.1.1", "192.168.1.1"},
{"Localhost with port", "127.0.0.1:3000", "127.0.0.1"}, {"Localhost with port", "127.0.0.1:3000", "127.0.0.1"},
} }
for _, tt := range tests { for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) { t.Run(tt.name, func(t *testing.T) {
result := tt.input result := tt.input
if host, _, err := net.SplitHostPort(result); err == nil { if host, _, err := net.SplitHostPort(result); err == nil {
result = host result = host
} }
if result != tt.expected { if result != tt.expected {
t.Errorf("Expected %s, got %s", tt.expected, result) t.Errorf("Expected %s, got %s", tt.expected, result)
} }
@@ -122,7 +122,7 @@ func TestIPv4Extraction(t *testing.T) {
} }
} }
// TestIPv6Extraction tests extracting IPv6 addresses // TestIPv6Extraction tests extracting IPv6 addresses
func TestIPv6Extraction(t *testing.T) { func TestIPv6Extraction(t *testing.T) {
tests := []struct { tests := []struct {
name string name string
@@ -134,14 +134,14 @@ func TestIPv6Extraction(t *testing.T) {
{"IPv6 without port", "2001:db8::1", "2001:db8::1"}, {"IPv6 without port", "2001:db8::1", "2001:db8::1"},
{"IPv6 localhost", "::1", "::1"}, {"IPv6 localhost", "::1", "::1"},
} }
for _, tt := range tests { for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) { t.Run(tt.name, func(t *testing.T) {
result := tt.input result := tt.input
if host, _, err := net.SplitHostPort(result); err == nil { if host, _, err := net.SplitHostPort(result); err == nil {
result = host result = host
} }
if result != tt.expected { if result != tt.expected {
t.Errorf("Expected %s, got %s", tt.expected, result) t.Errorf("Expected %s, got %s", tt.expected, result)
} }
@@ -160,14 +160,14 @@ func TestIPExtractionEdgeCases(t *testing.T) {
{"Empty string", "", ""}, {"Empty string", "", ""},
{"Just port", ":8080", ""}, {"Just port", ":8080", ""},
} }
for _, tt := range tests { for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) { t.Run(tt.name, func(t *testing.T) {
result := tt.input result := tt.input
if host, _, err := net.SplitHostPort(result); err == nil { if host, _, err := net.SplitHostPort(result); err == nil {
result = host result = host
} }
if result != tt.expected { if result != tt.expected {
t.Errorf("Expected %s, got %s", tt.expected, result) t.Errorf("Expected %s, got %s", tt.expected, result)
} }

View File

@@ -14,8 +14,8 @@ import (
"strings" "strings"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/golang-jwt/jwt/v5" "github.com/golang-jwt/jwt/v5"
) )
@@ -101,20 +101,20 @@ func (ja *JWTAuthenticator) AuthViaJWT(
// Token is valid, extract payload // Token is valid, extract payload
claims := token.Claims.(jwt.MapClaims) claims := token.Claims.(jwt.MapClaims)
// Use shared helper to get user from JWT claims // Use shared helper to get user from JWT claims
var user *schema.User var user *schema.User
user, err = getUserFromJWT(claims, Keys.JwtConfig.ValidateUser, schema.AuthToken, -1) user, err = getUserFromJWT(claims, Keys.JwtConfig.ValidateUser, schema.AuthToken, -1)
if err != nil { if err != nil {
return nil, err return nil, err
} }
// If not validating user, we only get roles from JWT (no projects for this auth method) // If not validating user, we only get roles from JWT (no projects for this auth method)
if !Keys.JwtConfig.ValidateUser { if !Keys.JwtConfig.ValidateUser {
user.Roles = extractRolesFromClaims(claims, false) user.Roles = extractRolesFromClaims(claims, false)
user.Projects = nil // Standard JWT auth doesn't include projects user.Projects = nil // Standard JWT auth doesn't include projects
} }
return user, nil return user, nil
} }

View File

@@ -12,8 +12,8 @@ import (
"net/http" "net/http"
"os" "os"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/golang-jwt/jwt/v5" "github.com/golang-jwt/jwt/v5"
) )
@@ -146,13 +146,13 @@ func (ja *JWTCookieSessionAuthenticator) Login(
} }
claims := token.Claims.(jwt.MapClaims) claims := token.Claims.(jwt.MapClaims)
// Use shared helper to get user from JWT claims // Use shared helper to get user from JWT claims
user, err = getUserFromJWT(claims, jc.ValidateUser, schema.AuthSession, schema.AuthViaToken) user, err = getUserFromJWT(claims, jc.ValidateUser, schema.AuthSession, schema.AuthViaToken)
if err != nil { if err != nil {
return nil, err return nil, err
} }
// Sync or update user if configured // Sync or update user if configured
if !jc.ValidateUser && (jc.SyncUserOnLogin || jc.UpdateUserOnLogin) { if !jc.ValidateUser && (jc.SyncUserOnLogin || jc.UpdateUserOnLogin) {
handleTokenUser(user) handleTokenUser(user)

View File

@@ -11,8 +11,8 @@ import (
"fmt" "fmt"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/golang-jwt/jwt/v5" "github.com/golang-jwt/jwt/v5"
) )
@@ -28,7 +28,7 @@ func extractStringFromClaims(claims jwt.MapClaims, key string) string {
// If validateRoles is true, only valid roles are returned // If validateRoles is true, only valid roles are returned
func extractRolesFromClaims(claims jwt.MapClaims, validateRoles bool) []string { func extractRolesFromClaims(claims jwt.MapClaims, validateRoles bool) []string {
var roles []string var roles []string
if rawroles, ok := claims["roles"].([]any); ok { if rawroles, ok := claims["roles"].([]any); ok {
for _, rr := range rawroles { for _, rr := range rawroles {
if r, ok := rr.(string); ok { if r, ok := rr.(string); ok {
@@ -42,14 +42,14 @@ func extractRolesFromClaims(claims jwt.MapClaims, validateRoles bool) []string {
} }
} }
} }
return roles return roles
} }
// extractProjectsFromClaims extracts projects from JWT claims // extractProjectsFromClaims extracts projects from JWT claims
func extractProjectsFromClaims(claims jwt.MapClaims) []string { func extractProjectsFromClaims(claims jwt.MapClaims) []string {
projects := make([]string, 0) projects := make([]string, 0)
if rawprojs, ok := claims["projects"].([]any); ok { if rawprojs, ok := claims["projects"].([]any); ok {
for _, pp := range rawprojs { for _, pp := range rawprojs {
if p, ok := pp.(string); ok { if p, ok := pp.(string); ok {
@@ -61,7 +61,7 @@ func extractProjectsFromClaims(claims jwt.MapClaims) []string {
projects = append(projects, projSlice...) projects = append(projects, projSlice...)
} }
} }
return projects return projects
} }
@@ -72,14 +72,14 @@ func extractNameFromClaims(claims jwt.MapClaims) string {
if name, ok := claims["name"].(string); ok { if name, ok := claims["name"].(string); ok {
return name return name
} }
// Try nested structure: {name: {values: [...]}} // Try nested structure: {name: {values: [...]}}
if wrap, ok := claims["name"].(map[string]any); ok { if wrap, ok := claims["name"].(map[string]any); ok {
if vals, ok := wrap["values"].([]any); ok { if vals, ok := wrap["values"].([]any); ok {
if len(vals) == 0 { if len(vals) == 0 {
return "" return ""
} }
name := fmt.Sprintf("%v", vals[0]) name := fmt.Sprintf("%v", vals[0])
for i := 1; i < len(vals); i++ { for i := 1; i < len(vals); i++ {
name += fmt.Sprintf(" %v", vals[i]) name += fmt.Sprintf(" %v", vals[i])
@@ -87,7 +87,7 @@ func extractNameFromClaims(claims jwt.MapClaims) string {
return name return name
} }
} }
return "" return ""
} }
@@ -100,7 +100,7 @@ func getUserFromJWT(claims jwt.MapClaims, validateUser bool, authType schema.Aut
if sub == "" { if sub == "" {
return nil, errors.New("missing 'sub' claim in JWT") return nil, errors.New("missing 'sub' claim in JWT")
} }
if validateUser { if validateUser {
// Validate user against database // Validate user against database
ur := repository.GetUserRepository() ur := repository.GetUserRepository()
@@ -109,22 +109,22 @@ func getUserFromJWT(claims jwt.MapClaims, validateUser bool, authType schema.Aut
cclog.Errorf("Error while loading user '%v': %v", sub, err) cclog.Errorf("Error while loading user '%v': %v", sub, err)
return nil, fmt.Errorf("database error: %w", err) return nil, fmt.Errorf("database error: %w", err)
} }
// Deny any logins for unknown usernames // Deny any logins for unknown usernames
if user == nil || err == sql.ErrNoRows { if user == nil || err == sql.ErrNoRows {
cclog.Warn("Could not find user from JWT in internal database.") cclog.Warn("Could not find user from JWT in internal database.")
return nil, errors.New("unknown user") return nil, errors.New("unknown user")
} }
// Return database user (with database roles) // Return database user (with database roles)
return user, nil return user, nil
} }
// Create user from JWT claims // Create user from JWT claims
name := extractNameFromClaims(claims) name := extractNameFromClaims(claims)
roles := extractRolesFromClaims(claims, true) // Validate roles roles := extractRolesFromClaims(claims, true) // Validate roles
projects := extractProjectsFromClaims(claims) projects := extractProjectsFromClaims(claims)
return &schema.User{ return &schema.User{
Username: sub, Username: sub,
Name: name, Name: name,

View File

@@ -8,7 +8,7 @@ package auth
import ( import (
"testing" "testing"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/golang-jwt/jwt/v5" "github.com/golang-jwt/jwt/v5"
) )
@@ -19,7 +19,7 @@ func TestExtractStringFromClaims(t *testing.T) {
"email": "test@example.com", "email": "test@example.com",
"age": 25, // not a string "age": 25, // not a string
} }
tests := []struct { tests := []struct {
name string name string
key string key string
@@ -30,7 +30,7 @@ func TestExtractStringFromClaims(t *testing.T) {
{"Non-existent key", "missing", ""}, {"Non-existent key", "missing", ""},
{"Non-string value", "age", ""}, {"Non-string value", "age", ""},
} }
for _, tt := range tests { for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) { t.Run(tt.name, func(t *testing.T) {
result := extractStringFromClaims(claims, tt.key) result := extractStringFromClaims(claims, tt.key)
@@ -88,16 +88,16 @@ func TestExtractRolesFromClaims(t *testing.T) {
expected: []string{}, expected: []string{},
}, },
} }
for _, tt := range tests { for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) { t.Run(tt.name, func(t *testing.T) {
result := extractRolesFromClaims(tt.claims, tt.validateRoles) result := extractRolesFromClaims(tt.claims, tt.validateRoles)
if len(result) != len(tt.expected) { if len(result) != len(tt.expected) {
t.Errorf("Expected %d roles, got %d", len(tt.expected), len(result)) t.Errorf("Expected %d roles, got %d", len(tt.expected), len(result))
return return
} }
for i, role := range result { for i, role := range result {
if i >= len(tt.expected) || role != tt.expected[i] { if i >= len(tt.expected) || role != tt.expected[i] {
t.Errorf("Expected role %s at position %d, got %s", tt.expected[i], i, role) t.Errorf("Expected role %s at position %d, got %s", tt.expected[i], i, role)
@@ -141,16 +141,16 @@ func TestExtractProjectsFromClaims(t *testing.T) {
expected: []string{"project1", "project2"}, // Should skip non-strings expected: []string{"project1", "project2"}, // Should skip non-strings
}, },
} }
for _, tt := range tests { for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) { t.Run(tt.name, func(t *testing.T) {
result := extractProjectsFromClaims(tt.claims) result := extractProjectsFromClaims(tt.claims)
if len(result) != len(tt.expected) { if len(result) != len(tt.expected) {
t.Errorf("Expected %d projects, got %d", len(tt.expected), len(result)) t.Errorf("Expected %d projects, got %d", len(tt.expected), len(result))
return return
} }
for i, project := range result { for i, project := range result {
if i >= len(tt.expected) || project != tt.expected[i] { if i >= len(tt.expected) || project != tt.expected[i] {
t.Errorf("Expected project %s at position %d, got %s", tt.expected[i], i, project) t.Errorf("Expected project %s at position %d, got %s", tt.expected[i], i, project)
@@ -216,7 +216,7 @@ func TestExtractNameFromClaims(t *testing.T) {
expected: "123 Smith", // Should convert to string expected: "123 Smith", // Should convert to string
}, },
} }
for _, tt := range tests { for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) { t.Run(tt.name, func(t *testing.T) {
result := extractNameFromClaims(tt.claims) result := extractNameFromClaims(tt.claims)
@@ -235,29 +235,28 @@ func TestGetUserFromJWT_NoValidation(t *testing.T) {
"roles": []any{"user", "admin"}, "roles": []any{"user", "admin"},
"projects": []any{"project1", "project2"}, "projects": []any{"project1", "project2"},
} }
user, err := getUserFromJWT(claims, false, schema.AuthToken, -1) user, err := getUserFromJWT(claims, false, schema.AuthToken, -1)
if err != nil { if err != nil {
t.Fatalf("Unexpected error: %v", err) t.Fatalf("Unexpected error: %v", err)
} }
if user.Username != "testuser" { if user.Username != "testuser" {
t.Errorf("Expected username 'testuser', got '%s'", user.Username) t.Errorf("Expected username 'testuser', got '%s'", user.Username)
} }
if user.Name != "Test User" { if user.Name != "Test User" {
t.Errorf("Expected name 'Test User', got '%s'", user.Name) t.Errorf("Expected name 'Test User', got '%s'", user.Name)
} }
if len(user.Roles) != 2 { if len(user.Roles) != 2 {
t.Errorf("Expected 2 roles, got %d", len(user.Roles)) t.Errorf("Expected 2 roles, got %d", len(user.Roles))
} }
if len(user.Projects) != 2 { if len(user.Projects) != 2 {
t.Errorf("Expected 2 projects, got %d", len(user.Projects)) t.Errorf("Expected 2 projects, got %d", len(user.Projects))
} }
if user.AuthType != schema.AuthToken { if user.AuthType != schema.AuthToken {
t.Errorf("Expected AuthType %v, got %v", schema.AuthToken, user.AuthType) t.Errorf("Expected AuthType %v, got %v", schema.AuthToken, user.AuthType)
} }
@@ -268,13 +267,13 @@ func TestGetUserFromJWT_MissingSub(t *testing.T) {
claims := jwt.MapClaims{ claims := jwt.MapClaims{
"name": "Test User", "name": "Test User",
} }
_, err := getUserFromJWT(claims, false, schema.AuthToken, -1) _, err := getUserFromJWT(claims, false, schema.AuthToken, -1)
if err == nil { if err == nil {
t.Error("Expected error for missing sub claim") t.Error("Expected error for missing sub claim")
} }
if err.Error() != "missing 'sub' claim in JWT" { if err.Error() != "missing 'sub' claim in JWT" {
t.Errorf("Expected specific error message, got: %v", err) t.Errorf("Expected specific error message, got: %v", err)
} }

View File

@@ -13,8 +13,8 @@ import (
"os" "os"
"strings" "strings"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/golang-jwt/jwt/v5" "github.com/golang-jwt/jwt/v5"
) )
@@ -75,13 +75,13 @@ func (ja *JWTSessionAuthenticator) Login(
} }
claims := token.Claims.(jwt.MapClaims) claims := token.Claims.(jwt.MapClaims)
// Use shared helper to get user from JWT claims // Use shared helper to get user from JWT claims
user, err = getUserFromJWT(claims, Keys.JwtConfig.ValidateUser, schema.AuthSession, schema.AuthViaToken) user, err = getUserFromJWT(claims, Keys.JwtConfig.ValidateUser, schema.AuthSession, schema.AuthViaToken)
if err != nil { if err != nil {
return nil, err return nil, err
} }
// Sync or update user if configured // Sync or update user if configured
if !Keys.JwtConfig.ValidateUser && (Keys.JwtConfig.SyncUserOnLogin || Keys.JwtConfig.UpdateUserOnLogin) { if !Keys.JwtConfig.ValidateUser && (Keys.JwtConfig.SyncUserOnLogin || Keys.JwtConfig.UpdateUserOnLogin) {
handleTokenUser(user) handleTokenUser(user)

View File

@@ -13,8 +13,8 @@ import (
"strings" "strings"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/go-ldap/ldap/v3" "github.com/go-ldap/ldap/v3"
) )

View File

@@ -9,8 +9,8 @@ import (
"fmt" "fmt"
"net/http" "net/http"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"golang.org/x/crypto/bcrypt" "golang.org/x/crypto/bcrypt"
) )

View File

@@ -15,8 +15,8 @@ import (
"time" "time"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/coreos/go-oidc/v3/oidc" "github.com/coreos/go-oidc/v3/oidc"
"github.com/gorilla/mux" "github.com/gorilla/mux"
"golang.org/x/oauth2" "golang.org/x/oauth2"
@@ -59,7 +59,7 @@ func NewOIDC(a *Authentication) *OIDC {
// Use context with timeout for provider initialization // Use context with timeout for provider initialization
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel() defer cancel()
provider, err := oidc.NewProvider(ctx, Keys.OpenIDConfig.Provider) provider, err := oidc.NewProvider(ctx, Keys.OpenIDConfig.Provider)
if err != nil { if err != nil {
cclog.Fatal(err) cclog.Fatal(err)
@@ -119,7 +119,7 @@ func (oa *OIDC) OAuth2Callback(rw http.ResponseWriter, r *http.Request) {
// Exchange authorization code for token with timeout // Exchange authorization code for token with timeout
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel() defer cancel()
token, err := oa.client.Exchange(ctx, code, oauth2.VerifierOption(codeVerifier)) token, err := oa.client.Exchange(ctx, code, oauth2.VerifierOption(codeVerifier))
if err != nil { if err != nil {
http.Error(rw, "Failed to exchange token: "+err.Error(), http.StatusInternalServerError) http.Error(rw, "Failed to exchange token: "+err.Error(), http.StatusInternalServerError)

View File

@@ -11,8 +11,8 @@ import (
"encoding/json" "encoding/json"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/resampler" "github.com/ClusterCockpit/cc-lib/v2/resampler"
) )
type ProgramConfig struct { type ProgramConfig struct {
@@ -37,10 +37,10 @@ type ProgramConfig struct {
EmbedStaticFiles bool `json:"embed-static-files"` EmbedStaticFiles bool `json:"embed-static-files"`
StaticFiles string `json:"static-files"` StaticFiles string `json:"static-files"`
// 'sqlite3' or 'mysql' (mysql will work for mariadb as well) // Database driver - only 'sqlite3' is supported
DBDriver string `json:"db-driver"` DBDriver string `json:"db-driver"`
// For sqlite3 a filename, for mysql a DSN in this format: https://github.com/go-sql-driver/mysql#dsn-data-source-name (Without query parameters!). // Path to SQLite database file
DB string `json:"db"` DB string `json:"db"`
// Keep all metric data in the metric data repositories, // Keep all metric data in the metric data repositories,
@@ -90,8 +90,7 @@ type ResampleConfig struct {
} }
type NATSConfig struct { type NATSConfig struct {
SubjectJobStart string `json:"subjectJobStart"` SubjectJobEvent string `json:"subjectJobEvent"`
SubjectJobStop string `json:"subjectJobStop"`
SubjectNodeState string `json:"subjectNodeState"` SubjectNodeState string `json:"subjectNodeState"`
} }
@@ -112,14 +111,6 @@ type FilterRanges struct {
StartTime *TimeRange `json:"startTime"` StartTime *TimeRange `json:"startTime"`
} }
type ClusterConfig struct {
Name string `json:"name"`
FilterRanges *FilterRanges `json:"filterRanges"`
MetricDataRepository json.RawMessage `json:"metricDataRepository"`
}
var Clusters []*ClusterConfig
var Keys ProgramConfig = ProgramConfig{ var Keys ProgramConfig = ProgramConfig{
Addr: "localhost:8080", Addr: "localhost:8080",
DisableAuthentication: false, DisableAuthentication: false,
@@ -133,7 +124,7 @@ var Keys ProgramConfig = ProgramConfig{
ShortRunningJobsDuration: 5 * 60, ShortRunningJobsDuration: 5 * 60,
} }
func Init(mainConfig json.RawMessage, clusterConfig json.RawMessage) { func Init(mainConfig json.RawMessage) {
Validate(configSchema, mainConfig) Validate(configSchema, mainConfig)
dec := json.NewDecoder(bytes.NewReader(mainConfig)) dec := json.NewDecoder(bytes.NewReader(mainConfig))
dec.DisallowUnknownFields() dec.DisallowUnknownFields()
@@ -141,17 +132,6 @@ func Init(mainConfig json.RawMessage, clusterConfig json.RawMessage) {
cclog.Abortf("Config Init: Could not decode config file '%s'.\nError: %s\n", mainConfig, err.Error()) cclog.Abortf("Config Init: Could not decode config file '%s'.\nError: %s\n", mainConfig, err.Error())
} }
Validate(clustersSchema, clusterConfig)
dec = json.NewDecoder(bytes.NewReader(clusterConfig))
dec.DisallowUnknownFields()
if err := dec.Decode(&Clusters); err != nil {
cclog.Abortf("Config Init: Could not decode config file '%s'.\nError: %s\n", mainConfig, err.Error())
}
if len(Clusters) < 1 {
cclog.Abort("Config Init: At least one cluster required in config. Exited with error.")
}
if Keys.EnableResampling != nil && Keys.EnableResampling.MinimumPoints > 0 { if Keys.EnableResampling != nil && Keys.EnableResampling.MinimumPoints > 0 {
resampler.SetMinimumRequiredPoints(Keys.EnableResampling.MinimumPoints) resampler.SetMinimumRequiredPoints(Keys.EnableResampling.MinimumPoints)
} }

View File

@@ -8,19 +8,15 @@ package config
import ( import (
"testing" "testing"
ccconf "github.com/ClusterCockpit/cc-lib/ccConfig" ccconf "github.com/ClusterCockpit/cc-lib/v2/ccConfig"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
) )
func TestInit(t *testing.T) { func TestInit(t *testing.T) {
fp := "../../configs/config.json" fp := "../../configs/config.json"
ccconf.Init(fp) ccconf.Init(fp)
if cfg := ccconf.GetPackageConfig("main"); cfg != nil { if cfg := ccconf.GetPackageConfig("main"); cfg != nil {
if clustercfg := ccconf.GetPackageConfig("clusters"); clustercfg != nil { Init(cfg)
Init(cfg, clustercfg)
} else {
cclog.Abort("Cluster configuration must be present")
}
} else { } else {
cclog.Abort("Main configuration must be present") cclog.Abort("Main configuration must be present")
} }
@@ -34,11 +30,7 @@ func TestInitMinimal(t *testing.T) {
fp := "../../configs/config-demo.json" fp := "../../configs/config-demo.json"
ccconf.Init(fp) ccconf.Init(fp)
if cfg := ccconf.GetPackageConfig("main"); cfg != nil { if cfg := ccconf.GetPackageConfig("main"); cfg != nil {
if clustercfg := ccconf.GetPackageConfig("clusters"); clustercfg != nil { Init(cfg)
Init(cfg, clustercfg)
} else {
cclog.Abort("Cluster configuration must be present")
}
} else { } else {
cclog.Abort("Main configuration must be present") cclog.Abort("Main configuration must be present")
} }

View File

@@ -41,7 +41,7 @@ var configSchema = `
"type": "string" "type": "string"
}, },
"db": { "db": {
"description": "For sqlite3 a filename, for mysql a DSN in this format: https://github.com/go-sql-driver/mysql#dsn-data-source-name (Without query parameters!).", "description": "Path to SQLite database file (e.g., './var/job.db')",
"type": "string" "type": "string"
}, },
"disable-archive": { "disable-archive": {
@@ -119,87 +119,22 @@ var configSchema = `
} }
}, },
"required": ["trigger", "resolutions"] "required": ["trigger", "resolutions"]
},
"apiSubjects": {
"description": "NATS subjects configuration for subscribing to job and node events.",
"type": "object",
"properties": {
"subjectJobEvent": {
"description": "NATS subject for job events (start_job, stop_job)",
"type": "string"
},
"subjectNodeState": {
"description": "NATS subject for node state updates",
"type": "string"
}
},
"required": ["subjectJobEvent", "subjectNodeState"]
} }
}, },
"required": ["apiAllowedIPs"] "required": ["apiAllowedIPs"]
}` }`
var clustersSchema = `
{
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"description": "The name of the cluster.",
"type": "string"
},
"metricDataRepository": {
"description": "Type of the metric data repository for this cluster",
"type": "object",
"properties": {
"kind": {
"type": "string",
"enum": ["influxdb", "prometheus", "cc-metric-store", "cc-metric-store-internal", "test"]
},
"url": {
"type": "string"
},
"token": {
"type": "string"
}
},
"required": ["kind"]
},
"filterRanges": {
"description": "This option controls the slider ranges for the UI controls of numNodes, duration, and startTime.",
"type": "object",
"properties": {
"numNodes": {
"description": "UI slider range for number of nodes",
"type": "object",
"properties": {
"from": {
"type": "integer"
},
"to": {
"type": "integer"
}
},
"required": ["from", "to"]
},
"duration": {
"description": "UI slider range for duration",
"type": "object",
"properties": {
"from": {
"type": "integer"
},
"to": {
"type": "integer"
}
},
"required": ["from", "to"]
},
"startTime": {
"description": "UI slider range for start time",
"type": "object",
"properties": {
"from": {
"type": "string",
"format": "date-time"
},
"to": {
"type": "null"
}
},
"required": ["from", "to"]
}
},
"required": ["numNodes", "duration", "startTime"]
}
},
"required": ["name", "metricDataRepository", "filterRanges"],
"minItems": 1
}
}`

View File

@@ -8,7 +8,7 @@ package config
import ( import (
"encoding/json" "encoding/json"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/santhosh-tekuri/jsonschema/v5" "github.com/santhosh-tekuri/jsonschema/v5"
) )

File diff suppressed because it is too large Load Diff

View File

@@ -10,7 +10,7 @@ import (
"time" "time"
"github.com/ClusterCockpit/cc-backend/internal/config" "github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
) )
type ClusterMetricWithName struct { type ClusterMetricWithName struct {
@@ -82,6 +82,7 @@ type JobFilter struct {
State []schema.JobState `json:"state,omitempty"` State []schema.JobState `json:"state,omitempty"`
MetricStats []*MetricStatItem `json:"metricStats,omitempty"` MetricStats []*MetricStatItem `json:"metricStats,omitempty"`
Shared *string `json:"shared,omitempty"` Shared *string `json:"shared,omitempty"`
Schedule *string `json:"schedule,omitempty"`
Node *StringInput `json:"node,omitempty"` Node *StringInput `json:"node,omitempty"`
} }

View File

@@ -4,7 +4,7 @@ import (
"sync" "sync"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/jmoiron/sqlx" "github.com/jmoiron/sqlx"
) )

View File

@@ -3,7 +3,7 @@ package graph
// This file will be automatically regenerated based on the schema, any resolver // This file will be automatically regenerated based on the schema, any resolver
// implementations // implementations
// will be copied through when generating and any unknown code will be moved to the end. // will be copied through when generating and any unknown code will be moved to the end.
// Code generated by github.com/99designs/gqlgen version v0.17.84 // Code generated by github.com/99designs/gqlgen version v0.17.85
import ( import (
"context" "context"
@@ -19,11 +19,11 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/config" "github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/graph/generated" "github.com/ClusterCockpit/cc-backend/internal/graph/generated"
"github.com/ClusterCockpit/cc-backend/internal/graph/model" "github.com/ClusterCockpit/cc-backend/internal/graph/model"
"github.com/ClusterCockpit/cc-backend/internal/metricDataDispatcher" "github.com/ClusterCockpit/cc-backend/internal/metricdispatch"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
"github.com/ClusterCockpit/cc-backend/pkg/archive" "github.com/ClusterCockpit/cc-backend/pkg/archive"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
) )
// Partitions is the resolver for the partitions field. // Partitions is the resolver for the partitions field.
@@ -283,7 +283,7 @@ func (r *mutationResolver) RemoveTagFromList(ctx context.Context, tagIds []strin
// Test Access: Admins && Admin Tag OR Everyone && Private Tag // Test Access: Admins && Admin Tag OR Everyone && Private Tag
if user.HasRole(schema.RoleAdmin) && (tscope == "global" || tscope == "admin") || user.Username == tscope { if user.HasRole(schema.RoleAdmin) && (tscope == "global" || tscope == "admin") || user.Username == tscope {
// Remove from DB // Remove from DB
if err = r.Repo.RemoveTagById(tid); err != nil { if err = r.Repo.RemoveTagByID(tid); err != nil {
cclog.Warn("Error while removing tag") cclog.Warn("Error while removing tag")
return nil, err return nil, err
} else { } else {
@@ -484,7 +484,7 @@ func (r *queryResolver) JobMetrics(ctx context.Context, id string, metrics []str
return nil, err return nil, err
} }
data, err := metricDataDispatcher.LoadData(job, metrics, scopes, ctx, *resolution) data, err := metricdispatch.LoadData(job, metrics, scopes, ctx, *resolution)
if err != nil { if err != nil {
cclog.Warn("Error while loading job data") cclog.Warn("Error while loading job data")
return nil, err return nil, err
@@ -512,7 +512,7 @@ func (r *queryResolver) JobStats(ctx context.Context, id string, metrics []strin
return nil, err return nil, err
} }
data, err := metricDataDispatcher.LoadJobStats(job, metrics, ctx) data, err := metricdispatch.LoadJobStats(job, metrics, ctx)
if err != nil { if err != nil {
cclog.Warnf("Error while loading jobStats data for job id %s", id) cclog.Warnf("Error while loading jobStats data for job id %s", id)
return nil, err return nil, err
@@ -537,7 +537,7 @@ func (r *queryResolver) ScopedJobStats(ctx context.Context, id string, metrics [
return nil, err return nil, err
} }
data, err := metricDataDispatcher.LoadScopedJobStats(job, metrics, scopes, ctx) data, err := metricdispatch.LoadScopedJobStats(job, metrics, scopes, ctx)
if err != nil { if err != nil {
cclog.Warnf("Error while loading scopedJobStats data for job id %s", id) cclog.Warnf("Error while loading scopedJobStats data for job id %s", id)
return nil, err return nil, err
@@ -590,21 +590,24 @@ func (r *queryResolver) Jobs(ctx context.Context, filter []*model.JobFilter, pag
// Note: Even if App-Default 'config.Keys.UiDefaults["job_list_usePaging"]' is set, always return hasNextPage boolean. // Note: Even if App-Default 'config.Keys.UiDefaults["job_list_usePaging"]' is set, always return hasNextPage boolean.
// Users can decide in frontend to use continuous scroll, even if app-default is paging! // Users can decide in frontend to use continuous scroll, even if app-default is paging!
// Skip if page.ItemsPerPage == -1 ("Load All" -> No Next Page required, Status Dashboards)
/* /*
Example Page 4 @ 10 IpP : Does item 41 exist? Example Page 4 @ 10 IpP : Does item 41 exist?
Minimal Page 41 @ 1 IpP : If len(result) is 1, Page 5 @ 10 IpP exists. Minimal Page 41 @ 1 IpP : If len(result) is 1, Page 5 @ 10 IpP exists.
*/ */
nextPage := &model.PageRequest{ hasNextPage := false
ItemsPerPage: 1, if page.ItemsPerPage != -1 {
Page: ((page.Page * page.ItemsPerPage) + 1), nextPage := &model.PageRequest{
ItemsPerPage: 1,
Page: ((page.Page * page.ItemsPerPage) + 1),
}
nextJobs, err := r.Repo.QueryJobs(ctx, filter, nextPage, order)
if err != nil {
cclog.Warn("Error while querying next jobs")
return nil, err
}
hasNextPage = len(nextJobs) == 1
} }
nextJobs, err := r.Repo.QueryJobs(ctx, filter, nextPage, order)
if err != nil {
cclog.Warn("Error while querying next jobs")
return nil, err
}
hasNextPage := len(nextJobs) == 1
return &model.JobResultList{Items: jobs, Count: &count, HasNextPage: &hasNextPage}, nil return &model.JobResultList{Items: jobs, Count: &count, HasNextPage: &hasNextPage}, nil
} }
@@ -702,7 +705,7 @@ func (r *queryResolver) JobsMetricStats(ctx context.Context, filter []*model.Job
res := []*model.JobStats{} res := []*model.JobStats{}
for _, job := range jobs { for _, job := range jobs {
data, err := metricDataDispatcher.LoadJobStats(job, metrics, ctx) data, err := metricdispatch.LoadJobStats(job, metrics, ctx)
if err != nil { if err != nil {
cclog.Warnf("Error while loading comparison jobStats data for job id %d", job.JobID) cclog.Warnf("Error while loading comparison jobStats data for job id %d", job.JobID)
continue continue
@@ -753,13 +756,19 @@ func (r *queryResolver) NodeMetrics(ctx context.Context, cluster string, nodes [
return nil, errors.New("you need to be administrator or support staff for this query") return nil, errors.New("you need to be administrator or support staff for this query")
} }
defaultMetrics := make([]string, 0)
for _, mc := range archive.GetCluster(cluster).MetricConfig {
defaultMetrics = append(defaultMetrics, mc.Name)
}
if metrics == nil { if metrics == nil {
for _, mc := range archive.GetCluster(cluster).MetricConfig { metrics = defaultMetrics
metrics = append(metrics, mc.Name) } else {
} metrics = slices.DeleteFunc(metrics, func(metric string) bool {
return !slices.Contains(defaultMetrics, metric) // Remove undefined metrics.
})
} }
data, err := metricDataDispatcher.LoadNodeData(cluster, metrics, nodes, scopes, from, to, ctx) data, err := metricdispatch.LoadNodeData(cluster, metrics, nodes, scopes, from, to, ctx)
if err != nil { if err != nil {
cclog.Warn("error while loading node data") cclog.Warn("error while loading node data")
return nil, err return nil, err
@@ -825,7 +834,7 @@ func (r *queryResolver) NodeMetricsList(ctx context.Context, cluster string, sub
} }
} }
data, err := metricDataDispatcher.LoadNodeListData(cluster, subCluster, nodes, metrics, scopes, *resolution, from, to, ctx) data, err := metricdispatch.LoadNodeListData(cluster, subCluster, nodes, metrics, scopes, *resolution, from, to, ctx)
if err != nil { if err != nil {
cclog.Warn("error while loading node data (Resolver.NodeMetricsList") cclog.Warn("error while loading node data (Resolver.NodeMetricsList")
return nil, err return nil, err
@@ -880,7 +889,7 @@ func (r *queryResolver) ClusterMetrics(ctx context.Context, cluster string, metr
// 'nodes' == nil -> Defaults to all nodes of cluster for existing query workflow // 'nodes' == nil -> Defaults to all nodes of cluster for existing query workflow
scopes := []schema.MetricScope{"node"} scopes := []schema.MetricScope{"node"}
data, err := metricDataDispatcher.LoadNodeData(cluster, metrics, nil, scopes, from, to, ctx) data, err := metricdispatch.LoadNodeData(cluster, metrics, nil, scopes, from, to, ctx)
if err != nil { if err != nil {
cclog.Warn("error while loading node data") cclog.Warn("error while loading node data")
return nil, err return nil, err
@@ -972,12 +981,10 @@ func (r *Resolver) Query() generated.QueryResolver { return &queryResolver{r} }
// SubCluster returns generated.SubClusterResolver implementation. // SubCluster returns generated.SubClusterResolver implementation.
func (r *Resolver) SubCluster() generated.SubClusterResolver { return &subClusterResolver{r} } func (r *Resolver) SubCluster() generated.SubClusterResolver { return &subClusterResolver{r} }
type ( type clusterResolver struct{ *Resolver }
clusterResolver struct{ *Resolver } type jobResolver struct{ *Resolver }
jobResolver struct{ *Resolver } type metricValueResolver struct{ *Resolver }
metricValueResolver struct{ *Resolver } type mutationResolver struct{ *Resolver }
mutationResolver struct{ *Resolver } type nodeResolver struct{ *Resolver }
nodeResolver struct{ *Resolver } type queryResolver struct{ *Resolver }
queryResolver struct{ *Resolver } type subClusterResolver struct{ *Resolver }
subClusterResolver struct{ *Resolver }
)

View File

@@ -13,9 +13,9 @@ import (
"github.com/99designs/gqlgen/graphql" "github.com/99designs/gqlgen/graphql"
"github.com/ClusterCockpit/cc-backend/internal/graph/model" "github.com/ClusterCockpit/cc-backend/internal/graph/model"
"github.com/ClusterCockpit/cc-backend/internal/metricDataDispatcher" "github.com/ClusterCockpit/cc-backend/internal/metricdispatch"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
) )
const MAX_JOBS_FOR_ANALYSIS = 500 const MAX_JOBS_FOR_ANALYSIS = 500
@@ -55,7 +55,7 @@ func (r *queryResolver) rooflineHeatmap(
// resolution = max(resolution, mc.Timestep) // resolution = max(resolution, mc.Timestep)
// } // }
jobdata, err := metricDataDispatcher.LoadData(job, []string{"flops_any", "mem_bw"}, []schema.MetricScope{schema.MetricScopeNode}, ctx, 0) jobdata, err := metricdispatch.LoadData(job, []string{"flops_any", "mem_bw"}, []schema.MetricScope{schema.MetricScopeNode}, ctx, 0)
if err != nil { if err != nil {
cclog.Errorf("Error while loading roofline metrics for job %d", job.ID) cclog.Errorf("Error while loading roofline metrics for job %d", job.ID)
return nil, err return nil, err
@@ -128,7 +128,7 @@ func (r *queryResolver) jobsFootprints(ctx context.Context, filter []*model.JobF
continue continue
} }
if err := metricDataDispatcher.LoadAverages(job, metrics, avgs, ctx); err != nil { if err := metricdispatch.LoadAverages(job, metrics, avgs, ctx); err != nil {
cclog.Error("Error while loading averages for footprint") cclog.Error("Error while loading averages for footprint")
return nil, err return nil, err
} }

View File

@@ -2,6 +2,7 @@
// All rights reserved. This file is part of cc-backend. // All rights reserved. This file is part of cc-backend.
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package importer package importer
import ( import (
@@ -14,8 +15,8 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/config" "github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
"github.com/ClusterCockpit/cc-backend/pkg/archive" "github.com/ClusterCockpit/cc-backend/pkg/archive"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
) )
// HandleImportFlag imports jobs from file pairs specified in a comma-separated flag string. // HandleImportFlag imports jobs from file pairs specified in a comma-separated flag string.

View File

@@ -16,8 +16,8 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/importer" "github.com/ClusterCockpit/cc-backend/internal/importer"
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
"github.com/ClusterCockpit/cc-backend/pkg/archive" "github.com/ClusterCockpit/cc-backend/pkg/archive"
ccconf "github.com/ClusterCockpit/cc-lib/ccConfig" ccconf "github.com/ClusterCockpit/cc-lib/v2/ccConfig"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
) )
// copyFile copies a file from source path to destination path. // copyFile copies a file from source path to destination path.
@@ -56,36 +56,8 @@ func setup(t *testing.T) *repository.JobRepository {
"archive": { "archive": {
"kind": "file", "kind": "file",
"path": "./var/job-archive" "path": "./var/job-archive"
}, }
"clusters": [ }`
{
"name": "testcluster",
"metricDataRepository": {"kind": "test", "url": "bla:8081"},
"filterRanges": {
"numNodes": { "from": 1, "to": 64 },
"duration": { "from": 0, "to": 86400 },
"startTime": { "from": "2022-01-01T00:00:00Z", "to": null }
}
},
{
"name": "fritz",
"metricDataRepository": {"kind": "test", "url": "bla:8081"},
"filterRanges": {
"numNodes": { "from": 1, "to": 944 },
"duration": { "from": 0, "to": 86400 },
"startTime": { "from": "2022-01-01T00:00:00Z", "to": null }
}
},
{
"name": "taurus",
"metricDataRepository": {"kind": "test", "url": "bla:8081"},
"filterRanges": {
"numNodes": { "from": 1, "to": 4000 },
"duration": { "from": 0, "to": 604800 },
"startTime": { "from": "2010-01-01T00:00:00Z", "to": null }
}
}
]}`
cclog.Init("info", true) cclog.Init("info", true)
tmpdir := t.TempDir() tmpdir := t.TempDir()
@@ -107,7 +79,7 @@ func setup(t *testing.T) *repository.JobRepository {
} }
dbfilepath := filepath.Join(tmpdir, "test.db") dbfilepath := filepath.Join(tmpdir, "test.db")
err := repository.MigrateDB("sqlite3", dbfilepath) err := repository.MigrateDB(dbfilepath)
if err != nil { if err != nil {
t.Fatal(err) t.Fatal(err)
} }
@@ -121,11 +93,7 @@ func setup(t *testing.T) *repository.JobRepository {
// Load and check main configuration // Load and check main configuration
if cfg := ccconf.GetPackageConfig("main"); cfg != nil { if cfg := ccconf.GetPackageConfig("main"); cfg != nil {
if clustercfg := ccconf.GetPackageConfig("clusters"); clustercfg != nil { config.Init(cfg)
config.Init(cfg, clustercfg)
} else {
t.Fatal("Cluster configuration must be present")
}
} else { } else {
t.Fatal("Main configuration must be present") t.Fatal("Main configuration must be present")
} }

View File

@@ -22,8 +22,8 @@ import (
"github.com/ClusterCockpit/cc-backend/internal/repository" "github.com/ClusterCockpit/cc-backend/internal/repository"
"github.com/ClusterCockpit/cc-backend/pkg/archive" "github.com/ClusterCockpit/cc-backend/pkg/archive"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
) )
const ( const (

View File

@@ -2,12 +2,13 @@
// All rights reserved. This file is part of cc-backend. // All rights reserved. This file is part of cc-backend.
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package importer package importer
import ( import (
"math" "math"
ccunits "github.com/ClusterCockpit/cc-lib/ccUnits" ccunits "github.com/ClusterCockpit/cc-lib/v2/ccUnits"
) )
// getNormalizationFactor calculates the scaling factor needed to normalize a value // getNormalizationFactor calculates the scaling factor needed to normalize a value

View File

@@ -8,7 +8,7 @@ import (
"fmt" "fmt"
"testing" "testing"
ccunits "github.com/ClusterCockpit/cc-lib/ccUnits" ccunits "github.com/ClusterCockpit/cc-lib/v2/ccUnits"
) )
// TestNormalizeFactor tests the normalization of large byte values to gigabyte prefix. // TestNormalizeFactor tests the normalization of large byte values to gigabyte prefix.

View File

@@ -1,95 +0,0 @@
// Copyright (C) NHR@FAU, University Erlangen-Nuremberg.
// All rights reserved. This file is part of cc-backend.
// Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file.
package memorystore
const configSchema = `{
"type": "object",
"description": "Configuration specific to built-in metric-store.",
"properties": {
"checkpoints": {
"description": "Configuration for checkpointing the metrics within metric-store",
"type": "object",
"properties": {
"file-format": {
"description": "Specify the type of checkpoint file. There are 2 variants: 'avro' and 'json'. If nothing is specified, 'avro' is default.",
"type": "string"
},
"interval": {
"description": "Interval at which the metrics should be checkpointed.",
"type": "string"
},
"directory": {
"description": "Specify the parent directy in which the checkpointed files should be placed.",
"type": "string"
},
"restore": {
"description": "When cc-backend starts up, look for checkpointed files that are less than X hours old and load metrics from these selected checkpoint files.",
"type": "string"
}
}
},
"archive": {
"description": "Configuration for archiving the already checkpointed files.",
"type": "object",
"properties": {
"interval": {
"description": "Interval at which the checkpointed files should be archived.",
"type": "string"
},
"directory": {
"description": "Specify the parent directy in which the archived files should be placed.",
"type": "string"
}
}
},
"retention-in-memory": {
"description": "Keep the metrics within memory for given time interval. Retention for X hours, then the metrics would be freed.",
"type": "string"
},
"nats": {
"description": "Configuration for accepting published data through NATS.",
"type": "array",
"items": {
"type": "object",
"properties": {
"address": {
"description": "Address of the NATS server.",
"type": "string"
},
"username": {
"description": "Optional: If configured with username/password method.",
"type": "string"
},
"password": {
"description": "Optional: If configured with username/password method.",
"type": "string"
},
"creds-file-path": {
"description": "Optional: If configured with Credential File method. Path to your NATS cred file.",
"type": "string"
},
"subscriptions": {
"description": "Array of various subscriptions. Allows to subscibe to different subjects and publishers.",
"type": "array",
"items": {
"type": "object",
"properties": {
"subscribe-to": {
"description": "Channel name",
"type": "string"
},
"cluster-tag": {
"description": "Optional: Allow lines without a cluster tag, use this as default",
"type": "string"
}
}
}
}
}
}
}
}
}`

View File

@@ -1,381 +0,0 @@
// Copyright (C) NHR@FAU, University Erlangen-Nuremberg.
// All rights reserved. This file is part of cc-backend.
// Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file.
package metricDataDispatcher
import (
"context"
"fmt"
"math"
"time"
"github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/metricdata"
"github.com/ClusterCockpit/cc-backend/pkg/archive"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger"
"github.com/ClusterCockpit/cc-lib/lrucache"
"github.com/ClusterCockpit/cc-lib/resampler"
"github.com/ClusterCockpit/cc-lib/schema"
)
var cache *lrucache.Cache = lrucache.New(128 * 1024 * 1024)
func cacheKey(
job *schema.Job,
metrics []string,
scopes []schema.MetricScope,
resolution int,
) string {
// Duration and StartTime do not need to be in the cache key as StartTime is less unique than
// job.ID and the TTL of the cache entry makes sure it does not stay there forever.
return fmt.Sprintf("%d(%s):[%v],[%v]-%d",
job.ID, job.State, metrics, scopes, resolution)
}
// Fetches the metric data for a job.
func LoadData(job *schema.Job,
metrics []string,
scopes []schema.MetricScope,
ctx context.Context,
resolution int,
) (schema.JobData, error) {
data := cache.Get(cacheKey(job, metrics, scopes, resolution), func() (_ any, ttl time.Duration, size int) {
var jd schema.JobData
var err error
if job.State == schema.JobStateRunning ||
job.MonitoringStatus == schema.MonitoringStatusRunningOrArchiving ||
config.Keys.DisableArchive {
repo, err := metricdata.GetMetricDataRepo(job.Cluster)
if err != nil {
return fmt.Errorf("METRICDATA/METRICDATA > no metric data repository configured for '%s'", job.Cluster), 0, 0
}
if scopes == nil {
scopes = append(scopes, schema.MetricScopeNode)
}
if metrics == nil {
cluster := archive.GetCluster(job.Cluster)
for _, mc := range cluster.MetricConfig {
metrics = append(metrics, mc.Name)
}
}
jd, err = repo.LoadData(job, metrics, scopes, ctx, resolution)
if err != nil {
if len(jd) != 0 {
cclog.Warnf("partial error: %s", err.Error())
// return err, 0, 0 // Reactivating will block archiving on one partial error
} else {
cclog.Error("Error while loading job data from metric repository")
return err, 0, 0
}
}
size = jd.Size()
} else {
var jd_temp schema.JobData
jd_temp, err = archive.GetHandle().LoadJobData(job)
if err != nil {
cclog.Error("Error while loading job data from archive")
return err, 0, 0
}
// Deep copy the cached archive hashmap
jd = metricdata.DeepCopy(jd_temp)
// Resampling for archived data.
// Pass the resolution from frontend here.
for _, v := range jd {
for _, v_ := range v {
timestep := int64(0)
for i := 0; i < len(v_.Series); i += 1 {
v_.Series[i].Data, timestep, err = resampler.LargestTriangleThreeBucket(v_.Series[i].Data, int64(v_.Timestep), int64(resolution))
if err != nil {
return err, 0, 0
}
}
v_.Timestep = int(timestep)
}
}
// Avoid sending unrequested data to the client:
if metrics != nil || scopes != nil {
if metrics == nil {
metrics = make([]string, 0, len(jd))
for k := range jd {
metrics = append(metrics, k)
}
}
res := schema.JobData{}
for _, metric := range metrics {
if perscope, ok := jd[metric]; ok {
if len(perscope) > 1 {
subset := make(map[schema.MetricScope]*schema.JobMetric)
for _, scope := range scopes {
if jm, ok := perscope[scope]; ok {
subset[scope] = jm
}
}
if len(subset) > 0 {
perscope = subset
}
}
res[metric] = perscope
}
}
jd = res
}
size = jd.Size()
}
ttl = 5 * time.Hour
if job.State == schema.JobStateRunning {
ttl = 2 * time.Minute
}
// FIXME: Review: Is this really necessary or correct.
// Note: Lines 147-170 formerly known as prepareJobData(jobData, scopes)
// For /monitoring/job/<job> and some other places, flops_any and mem_bw need
// to be available at the scope 'node'. If a job has a lot of nodes,
// statisticsSeries should be available so that a min/median/max Graph can be
// used instead of a lot of single lines.
// NOTE: New StatsSeries will always be calculated as 'min/median/max'
// Existing (archived) StatsSeries can be 'min/mean/max'!
const maxSeriesSize int = 15
for _, scopes := range jd {
for _, jm := range scopes {
if jm.StatisticsSeries != nil || len(jm.Series) <= maxSeriesSize {
continue
}
jm.AddStatisticsSeries()
}
}
nodeScopeRequested := false
for _, scope := range scopes {
if scope == schema.MetricScopeNode {
nodeScopeRequested = true
}
}
if nodeScopeRequested {
jd.AddNodeScope("flops_any")
jd.AddNodeScope("mem_bw")
}
// Round Resulting Stat Values
jd.RoundMetricStats()
return jd, ttl, size
})
if err, ok := data.(error); ok {
cclog.Error("Error in returned dataset")
return nil, err
}
return data.(schema.JobData), nil
}
// Used for the jobsFootprint GraphQL-Query. TODO: Rename/Generalize.
func LoadAverages(
job *schema.Job,
metrics []string,
data [][]schema.Float,
ctx context.Context,
) error {
if job.State != schema.JobStateRunning && !config.Keys.DisableArchive {
return archive.LoadAveragesFromArchive(job, metrics, data) // #166 change also here?
}
repo, err := metricdata.GetMetricDataRepo(job.Cluster)
if err != nil {
return fmt.Errorf("METRICDATA/METRICDATA > no metric data repository configured for '%s'", job.Cluster)
}
stats, err := repo.LoadStats(job, metrics, ctx) // #166 how to handle stats for acc normalizazion?
if err != nil {
cclog.Errorf("Error while loading statistics for job %v (User %v, Project %v)", job.JobID, job.User, job.Project)
return err
}
for i, m := range metrics {
nodes, ok := stats[m]
if !ok {
data[i] = append(data[i], schema.NaN)
continue
}
sum := 0.0
for _, node := range nodes {
sum += node.Avg
}
data[i] = append(data[i], schema.Float(sum))
}
return nil
}
// Used for statsTable in frontend: Return scoped statistics by metric.
func LoadScopedJobStats(
job *schema.Job,
metrics []string,
scopes []schema.MetricScope,
ctx context.Context,
) (schema.ScopedJobStats, error) {
if job.State != schema.JobStateRunning && !config.Keys.DisableArchive {
return archive.LoadScopedStatsFromArchive(job, metrics, scopes)
}
repo, err := metricdata.GetMetricDataRepo(job.Cluster)
if err != nil {
return nil, fmt.Errorf("job %d: no metric data repository configured for '%s'", job.JobID, job.Cluster)
}
scopedStats, err := repo.LoadScopedStats(job, metrics, scopes, ctx)
if err != nil {
cclog.Errorf("error while loading scoped statistics for job %d (User %s, Project %s)", job.JobID, job.User, job.Project)
return nil, err
}
return scopedStats, nil
}
// Used for polar plots in frontend: Aggregates statistics for all nodes to single values for job per metric.
func LoadJobStats(
job *schema.Job,
metrics []string,
ctx context.Context,
) (map[string]schema.MetricStatistics, error) {
if job.State != schema.JobStateRunning && !config.Keys.DisableArchive {
return archive.LoadStatsFromArchive(job, metrics)
}
data := make(map[string]schema.MetricStatistics, len(metrics))
repo, err := metricdata.GetMetricDataRepo(job.Cluster)
if err != nil {
return data, fmt.Errorf("job %d: no metric data repository configured for '%s'", job.JobID, job.Cluster)
}
stats, err := repo.LoadStats(job, metrics, ctx)
if err != nil {
cclog.Errorf("error while loading statistics for job %d (User %s, Project %s)", job.JobID, job.User, job.Project)
return data, err
}
for _, m := range metrics {
sum, avg, min, max := 0.0, 0.0, 0.0, 0.0
nodes, ok := stats[m]
if !ok {
data[m] = schema.MetricStatistics{Min: min, Avg: avg, Max: max}
continue
}
for _, node := range nodes {
sum += node.Avg
min = math.Min(min, node.Min)
max = math.Max(max, node.Max)
}
data[m] = schema.MetricStatistics{
Avg: (math.Round((sum/float64(job.NumNodes))*100) / 100),
Min: (math.Round(min*100) / 100),
Max: (math.Round(max*100) / 100),
}
}
return data, nil
}
// Used for the classic node/system view. Returns a map of nodes to a map of metrics.
func LoadNodeData(
cluster string,
metrics, nodes []string,
scopes []schema.MetricScope,
from, to time.Time,
ctx context.Context,
) (map[string]map[string][]*schema.JobMetric, error) {
repo, err := metricdata.GetMetricDataRepo(cluster)
if err != nil {
return nil, fmt.Errorf("METRICDATA/METRICDATA > no metric data repository configured for '%s'", cluster)
}
if metrics == nil {
for _, m := range archive.GetCluster(cluster).MetricConfig {
metrics = append(metrics, m.Name)
}
}
data, err := repo.LoadNodeData(cluster, metrics, nodes, scopes, from, to, ctx)
if err != nil {
if len(data) != 0 {
cclog.Warnf("partial error: %s", err.Error())
} else {
cclog.Error("Error while loading node data from metric repository")
return nil, err
}
}
if data == nil {
return nil, fmt.Errorf("METRICDATA/METRICDATA > the metric data repository for '%s' does not support this query", cluster)
}
return data, nil
}
func LoadNodeListData(
cluster, subCluster string,
nodes []string,
metrics []string,
scopes []schema.MetricScope,
resolution int,
from, to time.Time,
ctx context.Context,
) (map[string]schema.JobData, error) {
repo, err := metricdata.GetMetricDataRepo(cluster)
if err != nil {
return nil, fmt.Errorf("METRICDATA/METRICDATA > no metric data repository configured for '%s'", cluster)
}
if metrics == nil {
for _, m := range archive.GetCluster(cluster).MetricConfig {
metrics = append(metrics, m.Name)
}
}
data, err := repo.LoadNodeListData(cluster, subCluster, nodes, metrics, scopes, resolution, from, to, ctx)
if err != nil {
if len(data) != 0 {
cclog.Warnf("partial error: %s", err.Error())
} else {
cclog.Error("Error while loading node data from metric repository")
return nil, err
}
}
// NOTE: New StatsSeries will always be calculated as 'min/median/max'
const maxSeriesSize int = 8
for _, jd := range data {
for _, scopes := range jd {
for _, jm := range scopes {
if jm.StatisticsSeries != nil || len(jm.Series) < maxSeriesSize {
continue
}
jm.AddStatisticsSeries()
}
}
}
if data == nil {
return nil, fmt.Errorf("METRICDATA/METRICDATA > the metric data repository for '%s' does not support this query", cluster)
}
return data, nil
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,88 +0,0 @@
// Copyright (C) NHR@FAU, University Erlangen-Nuremberg.
// All rights reserved. This file is part of cc-backend.
// Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file.
package metricdata
import (
"context"
"encoding/json"
"fmt"
"time"
"github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/memorystore"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema"
)
type MetricDataRepository interface {
// Initialize this MetricDataRepository. One instance of
// this interface will only ever be responsible for one cluster.
Init(rawConfig json.RawMessage) error
// Return the JobData for the given job, only with the requested metrics.
LoadData(job *schema.Job, metrics []string, scopes []schema.MetricScope, ctx context.Context, resolution int) (schema.JobData, error)
// Return a map of metrics to a map of nodes to the metric statistics of the job. node scope only.
LoadStats(job *schema.Job, metrics []string, ctx context.Context) (map[string]map[string]schema.MetricStatistics, error)
// Return a map of metrics to a map of scopes to the scoped metric statistics of the job.
LoadScopedStats(job *schema.Job, metrics []string, scopes []schema.MetricScope, ctx context.Context) (schema.ScopedJobStats, error)
// Return a map of hosts to a map of metrics at the requested scopes (currently only node) for that node.
LoadNodeData(cluster string, metrics, nodes []string, scopes []schema.MetricScope, from, to time.Time, ctx context.Context) (map[string]map[string][]*schema.JobMetric, error)
// Return a map of hosts to a map of metrics to a map of scopes for multiple nodes.
LoadNodeListData(cluster, subCluster string, nodes, metrics []string, scopes []schema.MetricScope, resolution int, from, to time.Time, ctx context.Context) (map[string]schema.JobData, error)
}
var metricDataRepos map[string]MetricDataRepository = map[string]MetricDataRepository{}
func Init() error {
for _, cluster := range config.Clusters {
if cluster.MetricDataRepository != nil {
var kind struct {
Kind string `json:"kind"`
}
if err := json.Unmarshal(cluster.MetricDataRepository, &kind); err != nil {
cclog.Warn("Error while unmarshaling raw json MetricDataRepository")
return err
}
var mdr MetricDataRepository
switch kind.Kind {
case "cc-metric-store":
mdr = &CCMetricStore{}
case "cc-metric-store-internal":
mdr = &CCMetricStoreInternal{}
memorystore.InternalCCMSFlag = true
case "prometheus":
mdr = &PrometheusDataRepository{}
case "test":
mdr = &TestMetricDataRepository{}
default:
return fmt.Errorf("METRICDATA/METRICDATA > Unknown MetricDataRepository %v for cluster %v", kind.Kind, cluster.Name)
}
if err := mdr.Init(cluster.MetricDataRepository); err != nil {
cclog.Errorf("Error initializing MetricDataRepository %v for cluster %v", kind.Kind, cluster.Name)
return err
}
metricDataRepos[cluster.Name] = mdr
}
}
return nil
}
func GetMetricDataRepo(cluster string) (MetricDataRepository, error) {
var err error
repo, ok := metricDataRepos[cluster]
if !ok {
err = fmt.Errorf("METRICDATA/METRICDATA > no metric data repository configured for '%s'", cluster)
}
return repo, err
}

View File

@@ -1,587 +0,0 @@
// Copyright (C) 2022 DKRZ
// All rights reserved. This file is part of cc-backend.
// Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file.
package metricdata
import (
"bytes"
"context"
"encoding/json"
"errors"
"fmt"
"math"
"net/http"
"os"
"regexp"
"sort"
"strings"
"sync"
"text/template"
"time"
"github.com/ClusterCockpit/cc-backend/pkg/archive"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema"
promapi "github.com/prometheus/client_golang/api"
promv1 "github.com/prometheus/client_golang/api/prometheus/v1"
promcfg "github.com/prometheus/common/config"
promm "github.com/prometheus/common/model"
)
type PrometheusDataRepositoryConfig struct {
Url string `json:"url"`
Username string `json:"username,omitempty"`
Suffix string `json:"suffix,omitempty"`
Templates map[string]string `json:"query-templates"`
}
type PrometheusDataRepository struct {
client promapi.Client
queryClient promv1.API
suffix string
templates map[string]*template.Template
}
type PromQLArgs struct {
Nodes string
}
type Trie map[rune]Trie
var logOnce sync.Once
func contains(s []schema.MetricScope, str schema.MetricScope) bool {
for _, v := range s {
if v == str {
return true
}
}
return false
}
func MinMaxMean(data []schema.Float) (float64, float64, float64) {
if len(data) == 0 {
return 0.0, 0.0, 0.0
}
min := math.MaxFloat64
max := -math.MaxFloat64
var sum float64
var n float64
for _, val := range data {
if val.IsNaN() {
continue
}
sum += float64(val)
n += 1
if float64(val) > max {
max = float64(val)
}
if float64(val) < min {
min = float64(val)
}
}
return min, max, sum / n
}
// Rewritten from
// https://github.com/ermanh/trieregex/blob/master/trieregex/trieregex.py
func nodeRegex(nodes []string) string {
root := Trie{}
// add runes of each compute node to trie
for _, node := range nodes {
_trie := root
for _, c := range node {
if _, ok := _trie[c]; !ok {
_trie[c] = Trie{}
}
_trie = _trie[c]
}
_trie['*'] = Trie{}
}
// recursively build regex from rune trie
var trieRegex func(trie Trie, reset bool) string
trieRegex = func(trie Trie, reset bool) string {
if reset == true {
trie = root
}
if len(trie) == 0 {
return ""
}
if len(trie) == 1 {
for key, _trie := range trie {
if key == '*' {
return ""
}
return regexp.QuoteMeta(string(key)) + trieRegex(_trie, false)
}
} else {
sequences := []string{}
for key, _trie := range trie {
if key != '*' {
sequences = append(sequences, regexp.QuoteMeta(string(key))+trieRegex(_trie, false))
}
}
sort.Slice(sequences, func(i, j int) bool {
return (-len(sequences[i]) < -len(sequences[j])) || (sequences[i] < sequences[j])
})
var result string
// single edge from this tree node
if len(sequences) == 1 {
result = sequences[0]
if len(result) > 1 {
result = "(?:" + result + ")"
}
// multiple edges, each length 1
} else if s := strings.Join(sequences, ""); len(s) == len(sequences) {
// char or numeric range
if len(s)-1 == int(s[len(s)-1])-int(s[0]) {
result = fmt.Sprintf("[%c-%c]", s[0], s[len(s)-1])
// char or numeric set
} else {
result = "[" + s + "]"
}
// multiple edges of different lengths
} else {
result = "(?:" + strings.Join(sequences, "|") + ")"
}
if _, ok := trie['*']; ok {
result += "?"
}
return result
}
return ""
}
return trieRegex(root, true)
}
func (pdb *PrometheusDataRepository) Init(rawConfig json.RawMessage) error {
var config PrometheusDataRepositoryConfig
// parse config
if err := json.Unmarshal(rawConfig, &config); err != nil {
cclog.Warn("Error while unmarshaling raw json config")
return err
}
// support basic authentication
var rt http.RoundTripper = nil
if prom_pw := os.Getenv("PROMETHEUS_PASSWORD"); prom_pw != "" && config.Username != "" {
prom_pw := promcfg.Secret(prom_pw)
rt = promcfg.NewBasicAuthRoundTripper(promcfg.NewInlineSecret(config.Username), promcfg.NewInlineSecret(string(prom_pw)), promapi.DefaultRoundTripper)
} else {
if config.Username != "" {
return errors.New("METRICDATA/PROMETHEUS > Prometheus username provided, but PROMETHEUS_PASSWORD not set")
}
}
// init client
client, err := promapi.NewClient(promapi.Config{
Address: config.Url,
RoundTripper: rt,
})
if err != nil {
cclog.Error("Error while initializing new prometheus client")
return err
}
// init query client
pdb.client = client
pdb.queryClient = promv1.NewAPI(pdb.client)
// site config
pdb.suffix = config.Suffix
// init query templates
pdb.templates = make(map[string]*template.Template)
for metric, templ := range config.Templates {
pdb.templates[metric], err = template.New(metric).Parse(templ)
if err == nil {
cclog.Debugf("Added PromQL template for %s: %s", metric, templ)
} else {
cclog.Warnf("Failed to parse PromQL template %s for metric %s", templ, metric)
}
}
return nil
}
// TODO: respect scope argument
func (pdb *PrometheusDataRepository) FormatQuery(
metric string,
scope schema.MetricScope,
nodes []string,
cluster string,
) (string, error) {
args := PromQLArgs{}
if len(nodes) > 0 {
args.Nodes = fmt.Sprintf("(%s)%s", nodeRegex(nodes), pdb.suffix)
} else {
args.Nodes = fmt.Sprintf(".*%s", pdb.suffix)
}
buf := &bytes.Buffer{}
if templ, ok := pdb.templates[metric]; ok {
err := templ.Execute(buf, args)
if err != nil {
return "", errors.New(fmt.Sprintf("METRICDATA/PROMETHEUS > Error compiling template %v", templ))
} else {
query := buf.String()
cclog.Debugf("PromQL: %s", query)
return query, nil
}
} else {
return "", errors.New(fmt.Sprintf("METRICDATA/PROMETHEUS > No PromQL for metric %s configured.", metric))
}
}
// Convert PromAPI row to CC schema.Series
func (pdb *PrometheusDataRepository) RowToSeries(
from time.Time,
step int64,
steps int64,
row *promm.SampleStream,
) schema.Series {
ts := from.Unix()
hostname := strings.TrimSuffix(string(row.Metric["exported_instance"]), pdb.suffix)
// init array of expected length with NaN
values := make([]schema.Float, steps+1)
for i := range values {
values[i] = schema.NaN
}
// copy recorded values from prom sample pair
for _, v := range row.Values {
idx := (v.Timestamp.Unix() - ts) / step
values[idx] = schema.Float(v.Value)
}
min, max, mean := MinMaxMean(values)
// output struct
return schema.Series{
Hostname: hostname,
Data: values,
Statistics: schema.MetricStatistics{
Avg: mean,
Min: min,
Max: max,
},
}
}
func (pdb *PrometheusDataRepository) LoadData(
job *schema.Job,
metrics []string,
scopes []schema.MetricScope,
ctx context.Context,
resolution int,
) (schema.JobData, error) {
// TODO respect requested scope
if len(scopes) == 0 || !contains(scopes, schema.MetricScopeNode) {
scopes = append(scopes, schema.MetricScopeNode)
}
jobData := make(schema.JobData)
// parse job specs
nodes := make([]string, len(job.Resources))
for i, resource := range job.Resources {
nodes[i] = resource.Hostname
}
from := time.Unix(job.StartTime, 0)
to := time.Unix(job.StartTime+int64(job.Duration), 0)
for _, scope := range scopes {
if scope != schema.MetricScopeNode {
logOnce.Do(func() {
cclog.Infof("Scope '%s' requested, but not yet supported: Will return 'node' scope only.", scope)
})
continue
}
for _, metric := range metrics {
metricConfig := archive.GetMetricConfig(job.Cluster, metric)
if metricConfig == nil {
cclog.Warnf("Error in LoadData: Metric %s for cluster %s not configured", metric, job.Cluster)
return nil, errors.New("Prometheus config error")
}
query, err := pdb.FormatQuery(metric, scope, nodes, job.Cluster)
if err != nil {
cclog.Warn("Error while formatting prometheus query")
return nil, err
}
// ranged query over all job nodes
r := promv1.Range{
Start: from,
End: to,
Step: time.Duration(metricConfig.Timestep * 1e9),
}
result, warnings, err := pdb.queryClient.QueryRange(ctx, query, r)
if err != nil {
cclog.Errorf("Prometheus query error in LoadData: %v\nQuery: %s", err, query)
return nil, errors.New("Prometheus query error")
}
if len(warnings) > 0 {
cclog.Warnf("Warnings: %v\n", warnings)
}
// init data structures
if _, ok := jobData[metric]; !ok {
jobData[metric] = make(map[schema.MetricScope]*schema.JobMetric)
}
jobMetric, ok := jobData[metric][scope]
if !ok {
jobMetric = &schema.JobMetric{
Unit: metricConfig.Unit,
Timestep: metricConfig.Timestep,
Series: make([]schema.Series, 0),
}
}
step := int64(metricConfig.Timestep)
steps := int64(to.Sub(from).Seconds()) / step
// iter rows of host, metric, values
for _, row := range result.(promm.Matrix) {
jobMetric.Series = append(jobMetric.Series,
pdb.RowToSeries(from, step, steps, row))
}
// only add metric if at least one host returned data
if !ok && len(jobMetric.Series) > 0 {
jobData[metric][scope] = jobMetric
}
// sort by hostname to get uniform coloring
sort.Slice(jobMetric.Series, func(i, j int) bool {
return (jobMetric.Series[i].Hostname < jobMetric.Series[j].Hostname)
})
}
}
return jobData, nil
}
// TODO change implementation to precomputed/cached stats
func (pdb *PrometheusDataRepository) LoadStats(
job *schema.Job,
metrics []string,
ctx context.Context,
) (map[string]map[string]schema.MetricStatistics, error) {
// map of metrics of nodes of stats
stats := map[string]map[string]schema.MetricStatistics{}
data, err := pdb.LoadData(job, metrics, []schema.MetricScope{schema.MetricScopeNode}, ctx, 0 /*resolution here*/)
if err != nil {
cclog.Warn("Error while loading job for stats")
return nil, err
}
for metric, metricData := range data {
stats[metric] = make(map[string]schema.MetricStatistics)
for _, series := range metricData[schema.MetricScopeNode].Series {
stats[metric][series.Hostname] = series.Statistics
}
}
return stats, nil
}
func (pdb *PrometheusDataRepository) LoadNodeData(
cluster string,
metrics, nodes []string,
scopes []schema.MetricScope,
from, to time.Time,
ctx context.Context,
) (map[string]map[string][]*schema.JobMetric, error) {
t0 := time.Now()
// Map of hosts of metrics of value slices
data := make(map[string]map[string][]*schema.JobMetric)
// query db for each metric
// TODO: scopes seems to be always empty
if len(scopes) == 0 || !contains(scopes, schema.MetricScopeNode) {
scopes = append(scopes, schema.MetricScopeNode)
}
for _, scope := range scopes {
if scope != schema.MetricScopeNode {
logOnce.Do(func() {
cclog.Infof("Note: Scope '%s' requested, but not yet supported: Will return 'node' scope only.", scope)
})
continue
}
for _, metric := range metrics {
metricConfig := archive.GetMetricConfig(cluster, metric)
if metricConfig == nil {
cclog.Warnf("Error in LoadNodeData: Metric %s for cluster %s not configured", metric, cluster)
return nil, errors.New("Prometheus config error")
}
query, err := pdb.FormatQuery(metric, scope, nodes, cluster)
if err != nil {
cclog.Warn("Error while formatting prometheus query")
return nil, err
}
// ranged query over all nodes
r := promv1.Range{
Start: from,
End: to,
Step: time.Duration(metricConfig.Timestep * 1e9),
}
result, warnings, err := pdb.queryClient.QueryRange(ctx, query, r)
if err != nil {
cclog.Errorf("Prometheus query error in LoadNodeData: %v\n", err)
return nil, errors.New("Prometheus query error")
}
if len(warnings) > 0 {
cclog.Warnf("Warnings: %v\n", warnings)
}
step := int64(metricConfig.Timestep)
steps := int64(to.Sub(from).Seconds()) / step
// iter rows of host, metric, values
for _, row := range result.(promm.Matrix) {
hostname := strings.TrimSuffix(string(row.Metric["exported_instance"]), pdb.suffix)
hostdata, ok := data[hostname]
if !ok {
hostdata = make(map[string][]*schema.JobMetric)
data[hostname] = hostdata
}
// output per host and metric
hostdata[metric] = append(hostdata[metric], &schema.JobMetric{
Unit: metricConfig.Unit,
Timestep: metricConfig.Timestep,
Series: []schema.Series{pdb.RowToSeries(from, step, steps, row)},
},
)
}
}
}
t1 := time.Since(t0)
cclog.Debugf("LoadNodeData of %v nodes took %s", len(data), t1)
return data, nil
}
// Implemented by NHR@FAU; Used in Job-View StatsTable
func (pdb *PrometheusDataRepository) LoadScopedStats(
job *schema.Job,
metrics []string,
scopes []schema.MetricScope,
ctx context.Context,
) (schema.ScopedJobStats, error) {
// Assumption: pdb.loadData() only returns series node-scope - use node scope for statsTable
scopedJobStats := make(schema.ScopedJobStats)
data, err := pdb.LoadData(job, metrics, []schema.MetricScope{schema.MetricScopeNode}, ctx, 0 /*resolution here*/)
if err != nil {
cclog.Warn("Error while loading job for scopedJobStats")
return nil, err
}
for metric, metricData := range data {
for _, scope := range scopes {
if scope != schema.MetricScopeNode {
logOnce.Do(func() {
cclog.Infof("Note: Scope '%s' requested, but not yet supported: Will return 'node' scope only.", scope)
})
continue
}
if _, ok := scopedJobStats[metric]; !ok {
scopedJobStats[metric] = make(map[schema.MetricScope][]*schema.ScopedStats)
}
if _, ok := scopedJobStats[metric][scope]; !ok {
scopedJobStats[metric][scope] = make([]*schema.ScopedStats, 0)
}
for _, series := range metricData[scope].Series {
scopedJobStats[metric][scope] = append(scopedJobStats[metric][scope], &schema.ScopedStats{
Hostname: series.Hostname,
Data: &series.Statistics,
})
}
}
}
return scopedJobStats, nil
}
// Implemented by NHR@FAU; Used in NodeList-View
func (pdb *PrometheusDataRepository) LoadNodeListData(
cluster, subCluster string,
nodes []string,
metrics []string,
scopes []schema.MetricScope,
resolution int,
from, to time.Time,
ctx context.Context,
) (map[string]schema.JobData, error) {
// Assumption: pdb.loadData() only returns series node-scope - use node scope for NodeList
// Fetch Data, based on pdb.LoadNodeData()
t0 := time.Now()
// Map of hosts of jobData
data := make(map[string]schema.JobData)
// query db for each metric
// TODO: scopes seems to be always empty
if len(scopes) == 0 || !contains(scopes, schema.MetricScopeNode) {
scopes = append(scopes, schema.MetricScopeNode)
}
for _, scope := range scopes {
if scope != schema.MetricScopeNode {
logOnce.Do(func() {
cclog.Infof("Note: Scope '%s' requested, but not yet supported: Will return 'node' scope only.", scope)
})
continue
}
for _, metric := range metrics {
metricConfig := archive.GetMetricConfig(cluster, metric)
if metricConfig == nil {
cclog.Warnf("Error in LoadNodeListData: Metric %s for cluster %s not configured", metric, cluster)
return nil, errors.New("Prometheus config error")
}
query, err := pdb.FormatQuery(metric, scope, nodes, cluster)
if err != nil {
cclog.Warn("Error while formatting prometheus query")
return nil, err
}
// ranged query over all nodes
r := promv1.Range{
Start: from,
End: to,
Step: time.Duration(metricConfig.Timestep * 1e9),
}
result, warnings, err := pdb.queryClient.QueryRange(ctx, query, r)
if err != nil {
cclog.Errorf("Prometheus query error in LoadNodeData: %v\n", err)
return nil, errors.New("Prometheus query error")
}
if len(warnings) > 0 {
cclog.Warnf("Warnings: %v\n", warnings)
}
step := int64(metricConfig.Timestep)
steps := int64(to.Sub(from).Seconds()) / step
// iter rows of host, metric, values
for _, row := range result.(promm.Matrix) {
hostname := strings.TrimSuffix(string(row.Metric["exported_instance"]), pdb.suffix)
hostdata, ok := data[hostname]
if !ok {
hostdata = make(schema.JobData)
data[hostname] = hostdata
}
metricdata, ok := hostdata[metric]
if !ok {
metricdata = make(map[schema.MetricScope]*schema.JobMetric)
data[hostname][metric] = metricdata
}
// output per host, metric and scope
scopeData, ok := metricdata[scope]
if !ok {
scopeData = &schema.JobMetric{
Unit: metricConfig.Unit,
Timestep: metricConfig.Timestep,
Series: []schema.Series{pdb.RowToSeries(from, step, steps, row)},
}
data[hostname][metric][scope] = scopeData
}
}
}
}
t1 := time.Since(t0)
cclog.Debugf("LoadNodeListData of %v nodes took %s", len(data), t1)
return data, nil
}

View File

@@ -1,118 +0,0 @@
// Copyright (C) NHR@FAU, University Erlangen-Nuremberg.
// All rights reserved. This file is part of cc-backend.
// Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file.
package metricdata
import (
"context"
"encoding/json"
"time"
"github.com/ClusterCockpit/cc-lib/schema"
)
var TestLoadDataCallback func(job *schema.Job, metrics []string, scopes []schema.MetricScope, ctx context.Context, resolution int) (schema.JobData, error) = func(job *schema.Job, metrics []string, scopes []schema.MetricScope, ctx context.Context, resolution int) (schema.JobData, error) {
panic("TODO")
}
// TestMetricDataRepository is only a mock for unit-testing.
type TestMetricDataRepository struct{}
func (tmdr *TestMetricDataRepository) Init(_ json.RawMessage) error {
return nil
}
func (tmdr *TestMetricDataRepository) LoadData(
job *schema.Job,
metrics []string,
scopes []schema.MetricScope,
ctx context.Context,
resolution int,
) (schema.JobData, error) {
return TestLoadDataCallback(job, metrics, scopes, ctx, resolution)
}
func (tmdr *TestMetricDataRepository) LoadStats(
job *schema.Job,
metrics []string,
ctx context.Context,
) (map[string]map[string]schema.MetricStatistics, error) {
panic("TODO")
}
func (tmdr *TestMetricDataRepository) LoadScopedStats(
job *schema.Job,
metrics []string,
scopes []schema.MetricScope,
ctx context.Context,
) (schema.ScopedJobStats, error) {
panic("TODO")
}
func (tmdr *TestMetricDataRepository) LoadNodeData(
cluster string,
metrics, nodes []string,
scopes []schema.MetricScope,
from, to time.Time,
ctx context.Context,
) (map[string]map[string][]*schema.JobMetric, error) {
panic("TODO")
}
func (tmdr *TestMetricDataRepository) LoadNodeListData(
cluster, subCluster string,
nodes []string,
metrics []string,
scopes []schema.MetricScope,
resolution int,
from, to time.Time,
ctx context.Context,
) (map[string]schema.JobData, error) {
panic("TODO")
}
func DeepCopy(jdTemp schema.JobData) schema.JobData {
jd := make(schema.JobData, len(jdTemp))
for k, v := range jdTemp {
jd[k] = make(map[schema.MetricScope]*schema.JobMetric, len(jdTemp[k]))
for k_, v_ := range v {
jd[k][k_] = new(schema.JobMetric)
jd[k][k_].Series = make([]schema.Series, len(v_.Series))
for i := 0; i < len(v_.Series); i += 1 {
jd[k][k_].Series[i].Data = make([]schema.Float, len(v_.Series[i].Data))
copy(jd[k][k_].Series[i].Data, v_.Series[i].Data)
jd[k][k_].Series[i].Hostname = v_.Series[i].Hostname
jd[k][k_].Series[i].Id = v_.Series[i].Id
jd[k][k_].Series[i].Statistics.Avg = v_.Series[i].Statistics.Avg
jd[k][k_].Series[i].Statistics.Min = v_.Series[i].Statistics.Min
jd[k][k_].Series[i].Statistics.Max = v_.Series[i].Statistics.Max
}
jd[k][k_].Timestep = v_.Timestep
jd[k][k_].Unit.Base = v_.Unit.Base
jd[k][k_].Unit.Prefix = v_.Unit.Prefix
if v_.StatisticsSeries != nil {
// Init Slices
jd[k][k_].StatisticsSeries = new(schema.StatsSeries)
jd[k][k_].StatisticsSeries.Max = make([]schema.Float, len(v_.StatisticsSeries.Max))
jd[k][k_].StatisticsSeries.Min = make([]schema.Float, len(v_.StatisticsSeries.Min))
jd[k][k_].StatisticsSeries.Median = make([]schema.Float, len(v_.StatisticsSeries.Median))
jd[k][k_].StatisticsSeries.Mean = make([]schema.Float, len(v_.StatisticsSeries.Mean))
// Copy Data
copy(jd[k][k_].StatisticsSeries.Max, v_.StatisticsSeries.Max)
copy(jd[k][k_].StatisticsSeries.Min, v_.StatisticsSeries.Min)
copy(jd[k][k_].StatisticsSeries.Median, v_.StatisticsSeries.Median)
copy(jd[k][k_].StatisticsSeries.Mean, v_.StatisticsSeries.Mean)
// Handle Percentiles
for k__, v__ := range v_.StatisticsSeries.Percentiles {
jd[k][k_].StatisticsSeries.Percentiles[k__] = make([]schema.Float, len(v__))
copy(jd[k][k_].StatisticsSeries.Percentiles[k__], v__)
}
} else {
jd[k][k_].StatisticsSeries = v_.StatisticsSeries
}
}
}
return jd
}

View File

@@ -0,0 +1,490 @@
// Copyright (C) NHR@FAU, University Erlangen-Nuremberg.
// All rights reserved. This file is part of cc-backend.
// Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file.
// Package metricdispatch provides a unified interface for loading and caching job metric data.
//
// This package serves as a central dispatcher that routes metric data requests to the appropriate
// backend based on job state. For running jobs, data is fetched from the metric store (e.g., cc-metric-store).
// For completed jobs, data is retrieved from the file-based job archive.
//
// # Key Features
//
// - Automatic backend selection based on job state (running vs. archived)
// - LRU cache for performance optimization (128 MB default cache size)
// - Data resampling using Largest Triangle Three Bucket algorithm for archived data
// - Automatic statistics series generation for jobs with many nodes
// - Support for scoped metrics (node, socket, accelerator, core)
//
// # Cache Behavior
//
// Cached data has different TTL (time-to-live) values depending on job state:
// - Running jobs: 2 minutes (data changes frequently)
// - Completed jobs: 5 hours (data is static)
//
// The cache key is based on job ID, state, requested metrics, scopes, and resolution.
//
// # Usage
//
// The primary entry point is LoadData, which automatically handles both running and archived jobs:
//
// jobData, err := metricdispatch.LoadData(job, metrics, scopes, ctx, resolution)
// if err != nil {
// // Handle error
// }
//
// For statistics only, use LoadJobStats, LoadScopedJobStats, or LoadAverages depending on the required format.
package metricdispatch
import (
"context"
"fmt"
"math"
"time"
"github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/internal/metricstore"
"github.com/ClusterCockpit/cc-backend/pkg/archive"
cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/v2/lrucache"
"github.com/ClusterCockpit/cc-lib/v2/resampler"
"github.com/ClusterCockpit/cc-lib/v2/schema"
)
// cache is an LRU cache with 128 MB capacity for storing loaded job metric data.
// The cache reduces load on both the metric store and archive backends.
var cache *lrucache.Cache = lrucache.New(128 * 1024 * 1024)
// cacheKey generates a unique cache key for a job's metric data based on job ID, state,
// requested metrics, scopes, and resolution. Duration and StartTime are intentionally excluded
// because job.ID is more unique and the cache TTL ensures entries don't persist indefinitely.
func cacheKey(
job *schema.Job,
metrics []string,
scopes []schema.MetricScope,
resolution int,
) string {
return fmt.Sprintf("%d(%s):[%v],[%v]-%d",
job.ID, job.State, metrics, scopes, resolution)
}
// LoadData retrieves metric data for a job from the appropriate backend (memory store for running jobs,
// archive for completed jobs) and applies caching, resampling, and statistics generation as needed.
//
// For running jobs or when archive is disabled, data is fetched from the metric store.
// For completed archived jobs, data is loaded from the job archive and resampled if needed.
//
// Parameters:
// - job: The job for which to load metric data
// - metrics: List of metric names to load (nil loads all metrics for the cluster)
// - scopes: Metric scopes to include (nil defaults to node scope)
// - ctx: Context for cancellation and timeouts
// - resolution: Target number of data points for resampling (only applies to archived data)
//
// Returns the loaded job data and any error encountered. For partial errors (some metrics failed),
// the function returns the successfully loaded data with a warning logged.
func LoadData(job *schema.Job,
metrics []string,
scopes []schema.MetricScope,
ctx context.Context,
resolution int,
) (schema.JobData, error) {
data := cache.Get(cacheKey(job, metrics, scopes, resolution), func() (_ any, ttl time.Duration, size int) {
var jd schema.JobData
var err error
if job.State == schema.JobStateRunning ||
job.MonitoringStatus == schema.MonitoringStatusRunningOrArchiving ||
config.Keys.DisableArchive {
if scopes == nil {
scopes = append(scopes, schema.MetricScopeNode)
}
if metrics == nil {
cluster := archive.GetCluster(job.Cluster)
for _, mc := range cluster.MetricConfig {
metrics = append(metrics, mc.Name)
}
}
jd, err = metricstore.LoadData(job, metrics, scopes, ctx, resolution)
if err != nil {
if len(jd) != 0 {
cclog.Warnf("partial error loading metrics from store for job %d (user: %s, project: %s): %s",
job.JobID, job.User, job.Project, err.Error())
} else {
cclog.Errorf("failed to load job data from metric store for job %d (user: %s, project: %s): %s",
job.JobID, job.User, job.Project, err.Error())
return err, 0, 0
}
}
size = jd.Size()
} else {
var jdTemp schema.JobData
jdTemp, err = archive.GetHandle().LoadJobData(job)
if err != nil {
cclog.Errorf("failed to load job data from archive for job %d (user: %s, project: %s): %s",
job.JobID, job.User, job.Project, err.Error())
return err, 0, 0
}
jd = deepCopy(jdTemp)
// Resample archived data using Largest Triangle Three Bucket algorithm to reduce data points
// to the requested resolution, improving transfer performance and client-side rendering.
for _, v := range jd {
for _, v_ := range v {
timestep := int64(0)
for i := 0; i < len(v_.Series); i += 1 {
v_.Series[i].Data, timestep, err = resampler.LargestTriangleThreeBucket(v_.Series[i].Data, int64(v_.Timestep), int64(resolution))
if err != nil {
return err, 0, 0
}
}
v_.Timestep = int(timestep)
}
}
// Filter job data to only include requested metrics and scopes, avoiding unnecessary data transfer.
if metrics != nil || scopes != nil {
if metrics == nil {
metrics = make([]string, 0, len(jd))
for k := range jd {
metrics = append(metrics, k)
}
}
res := schema.JobData{}
for _, metric := range metrics {
if perscope, ok := jd[metric]; ok {
if len(perscope) > 1 {
subset := make(map[schema.MetricScope]*schema.JobMetric)
for _, scope := range scopes {
if jm, ok := perscope[scope]; ok {
subset[scope] = jm
}
}
if len(subset) > 0 {
perscope = subset
}
}
res[metric] = perscope
}
}
jd = res
}
size = jd.Size()
}
ttl = 5 * time.Hour
if job.State == schema.JobStateRunning {
ttl = 2 * time.Minute
}
// Generate statistics series for jobs with many nodes to enable min/median/max graphs
// instead of overwhelming the UI with individual node lines. Note that newly calculated
// statistics use min/median/max, while archived statistics may use min/mean/max.
const maxSeriesSize int = 15
for _, scopes := range jd {
for _, jm := range scopes {
if jm.StatisticsSeries != nil || len(jm.Series) <= maxSeriesSize {
continue
}
jm.AddStatisticsSeries()
}
}
nodeScopeRequested := false
for _, scope := range scopes {
if scope == schema.MetricScopeNode {
nodeScopeRequested = true
}
}
if nodeScopeRequested {
jd.AddNodeScope("flops_any")
jd.AddNodeScope("mem_bw")
}
// Round Resulting Stat Values
jd.RoundMetricStats()
return jd, ttl, size
})
if err, ok := data.(error); ok {
cclog.Errorf("error in cached dataset for job %d: %s", job.JobID, err.Error())
return nil, err
}
return data.(schema.JobData), nil
}
// LoadAverages computes average values for the specified metrics across all nodes of a job.
// For running jobs, it loads statistics from the metric store. For completed jobs, it uses
// the pre-calculated averages from the job archive. The results are appended to the data slice.
func LoadAverages(
job *schema.Job,
metrics []string,
data [][]schema.Float,
ctx context.Context,
) error {
if job.State != schema.JobStateRunning && !config.Keys.DisableArchive {
return archive.LoadAveragesFromArchive(job, metrics, data) // #166 change also here?
}
stats, err := metricstore.LoadStats(job, metrics, ctx)
if err != nil {
cclog.Errorf("failed to load statistics from metric store for job %d (user: %s, project: %s): %s",
job.JobID, job.User, job.Project, err.Error())
return err
}
for i, m := range metrics {
nodes, ok := stats[m]
if !ok {
data[i] = append(data[i], schema.NaN)
continue
}
sum := 0.0
for _, node := range nodes {
sum += node.Avg
}
data[i] = append(data[i], schema.Float(sum))
}
return nil
}
// LoadScopedJobStats retrieves job statistics organized by metric scope (node, socket, core, accelerator).
// For running jobs, statistics are computed from the metric store. For completed jobs, pre-calculated
// statistics are loaded from the job archive.
func LoadScopedJobStats(
job *schema.Job,
metrics []string,
scopes []schema.MetricScope,
ctx context.Context,
) (schema.ScopedJobStats, error) {
if job.State != schema.JobStateRunning && !config.Keys.DisableArchive {
return archive.LoadScopedStatsFromArchive(job, metrics, scopes)
}
scopedStats, err := metricstore.LoadScopedStats(job, metrics, scopes, ctx)
if err != nil {
cclog.Errorf("failed to load scoped statistics from metric store for job %d (user: %s, project: %s): %s",
job.JobID, job.User, job.Project, err.Error())
return nil, err
}
return scopedStats, nil
}
// LoadJobStats retrieves aggregated statistics (min/avg/max) for each requested metric across all job nodes.
// For running jobs, statistics are computed from the metric store. For completed jobs, pre-calculated
// statistics are loaded from the job archive.
func LoadJobStats(
job *schema.Job,
metrics []string,
ctx context.Context,
) (map[string]schema.MetricStatistics, error) {
if job.State != schema.JobStateRunning && !config.Keys.DisableArchive {
return archive.LoadStatsFromArchive(job, metrics)
}
data := make(map[string]schema.MetricStatistics, len(metrics))
stats, err := metricstore.LoadStats(job, metrics, ctx)
if err != nil {
cclog.Errorf("failed to load statistics from metric store for job %d (user: %s, project: %s): %s",
job.JobID, job.User, job.Project, err.Error())
return data, err
}
for _, m := range metrics {
sum, avg, min, max := 0.0, 0.0, 0.0, 0.0
nodes, ok := stats[m]
if !ok {
data[m] = schema.MetricStatistics{Min: min, Avg: avg, Max: max}
continue
}
for _, node := range nodes {
sum += node.Avg
min = math.Min(min, node.Min)
max = math.Max(max, node.Max)
}
data[m] = schema.MetricStatistics{
Avg: (math.Round((sum/float64(job.NumNodes))*100) / 100),
Min: (math.Round(min*100) / 100),
Max: (math.Round(max*100) / 100),
}
}
return data, nil
}
// LoadNodeData retrieves metric data for specific nodes in a cluster within a time range.
// This is used for node monitoring views and system status pages. Data is always fetched from
// the metric store (not the archive) since it's for current/recent node status monitoring.
//
// Returns a nested map structure: node -> metric -> scoped data.
func LoadNodeData(
cluster string,
metrics, nodes []string,
scopes []schema.MetricScope,
from, to time.Time,
ctx context.Context,
) (map[string]map[string][]*schema.JobMetric, error) {
if metrics == nil {
for _, m := range archive.GetCluster(cluster).MetricConfig {
metrics = append(metrics, m.Name)
}
}
data, err := metricstore.LoadNodeData(cluster, metrics, nodes, scopes, from, to, ctx)
if err != nil {
if len(data) != 0 {
cclog.Warnf("partial error loading node data from metric store for cluster %s: %s", cluster, err.Error())
} else {
cclog.Errorf("failed to load node data from metric store for cluster %s: %s", cluster, err.Error())
return nil, err
}
}
if data == nil {
return nil, fmt.Errorf("metric store for cluster '%s' does not support node data queries", cluster)
}
return data, nil
}
// LoadNodeListData retrieves time-series metric data for multiple nodes within a time range,
// with optional resampling and automatic statistics generation for large datasets.
// This is used for comparing multiple nodes or displaying node status over time.
//
// Returns a map of node names to their job-like metric data structures.
func LoadNodeListData(
cluster, subCluster string,
nodes []string,
metrics []string,
scopes []schema.MetricScope,
resolution int,
from, to time.Time,
ctx context.Context,
) (map[string]schema.JobData, error) {
if metrics == nil {
for _, m := range archive.GetCluster(cluster).MetricConfig {
metrics = append(metrics, m.Name)
}
}
data, err := metricstore.LoadNodeListData(cluster, subCluster, nodes, metrics, scopes, resolution, from, to, ctx)
if err != nil {
if len(data) != 0 {
cclog.Warnf("partial error loading node list data from metric store for cluster %s, subcluster %s: %s",
cluster, subCluster, err.Error())
} else {
cclog.Errorf("failed to load node list data from metric store for cluster %s, subcluster %s: %s",
cluster, subCluster, err.Error())
return nil, err
}
}
// Generate statistics series for datasets with many series to improve visualization performance.
// Statistics are calculated as min/median/max.
const maxSeriesSize int = 8
for _, jd := range data {
for _, scopes := range jd {
for _, jm := range scopes {
if jm.StatisticsSeries != nil || len(jm.Series) < maxSeriesSize {
continue
}
jm.AddStatisticsSeries()
}
}
}
if data == nil {
return nil, fmt.Errorf("metric store for cluster '%s' does not support node list queries", cluster)
}
return data, nil
}
// deepCopy creates a deep copy of JobData to prevent cache corruption when modifying
// archived data (e.g., during resampling). This ensures the cached archive data remains
// immutable while allowing per-request transformations.
func deepCopy(source schema.JobData) schema.JobData {
result := make(schema.JobData, len(source))
for metricName, scopeMap := range source {
result[metricName] = make(map[schema.MetricScope]*schema.JobMetric, len(scopeMap))
for scope, jobMetric := range scopeMap {
result[metricName][scope] = copyJobMetric(jobMetric)
}
}
return result
}
func copyJobMetric(src *schema.JobMetric) *schema.JobMetric {
dst := &schema.JobMetric{
Timestep: src.Timestep,
Unit: src.Unit,
Series: make([]schema.Series, len(src.Series)),
}
for i := range src.Series {
dst.Series[i] = copySeries(&src.Series[i])
}
if src.StatisticsSeries != nil {
dst.StatisticsSeries = copyStatisticsSeries(src.StatisticsSeries)
}
return dst
}
func copySeries(src *schema.Series) schema.Series {
dst := schema.Series{
Hostname: src.Hostname,
Id: src.Id,
Statistics: src.Statistics,
Data: make([]schema.Float, len(src.Data)),
}
copy(dst.Data, src.Data)
return dst
}
func copyStatisticsSeries(src *schema.StatsSeries) *schema.StatsSeries {
dst := &schema.StatsSeries{
Min: make([]schema.Float, len(src.Min)),
Mean: make([]schema.Float, len(src.Mean)),
Median: make([]schema.Float, len(src.Median)),
Max: make([]schema.Float, len(src.Max)),
}
copy(dst.Min, src.Min)
copy(dst.Mean, src.Mean)
copy(dst.Median, src.Median)
copy(dst.Max, src.Max)
if len(src.Percentiles) > 0 {
dst.Percentiles = make(map[int][]schema.Float, len(src.Percentiles))
for percentile, values := range src.Percentiles {
dst.Percentiles[percentile] = make([]schema.Float, len(values))
copy(dst.Percentiles[percentile], values)
}
}
return dst
}

View File

@@ -0,0 +1,125 @@
// Copyright (C) NHR@FAU, University Erlangen-Nuremberg.
// All rights reserved. This file is part of cc-backend.
// Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file.
package metricdispatch
import (
"testing"
"github.com/ClusterCockpit/cc-lib/v2/schema"
)
func TestDeepCopy(t *testing.T) {
nodeId := "0"
original := schema.JobData{
"cpu_load": {
schema.MetricScopeNode: &schema.JobMetric{
Timestep: 60,
Unit: schema.Unit{Base: "load", Prefix: ""},
Series: []schema.Series{
{
Hostname: "node001",
Id: &nodeId,
Data: []schema.Float{1.0, 2.0, 3.0},
Statistics: schema.MetricStatistics{
Min: 1.0,
Avg: 2.0,
Max: 3.0,
},
},
},
StatisticsSeries: &schema.StatsSeries{
Min: []schema.Float{1.0, 1.5, 2.0},
Mean: []schema.Float{2.0, 2.5, 3.0},
Median: []schema.Float{2.0, 2.5, 3.0},
Max: []schema.Float{3.0, 3.5, 4.0},
Percentiles: map[int][]schema.Float{
25: {1.5, 2.0, 2.5},
75: {2.5, 3.0, 3.5},
},
},
},
},
}
copied := deepCopy(original)
original["cpu_load"][schema.MetricScopeNode].Series[0].Data[0] = 999.0
original["cpu_load"][schema.MetricScopeNode].StatisticsSeries.Min[0] = 888.0
original["cpu_load"][schema.MetricScopeNode].StatisticsSeries.Percentiles[25][0] = 777.0
if copied["cpu_load"][schema.MetricScopeNode].Series[0].Data[0] != 1.0 {
t.Errorf("Series data was not deeply copied: got %v, want 1.0",
copied["cpu_load"][schema.MetricScopeNode].Series[0].Data[0])
}
if copied["cpu_load"][schema.MetricScopeNode].StatisticsSeries.Min[0] != 1.0 {
t.Errorf("StatisticsSeries was not deeply copied: got %v, want 1.0",
copied["cpu_load"][schema.MetricScopeNode].StatisticsSeries.Min[0])
}
if copied["cpu_load"][schema.MetricScopeNode].StatisticsSeries.Percentiles[25][0] != 1.5 {
t.Errorf("Percentiles was not deeply copied: got %v, want 1.5",
copied["cpu_load"][schema.MetricScopeNode].StatisticsSeries.Percentiles[25][0])
}
if copied["cpu_load"][schema.MetricScopeNode].Timestep != 60 {
t.Errorf("Timestep not copied correctly: got %v, want 60",
copied["cpu_load"][schema.MetricScopeNode].Timestep)
}
if copied["cpu_load"][schema.MetricScopeNode].Series[0].Hostname != "node001" {
t.Errorf("Hostname not copied correctly: got %v, want node001",
copied["cpu_load"][schema.MetricScopeNode].Series[0].Hostname)
}
}
func TestDeepCopyNilStatisticsSeries(t *testing.T) {
original := schema.JobData{
"mem_used": {
schema.MetricScopeNode: &schema.JobMetric{
Timestep: 60,
Series: []schema.Series{
{
Hostname: "node001",
Data: []schema.Float{1.0, 2.0},
},
},
StatisticsSeries: nil,
},
},
}
copied := deepCopy(original)
if copied["mem_used"][schema.MetricScopeNode].StatisticsSeries != nil {
t.Errorf("StatisticsSeries should be nil, got %v",
copied["mem_used"][schema.MetricScopeNode].StatisticsSeries)
}
}
func TestDeepCopyEmptyPercentiles(t *testing.T) {
original := schema.JobData{
"cpu_load": {
schema.MetricScopeNode: &schema.JobMetric{
Timestep: 60,
Series: []schema.Series{},
StatisticsSeries: &schema.StatsSeries{
Min: []schema.Float{1.0},
Mean: []schema.Float{2.0},
Median: []schema.Float{2.0},
Max: []schema.Float{3.0},
Percentiles: nil,
},
},
},
}
copied := deepCopy(original)
if copied["cpu_load"][schema.MetricScopeNode].StatisticsSeries.Percentiles != nil {
t.Errorf("Percentiles should be nil when source is nil/empty")
}
}

View File

@@ -3,14 +3,15 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"errors" "errors"
"fmt"
"math" "math"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/ClusterCockpit/cc-lib/util" "github.com/ClusterCockpit/cc-lib/v2/util"
) )
var ( var (
@@ -124,6 +125,9 @@ func FetchData(req APIQueryRequest) (*APIQueryResponse, error) {
req.WithData = true req.WithData = true
ms := GetMemoryStore() ms := GetMemoryStore()
if ms == nil {
return nil, fmt.Errorf("memorystore not initialized")
}
response := APIQueryResponse{ response := APIQueryResponse{
Results: make([][]APIMetricData, 0, len(req.Queries)), Results: make([][]APIMetricData, 0, len(req.Queries)),

View File

@@ -3,7 +3,7 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"archive/zip" "archive/zip"
@@ -18,13 +18,13 @@ import (
"sync/atomic" "sync/atomic"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
) )
func Archiving(wg *sync.WaitGroup, ctx context.Context) { func Archiving(wg *sync.WaitGroup, ctx context.Context) {
go func() { go func() {
defer wg.Done() defer wg.Done()
d, err := time.ParseDuration(Keys.Archive.Interval) d, err := time.ParseDuration(Keys.Archive.ArchiveInterval)
if err != nil { if err != nil {
cclog.Fatalf("[METRICSTORE]> error parsing archive interval duration: %v\n", err) cclog.Fatalf("[METRICSTORE]> error parsing archive interval duration: %v\n", err)
} }

View File

@@ -3,7 +3,7 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"bufio" "bufio"
@@ -19,13 +19,15 @@ import (
"sync/atomic" "sync/atomic"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/linkedin/goavro/v2" "github.com/linkedin/goavro/v2"
) )
var NumAvroWorkers int = DefaultAvroWorkers var (
var startUp bool = true NumAvroWorkers int = DefaultAvroWorkers
startUp bool = true
)
func (as *AvroStore) ToCheckpoint(dir string, dumpAll bool) (int, error) { func (as *AvroStore) ToCheckpoint(dir string, dumpAll bool) (int, error) {
levels := make([]*AvroLevel, 0) levels := make([]*AvroLevel, 0)

View File

@@ -3,7 +3,7 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"context" "context"
@@ -11,7 +11,7 @@ import (
"strconv" "strconv"
"sync" "sync"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
) )
func DataStaging(wg *sync.WaitGroup, ctx context.Context) { func DataStaging(wg *sync.WaitGroup, ctx context.Context) {
@@ -30,8 +30,51 @@ func DataStaging(wg *sync.WaitGroup, ctx context.Context) {
for { for {
select { select {
case <-ctx.Done(): case <-ctx.Done():
return // Drain any remaining messages in channel before exiting
case val := <-LineProtocolMessages: for {
select {
case val, ok := <-LineProtocolMessages:
if !ok {
// Channel closed
return
}
// Process remaining message
freq, err := GetMetricFrequency(val.MetricName)
if err != nil {
continue
}
metricName := ""
for _, selectorName := range val.Selector {
metricName += selectorName + SelectorDelimiter
}
metricName += val.MetricName
var selector []string
selector = append(selector, val.Cluster, val.Node, strconv.FormatInt(freq, 10))
if !stringSlicesEqual(oldSelector, selector) {
avroLevel = avroStore.root.findAvroLevelOrCreate(selector)
if avroLevel == nil {
cclog.Errorf("Error creating or finding the level with cluster : %s, node : %s, metric : %s\n", val.Cluster, val.Node, val.MetricName)
}
oldSelector = slices.Clone(selector)
}
if avroLevel != nil {
avroLevel.addMetric(metricName, val.Value, val.Timestamp, int(freq))
}
default:
// No more messages, exit
return
}
}
case val, ok := <-LineProtocolMessages:
if !ok {
// Channel closed, exit gracefully
return
}
// Fetch the frequency of the metric from the global configuration // Fetch the frequency of the metric from the global configuration
freq, err := GetMetricFrequency(val.MetricName) freq, err := GetMetricFrequency(val.MetricName)
if err != nil { if err != nil {
@@ -65,7 +108,9 @@ func DataStaging(wg *sync.WaitGroup, ctx context.Context) {
oldSelector = slices.Clone(selector) oldSelector = slices.Clone(selector)
} }
avroLevel.addMetric(metricName, val.Value, val.Timestamp, int(freq)) if avroLevel != nil {
avroLevel.addMetric(metricName, val.Value, val.Timestamp, int(freq))
}
} }
} }
}() }()

View File

@@ -3,12 +3,12 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"sync" "sync"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
) )
var ( var (

View File

@@ -3,13 +3,13 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"errors" "errors"
"sync" "sync"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
) )
// BufferCap is the default buffer capacity. // BufferCap is the default buffer capacity.

View File

@@ -3,7 +3,7 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"bufio" "bufio"
@@ -23,8 +23,8 @@ import (
"sync/atomic" "sync/atomic"
"time" "time"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/linkedin/goavro/v2" "github.com/linkedin/goavro/v2"
) )
@@ -408,7 +408,6 @@ func (m *MemoryStore) FromCheckpointFiles(dir string, from int64) (int, error) {
return m.FromCheckpoint(dir, from, altFormat) return m.FromCheckpoint(dir, from, altFormat)
} }
cclog.Print("[METRICSTORE]> No valid checkpoint files found in the directory")
return 0, nil return 0, nil
} }

View File

@@ -3,7 +3,7 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"fmt" "fmt"
@@ -19,38 +19,49 @@ const (
DefaultAvroCheckpointInterval = time.Minute DefaultAvroCheckpointInterval = time.Minute
) )
var InternalCCMSFlag bool = false type Checkpoints struct {
FileFormat string `json:"file-format"`
Interval string `json:"interval"`
RootDir string `json:"directory"`
}
type Debug struct {
DumpToFile string `json:"dump-to-file"`
EnableGops bool `json:"gops"`
}
type Archive struct {
ArchiveInterval string `json:"interval"`
RootDir string `json:"directory"`
DeleteInstead bool `json:"delete-instead"`
}
type Subscriptions []struct {
// Channel name
SubscribeTo string `json:"subscribe-to"`
// Allow lines without a cluster tag, use this as default, optional
ClusterTag string `json:"cluster-tag"`
}
type MetricStoreConfig struct { type MetricStoreConfig struct {
// Number of concurrent workers for checkpoint and archive operations. // Number of concurrent workers for checkpoint and archive operations.
// If not set or 0, defaults to min(runtime.NumCPU()/2+1, 10) // If not set or 0, defaults to min(runtime.NumCPU()/2+1, 10)
NumWorkers int `json:"num-workers"` NumWorkers int `json:"num-workers"`
Checkpoints struct { RetentionInMemory string `json:"retention-in-memory"`
FileFormat string `json:"file-format"` MemoryCap int `json:"memory-cap"`
Interval string `json:"interval"` Checkpoints Checkpoints `json:"checkpoints"`
RootDir string `json:"directory"` Debug *Debug `json:"debug"`
Restore string `json:"restore"` Archive *Archive `json:"archive"`
} `json:"checkpoints"` Subscriptions *Subscriptions `json:"nats-subscriptions"`
Debug struct {
DumpToFile string `json:"dump-to-file"`
EnableGops bool `json:"gops"`
} `json:"debug"`
RetentionInMemory string `json:"retention-in-memory"`
Archive struct {
Interval string `json:"interval"`
RootDir string `json:"directory"`
DeleteInstead bool `json:"delete-instead"`
} `json:"archive"`
Subscriptions []struct {
// Channel name
SubscribeTo string `json:"subscribe-to"`
// Allow lines without a cluster tag, use this as default, optional
ClusterTag string `json:"cluster-tag"`
} `json:"subscriptions"`
} }
var Keys MetricStoreConfig var Keys MetricStoreConfig = MetricStoreConfig{
Checkpoints: Checkpoints{
FileFormat: "avro",
RootDir: "./var/checkpoints",
},
}
// AggregationStrategy for aggregation over multiple values at different cpus/sockets/..., not time! // AggregationStrategy for aggregation over multiple values at different cpus/sockets/..., not time!
type AggregationStrategy int type AggregationStrategy int

View File

@@ -0,0 +1,77 @@
// Copyright (C) NHR@FAU, University Erlangen-Nuremberg.
// All rights reserved. This file is part of cc-backend.
// Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file.
package metricstore
const configSchema = `{
"type": "object",
"description": "Configuration specific to built-in metric-store.",
"properties": {
"num-workers": {
"description": "Number of concurrent workers for checkpoint and archive operations",
"type": "integer"
},
"checkpoints": {
"description": "Configuration for checkpointing the metrics within metric-store",
"type": "object",
"properties": {
"file-format": {
"description": "Specify the type of checkpoint file. There are 2 variants: 'avro' and 'json'. If nothing is specified, 'avro' is default.",
"type": "string"
},
"interval": {
"description": "Interval at which the metrics should be checkpointed.",
"type": "string"
},
"directory": {
"description": "Specify the parent directy in which the checkpointed files should be placed.",
"type": "string"
}
},
"required": ["interval"]
},
"archive": {
"description": "Configuration for archiving the already checkpointed files.",
"type": "object",
"properties": {
"interval": {
"description": "Interval at which the checkpointed files should be archived.",
"type": "string"
},
"directory": {
"description": "Specify the directy in which the archived files should be placed.",
"type": "string"
}
},
"required": ["interval", "directory"]
},
"retention-in-memory": {
"description": "Keep the metrics within memory for given time interval. Retention for X hours, then the metrics would be freed.",
"type": "string"
},
"memory-cap": {
"description": "Upper memory capacity limit used by metricstore in GB",
"type": "integer"
},
"nats-subscriptions": {
"description": "Array of various subscriptions. Allows to subscibe to different subjects and publishers.",
"type": "array",
"items": {
"type": "object",
"properties": {
"subscribe-to": {
"description": "Channel name",
"type": "string"
},
"cluster-tag": {
"description": "Optional: Allow lines without a cluster tag, use this as default",
"type": "string"
}
}
}
}
},
"required": ["checkpoints", "retention-in-memory"]
}`

View File

@@ -3,7 +3,7 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"bufio" "bufio"

View File

@@ -3,7 +3,7 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"bufio" "bufio"

View File

@@ -3,13 +3,13 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"sync" "sync"
"unsafe" "unsafe"
"github.com/ClusterCockpit/cc-lib/util" "github.com/ClusterCockpit/cc-lib/v2/util"
) )
// Could also be called "node" as this forms a node in a tree structure. // Could also be called "node" as this forms a node in a tree structure.
@@ -72,6 +72,29 @@ func (l *Level) findLevelOrCreate(selector []string, nMetrics int) *Level {
return child.findLevelOrCreate(selector[1:], nMetrics) return child.findLevelOrCreate(selector[1:], nMetrics)
} }
func (l *Level) collectPaths(currentDepth, targetDepth int, currentPath []string, results *[][]string) {
l.lock.RLock()
defer l.lock.RUnlock()
for key, child := range l.children {
if child == nil {
continue
}
// We explicitly make a new slice and copy data to avoid sharing underlying arrays between siblings
newPath := make([]string, len(currentPath))
copy(newPath, currentPath)
newPath = append(newPath, key)
// Check depth, and just return if depth reached
if currentDepth+1 == targetDepth {
*results = append(*results, newPath)
} else {
child.collectPaths(currentDepth+1, targetDepth, newPath, results)
}
}
}
func (l *Level) free(t int64) (int, error) { func (l *Level) free(t int64) (int, error) {
l.lock.Lock() l.lock.Lock()
defer l.lock.Unlock() defer l.lock.Unlock()

View File

@@ -3,7 +3,7 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"context" "context"
@@ -12,8 +12,8 @@ import (
"time" "time"
"github.com/ClusterCockpit/cc-backend/pkg/nats" "github.com/ClusterCockpit/cc-backend/pkg/nats"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/influxdata/line-protocol/v2/lineprotocol" "github.com/influxdata/line-protocol/v2/lineprotocol"
) )
@@ -29,29 +29,30 @@ func ReceiveNats(ms *MemoryStore,
} }
var wg sync.WaitGroup var wg sync.WaitGroup
msgs := make(chan []byte, workers*2) msgs := make(chan []byte, workers*2)
for _, sc := range Keys.Subscriptions { for _, sc := range *Keys.Subscriptions {
clusterTag := sc.ClusterTag clusterTag := sc.ClusterTag
if workers > 1 { if workers > 1 {
wg.Add(workers) wg.Add(workers)
for range workers { for range workers {
go func() { go func() {
defer wg.Done()
for m := range msgs { for m := range msgs {
dec := lineprotocol.NewDecoderWithBytes(m) dec := lineprotocol.NewDecoderWithBytes(m)
if err := DecodeLine(dec, ms, clusterTag); err != nil { if err := DecodeLine(dec, ms, clusterTag); err != nil {
cclog.Errorf("error: %s", err.Error()) cclog.Errorf("error: %s", err.Error())
} }
} }
wg.Done()
}() }()
} }
nc.Subscribe(sc.SubscribeTo, func(subject string, data []byte) { nc.Subscribe(sc.SubscribeTo, func(subject string, data []byte) {
msgs <- data select {
case msgs <- data:
case <-ctx.Done():
}
}) })
} else { } else {
nc.Subscribe(sc.SubscribeTo, func(subject string, data []byte) { nc.Subscribe(sc.SubscribeTo, func(subject string, data []byte) {
@@ -64,7 +65,11 @@ func ReceiveNats(ms *MemoryStore,
cclog.Infof("NATS subscription to '%s' established", sc.SubscribeTo) cclog.Infof("NATS subscription to '%s' established", sc.SubscribeTo)
} }
close(msgs) go func() {
<-ctx.Done()
close(msgs)
}()
wg.Wait() wg.Wait()
return nil return nil

View File

@@ -3,7 +3,7 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
// Package memorystore provides an efficient in-memory time-series metric storage system // Package metricstore provides an efficient in-memory time-series metric storage system
// with support for hierarchical data organization, checkpointing, and archiving. // with support for hierarchical data organization, checkpointing, and archiving.
// //
// The package organizes metrics in a tree structure (cluster → host → component) and // The package organizes metrics in a tree structure (cluster → host → component) and
@@ -17,7 +17,7 @@
// - Concurrent checkpoint/archive workers // - Concurrent checkpoint/archive workers
// - Support for sum and average aggregation // - Support for sum and average aggregation
// - NATS integration for metric ingestion // - NATS integration for metric ingestion
package memorystore package metricstore
import ( import (
"bytes" "bytes"
@@ -25,15 +25,16 @@ import (
"encoding/json" "encoding/json"
"errors" "errors"
"runtime" "runtime"
"slices"
"sync" "sync"
"time" "time"
"github.com/ClusterCockpit/cc-backend/internal/config" "github.com/ClusterCockpit/cc-backend/internal/config"
"github.com/ClusterCockpit/cc-backend/pkg/archive" "github.com/ClusterCockpit/cc-backend/pkg/archive"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/resampler" "github.com/ClusterCockpit/cc-lib/v2/resampler"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
"github.com/ClusterCockpit/cc-lib/util" "github.com/ClusterCockpit/cc-lib/v2/util"
) )
var ( var (
@@ -44,6 +45,15 @@ var (
shutdownFunc context.CancelFunc shutdownFunc context.CancelFunc
) )
// NodeProvider provides information about nodes currently in use by running jobs.
// This interface allows metricstore to query job information without directly
// depending on the repository package, breaking the import cycle.
type NodeProvider interface {
// GetUsedNodes returns a map of cluster names to sorted lists of unique hostnames
// that are currently in use by jobs that started before the given timestamp.
GetUsedNodes(ts int64) (map[string][]string, error)
}
type Metric struct { type Metric struct {
Name string Name string
Value schema.Float Value schema.Float
@@ -51,8 +61,9 @@ type Metric struct {
} }
type MemoryStore struct { type MemoryStore struct {
Metrics map[string]MetricConfig Metrics map[string]MetricConfig
root Level root Level
nodeProvider NodeProvider // Injected dependency for querying running jobs
} }
func Init(rawConfig json.RawMessage, wg *sync.WaitGroup) { func Init(rawConfig json.RawMessage, wg *sync.WaitGroup) {
@@ -61,7 +72,7 @@ func Init(rawConfig json.RawMessage, wg *sync.WaitGroup) {
if rawConfig != nil { if rawConfig != nil {
config.Validate(configSchema, rawConfig) config.Validate(configSchema, rawConfig)
dec := json.NewDecoder(bytes.NewReader(rawConfig)) dec := json.NewDecoder(bytes.NewReader(rawConfig))
// dec.DisallowUnknownFields() dec.DisallowUnknownFields()
if err := dec.Decode(&Keys); err != nil { if err := dec.Decode(&Keys); err != nil {
cclog.Abortf("[METRICSTORE]> Metric Store Config Init: Could not decode config file '%s'.\nError: %s\n", rawConfig, err.Error()) cclog.Abortf("[METRICSTORE]> Metric Store Config Init: Could not decode config file '%s'.\nError: %s\n", rawConfig, err.Error())
} }
@@ -74,7 +85,7 @@ func Init(rawConfig json.RawMessage, wg *sync.WaitGroup) {
cclog.Debugf("[METRICSTORE]> Using %d workers for checkpoint/archive operations\n", Keys.NumWorkers) cclog.Debugf("[METRICSTORE]> Using %d workers for checkpoint/archive operations\n", Keys.NumWorkers)
// Helper function to add metric configuration // Helper function to add metric configuration
addMetricConfig := func(mc schema.MetricConfig) { addMetricConfig := func(mc *schema.MetricConfig) {
agg, err := AssignAggregationStrategy(mc.Aggregation) agg, err := AssignAggregationStrategy(mc.Aggregation)
if err != nil { if err != nil {
cclog.Warnf("Could not find aggregation strategy for metric config '%s': %s", mc.Name, err.Error()) cclog.Warnf("Could not find aggregation strategy for metric config '%s': %s", mc.Name, err.Error())
@@ -88,7 +99,7 @@ func Init(rawConfig json.RawMessage, wg *sync.WaitGroup) {
for _, c := range archive.Clusters { for _, c := range archive.Clusters {
for _, mc := range c.MetricConfig { for _, mc := range c.MetricConfig {
addMetricConfig(*mc) addMetricConfig(mc)
} }
for _, sc := range c.SubClusters { for _, sc := range c.SubClusters {
@@ -103,7 +114,7 @@ func Init(rawConfig json.RawMessage, wg *sync.WaitGroup) {
ms := GetMemoryStore() ms := GetMemoryStore()
d, err := time.ParseDuration(Keys.Checkpoints.Restore) d, err := time.ParseDuration(Keys.RetentionInMemory)
if err != nil { if err != nil {
cclog.Fatal(err) cclog.Fatal(err)
} }
@@ -128,11 +139,21 @@ func Init(rawConfig json.RawMessage, wg *sync.WaitGroup) {
ctx, shutdown := context.WithCancel(context.Background()) ctx, shutdown := context.WithCancel(context.Background())
wg.Add(4) retentionGoroutines := 1
checkpointingGoroutines := 1
dataStagingGoroutines := 1
archivingGoroutines := 0
if Keys.Archive != nil {
archivingGoroutines = 1
}
totalGoroutines := retentionGoroutines + checkpointingGoroutines + dataStagingGoroutines + archivingGoroutines
wg.Add(totalGoroutines)
Retention(wg, ctx) Retention(wg, ctx)
Checkpointing(wg, ctx) Checkpointing(wg, ctx)
Archiving(wg, ctx) if Keys.Archive != nil {
Archiving(wg, ctx)
}
DataStaging(wg, ctx) DataStaging(wg, ctx)
// Note: Signal handling has been removed from this function. // Note: Signal handling has been removed from this function.
@@ -141,9 +162,11 @@ func Init(rawConfig json.RawMessage, wg *sync.WaitGroup) {
// Store the shutdown function for later use by Shutdown() // Store the shutdown function for later use by Shutdown()
shutdownFunc = shutdown shutdownFunc = shutdown
err = ReceiveNats(ms, 1, ctx) if Keys.Subscriptions != nil {
if err != nil { err = ReceiveNats(ms, 1, ctx)
cclog.Fatal(err) if err != nil {
cclog.Fatal(err)
}
} }
} }
@@ -183,12 +206,23 @@ func GetMemoryStore() *MemoryStore {
return msInstance return msInstance
} }
// SetNodeProvider sets the NodeProvider implementation for the MemoryStore.
// This must be called during initialization to provide job state information
// for selective buffer retention during Free operations.
// If not set, the Free function will fall back to freeing all buffers.
func (ms *MemoryStore) SetNodeProvider(provider NodeProvider) {
ms.nodeProvider = provider
}
func Shutdown() { func Shutdown() {
// Cancel the context to signal all background goroutines to stop
if shutdownFunc != nil { if shutdownFunc != nil {
shutdownFunc() shutdownFunc()
} }
if Keys.Checkpoints.FileFormat != "json" {
close(LineProtocolMessages)
}
cclog.Infof("[METRICSTORE]> Writing to '%s'...\n", Keys.Checkpoints.RootDir) cclog.Infof("[METRICSTORE]> Writing to '%s'...\n", Keys.Checkpoints.RootDir)
var files int var files int
var err error var err error
@@ -199,7 +233,6 @@ func Shutdown() {
files, err = ms.ToCheckpoint(Keys.Checkpoints.RootDir, lastCheckpoint.Unix(), time.Now().Unix()) files, err = ms.ToCheckpoint(Keys.Checkpoints.RootDir, lastCheckpoint.Unix(), time.Now().Unix())
} else { } else {
files, err = GetAvroStore().ToCheckpoint(Keys.Checkpoints.RootDir, true) files, err = GetAvroStore().ToCheckpoint(Keys.Checkpoints.RootDir, true)
close(LineProtocolMessages)
} }
if err != nil { if err != nil {
@@ -208,15 +241,6 @@ func Shutdown() {
cclog.Infof("[METRICSTORE]> Done! (%d files written)\n", files) cclog.Infof("[METRICSTORE]> Done! (%d files written)\n", files)
} }
func getName(m *MemoryStore, i int) string {
for key, val := range m.Metrics {
if val.offset == i {
return key
}
}
return ""
}
func Retention(wg *sync.WaitGroup, ctx context.Context) { func Retention(wg *sync.WaitGroup, ctx context.Context) {
ms := GetMemoryStore() ms := GetMemoryStore()
@@ -244,7 +268,8 @@ func Retention(wg *sync.WaitGroup, ctx context.Context) {
case <-ticker.C: case <-ticker.C:
t := time.Now().Add(-d) t := time.Now().Add(-d)
cclog.Infof("[METRICSTORE]> start freeing buffers (older than %s)...\n", t.Format(time.RFC3339)) cclog.Infof("[METRICSTORE]> start freeing buffers (older than %s)...\n", t.Format(time.RFC3339))
freed, err := ms.Free(nil, t.Unix())
freed, err := Free(ms, t)
if err != nil { if err != nil {
cclog.Errorf("[METRICSTORE]> freeing up buffers failed: %s\n", err.Error()) cclog.Errorf("[METRICSTORE]> freeing up buffers failed: %s\n", err.Error())
} else { } else {
@@ -255,6 +280,104 @@ func Retention(wg *sync.WaitGroup, ctx context.Context) {
}() }()
} }
func Free(ms *MemoryStore, t time.Time) (int, error) {
// If no NodeProvider is configured, free all buffers older than t
if ms.nodeProvider == nil {
return ms.Free(nil, t.Unix())
}
excludeSelectors, err := ms.nodeProvider.GetUsedNodes(t.Unix())
if err != nil {
return 0, err
}
// excludeSelectors := make(map[string][]string, 0)
// excludeSelectors := map[string][]string{
// "alex": {"a0122", "a0123", "a0225"},
// "fritz": {"f0201", "f0202"},
// }
switch lenMap := len(excludeSelectors); lenMap {
// If the length of the map returned by GetUsedNodes() is 0,
// then use default Free method with nil selector
case 0:
return ms.Free(nil, t.Unix())
// Else formulate selectors, exclude those from the map
// and free the rest of the selectors
default:
selectors := GetSelectors(ms, excludeSelectors)
return FreeSelected(ms, selectors, t)
}
}
// A function to free specific selectors. Used when we want to retain some specific nodes
// beyond the retention time.
func FreeSelected(ms *MemoryStore, selectors [][]string, t time.Time) (int, error) {
freed := 0
for _, selector := range selectors {
freedBuffers, err := ms.Free(selector, t.Unix())
if err != nil {
cclog.Errorf("error while freeing selected buffers: %#v", err)
}
freed += freedBuffers
}
return freed, nil
}
// This function will populate all the second last levels - meaning nodes
// From that we can exclude the specific selectosr/node we want to retain.
func GetSelectors(ms *MemoryStore, excludeSelectors map[string][]string) [][]string {
allSelectors := ms.GetPaths(2)
filteredSelectors := make([][]string, 0, len(allSelectors))
for _, path := range allSelectors {
if len(path) < 2 {
continue
}
key := path[0] // The "Key" (Level 1)
value := path[1] // The "Value" (Level 2)
exclude := false
// Check if the key exists in our exclusion map
if excludedValues, exists := excludeSelectors[key]; exists {
// The key exists, now check if the specific value is in the exclusion list
if slices.Contains(excludedValues, value) {
exclude = true
}
}
if !exclude {
filteredSelectors = append(filteredSelectors, path)
}
}
// fmt.Printf("All selectors: %#v\n\n", allSelectors)
// fmt.Printf("filteredSelectors: %#v\n\n", filteredSelectors)
return filteredSelectors
}
// GetPaths returns a list of lists (paths) to the specified depth.
func (ms *MemoryStore) GetPaths(targetDepth int) [][]string {
var results [][]string
// Start recursion. Initial path is empty.
// We treat Root as depth 0.
ms.root.collectPaths(0, targetDepth, []string{}, &results)
return results
}
// Write all values in `metrics` to the level specified by `selector` for time `ts`. // Write all values in `metrics` to the level specified by `selector` for time `ts`.
// Look at `findLevelOrCreate` for how selectors work. // Look at `findLevelOrCreate` for how selectors work.
func (m *MemoryStore) Write(selector []string, ts int64, metrics []Metric) error { func (m *MemoryStore) Write(selector []string, ts int64, metrics []Metric) error {

View File

@@ -3,12 +3,12 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"testing" "testing"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
) )
func TestAssignAggregationStrategy(t *testing.T) { func TestAssignAggregationStrategy(t *testing.T) {
@@ -131,7 +131,7 @@ func TestBufferWrite(t *testing.T) {
func TestBufferRead(t *testing.T) { func TestBufferRead(t *testing.T) {
b := newBuffer(100, 10) b := newBuffer(100, 10)
// Write some test data // Write some test data
b.write(100, schema.Float(1.0)) b.write(100, schema.Float(1.0))
b.write(110, schema.Float(2.0)) b.write(110, schema.Float(2.0))

View File

@@ -3,56 +3,41 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package metricdata package metricstore
import ( import (
"context" "context"
"encoding/json"
"fmt" "fmt"
"strconv" "strconv"
"strings" "strings"
"time" "time"
"github.com/ClusterCockpit/cc-backend/internal/memorystore"
"github.com/ClusterCockpit/cc-backend/pkg/archive" "github.com/ClusterCockpit/cc-backend/pkg/archive"
cclog "github.com/ClusterCockpit/cc-lib/ccLogger" cclog "github.com/ClusterCockpit/cc-lib/v2/ccLogger"
"github.com/ClusterCockpit/cc-lib/schema" "github.com/ClusterCockpit/cc-lib/v2/schema"
) )
// Bloat Code // TestLoadDataCallback allows tests to override LoadData behavior
type CCMetricStoreConfigInternal struct { var TestLoadDataCallback func(job *schema.Job, metrics []string, scopes []schema.MetricScope, ctx context.Context, resolution int) (schema.JobData, error)
Kind string `json:"kind"`
Url string `json:"url"`
Token string `json:"token"`
// If metrics are known to this MetricDataRepository under a different func LoadData(
// name than in the `metricConfig` section of the 'cluster.json',
// provide this optional mapping of local to remote name for this metric.
Renamings map[string]string `json:"metricRenamings"`
}
// Bloat Code
type CCMetricStoreInternal struct{}
// Bloat Code
func (ccms *CCMetricStoreInternal) Init(rawConfig json.RawMessage) error {
return nil
}
func (ccms *CCMetricStoreInternal) LoadData(
job *schema.Job, job *schema.Job,
metrics []string, metrics []string,
scopes []schema.MetricScope, scopes []schema.MetricScope,
ctx context.Context, ctx context.Context,
resolution int, resolution int,
) (schema.JobData, error) { ) (schema.JobData, error) {
queries, assignedScope, err := ccms.buildQueries(job, metrics, scopes, int64(resolution)) if TestLoadDataCallback != nil {
return TestLoadDataCallback(job, metrics, scopes, ctx, resolution)
}
queries, assignedScope, err := buildQueries(job, metrics, scopes, int64(resolution))
if err != nil { if err != nil {
cclog.Errorf("Error while building queries for jobId %d, Metrics %v, Scopes %v: %s", job.JobID, metrics, scopes, err.Error()) cclog.Errorf("Error while building queries for jobId %d, Metrics %v, Scopes %v: %s", job.JobID, metrics, scopes, err.Error())
return nil, err return nil, err
} }
req := memorystore.APIQueryRequest{ req := APIQueryRequest{
Cluster: job.Cluster, Cluster: job.Cluster,
From: job.StartTime, From: job.StartTime,
To: job.StartTime + int64(job.Duration), To: job.StartTime + int64(job.Duration),
@@ -61,7 +46,7 @@ func (ccms *CCMetricStoreInternal) LoadData(
WithData: true, WithData: true,
} }
resBody, err := memorystore.FetchData(req) resBody, err := FetchData(req)
if err != nil { if err != nil {
cclog.Errorf("Error while fetching data : %s", err.Error()) cclog.Errorf("Error while fetching data : %s", err.Error())
return nil, err return nil, err
@@ -149,13 +134,13 @@ var (
acceleratorString = string(schema.MetricScopeAccelerator) acceleratorString = string(schema.MetricScopeAccelerator)
) )
func (ccms *CCMetricStoreInternal) buildQueries( func buildQueries(
job *schema.Job, job *schema.Job,
metrics []string, metrics []string,
scopes []schema.MetricScope, scopes []schema.MetricScope,
resolution int64, resolution int64,
) ([]memorystore.APIQuery, []schema.MetricScope, error) { ) ([]APIQuery, []schema.MetricScope, error) {
queries := make([]memorystore.APIQuery, 0, len(metrics)*len(scopes)*len(job.Resources)) queries := make([]APIQuery, 0, len(metrics)*len(scopes)*len(job.Resources))
assignedScope := []schema.MetricScope{} assignedScope := []schema.MetricScope{}
subcluster, scerr := archive.GetSubCluster(job.Cluster, job.SubCluster) subcluster, scerr := archive.GetSubCluster(job.Cluster, job.SubCluster)
@@ -217,7 +202,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
continue continue
} }
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: false, Aggregate: false,
@@ -235,7 +220,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
continue continue
} }
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: true, Aggregate: true,
@@ -249,7 +234,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
// HWThread -> HWThead // HWThread -> HWThead
if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeHWThread { if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeHWThread {
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: false, Aggregate: false,
@@ -265,7 +250,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeCore { if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeCore {
cores, _ := topology.GetCoresFromHWThreads(hwthreads) cores, _ := topology.GetCoresFromHWThreads(hwthreads)
for _, core := range cores { for _, core := range cores {
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: true, Aggregate: true,
@@ -282,7 +267,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeSocket { if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeSocket {
sockets, _ := topology.GetSocketsFromHWThreads(hwthreads) sockets, _ := topology.GetSocketsFromHWThreads(hwthreads)
for _, socket := range sockets { for _, socket := range sockets {
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: true, Aggregate: true,
@@ -297,7 +282,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
// HWThread -> Node // HWThread -> Node
if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeNode { if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeNode {
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: true, Aggregate: true,
@@ -312,7 +297,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
// Core -> Core // Core -> Core
if nativeScope == schema.MetricScopeCore && scope == schema.MetricScopeCore { if nativeScope == schema.MetricScopeCore && scope == schema.MetricScopeCore {
cores, _ := topology.GetCoresFromHWThreads(hwthreads) cores, _ := topology.GetCoresFromHWThreads(hwthreads)
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: false, Aggregate: false,
@@ -328,7 +313,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
if nativeScope == schema.MetricScopeCore && scope == schema.MetricScopeSocket { if nativeScope == schema.MetricScopeCore && scope == schema.MetricScopeSocket {
sockets, _ := topology.GetSocketsFromCores(hwthreads) sockets, _ := topology.GetSocketsFromCores(hwthreads)
for _, socket := range sockets { for _, socket := range sockets {
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: true, Aggregate: true,
@@ -344,7 +329,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
// Core -> Node // Core -> Node
if nativeScope == schema.MetricScopeCore && scope == schema.MetricScopeNode { if nativeScope == schema.MetricScopeCore && scope == schema.MetricScopeNode {
cores, _ := topology.GetCoresFromHWThreads(hwthreads) cores, _ := topology.GetCoresFromHWThreads(hwthreads)
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: true, Aggregate: true,
@@ -359,7 +344,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
// MemoryDomain -> MemoryDomain // MemoryDomain -> MemoryDomain
if nativeScope == schema.MetricScopeMemoryDomain && scope == schema.MetricScopeMemoryDomain { if nativeScope == schema.MetricScopeMemoryDomain && scope == schema.MetricScopeMemoryDomain {
sockets, _ := topology.GetMemoryDomainsFromHWThreads(hwthreads) sockets, _ := topology.GetMemoryDomainsFromHWThreads(hwthreads)
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: false, Aggregate: false,
@@ -374,7 +359,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
// MemoryDoman -> Node // MemoryDoman -> Node
if nativeScope == schema.MetricScopeMemoryDomain && scope == schema.MetricScopeNode { if nativeScope == schema.MetricScopeMemoryDomain && scope == schema.MetricScopeNode {
sockets, _ := topology.GetMemoryDomainsFromHWThreads(hwthreads) sockets, _ := topology.GetMemoryDomainsFromHWThreads(hwthreads)
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: true, Aggregate: true,
@@ -389,7 +374,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
// Socket -> Socket // Socket -> Socket
if nativeScope == schema.MetricScopeSocket && scope == schema.MetricScopeSocket { if nativeScope == schema.MetricScopeSocket && scope == schema.MetricScopeSocket {
sockets, _ := topology.GetSocketsFromHWThreads(hwthreads) sockets, _ := topology.GetSocketsFromHWThreads(hwthreads)
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: false, Aggregate: false,
@@ -404,7 +389,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
// Socket -> Node // Socket -> Node
if nativeScope == schema.MetricScopeSocket && scope == schema.MetricScopeNode { if nativeScope == schema.MetricScopeSocket && scope == schema.MetricScopeNode {
sockets, _ := topology.GetSocketsFromHWThreads(hwthreads) sockets, _ := topology.GetSocketsFromHWThreads(hwthreads)
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Aggregate: true, Aggregate: true,
@@ -418,7 +403,7 @@ func (ccms *CCMetricStoreInternal) buildQueries(
// Node -> Node // Node -> Node
if nativeScope == schema.MetricScopeNode && scope == schema.MetricScopeNode { if nativeScope == schema.MetricScopeNode && scope == schema.MetricScopeNode {
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: host.Hostname, Hostname: host.Hostname,
Resolution: resolution, Resolution: resolution,
@@ -435,18 +420,18 @@ func (ccms *CCMetricStoreInternal) buildQueries(
return queries, assignedScope, nil return queries, assignedScope, nil
} }
func (ccms *CCMetricStoreInternal) LoadStats( func LoadStats(
job *schema.Job, job *schema.Job,
metrics []string, metrics []string,
ctx context.Context, ctx context.Context,
) (map[string]map[string]schema.MetricStatistics, error) { ) (map[string]map[string]schema.MetricStatistics, error) {
queries, _, err := ccms.buildQueries(job, metrics, []schema.MetricScope{schema.MetricScopeNode}, 0) // #166 Add scope shere for analysis view accelerator normalization? queries, _, err := buildQueries(job, metrics, []schema.MetricScope{schema.MetricScopeNode}, 0) // #166 Add scope shere for analysis view accelerator normalization?
if err != nil { if err != nil {
cclog.Errorf("Error while building queries for jobId %d, Metrics %v: %s", job.JobID, metrics, err.Error()) cclog.Errorf("Error while building queries for jobId %d, Metrics %v: %s", job.JobID, metrics, err.Error())
return nil, err return nil, err
} }
req := memorystore.APIQueryRequest{ req := APIQueryRequest{
Cluster: job.Cluster, Cluster: job.Cluster,
From: job.StartTime, From: job.StartTime,
To: job.StartTime + int64(job.Duration), To: job.StartTime + int64(job.Duration),
@@ -455,7 +440,7 @@ func (ccms *CCMetricStoreInternal) LoadStats(
WithData: false, WithData: false,
} }
resBody, err := memorystore.FetchData(req) resBody, err := FetchData(req)
if err != nil { if err != nil {
cclog.Errorf("Error while fetching data : %s", err.Error()) cclog.Errorf("Error while fetching data : %s", err.Error())
return nil, err return nil, err
@@ -492,20 +477,19 @@ func (ccms *CCMetricStoreInternal) LoadStats(
return stats, nil return stats, nil
} }
// Used for Job-View Statistics Table func LoadScopedStats(
func (ccms *CCMetricStoreInternal) LoadScopedStats(
job *schema.Job, job *schema.Job,
metrics []string, metrics []string,
scopes []schema.MetricScope, scopes []schema.MetricScope,
ctx context.Context, ctx context.Context,
) (schema.ScopedJobStats, error) { ) (schema.ScopedJobStats, error) {
queries, assignedScope, err := ccms.buildQueries(job, metrics, scopes, 0) queries, assignedScope, err := buildQueries(job, metrics, scopes, 0)
if err != nil { if err != nil {
cclog.Errorf("Error while building queries for jobId %d, Metrics %v, Scopes %v: %s", job.JobID, metrics, scopes, err.Error()) cclog.Errorf("Error while building queries for jobId %d, Metrics %v, Scopes %v: %s", job.JobID, metrics, scopes, err.Error())
return nil, err return nil, err
} }
req := memorystore.APIQueryRequest{ req := APIQueryRequest{
Cluster: job.Cluster, Cluster: job.Cluster,
From: job.StartTime, From: job.StartTime,
To: job.StartTime + int64(job.Duration), To: job.StartTime + int64(job.Duration),
@@ -514,7 +498,7 @@ func (ccms *CCMetricStoreInternal) LoadScopedStats(
WithData: false, WithData: false,
} }
resBody, err := memorystore.FetchData(req) resBody, err := FetchData(req)
if err != nil { if err != nil {
cclog.Errorf("Error while fetching data : %s", err.Error()) cclog.Errorf("Error while fetching data : %s", err.Error())
return nil, err return nil, err
@@ -583,15 +567,14 @@ func (ccms *CCMetricStoreInternal) LoadScopedStats(
return scopedJobStats, nil return scopedJobStats, nil
} }
// Used for Systems-View Node-Overview func LoadNodeData(
func (ccms *CCMetricStoreInternal) LoadNodeData(
cluster string, cluster string,
metrics, nodes []string, metrics, nodes []string,
scopes []schema.MetricScope, scopes []schema.MetricScope,
from, to time.Time, from, to time.Time,
ctx context.Context, ctx context.Context,
) (map[string]map[string][]*schema.JobMetric, error) { ) (map[string]map[string][]*schema.JobMetric, error) {
req := memorystore.APIQueryRequest{ req := APIQueryRequest{
Cluster: cluster, Cluster: cluster,
From: from.Unix(), From: from.Unix(),
To: to.Unix(), To: to.Unix(),
@@ -604,7 +587,7 @@ func (ccms *CCMetricStoreInternal) LoadNodeData(
} else { } else {
for _, node := range nodes { for _, node := range nodes {
for _, metric := range metrics { for _, metric := range metrics {
req.Queries = append(req.Queries, memorystore.APIQuery{ req.Queries = append(req.Queries, APIQuery{
Hostname: node, Hostname: node,
Metric: metric, Metric: metric,
Resolution: 0, // Default for Node Queries: Will return metric $Timestep Resolution Resolution: 0, // Default for Node Queries: Will return metric $Timestep Resolution
@@ -613,7 +596,7 @@ func (ccms *CCMetricStoreInternal) LoadNodeData(
} }
} }
resBody, err := memorystore.FetchData(req) resBody, err := FetchData(req)
if err != nil { if err != nil {
cclog.Errorf("Error while fetching data : %s", err.Error()) cclog.Errorf("Error while fetching data : %s", err.Error())
return nil, err return nil, err
@@ -622,7 +605,7 @@ func (ccms *CCMetricStoreInternal) LoadNodeData(
var errors []string var errors []string
data := make(map[string]map[string][]*schema.JobMetric) data := make(map[string]map[string][]*schema.JobMetric)
for i, res := range resBody.Results { for i, res := range resBody.Results {
var query memorystore.APIQuery var query APIQuery
if resBody.Queries != nil { if resBody.Queries != nil {
query = resBody.Queries[i] query = resBody.Queries[i]
} else { } else {
@@ -673,8 +656,7 @@ func (ccms *CCMetricStoreInternal) LoadNodeData(
return data, nil return data, nil
} }
// Used for Systems-View Node-List func LoadNodeListData(
func (ccms *CCMetricStoreInternal) LoadNodeListData(
cluster, subCluster string, cluster, subCluster string,
nodes []string, nodes []string,
metrics []string, metrics []string,
@@ -683,15 +665,14 @@ func (ccms *CCMetricStoreInternal) LoadNodeListData(
from, to time.Time, from, to time.Time,
ctx context.Context, ctx context.Context,
) (map[string]schema.JobData, error) { ) (map[string]schema.JobData, error) {
// Note: Order of node data is not guaranteed after this point // Note: Order of node data is not guaranteed after this point
queries, assignedScope, err := ccms.buildNodeQueries(cluster, subCluster, nodes, metrics, scopes, int64(resolution)) queries, assignedScope, err := buildNodeQueries(cluster, subCluster, nodes, metrics, scopes, int64(resolution))
if err != nil { if err != nil {
cclog.Errorf("Error while building node queries for Cluster %s, SubCLuster %s, Metrics %v, Scopes %v: %s", cluster, subCluster, metrics, scopes, err.Error()) cclog.Errorf("Error while building node queries for Cluster %s, SubCLuster %s, Metrics %v, Scopes %v: %s", cluster, subCluster, metrics, scopes, err.Error())
return nil, err return nil, err
} }
req := memorystore.APIQueryRequest{ req := APIQueryRequest{
Cluster: cluster, Cluster: cluster,
Queries: queries, Queries: queries,
From: from.Unix(), From: from.Unix(),
@@ -700,7 +681,7 @@ func (ccms *CCMetricStoreInternal) LoadNodeListData(
WithData: true, WithData: true,
} }
resBody, err := memorystore.FetchData(req) resBody, err := FetchData(req)
if err != nil { if err != nil {
cclog.Errorf("Error while fetching data : %s", err.Error()) cclog.Errorf("Error while fetching data : %s", err.Error())
return nil, err return nil, err
@@ -709,7 +690,7 @@ func (ccms *CCMetricStoreInternal) LoadNodeListData(
var errors []string var errors []string
data := make(map[string]schema.JobData) data := make(map[string]schema.JobData)
for i, row := range resBody.Results { for i, row := range resBody.Results {
var query memorystore.APIQuery var query APIQuery
if resBody.Queries != nil { if resBody.Queries != nil {
query = resBody.Queries[i] query = resBody.Queries[i]
} else { } else {
@@ -789,15 +770,15 @@ func (ccms *CCMetricStoreInternal) LoadNodeListData(
return data, nil return data, nil
} }
func (ccms *CCMetricStoreInternal) buildNodeQueries( func buildNodeQueries(
cluster string, cluster string,
subCluster string, subCluster string,
nodes []string, nodes []string,
metrics []string, metrics []string,
scopes []schema.MetricScope, scopes []schema.MetricScope,
resolution int64, resolution int64,
) ([]memorystore.APIQuery, []schema.MetricScope, error) { ) ([]APIQuery, []schema.MetricScope, error) {
queries := make([]memorystore.APIQuery, 0, len(metrics)*len(scopes)*len(nodes)) queries := make([]APIQuery, 0, len(metrics)*len(scopes)*len(nodes))
assignedScope := []schema.MetricScope{} assignedScope := []schema.MetricScope{}
// Get Topol before loop if subCluster given // Get Topol before loop if subCluster given
@@ -812,7 +793,6 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
} }
for _, metric := range metrics { for _, metric := range metrics {
metric := metric
mc := archive.GetMetricConfig(cluster, metric) mc := archive.GetMetricConfig(cluster, metric)
if mc == nil { if mc == nil {
// return nil, fmt.Errorf("METRICDATA/CCMS > metric '%s' is not specified for cluster '%s'", metric, cluster) // return nil, fmt.Errorf("METRICDATA/CCMS > metric '%s' is not specified for cluster '%s'", metric, cluster)
@@ -880,7 +860,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
continue continue
} }
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: false, Aggregate: false,
@@ -898,7 +878,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
continue continue
} }
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: true, Aggregate: true,
@@ -912,7 +892,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
// HWThread -> HWThead // HWThread -> HWThead
if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeHWThread { if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeHWThread {
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: false, Aggregate: false,
@@ -928,7 +908,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeCore { if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeCore {
cores, _ := topology.GetCoresFromHWThreads(topology.Node) cores, _ := topology.GetCoresFromHWThreads(topology.Node)
for _, core := range cores { for _, core := range cores {
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: true, Aggregate: true,
@@ -945,7 +925,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeSocket { if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeSocket {
sockets, _ := topology.GetSocketsFromHWThreads(topology.Node) sockets, _ := topology.GetSocketsFromHWThreads(topology.Node)
for _, socket := range sockets { for _, socket := range sockets {
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: true, Aggregate: true,
@@ -960,7 +940,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
// HWThread -> Node // HWThread -> Node
if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeNode { if nativeScope == schema.MetricScopeHWThread && scope == schema.MetricScopeNode {
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: true, Aggregate: true,
@@ -975,7 +955,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
// Core -> Core // Core -> Core
if nativeScope == schema.MetricScopeCore && scope == schema.MetricScopeCore { if nativeScope == schema.MetricScopeCore && scope == schema.MetricScopeCore {
cores, _ := topology.GetCoresFromHWThreads(topology.Node) cores, _ := topology.GetCoresFromHWThreads(topology.Node)
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: false, Aggregate: false,
@@ -991,7 +971,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
if nativeScope == schema.MetricScopeCore && scope == schema.MetricScopeSocket { if nativeScope == schema.MetricScopeCore && scope == schema.MetricScopeSocket {
sockets, _ := topology.GetSocketsFromCores(topology.Node) sockets, _ := topology.GetSocketsFromCores(topology.Node)
for _, socket := range sockets { for _, socket := range sockets {
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: true, Aggregate: true,
@@ -1007,7 +987,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
// Core -> Node // Core -> Node
if nativeScope == schema.MetricScopeCore && scope == schema.MetricScopeNode { if nativeScope == schema.MetricScopeCore && scope == schema.MetricScopeNode {
cores, _ := topology.GetCoresFromHWThreads(topology.Node) cores, _ := topology.GetCoresFromHWThreads(topology.Node)
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: true, Aggregate: true,
@@ -1022,7 +1002,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
// MemoryDomain -> MemoryDomain // MemoryDomain -> MemoryDomain
if nativeScope == schema.MetricScopeMemoryDomain && scope == schema.MetricScopeMemoryDomain { if nativeScope == schema.MetricScopeMemoryDomain && scope == schema.MetricScopeMemoryDomain {
sockets, _ := topology.GetMemoryDomainsFromHWThreads(topology.Node) sockets, _ := topology.GetMemoryDomainsFromHWThreads(topology.Node)
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: false, Aggregate: false,
@@ -1037,7 +1017,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
// MemoryDoman -> Node // MemoryDoman -> Node
if nativeScope == schema.MetricScopeMemoryDomain && scope == schema.MetricScopeNode { if nativeScope == schema.MetricScopeMemoryDomain && scope == schema.MetricScopeNode {
sockets, _ := topology.GetMemoryDomainsFromHWThreads(topology.Node) sockets, _ := topology.GetMemoryDomainsFromHWThreads(topology.Node)
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: true, Aggregate: true,
@@ -1052,7 +1032,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
// Socket -> Socket // Socket -> Socket
if nativeScope == schema.MetricScopeSocket && scope == schema.MetricScopeSocket { if nativeScope == schema.MetricScopeSocket && scope == schema.MetricScopeSocket {
sockets, _ := topology.GetSocketsFromHWThreads(topology.Node) sockets, _ := topology.GetSocketsFromHWThreads(topology.Node)
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: false, Aggregate: false,
@@ -1067,7 +1047,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
// Socket -> Node // Socket -> Node
if nativeScope == schema.MetricScopeSocket && scope == schema.MetricScopeNode { if nativeScope == schema.MetricScopeSocket && scope == schema.MetricScopeNode {
sockets, _ := topology.GetSocketsFromHWThreads(topology.Node) sockets, _ := topology.GetSocketsFromHWThreads(topology.Node)
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Aggregate: true, Aggregate: true,
@@ -1081,7 +1061,7 @@ func (ccms *CCMetricStoreInternal) buildNodeQueries(
// Node -> Node // Node -> Node
if nativeScope == schema.MetricScopeNode && scope == schema.MetricScopeNode { if nativeScope == schema.MetricScopeNode && scope == schema.MetricScopeNode {
queries = append(queries, memorystore.APIQuery{ queries = append(queries, APIQuery{
Metric: metric, Metric: metric,
Hostname: hostname, Hostname: hostname,
Resolution: resolution, Resolution: resolution,

View File

@@ -3,13 +3,13 @@
// Use of this source code is governed by a MIT-style // Use of this source code is governed by a MIT-style
// license that can be found in the LICENSE file. // license that can be found in the LICENSE file.
package memorystore package metricstore
import ( import (
"errors" "errors"
"math" "math"
"github.com/ClusterCockpit/cc-lib/util" "github.com/ClusterCockpit/cc-lib/v2/util"
) )
type Stats struct { type Stats struct {

Some files were not shown because too many files have changed in this diff Show More