Update README and config schema

This commit is contained in:
2025-12-23 09:34:09 +01:00
parent 64fef9774c
commit 9bf5c5dc1a
5 changed files with 165 additions and 22 deletions

View File

@@ -100,11 +100,15 @@ The backend follows a layered architecture with clear separation of concerns:
- Pluggable backends: cc-metric-store, Prometheus, InfluxDB - Pluggable backends: cc-metric-store, Prometheus, InfluxDB
- Each cluster can have a different metric data backend - Each cluster can have a different metric data backend
- **internal/archiver**: Job archiving to file-based archive - **internal/archiver**: Job archiving to file-based archive
- **internal/api/nats.go**: NATS-based API for job and node operations
- Subscribes to NATS subjects for job events (start/stop)
- Handles node state updates via NATS
- Uses InfluxDB line protocol message format
- **pkg/archive**: Job archive backend implementations - **pkg/archive**: Job archive backend implementations
- File system backend (default) - File system backend (default)
- S3 backend - S3 backend
- SQLite backend (experimental) - SQLite backend (experimental)
- **pkg/nats**: NATS integration for metric ingestion - **pkg/nats**: NATS client and message decoding utilities
### Frontend Structure ### Frontend Structure
@@ -146,6 +150,14 @@ applied automatically on startup. Version tracking in `version` table.
## Configuration ## Configuration
- **config.json**: Main configuration (clusters, metric repositories, archive settings) - **config.json**: Main configuration (clusters, metric repositories, archive settings)
- `main.apiSubjects`: NATS subject configuration (optional)
- `subjectJobEvent`: Subject for job start/stop events (e.g., "cc.job.event")
- `subjectNodeState`: Subject for node state updates (e.g., "cc.node.state")
- `nats`: NATS client connection configuration (optional)
- `address`: NATS server address (e.g., "nats://localhost:4222")
- `username`: Authentication username (optional)
- `password`: Authentication password (optional)
- `creds-file-path`: Path to NATS credentials file (optional)
- **.env**: Environment variables (secrets like JWT keys) - **.env**: Environment variables (secrets like JWT keys)
- Copy from `configs/env-template.txt` - Copy from `configs/env-template.txt`
- NEVER commit this file - NEVER commit this file
@@ -207,9 +219,87 @@ applied automatically on startup. Version tracking in `version` table.
2. Increment `repository.Version` 2. Increment `repository.Version`
3. Test with fresh database and existing database 3. Test with fresh database and existing database
## NATS API
The backend supports a NATS-based API as an alternative to the REST API for job and node operations.
### Setup
1. Configure NATS client connection in `config.json`:
```json
{
"nats": {
"address": "nats://localhost:4222",
"username": "user",
"password": "pass"
}
}
```
2. Configure API subjects in `config.json` under `main`:
```json
{
"main": {
"apiSubjects": {
"subjectJobEvent": "cc.job.event",
"subjectNodeState": "cc.node.state"
}
}
}
```
### Message Format
Messages use **InfluxDB line protocol** format with the following structure:
#### Job Events
**Start Job:**
```
job,function=start_job event="{\"jobId\":123,\"user\":\"alice\",\"cluster\":\"test\", ...}" 1234567890000000000
```
**Stop Job:**
```
job,function=stop_job event="{\"jobId\":123,\"cluster\":\"test\",\"startTime\":1234567890,\"stopTime\":1234571490,\"jobState\":\"completed\"}" 1234571490000000000
```
**Tags:**
- `function`: Either `start_job` or `stop_job`
**Fields:**
- `event`: JSON payload containing job data (see REST API documentation for schema)
#### Node State Updates
```json
{
"cluster": "testcluster",
"nodes": [
{
"hostname": "node001",
"states": ["allocated"],
"cpusAllocated": 8,
"memoryAllocated": 16384,
"gpusAllocated": 0,
"jobsRunning": 1
}
]
}
```
### Implementation Notes
- NATS API mirrors REST API functionality but uses messaging
- Job start/stop events are processed asynchronously
- Duplicate job detection is handled (same as REST API)
- All validation rules from REST API apply
- Messages are logged; no responses are sent back to publishers
- If NATS client is unavailable, API subscriptions are skipped (logged as warning)
## Dependencies ## Dependencies
- Go 1.24.0+ (check go.mod for exact version) - Go 1.24.0+ (check go.mod for exact version)
- Node.js (for frontend builds) - Node.js (for frontend builds)
- SQLite 3 (only supported database) - SQLite 3 (only supported database)
- Optional: NATS server for metric ingestion - Optional: NATS server for NATS API integration

View File

@@ -22,11 +22,12 @@ switching from PHP Symfony to a Golang based solution are explained
## Overview ## Overview
This is a Golang web backend for the ClusterCockpit job-specific performance This is a Golang web backend for the ClusterCockpit job-specific performance
monitoring framework. It provides a REST API for integrating ClusterCockpit with monitoring framework. It provides a REST API and an optional NATS-based messaging
an HPC cluster batch system and external analysis scripts. Data exchange between API for integrating ClusterCockpit with an HPC cluster batch system and external
the web front-end and the back-end is based on a GraphQL API. The web frontend analysis scripts. Data exchange between the web front-end and the back-end is
is also served by the backend using [Svelte](https://svelte.dev/) components. based on a GraphQL API. The web frontend is also served by the backend using
Layout and styling are based on [Bootstrap 5](https://getbootstrap.com/) using [Svelte](https://svelte.dev/) components. Layout and styling are based on
[Bootstrap 5](https://getbootstrap.com/) using
[Bootstrap Icons](https://icons.getbootstrap.com/). [Bootstrap Icons](https://icons.getbootstrap.com/).
The backend uses [SQLite 3](https://sqlite.org/) as the relational SQL database. The backend uses [SQLite 3](https://sqlite.org/) as the relational SQL database.
@@ -35,6 +36,10 @@ databases, the only tested and supported setup is to use cc-metric-store as the
metric data backend. Documentation on how to integrate ClusterCockpit with other metric data backend. Documentation on how to integrate ClusterCockpit with other
time series databases will be added in the future. time series databases will be added in the future.
For real-time integration with HPC systems, the backend can subscribe to
[NATS](https://nats.io/) subjects to receive job start/stop events and node
state updates, providing an alternative to REST API polling.
Completed batch jobs are stored in a file-based job archive according to Completed batch jobs are stored in a file-based job archive according to
[this specification](https://github.com/ClusterCockpit/cc-specifications/tree/master/job-archive). [this specification](https://github.com/ClusterCockpit/cc-specifications/tree/master/job-archive).
The backend supports authentication via local accounts, an external LDAP The backend supports authentication via local accounts, an external LDAP
@@ -130,27 +135,60 @@ ln -s <your-existing-job-archive> ./var/job-archive
## Project file structure ## Project file structure
- [`.github/`](https://github.com/ClusterCockpit/cc-backend/tree/master/.github)
GitHub Actions workflows and dependabot configuration for CI/CD.
- [`api/`](https://github.com/ClusterCockpit/cc-backend/tree/master/api) - [`api/`](https://github.com/ClusterCockpit/cc-backend/tree/master/api)
contains the API schema files for the REST and GraphQL APIs. The REST API is contains the API schema files for the REST and GraphQL APIs. The REST API is
documented in the OpenAPI 3.0 format in documented in the OpenAPI 3.0 format in
[./api/openapi.yaml](./api/openapi.yaml). [./api/swagger.yaml](./api/swagger.yaml). The GraphQL schema is in
[./api/schema.graphqls](./api/schema.graphqls).
- [`cmd/cc-backend`](https://github.com/ClusterCockpit/cc-backend/tree/master/cmd/cc-backend) - [`cmd/cc-backend`](https://github.com/ClusterCockpit/cc-backend/tree/master/cmd/cc-backend)
contains `main.go` for the main application. contains the main application entry point and CLI implementation.
- [`configs/`](https://github.com/ClusterCockpit/cc-backend/tree/master/configs) - [`configs/`](https://github.com/ClusterCockpit/cc-backend/tree/master/configs)
contains documentation about configuration and command line options and required contains documentation about configuration and command line options and required
environment variables. A sample configuration file is provided. environment variables. Sample configuration files are provided.
- [`docs/`](https://github.com/ClusterCockpit/cc-backend/tree/master/docs)
contains more in-depth documentation.
- [`init/`](https://github.com/ClusterCockpit/cc-backend/tree/master/init) - [`init/`](https://github.com/ClusterCockpit/cc-backend/tree/master/init)
contains an example of setting up systemd for production use. contains an example of setting up systemd for production use.
- [`internal/`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal) - [`internal/`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal)
contains library source code that is not intended for use by others. contains library source code that is not intended for use by others.
- [`api`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/api)
REST API handlers and NATS integration
- [`archiver`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/archiver)
Job archiving functionality
- [`auth`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/auth)
Authentication (local, LDAP, OIDC) and JWT token handling
- [`config`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/config)
Configuration management and validation
- [`graph`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/graph)
GraphQL schema and resolvers
- [`importer`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/importer)
Job data import and database initialization
- [`memorystore`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/memorystore)
In-memory metric data store with checkpointing
- [`metricdata`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/metricdata)
Metric data repository implementations (cc-metric-store, Prometheus)
- [`metricDataDispatcher`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/metricDataDispatcher)
Dispatches metric data loading to appropriate backends
- [`repository`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/repository)
Database repository layer for jobs and metadata
- [`routerConfig`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/routerConfig)
HTTP router configuration and middleware
- [`tagger`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/tagger)
Job classification and application detection
- [`taskmanager`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/taskmanager)
Background task management and scheduled jobs
- [`pkg/`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg) - [`pkg/`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg)
contains Go packages that can be used by other projects. contains Go packages that can be used by other projects.
- [`archive`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg/archive)
Job archive backend implementations (filesystem, S3)
- [`nats`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg/nats)
NATS client and message handling
- [`tools/`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools) - [`tools/`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools)
Additional command line helper tools. Additional command line helper tools.
- [`archive-manager`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/archive-manager) - [`archive-manager`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/archive-manager)
Commands for getting infos about and existing job archive. Commands for getting infos about an existing job archive.
- [`archive-migration`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/archive-migration)
Tool for migrating job archives between formats.
- [`convert-pem-pubkey`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/convert-pem-pubkey) - [`convert-pem-pubkey`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/convert-pem-pubkey)
Tool to convert external pubkey for use in `cc-backend`. Tool to convert external pubkey for use in `cc-backend`.
- [`gen-keypair`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/gen-keypair) - [`gen-keypair`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/gen-keypair)
@@ -162,7 +200,7 @@ ln -s <your-existing-job-archive> ./var/job-archive
- [`frontend`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/frontend) - [`frontend`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/frontend)
Svelte components and static assets for the frontend UI Svelte components and static assets for the frontend UI
- [`templates`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/templates) - [`templates`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/templates)
Server-side Go templates Server-side Go templates, including monitoring views
- [`gqlgen.yml`](https://github.com/ClusterCockpit/cc-backend/blob/master/gqlgen.yml) - [`gqlgen.yml`](https://github.com/ClusterCockpit/cc-backend/blob/master/gqlgen.yml)
Configures the behaviour and generation of Configures the behaviour and generation of
[gqlgen](https://github.com/99designs/gqlgen). [gqlgen](https://github.com/99designs/gqlgen).

View File

@@ -5,14 +5,9 @@
"resampling": { "resampling": {
"minimumPoints": 600, "minimumPoints": 600,
"trigger": 180, "trigger": 180,
"resolutions": [ "resolutions": [240, 60]
240,
60
]
}, },
"apiAllowedIPs": [ "apiAllowedIPs": ["*"],
"*"
],
"emission-constant": 317 "emission-constant": 317
}, },
"cron": { "cron": {
@@ -103,4 +98,5 @@
} }
] ]
} }
} }

View File

@@ -15,6 +15,10 @@
240, 240,
60 60
] ]
},
"apiSubjects": {
"subjectJobEvent": "cc.job.event",
"subjectNodeState": "cc.node.state"
} }
}, },
"cron": { "cron": {

View File

@@ -119,6 +119,21 @@ var configSchema = `
} }
}, },
"required": ["trigger", "resolutions"] "required": ["trigger", "resolutions"]
},
"apiSubjects": {
"description": "NATS subjects configuration for subscribing to job and node events.",
"type": "object",
"properties": {
"subjectJobEvent": {
"description": "NATS subject for job events (start_job, stop_job)",
"type": "string"
},
"subjectNodeState": {
"description": "NATS subject for node state updates",
"type": "string"
}
},
"required": ["subjectJobEvent", "subjectNodeState"]
} }
}, },
"required": ["apiAllowedIPs"] "required": ["apiAllowedIPs"]