mirror of
https://github.com/ClusterCockpit/cc-backend
synced 2025-12-31 02:46:16 +01:00
Update README and config schema
This commit is contained in:
94
CLAUDE.md
94
CLAUDE.md
@@ -100,11 +100,15 @@ The backend follows a layered architecture with clear separation of concerns:
|
||||
- Pluggable backends: cc-metric-store, Prometheus, InfluxDB
|
||||
- Each cluster can have a different metric data backend
|
||||
- **internal/archiver**: Job archiving to file-based archive
|
||||
- **internal/api/nats.go**: NATS-based API for job and node operations
|
||||
- Subscribes to NATS subjects for job events (start/stop)
|
||||
- Handles node state updates via NATS
|
||||
- Uses InfluxDB line protocol message format
|
||||
- **pkg/archive**: Job archive backend implementations
|
||||
- File system backend (default)
|
||||
- S3 backend
|
||||
- SQLite backend (experimental)
|
||||
- **pkg/nats**: NATS integration for metric ingestion
|
||||
- **pkg/nats**: NATS client and message decoding utilities
|
||||
|
||||
### Frontend Structure
|
||||
|
||||
@@ -146,6 +150,14 @@ applied automatically on startup. Version tracking in `version` table.
|
||||
## Configuration
|
||||
|
||||
- **config.json**: Main configuration (clusters, metric repositories, archive settings)
|
||||
- `main.apiSubjects`: NATS subject configuration (optional)
|
||||
- `subjectJobEvent`: Subject for job start/stop events (e.g., "cc.job.event")
|
||||
- `subjectNodeState`: Subject for node state updates (e.g., "cc.node.state")
|
||||
- `nats`: NATS client connection configuration (optional)
|
||||
- `address`: NATS server address (e.g., "nats://localhost:4222")
|
||||
- `username`: Authentication username (optional)
|
||||
- `password`: Authentication password (optional)
|
||||
- `creds-file-path`: Path to NATS credentials file (optional)
|
||||
- **.env**: Environment variables (secrets like JWT keys)
|
||||
- Copy from `configs/env-template.txt`
|
||||
- NEVER commit this file
|
||||
@@ -207,9 +219,87 @@ applied automatically on startup. Version tracking in `version` table.
|
||||
2. Increment `repository.Version`
|
||||
3. Test with fresh database and existing database
|
||||
|
||||
## NATS API
|
||||
|
||||
The backend supports a NATS-based API as an alternative to the REST API for job and node operations.
|
||||
|
||||
### Setup
|
||||
|
||||
1. Configure NATS client connection in `config.json`:
|
||||
```json
|
||||
{
|
||||
"nats": {
|
||||
"address": "nats://localhost:4222",
|
||||
"username": "user",
|
||||
"password": "pass"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
2. Configure API subjects in `config.json` under `main`:
|
||||
```json
|
||||
{
|
||||
"main": {
|
||||
"apiSubjects": {
|
||||
"subjectJobEvent": "cc.job.event",
|
||||
"subjectNodeState": "cc.node.state"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Message Format
|
||||
|
||||
Messages use **InfluxDB line protocol** format with the following structure:
|
||||
|
||||
#### Job Events
|
||||
|
||||
**Start Job:**
|
||||
```
|
||||
job,function=start_job event="{\"jobId\":123,\"user\":\"alice\",\"cluster\":\"test\", ...}" 1234567890000000000
|
||||
```
|
||||
|
||||
**Stop Job:**
|
||||
```
|
||||
job,function=stop_job event="{\"jobId\":123,\"cluster\":\"test\",\"startTime\":1234567890,\"stopTime\":1234571490,\"jobState\":\"completed\"}" 1234571490000000000
|
||||
```
|
||||
|
||||
**Tags:**
|
||||
- `function`: Either `start_job` or `stop_job`
|
||||
|
||||
**Fields:**
|
||||
- `event`: JSON payload containing job data (see REST API documentation for schema)
|
||||
|
||||
#### Node State Updates
|
||||
|
||||
```json
|
||||
{
|
||||
"cluster": "testcluster",
|
||||
"nodes": [
|
||||
{
|
||||
"hostname": "node001",
|
||||
"states": ["allocated"],
|
||||
"cpusAllocated": 8,
|
||||
"memoryAllocated": 16384,
|
||||
"gpusAllocated": 0,
|
||||
"jobsRunning": 1
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Implementation Notes
|
||||
|
||||
- NATS API mirrors REST API functionality but uses messaging
|
||||
- Job start/stop events are processed asynchronously
|
||||
- Duplicate job detection is handled (same as REST API)
|
||||
- All validation rules from REST API apply
|
||||
- Messages are logged; no responses are sent back to publishers
|
||||
- If NATS client is unavailable, API subscriptions are skipped (logged as warning)
|
||||
|
||||
## Dependencies
|
||||
|
||||
- Go 1.24.0+ (check go.mod for exact version)
|
||||
- Node.js (for frontend builds)
|
||||
- SQLite 3 (only supported database)
|
||||
- Optional: NATS server for metric ingestion
|
||||
- Optional: NATS server for NATS API integration
|
||||
|
||||
62
README.md
62
README.md
@@ -22,11 +22,12 @@ switching from PHP Symfony to a Golang based solution are explained
|
||||
## Overview
|
||||
|
||||
This is a Golang web backend for the ClusterCockpit job-specific performance
|
||||
monitoring framework. It provides a REST API for integrating ClusterCockpit with
|
||||
an HPC cluster batch system and external analysis scripts. Data exchange between
|
||||
the web front-end and the back-end is based on a GraphQL API. The web frontend
|
||||
is also served by the backend using [Svelte](https://svelte.dev/) components.
|
||||
Layout and styling are based on [Bootstrap 5](https://getbootstrap.com/) using
|
||||
monitoring framework. It provides a REST API and an optional NATS-based messaging
|
||||
API for integrating ClusterCockpit with an HPC cluster batch system and external
|
||||
analysis scripts. Data exchange between the web front-end and the back-end is
|
||||
based on a GraphQL API. The web frontend is also served by the backend using
|
||||
[Svelte](https://svelte.dev/) components. Layout and styling are based on
|
||||
[Bootstrap 5](https://getbootstrap.com/) using
|
||||
[Bootstrap Icons](https://icons.getbootstrap.com/).
|
||||
|
||||
The backend uses [SQLite 3](https://sqlite.org/) as the relational SQL database.
|
||||
@@ -35,6 +36,10 @@ databases, the only tested and supported setup is to use cc-metric-store as the
|
||||
metric data backend. Documentation on how to integrate ClusterCockpit with other
|
||||
time series databases will be added in the future.
|
||||
|
||||
For real-time integration with HPC systems, the backend can subscribe to
|
||||
[NATS](https://nats.io/) subjects to receive job start/stop events and node
|
||||
state updates, providing an alternative to REST API polling.
|
||||
|
||||
Completed batch jobs are stored in a file-based job archive according to
|
||||
[this specification](https://github.com/ClusterCockpit/cc-specifications/tree/master/job-archive).
|
||||
The backend supports authentication via local accounts, an external LDAP
|
||||
@@ -130,27 +135,60 @@ ln -s <your-existing-job-archive> ./var/job-archive
|
||||
|
||||
## Project file structure
|
||||
|
||||
- [`.github/`](https://github.com/ClusterCockpit/cc-backend/tree/master/.github)
|
||||
GitHub Actions workflows and dependabot configuration for CI/CD.
|
||||
- [`api/`](https://github.com/ClusterCockpit/cc-backend/tree/master/api)
|
||||
contains the API schema files for the REST and GraphQL APIs. The REST API is
|
||||
documented in the OpenAPI 3.0 format in
|
||||
[./api/openapi.yaml](./api/openapi.yaml).
|
||||
[./api/swagger.yaml](./api/swagger.yaml). The GraphQL schema is in
|
||||
[./api/schema.graphqls](./api/schema.graphqls).
|
||||
- [`cmd/cc-backend`](https://github.com/ClusterCockpit/cc-backend/tree/master/cmd/cc-backend)
|
||||
contains `main.go` for the main application.
|
||||
contains the main application entry point and CLI implementation.
|
||||
- [`configs/`](https://github.com/ClusterCockpit/cc-backend/tree/master/configs)
|
||||
contains documentation about configuration and command line options and required
|
||||
environment variables. A sample configuration file is provided.
|
||||
- [`docs/`](https://github.com/ClusterCockpit/cc-backend/tree/master/docs)
|
||||
contains more in-depth documentation.
|
||||
environment variables. Sample configuration files are provided.
|
||||
- [`init/`](https://github.com/ClusterCockpit/cc-backend/tree/master/init)
|
||||
contains an example of setting up systemd for production use.
|
||||
- [`internal/`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal)
|
||||
contains library source code that is not intended for use by others.
|
||||
- [`api`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/api)
|
||||
REST API handlers and NATS integration
|
||||
- [`archiver`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/archiver)
|
||||
Job archiving functionality
|
||||
- [`auth`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/auth)
|
||||
Authentication (local, LDAP, OIDC) and JWT token handling
|
||||
- [`config`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/config)
|
||||
Configuration management and validation
|
||||
- [`graph`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/graph)
|
||||
GraphQL schema and resolvers
|
||||
- [`importer`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/importer)
|
||||
Job data import and database initialization
|
||||
- [`memorystore`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/memorystore)
|
||||
In-memory metric data store with checkpointing
|
||||
- [`metricdata`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/metricdata)
|
||||
Metric data repository implementations (cc-metric-store, Prometheus)
|
||||
- [`metricDataDispatcher`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/metricDataDispatcher)
|
||||
Dispatches metric data loading to appropriate backends
|
||||
- [`repository`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/repository)
|
||||
Database repository layer for jobs and metadata
|
||||
- [`routerConfig`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/routerConfig)
|
||||
HTTP router configuration and middleware
|
||||
- [`tagger`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/tagger)
|
||||
Job classification and application detection
|
||||
- [`taskmanager`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/taskmanager)
|
||||
Background task management and scheduled jobs
|
||||
- [`pkg/`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg)
|
||||
contains Go packages that can be used by other projects.
|
||||
- [`archive`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg/archive)
|
||||
Job archive backend implementations (filesystem, S3)
|
||||
- [`nats`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg/nats)
|
||||
NATS client and message handling
|
||||
- [`tools/`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools)
|
||||
Additional command line helper tools.
|
||||
- [`archive-manager`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/archive-manager)
|
||||
Commands for getting infos about and existing job archive.
|
||||
Commands for getting infos about an existing job archive.
|
||||
- [`archive-migration`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/archive-migration)
|
||||
Tool for migrating job archives between formats.
|
||||
- [`convert-pem-pubkey`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/convert-pem-pubkey)
|
||||
Tool to convert external pubkey for use in `cc-backend`.
|
||||
- [`gen-keypair`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/gen-keypair)
|
||||
@@ -162,7 +200,7 @@ ln -s <your-existing-job-archive> ./var/job-archive
|
||||
- [`frontend`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/frontend)
|
||||
Svelte components and static assets for the frontend UI
|
||||
- [`templates`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/templates)
|
||||
Server-side Go templates
|
||||
Server-side Go templates, including monitoring views
|
||||
- [`gqlgen.yml`](https://github.com/ClusterCockpit/cc-backend/blob/master/gqlgen.yml)
|
||||
Configures the behaviour and generation of
|
||||
[gqlgen](https://github.com/99designs/gqlgen).
|
||||
|
||||
@@ -5,14 +5,9 @@
|
||||
"resampling": {
|
||||
"minimumPoints": 600,
|
||||
"trigger": 180,
|
||||
"resolutions": [
|
||||
240,
|
||||
60
|
||||
]
|
||||
"resolutions": [240, 60]
|
||||
},
|
||||
"apiAllowedIPs": [
|
||||
"*"
|
||||
],
|
||||
"apiAllowedIPs": ["*"],
|
||||
"emission-constant": 317
|
||||
},
|
||||
"cron": {
|
||||
@@ -104,3 +99,4 @@
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -15,6 +15,10 @@
|
||||
240,
|
||||
60
|
||||
]
|
||||
},
|
||||
"apiSubjects": {
|
||||
"subjectJobEvent": "cc.job.event",
|
||||
"subjectNodeState": "cc.node.state"
|
||||
}
|
||||
},
|
||||
"cron": {
|
||||
|
||||
@@ -119,6 +119,21 @@ var configSchema = `
|
||||
}
|
||||
},
|
||||
"required": ["trigger", "resolutions"]
|
||||
},
|
||||
"apiSubjects": {
|
||||
"description": "NATS subjects configuration for subscribing to job and node events.",
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"subjectJobEvent": {
|
||||
"description": "NATS subject for job events (start_job, stop_job)",
|
||||
"type": "string"
|
||||
},
|
||||
"subjectNodeState": {
|
||||
"description": "NATS subject for node state updates",
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"required": ["subjectJobEvent", "subjectNodeState"]
|
||||
}
|
||||
},
|
||||
"required": ["apiAllowedIPs"]
|
||||
|
||||
Reference in New Issue
Block a user