mirror of
https://github.com/ClusterCockpit/cc-backend
synced 2025-12-31 02:46:16 +01:00
Update README and config schema
This commit is contained in:
94
CLAUDE.md
94
CLAUDE.md
@@ -100,11 +100,15 @@ The backend follows a layered architecture with clear separation of concerns:
|
|||||||
- Pluggable backends: cc-metric-store, Prometheus, InfluxDB
|
- Pluggable backends: cc-metric-store, Prometheus, InfluxDB
|
||||||
- Each cluster can have a different metric data backend
|
- Each cluster can have a different metric data backend
|
||||||
- **internal/archiver**: Job archiving to file-based archive
|
- **internal/archiver**: Job archiving to file-based archive
|
||||||
|
- **internal/api/nats.go**: NATS-based API for job and node operations
|
||||||
|
- Subscribes to NATS subjects for job events (start/stop)
|
||||||
|
- Handles node state updates via NATS
|
||||||
|
- Uses InfluxDB line protocol message format
|
||||||
- **pkg/archive**: Job archive backend implementations
|
- **pkg/archive**: Job archive backend implementations
|
||||||
- File system backend (default)
|
- File system backend (default)
|
||||||
- S3 backend
|
- S3 backend
|
||||||
- SQLite backend (experimental)
|
- SQLite backend (experimental)
|
||||||
- **pkg/nats**: NATS integration for metric ingestion
|
- **pkg/nats**: NATS client and message decoding utilities
|
||||||
|
|
||||||
### Frontend Structure
|
### Frontend Structure
|
||||||
|
|
||||||
@@ -146,6 +150,14 @@ applied automatically on startup. Version tracking in `version` table.
|
|||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
- **config.json**: Main configuration (clusters, metric repositories, archive settings)
|
- **config.json**: Main configuration (clusters, metric repositories, archive settings)
|
||||||
|
- `main.apiSubjects`: NATS subject configuration (optional)
|
||||||
|
- `subjectJobEvent`: Subject for job start/stop events (e.g., "cc.job.event")
|
||||||
|
- `subjectNodeState`: Subject for node state updates (e.g., "cc.node.state")
|
||||||
|
- `nats`: NATS client connection configuration (optional)
|
||||||
|
- `address`: NATS server address (e.g., "nats://localhost:4222")
|
||||||
|
- `username`: Authentication username (optional)
|
||||||
|
- `password`: Authentication password (optional)
|
||||||
|
- `creds-file-path`: Path to NATS credentials file (optional)
|
||||||
- **.env**: Environment variables (secrets like JWT keys)
|
- **.env**: Environment variables (secrets like JWT keys)
|
||||||
- Copy from `configs/env-template.txt`
|
- Copy from `configs/env-template.txt`
|
||||||
- NEVER commit this file
|
- NEVER commit this file
|
||||||
@@ -207,9 +219,87 @@ applied automatically on startup. Version tracking in `version` table.
|
|||||||
2. Increment `repository.Version`
|
2. Increment `repository.Version`
|
||||||
3. Test with fresh database and existing database
|
3. Test with fresh database and existing database
|
||||||
|
|
||||||
|
## NATS API
|
||||||
|
|
||||||
|
The backend supports a NATS-based API as an alternative to the REST API for job and node operations.
|
||||||
|
|
||||||
|
### Setup
|
||||||
|
|
||||||
|
1. Configure NATS client connection in `config.json`:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"nats": {
|
||||||
|
"address": "nats://localhost:4222",
|
||||||
|
"username": "user",
|
||||||
|
"password": "pass"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Configure API subjects in `config.json` under `main`:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"main": {
|
||||||
|
"apiSubjects": {
|
||||||
|
"subjectJobEvent": "cc.job.event",
|
||||||
|
"subjectNodeState": "cc.node.state"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Message Format
|
||||||
|
|
||||||
|
Messages use **InfluxDB line protocol** format with the following structure:
|
||||||
|
|
||||||
|
#### Job Events
|
||||||
|
|
||||||
|
**Start Job:**
|
||||||
|
```
|
||||||
|
job,function=start_job event="{\"jobId\":123,\"user\":\"alice\",\"cluster\":\"test\", ...}" 1234567890000000000
|
||||||
|
```
|
||||||
|
|
||||||
|
**Stop Job:**
|
||||||
|
```
|
||||||
|
job,function=stop_job event="{\"jobId\":123,\"cluster\":\"test\",\"startTime\":1234567890,\"stopTime\":1234571490,\"jobState\":\"completed\"}" 1234571490000000000
|
||||||
|
```
|
||||||
|
|
||||||
|
**Tags:**
|
||||||
|
- `function`: Either `start_job` or `stop_job`
|
||||||
|
|
||||||
|
**Fields:**
|
||||||
|
- `event`: JSON payload containing job data (see REST API documentation for schema)
|
||||||
|
|
||||||
|
#### Node State Updates
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"cluster": "testcluster",
|
||||||
|
"nodes": [
|
||||||
|
{
|
||||||
|
"hostname": "node001",
|
||||||
|
"states": ["allocated"],
|
||||||
|
"cpusAllocated": 8,
|
||||||
|
"memoryAllocated": 16384,
|
||||||
|
"gpusAllocated": 0,
|
||||||
|
"jobsRunning": 1
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Implementation Notes
|
||||||
|
|
||||||
|
- NATS API mirrors REST API functionality but uses messaging
|
||||||
|
- Job start/stop events are processed asynchronously
|
||||||
|
- Duplicate job detection is handled (same as REST API)
|
||||||
|
- All validation rules from REST API apply
|
||||||
|
- Messages are logged; no responses are sent back to publishers
|
||||||
|
- If NATS client is unavailable, API subscriptions are skipped (logged as warning)
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
- Go 1.24.0+ (check go.mod for exact version)
|
- Go 1.24.0+ (check go.mod for exact version)
|
||||||
- Node.js (for frontend builds)
|
- Node.js (for frontend builds)
|
||||||
- SQLite 3 (only supported database)
|
- SQLite 3 (only supported database)
|
||||||
- Optional: NATS server for metric ingestion
|
- Optional: NATS server for NATS API integration
|
||||||
|
|||||||
62
README.md
62
README.md
@@ -22,11 +22,12 @@ switching from PHP Symfony to a Golang based solution are explained
|
|||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
This is a Golang web backend for the ClusterCockpit job-specific performance
|
This is a Golang web backend for the ClusterCockpit job-specific performance
|
||||||
monitoring framework. It provides a REST API for integrating ClusterCockpit with
|
monitoring framework. It provides a REST API and an optional NATS-based messaging
|
||||||
an HPC cluster batch system and external analysis scripts. Data exchange between
|
API for integrating ClusterCockpit with an HPC cluster batch system and external
|
||||||
the web front-end and the back-end is based on a GraphQL API. The web frontend
|
analysis scripts. Data exchange between the web front-end and the back-end is
|
||||||
is also served by the backend using [Svelte](https://svelte.dev/) components.
|
based on a GraphQL API. The web frontend is also served by the backend using
|
||||||
Layout and styling are based on [Bootstrap 5](https://getbootstrap.com/) using
|
[Svelte](https://svelte.dev/) components. Layout and styling are based on
|
||||||
|
[Bootstrap 5](https://getbootstrap.com/) using
|
||||||
[Bootstrap Icons](https://icons.getbootstrap.com/).
|
[Bootstrap Icons](https://icons.getbootstrap.com/).
|
||||||
|
|
||||||
The backend uses [SQLite 3](https://sqlite.org/) as the relational SQL database.
|
The backend uses [SQLite 3](https://sqlite.org/) as the relational SQL database.
|
||||||
@@ -35,6 +36,10 @@ databases, the only tested and supported setup is to use cc-metric-store as the
|
|||||||
metric data backend. Documentation on how to integrate ClusterCockpit with other
|
metric data backend. Documentation on how to integrate ClusterCockpit with other
|
||||||
time series databases will be added in the future.
|
time series databases will be added in the future.
|
||||||
|
|
||||||
|
For real-time integration with HPC systems, the backend can subscribe to
|
||||||
|
[NATS](https://nats.io/) subjects to receive job start/stop events and node
|
||||||
|
state updates, providing an alternative to REST API polling.
|
||||||
|
|
||||||
Completed batch jobs are stored in a file-based job archive according to
|
Completed batch jobs are stored in a file-based job archive according to
|
||||||
[this specification](https://github.com/ClusterCockpit/cc-specifications/tree/master/job-archive).
|
[this specification](https://github.com/ClusterCockpit/cc-specifications/tree/master/job-archive).
|
||||||
The backend supports authentication via local accounts, an external LDAP
|
The backend supports authentication via local accounts, an external LDAP
|
||||||
@@ -130,27 +135,60 @@ ln -s <your-existing-job-archive> ./var/job-archive
|
|||||||
|
|
||||||
## Project file structure
|
## Project file structure
|
||||||
|
|
||||||
|
- [`.github/`](https://github.com/ClusterCockpit/cc-backend/tree/master/.github)
|
||||||
|
GitHub Actions workflows and dependabot configuration for CI/CD.
|
||||||
- [`api/`](https://github.com/ClusterCockpit/cc-backend/tree/master/api)
|
- [`api/`](https://github.com/ClusterCockpit/cc-backend/tree/master/api)
|
||||||
contains the API schema files for the REST and GraphQL APIs. The REST API is
|
contains the API schema files for the REST and GraphQL APIs. The REST API is
|
||||||
documented in the OpenAPI 3.0 format in
|
documented in the OpenAPI 3.0 format in
|
||||||
[./api/openapi.yaml](./api/openapi.yaml).
|
[./api/swagger.yaml](./api/swagger.yaml). The GraphQL schema is in
|
||||||
|
[./api/schema.graphqls](./api/schema.graphqls).
|
||||||
- [`cmd/cc-backend`](https://github.com/ClusterCockpit/cc-backend/tree/master/cmd/cc-backend)
|
- [`cmd/cc-backend`](https://github.com/ClusterCockpit/cc-backend/tree/master/cmd/cc-backend)
|
||||||
contains `main.go` for the main application.
|
contains the main application entry point and CLI implementation.
|
||||||
- [`configs/`](https://github.com/ClusterCockpit/cc-backend/tree/master/configs)
|
- [`configs/`](https://github.com/ClusterCockpit/cc-backend/tree/master/configs)
|
||||||
contains documentation about configuration and command line options and required
|
contains documentation about configuration and command line options and required
|
||||||
environment variables. A sample configuration file is provided.
|
environment variables. Sample configuration files are provided.
|
||||||
- [`docs/`](https://github.com/ClusterCockpit/cc-backend/tree/master/docs)
|
|
||||||
contains more in-depth documentation.
|
|
||||||
- [`init/`](https://github.com/ClusterCockpit/cc-backend/tree/master/init)
|
- [`init/`](https://github.com/ClusterCockpit/cc-backend/tree/master/init)
|
||||||
contains an example of setting up systemd for production use.
|
contains an example of setting up systemd for production use.
|
||||||
- [`internal/`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal)
|
- [`internal/`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal)
|
||||||
contains library source code that is not intended for use by others.
|
contains library source code that is not intended for use by others.
|
||||||
|
- [`api`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/api)
|
||||||
|
REST API handlers and NATS integration
|
||||||
|
- [`archiver`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/archiver)
|
||||||
|
Job archiving functionality
|
||||||
|
- [`auth`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/auth)
|
||||||
|
Authentication (local, LDAP, OIDC) and JWT token handling
|
||||||
|
- [`config`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/config)
|
||||||
|
Configuration management and validation
|
||||||
|
- [`graph`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/graph)
|
||||||
|
GraphQL schema and resolvers
|
||||||
|
- [`importer`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/importer)
|
||||||
|
Job data import and database initialization
|
||||||
|
- [`memorystore`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/memorystore)
|
||||||
|
In-memory metric data store with checkpointing
|
||||||
|
- [`metricdata`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/metricdata)
|
||||||
|
Metric data repository implementations (cc-metric-store, Prometheus)
|
||||||
|
- [`metricDataDispatcher`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/metricDataDispatcher)
|
||||||
|
Dispatches metric data loading to appropriate backends
|
||||||
|
- [`repository`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/repository)
|
||||||
|
Database repository layer for jobs and metadata
|
||||||
|
- [`routerConfig`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/routerConfig)
|
||||||
|
HTTP router configuration and middleware
|
||||||
|
- [`tagger`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/tagger)
|
||||||
|
Job classification and application detection
|
||||||
|
- [`taskmanager`](https://github.com/ClusterCockpit/cc-backend/tree/master/internal/taskmanager)
|
||||||
|
Background task management and scheduled jobs
|
||||||
- [`pkg/`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg)
|
- [`pkg/`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg)
|
||||||
contains Go packages that can be used by other projects.
|
contains Go packages that can be used by other projects.
|
||||||
|
- [`archive`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg/archive)
|
||||||
|
Job archive backend implementations (filesystem, S3)
|
||||||
|
- [`nats`](https://github.com/ClusterCockpit/cc-backend/tree/master/pkg/nats)
|
||||||
|
NATS client and message handling
|
||||||
- [`tools/`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools)
|
- [`tools/`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools)
|
||||||
Additional command line helper tools.
|
Additional command line helper tools.
|
||||||
- [`archive-manager`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/archive-manager)
|
- [`archive-manager`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/archive-manager)
|
||||||
Commands for getting infos about and existing job archive.
|
Commands for getting infos about an existing job archive.
|
||||||
|
- [`archive-migration`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/archive-migration)
|
||||||
|
Tool for migrating job archives between formats.
|
||||||
- [`convert-pem-pubkey`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/convert-pem-pubkey)
|
- [`convert-pem-pubkey`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/convert-pem-pubkey)
|
||||||
Tool to convert external pubkey for use in `cc-backend`.
|
Tool to convert external pubkey for use in `cc-backend`.
|
||||||
- [`gen-keypair`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/gen-keypair)
|
- [`gen-keypair`](https://github.com/ClusterCockpit/cc-backend/tree/master/tools/gen-keypair)
|
||||||
@@ -162,7 +200,7 @@ ln -s <your-existing-job-archive> ./var/job-archive
|
|||||||
- [`frontend`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/frontend)
|
- [`frontend`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/frontend)
|
||||||
Svelte components and static assets for the frontend UI
|
Svelte components and static assets for the frontend UI
|
||||||
- [`templates`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/templates)
|
- [`templates`](https://github.com/ClusterCockpit/cc-backend/tree/master/web/templates)
|
||||||
Server-side Go templates
|
Server-side Go templates, including monitoring views
|
||||||
- [`gqlgen.yml`](https://github.com/ClusterCockpit/cc-backend/blob/master/gqlgen.yml)
|
- [`gqlgen.yml`](https://github.com/ClusterCockpit/cc-backend/blob/master/gqlgen.yml)
|
||||||
Configures the behaviour and generation of
|
Configures the behaviour and generation of
|
||||||
[gqlgen](https://github.com/99designs/gqlgen).
|
[gqlgen](https://github.com/99designs/gqlgen).
|
||||||
|
|||||||
@@ -5,14 +5,9 @@
|
|||||||
"resampling": {
|
"resampling": {
|
||||||
"minimumPoints": 600,
|
"minimumPoints": 600,
|
||||||
"trigger": 180,
|
"trigger": 180,
|
||||||
"resolutions": [
|
"resolutions": [240, 60]
|
||||||
240,
|
|
||||||
60
|
|
||||||
]
|
|
||||||
},
|
},
|
||||||
"apiAllowedIPs": [
|
"apiAllowedIPs": ["*"],
|
||||||
"*"
|
|
||||||
],
|
|
||||||
"emission-constant": 317
|
"emission-constant": 317
|
||||||
},
|
},
|
||||||
"cron": {
|
"cron": {
|
||||||
@@ -104,3 +99,4 @@
|
|||||||
]
|
]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -15,6 +15,10 @@
|
|||||||
240,
|
240,
|
||||||
60
|
60
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
"apiSubjects": {
|
||||||
|
"subjectJobEvent": "cc.job.event",
|
||||||
|
"subjectNodeState": "cc.node.state"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"cron": {
|
"cron": {
|
||||||
|
|||||||
@@ -119,6 +119,21 @@ var configSchema = `
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
"required": ["trigger", "resolutions"]
|
"required": ["trigger", "resolutions"]
|
||||||
|
},
|
||||||
|
"apiSubjects": {
|
||||||
|
"description": "NATS subjects configuration for subscribing to job and node events.",
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"subjectJobEvent": {
|
||||||
|
"description": "NATS subject for job events (start_job, stop_job)",
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
"subjectNodeState": {
|
||||||
|
"description": "NATS subject for node state updates",
|
||||||
|
"type": "string"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"required": ["subjectJobEvent", "subjectNodeState"]
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"required": ["apiAllowedIPs"]
|
"required": ["apiAllowedIPs"]
|
||||||
|
|||||||
Reference in New Issue
Block a user