8.6 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
ClusterCockpit is a job-specific performance monitoring framework for HPC clusters. This is a Golang backend that provides REST and GraphQL APIs, serves a Svelte-based frontend, and manages job archives and metric data from various time-series databases.
Build and Development Commands
Building
# Build everything (frontend + backend)
make
# Build only the frontend
make frontend
# Build only the backend (requires frontend to be built first)
go build -ldflags='-s -X main.date=$(date +"%Y-%m-%d:T%H:%M:%S") -X main.version=1.4.4 -X main.commit=$(git rev-parse --short HEAD)' ./cmd/cc-backend
Testing
# Run all tests
make test
# Run tests with verbose output
go test -v ./...
# Run tests for a specific package
go test ./internal/repository
Code Generation
# Regenerate GraphQL schema and resolvers (after modifying api/*.graphqls)
make graphql
# Regenerate Swagger/OpenAPI docs (after modifying API comments)
make swagger
Frontend Development
cd web/frontend
# Install dependencies
npm install
# Build for production
npm run build
# Development mode with watch
npm run dev
Running
# Initialize database and create admin user
./cc-backend -init-db -add-user demo:admin:demo
# Start server in development mode (enables GraphQL Playground and Swagger UI)
./cc-backend -server -dev -loglevel info
# Start demo with sample data
./startDemo.sh
Architecture
Backend Structure
The backend follows a layered architecture with clear separation of concerns:
- cmd/cc-backend: Entry point, orchestrates initialization of all subsystems
- internal/repository: Data access layer using repository pattern
- Abstracts database operations (SQLite3 only)
- Implements LRU caching for performance
- Provides repositories for Job, User, Node, and Tag entities
- Transaction support for batch operations
- internal/api: REST API endpoints (Swagger/OpenAPI documented)
- internal/graph: GraphQL API (uses gqlgen)
- Schema in
api/*.graphqls - Generated code in
internal/graph/generated/ - Resolvers in
internal/graph/schema.resolvers.go
- Schema in
- internal/auth: Authentication layer
- Supports local accounts, LDAP, OIDC, and JWT tokens
- Implements rate limiting for login attempts
- internal/metricdata: Metric data repository abstraction
- Pluggable backends: cc-metric-store, Prometheus, InfluxDB
- Each cluster can have a different metric data backend
- internal/archiver: Job archiving to file-based archive
- internal/api/nats.go: NATS-based API for job and node operations
- Subscribes to NATS subjects for job events (start/stop)
- Handles node state updates via NATS
- Uses InfluxDB line protocol message format
- pkg/archive: Job archive backend implementations
- File system backend (default)
- S3 backend
- SQLite backend (experimental)
- pkg/nats: NATS client and message decoding utilities
Frontend Structure
- web/frontend: Svelte 5 application
- Uses Rollup for building
- Components organized by feature (analysis, job, user, etc.)
- GraphQL client using @urql/svelte
- Bootstrap 5 + SvelteStrap for UI
- uPlot for time-series visualization
- web/templates: Server-side Go templates
Key Concepts
Job Archive: Completed jobs are stored in a file-based archive following the
ClusterCockpit job-archive
specification.
Each job has a meta.json file with metadata and metric data files.
Metric Data Repositories: Time-series metric data is stored separately from
job metadata. The system supports multiple backends (cc-metric-store is
recommended). Configuration is per-cluster in config.json.
Authentication Flow:
- Multiple authenticators can be configured (local, LDAP, OIDC, JWT)
- Each authenticator's
CanLoginmethod is called to determine if it should handle the request - The first authenticator that returns true performs the actual
Login - JWT tokens are used for API authentication
Database Migrations: SQL migrations in internal/repository/migrations/ are
applied automatically on startup. Version tracking in version table.
Scopes: Metrics can be collected at different scopes:
- Node scope (always available)
- Core scope (for jobs with ≤8 nodes)
- Accelerator scope (for GPU/accelerator metrics)
Configuration
- config.json: Main configuration (clusters, metric repositories, archive settings)
main.apiSubjects: NATS subject configuration (optional)subjectJobEvent: Subject for job start/stop events (e.g., "cc.job.event")subjectNodeState: Subject for node state updates (e.g., "cc.node.state")
nats: NATS client connection configuration (optional)address: NATS server address (e.g., "nats://localhost:4222")username: Authentication username (optional)password: Authentication password (optional)creds-file-path: Path to NATS credentials file (optional)
- .env: Environment variables (secrets like JWT keys)
- Copy from
configs/env-template.txt - NEVER commit this file
- Copy from
- cluster.json: Cluster topology and metric definitions (loaded from archive or config)
Database
- Default: SQLite 3 (
./var/job.db) - Connection managed by
internal/repository - Schema version in
internal/repository/migration.go
Code Generation
GraphQL (gqlgen):
- Schema:
api/*.graphqls - Config:
gqlgen.yml - Generated code:
internal/graph/generated/ - Custom resolvers:
internal/graph/schema.resolvers.go - Run
make graphqlafter schema changes
Swagger/OpenAPI:
- Annotations in
internal/api/*.go - Generated docs:
api/docs.go,api/swagger.yaml - Run
make swaggerafter API changes
Testing Conventions
- Test files use
_test.gosuffix - Test data in
testdata/subdirectories - Repository tests use in-memory SQLite
- API tests use httptest
Common Workflows
Adding a new GraphQL field
- Edit schema in
api/*.graphqls - Run
make graphql - Implement resolver in
internal/graph/schema.resolvers.go
Adding a new REST endpoint
- Add handler in
internal/api/*.go - Add route in
internal/api/rest.go - Add Swagger annotations
- Run
make swagger
Adding a new metric data backend
- Implement
MetricDataRepositoryinterface ininternal/metricdata/ - Register in
metricdata.Init()switch statement - Update config.json schema documentation
Modifying database schema
- Create new migration in
internal/repository/migrations/ - Increment
repository.Version - Test with fresh database and existing database
NATS API
The backend supports a NATS-based API as an alternative to the REST API for job and node operations.
Setup
-
Configure NATS client connection in
config.json:{ "nats": { "address": "nats://localhost:4222", "username": "user", "password": "pass" } } -
Configure API subjects in
config.jsonundermain:{ "main": { "apiSubjects": { "subjectJobEvent": "cc.job.event", "subjectNodeState": "cc.node.state" } } }
Message Format
Messages use InfluxDB line protocol format with the following structure:
Job Events
Start Job:
job,function=start_job event="{\"jobId\":123,\"user\":\"alice\",\"cluster\":\"test\", ...}" 1234567890000000000
Stop Job:
job,function=stop_job event="{\"jobId\":123,\"cluster\":\"test\",\"startTime\":1234567890,\"stopTime\":1234571490,\"jobState\":\"completed\"}" 1234571490000000000
Tags:
function: Eitherstart_joborstop_job
Fields:
event: JSON payload containing job data (see REST API documentation for schema)
Node State Updates
{
"cluster": "testcluster",
"nodes": [
{
"hostname": "node001",
"states": ["allocated"],
"cpusAllocated": 8,
"memoryAllocated": 16384,
"gpusAllocated": 0,
"jobsRunning": 1
}
]
}
Implementation Notes
- NATS API mirrors REST API functionality but uses messaging
- Job start/stop events are processed asynchronously
- Duplicate job detection is handled (same as REST API)
- All validation rules from REST API apply
- Messages are logged; no responses are sent back to publishers
- If NATS client is unavailable, API subscriptions are skipped (logged as warning)
Dependencies
- Go 1.24.0+ (check go.mod for exact version)
- Node.js (for frontend builds)
- SQLite 3 (only supported database)
- Optional: NATS server for NATS API integration