# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with
code in this repository.

## Project Overview

ClusterCockpit is a job-specific performance monitoring framework for HPC
clusters. This is a Golang backend that provides REST and GraphQL APIs, serves a
Svelte-based frontend, and manages job archives and metric data from various
time-series databases.

## Build and Development Commands

### Building

```bash
# Build everything (frontend + backend)
make

# Build only the frontend
make frontend

# Build only the backend (requires frontend to be built first)
go build -ldflags='-s -X main.date=$(date +"%Y-%m-%d:T%H:%M:%S") -X main.version=1.4.4 -X main.commit=$(git rev-parse --short HEAD)' ./cmd/cc-backend
```

### Testing

```bash
# Run all tests
make test

# Run tests with verbose output
go test -v ./...

# Run tests for a specific package
go test ./internal/repository
```

### Code Generation

```bash
# Regenerate GraphQL schema and resolvers (after modifying api/*.graphqls)
make graphql

# Regenerate Swagger/OpenAPI docs (after modifying API comments)
make swagger
```

### Frontend Development

```bash
cd web/frontend

# Install dependencies
npm install

# Build for production
npm run build

# Development mode with watch
npm run dev
```

### Running

```bash
# Initialize database and create admin user
./cc-backend -init-db -add-user demo:admin:demo

# Start server in development mode (enables GraphQL Playground and Swagger UI)
./cc-backend -server -dev -loglevel info

# Start demo with sample data
./startDemo.sh
```

## Architecture

### Backend Structure

The backend follows a layered architecture with clear separation of concerns:

- **cmd/cc-backend**: Entry point, orchestrates initialization of all subsystems
- **internal/repository**: Data access layer using repository pattern
  - Abstracts database operations (SQLite3 only)
  - Implements LRU caching for performance
  - Provides repositories for Job, User, Node, and Tag entities
  - Transaction support for batch operations
- **internal/api**: REST API endpoints (Swagger/OpenAPI documented)
- **internal/graph**: GraphQL API (uses gqlgen)
  - Schema in `api/*.graphqls`
  - Generated code in `internal/graph/generated/`
  - Resolvers in `internal/graph/schema.resolvers.go`
- **internal/auth**: Authentication layer
  - Supports local accounts, LDAP, OIDC, and JWT tokens
  - Implements rate limiting for login attempts
- **internal/metricdata**: Metric data repository abstraction
  - Pluggable backends: cc-metric-store, Prometheus, InfluxDB
  - Each cluster can have a different metric data backend
- **internal/archiver**: Job archiving to file-based archive
- **pkg/archive**: Job archive backend implementations
  - File system backend (default)
  - S3 backend
  - SQLite backend (experimental)
- **pkg/nats**: NATS integration for metric ingestion

### Frontend Structure

- **web/frontend**: Svelte 5 application
  - Uses Rollup for building
  - Components organized by feature (analysis, job, user, etc.)
  - GraphQL client using @urql/svelte
  - Bootstrap 5 + SvelteStrap for UI
  - uPlot for time-series visualization
- **web/templates**: Server-side Go templates

### Key Concepts

**Job Archive**: Completed jobs are stored in a file-based archive following the
[ClusterCockpit job-archive
specification](https://github.com/ClusterCockpit/cc-specifications/tree/master/job-archive).
Each job has a `meta.json` file with metadata and metric data files.

**Metric Data Repositories**: Time-series metric data is stored separately from
job metadata. The system supports multiple backends (cc-metric-store is
recommended). Configuration is per-cluster in `config.json`.

**Authentication Flow**:

1. Multiple authenticators can be configured (local, LDAP, OIDC, JWT)
2. Each authenticator's `CanLogin` method is called to determine if it should handle the request
3. The first authenticator that returns true performs the actual `Login`
4. JWT tokens are used for API authentication

**Database Migrations**: SQL migrations in `internal/repository/migrations/` are
applied automatically on startup. Version tracking in `version` table.

**Scopes**: Metrics can be collected at different scopes:

- Node scope (always available)
- Core scope (for jobs with ≤8 nodes)
- Accelerator scope (for GPU/accelerator metrics)

## Configuration

- **config.json**: Main configuration (clusters, metric repositories, archive settings)
- **.env**: Environment variables (secrets like JWT keys)
  - Copy from `configs/env-template.txt`
  - NEVER commit this file
- **cluster.json**: Cluster topology and metric definitions (loaded from archive or config)

## Database

- Default: SQLite 3 (`./var/job.db`)
- Connection managed by `internal/repository`
- Schema version in `internal/repository/migration.go`

## Code Generation

**GraphQL** (gqlgen):

- Schema: `api/*.graphqls`
- Config: `gqlgen.yml`
- Generated code: `internal/graph/generated/`
- Custom resolvers: `internal/graph/schema.resolvers.go`
- Run `make graphql` after schema changes

**Swagger/OpenAPI**:

- Annotations in `internal/api/*.go`
- Generated docs: `api/docs.go`, `api/swagger.yaml`
- Run `make swagger` after API changes

## Testing Conventions

- Test files use `_test.go` suffix
- Test data in `testdata/` subdirectories
- Repository tests use in-memory SQLite
- API tests use httptest

## Common Workflows

### Adding a new GraphQL field

1. Edit schema in `api/*.graphqls`
2. Run `make graphql`
3. Implement resolver in `internal/graph/schema.resolvers.go`

### Adding a new REST endpoint

1. Add handler in `internal/api/*.go`
2. Add route in `internal/api/rest.go`
3. Add Swagger annotations
4. Run `make swagger`

### Adding a new metric data backend

1. Implement `MetricDataRepository` interface in `internal/metricdata/`
2. Register in `metricdata.Init()` switch statement
3. Update config.json schema documentation

### Modifying database schema

1. Create new migration in `internal/repository/migrations/`
2. Increment `repository.Version`
3. Test with fresh database and existing database

## Dependencies

- Go 1.24.0+ (check go.mod for exact version)
- Node.js (for frontend builds)
- SQLite 3 (only supported database)
- Optional: NATS server for metric ingestion