Commit Graph

1063 Commits

Author SHA1 Message Date
83d04dff17 feat(auth): replace .env/godotenv secret handling with config-based secrets
Secrets (JWT keys, LDAP sync password, OIDC client id/secret, cross-login
keys) are now configured directly in config.json under the auth section
where they are used. Each secret can still be supplied via its existing
environment variable, which takes precedence over the config value.

The godotenv dependency, the .env file, configs/env-template.txt and the
loadEnvironment() bootstrap step are removed. -init now writes the demo
JWT keys into config.json instead of a .env file.

Closes #283

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 3a7cb814c53f
2026-06-17 12:28:17 +02:00
2b01b57495 feat: replace gorilla/sessions with alexedwards/scs/v2
Browser sessions are now server-side, stored in the SQLite database via
scs/sqlite3store (new `sessions` table, DB migration to version 12) instead
of gorilla/sessions client-side cookie storage. Only an opaque random token
is kept in the cookie; session data lives server-side and survives restarts.

Session middleware is wired as a hybrid to avoid buffering large responses:
scs.LoadAndSave on the login/logout write paths, and a non-buffering
read-only LoadSession middleware on the secured/config/frontend read paths
so the large GraphQL /query responses stream unbuffered. JWT-only APIs
(/api, /userapi, /api/metricstore) and static files are left unwrapped.

The session cookie Secure flag is now derived from the server config (set
when cc-backend terminates TLS itself); previously it was effectively never
set. The SESSION_KEY env var is removed as server-side tokens need no
signing secret. The dormant Bearer-JWT branch in the frontend urql client
is removed; the web UI authenticates GraphQL via the session cookie.

Closes #558

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b51075f43cc7
2026-06-17 07:54:26 +02:00
Jan Eitzinger
c94f5918f3 Merge pull request #556 from ClusterCockpit/release/v1.5
Fix critical/severe issues in init, startup and shutdown
2026-06-07 07:31:02 +02:00
1b72b0b5ad Fix critical/severe issues in init, startup and shutdown
- auth: do not abort the server when authentication is disabled. auth.Init
  is now always called; with disable-authentication it sets up an ephemeral
  session store (SESSION_KEY not required) and registers no authenticators,
  so the unconditional auth.GetAuthInstance() callers (server init,
  api.New()) always get a valid instance.
- main: run the graceful-shutdown sequence on the startup-error path. runServer
  derives a cancelable context and, on a server-start failure, cancels it and
  waits so the metricstore final checkpoint / WAL rotation, archiver flush and
  taskmanager shutdown actually run before exit.
- server: log the :80 HTTP->HTTPS redirect listener error instead of dropping it.
- archiver: guard Shutdown against being called when Start never ran
  (avoids close(nil) panic / blocking on a nil workerDone).
- nats API: stop worker goroutines on shutdown via a stop channel + idempotent
  Shutdown(); workers and subscription callbacks select on stop and the
  channels are never closed, so no send-on-closed-channel can occur. Wired
  into Server.Shutdown after the NATS client is closed.
- metricstore: make Shutdown idempotent (nil shutdownFunc, early return) and
  release shutdownFuncMu before the checkpoint write.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 3c179f9caa8f
2026-06-05 10:16:28 +02:00
Jan Eitzinger
d74ae77c8e Merge pull request #553 from ClusterCockpit/release/v1.5
Release/v1.5
2026-06-04 20:33:22 +02:00
3bef199cbe Regenerate GraphQL 2026-06-04 20:12:27 +02:00
16942f55a0 Fix medium-severity issues from follow-up security audit
Addresses the remaining medium findings from the second-pass audit:

- DoS hardening: bound GraphQL query cost with FixedComplexityLimit, and
  reject non-positive items-per-page / page values so uint64 conversion
  cannot underflow into an unbounded LIMIT/OFFSET. The -1 "load all"
  sentinel stays valid for dashboards; REST now returns 400 for bad input.

- Security headers: add X-Content-Type-Options, X-Frame-Options,
  Referrer-Policy and a conservative CSP (frame-ancestors/object-src/
  base-uri) that hardens against clickjacking and base-tag injection
  without restricting the self-hosted SPA's inline scripts.

- Stored XSS: render job.metaData.message as escaped text instead of
  {@html ...} in Job.root and JobFootprint, preserving line breaks via
  white-space: pre-wrap.

- SQL injection hardening: parameterize the tag-scope IN list and the
  manager project subquery in CountTags instead of interpolating
  user.Username / user.Projects (externally sourced via OIDC/LDAP).

- CSRF defense-in-depth: reject cross-site state-changing requests via
  Sec-Fetch-Site, failing open for non-browser clients, on top of the
  existing SameSite=Lax session cookie.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: de7d47a85c7c
2026-06-04 20:08:41 +02:00
6d86690c76 Fix critical issues from follow-up security audit
A second-pass audit surfaced three severe issues missed by the previous
review, each a sibling code path of a bug class that was only partially
fixed before:

- auth: JWT session login (jwtSession.go) registered its authenticator
  even when CROSS_LOGIN_JWT_HS512_KEY was unset, leaving an empty HMAC
  key. golang-jwt verifies any HS256/HS512 signature against an empty
  key, allowing unauthenticated admin token forgery. Init() now refuses
  to register without a key, with a defense-in-depth empty-key guard in
  the keyfunc.

- repository: metric names from GraphQL ([String!]) were interpolated
  raw into json_extract(footprint, "$.<name>") SQL. SQLite parses
  double-quoted strings as literals, enabling SQL injection by any
  authenticated user. Validate metric names against ^[a-zA-Z0-9_]+$ in
  jobsMetricStatisticsHistogram and buildFloatJSONCondition.

- metricstore: cluster/host line-protocol tags flowed unvalidated into
  path.Join(RootDir, cluster, host) for checkpoint/WAL files, allowing
  arbitrary file write outside the checkpoint root via NATS
  (unauthenticated) or POST /api/write. Reject path-traversal sequences
  in DecodeLine before the tags become path components.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b57246993ec1
2026-06-04 19:07:20 +02:00
6f7e262f3f Fix issues after security audit
Entire-Checkpoint: bc18358a9343
2026-06-04 18:33:30 +02:00
58ead40112 Merge branch 'main' into release/v1.5 2026-06-04 17:56:41 +02:00
0020f63582 Rebuild Swagger 2026-06-04 17:56:32 +02:00
Christoph Kluge
1ebde74774 Adapt swagger definitions of user update endpoint 2026-06-02 17:58:15 +02:00
Christoph Kluge
40722d72f5 fix name in doc comment 2026-06-02 16:48:23 +02:00
Christoph Kluge
f4384668e5 fix name comment 2026-06-02 16:47:29 +02:00
Christoph Kluge
e06982db00 reintroduce user update api path 2026-06-02 16:34:46 +02:00
Christoph Kluge
4c59aee304 feat: add subCluster filter to filter component 2026-04-24 11:43:42 +02:00
43807ae12a feat: Also submit projects array via oidc token
Entire-Checkpoint: 2064482d97e1
2026-04-01 13:46:21 +02:00
31a8a11f1b fix: Always request oidc roles from token
Entire-Checkpoint: bfdbffd7aae0
2026-04-01 12:36:37 +02:00
84fe61b3e0 fix: allow all role changes on SyncUser and UpdateUser callback
Entire-Checkpoint: 496bace0120e
2026-04-01 11:09:50 +02:00
1f04e0a1ce fix: oidc role extraction
Entire-Checkpoint: bbe9ad3cf817
2026-04-01 11:03:19 +02:00
641dc0e3b8 Run gofumpt 2026-03-30 16:49:27 +02:00
82c514b11a Ease samesite cookie settings
Entire-Checkpoint: 2fe286e23a4a
2026-03-30 16:10:15 +02:00
c267501a1b Reduve noise in tagger logging 2026-03-25 06:53:01 +01:00
Christoph Kluge
bd7125a52e review doubleranged filters, fix and improve valeu selection 2026-03-24 15:00:41 +01:00
bf48389aeb Optimize sortby in stats queries
Entire-Checkpoint: 9b5b833472e1
2026-03-20 05:39:22 +01:00
Christoph Kluge
10b4fa5a06 change: remove heuristic metricHealth, replace with DB metricHealth
- add metricHealth to single Node view
2026-03-19 15:55:58 +01:00
Christoph Kluge
886791cf8a remove deprecated minRunningFor filter remnants 2026-03-19 14:09:10 +01:00
Christoph Kluge
16ec1e69d9 streamline and unify statsSeries calc and render 2026-03-19 13:30:38 +01:00
22057ff281 Pass reqKey as CacheKey
Entire-Checkpoint: b95ef43221bb
2026-03-19 11:04:32 +01:00
8b0881fb17 Exclude down nodes from HealthCheck
Entire-Checkpoint: 0c3347168c79
2026-03-18 11:20:12 +01:00
Christoph Kluge
33beb3c806 fix: simplify stats query condition
- caused expensive subquery without need in frontend
2026-03-18 11:07:57 +01:00
c1d51959d5 Change dtermineState to enforce priority order
Make exception if node is idle + down, then final state is idle

Entire-Checkpoint: 92c797737df8
2026-03-18 10:57:06 +01:00
c449996559 Add context to log message
Entire-Checkpoint: 55d95cdef0d4
2026-03-18 09:43:41 +01:00
6ebc9e88fa Add more context information to auth failed log
Entire-Checkpoint: 2187cd89cb78
2026-03-18 06:56:01 +01:00
3314b8e284 Ignore ErrNoRows error. Include calling function in log.
Entire-Checkpoint: 20746187d135
2026-03-16 20:09:44 +01:00
6855d62bf2 Make log in scanRow more descriptive. No log for common no rows error
Entire-Checkpoint: 858b34ef56b8
2026-03-16 20:03:27 +01:00
7f3eb443d9 Include calling function in error message
Entire-Checkpoint: a4948d0fe7a3
2026-03-16 15:42:38 +01:00
09d0ba71d2 Provide idential nodestate functionality in NATS API
Entire-Checkpoint: 3a40b75edd68
2026-03-16 12:13:14 +01:00
df93dbed63 Add busyTimeout config setting
Entire-Checkpoint: 81097a6c52a2
2026-03-16 11:30:21 +01:00
e4f3fa9ba0 Wrap SyncJobs in transaction
Entire-Checkpoint: d4f6c79a8dc1
2026-03-16 11:25:49 +01:00
51517f8031 Reduce insert pressure in db. Increase sqlite timeout value
Entire-Checkpoint: a1e2931d4deb
2026-03-16 11:17:47 +01:00
e38396a081 Upgrade dependencies. Rebuild GraphQL.
Entire-Checkpoint: f770853c9fa0
2026-03-13 17:22:34 +01:00
e83bd2babd Consolidate migrations
Entire-Checkpoint: a3dba4105838
2026-03-13 17:14:13 +01:00
Christoph Kluge
ba366d0d72 use inline literals in simple queries, add downgrade optimize 2026-03-13 15:16:19 +01:00
f15f1452cc Inline jobstate literal in query
Entire-Checkpoint: 35f06df74b51
2026-03-13 15:16:07 +01:00
d586fe4b43 Optimize usage dashboard: partial indexes, request cache, parallel histograms
- Add migration 14: partial covering indexes WHERE job_state='running'
  for user/project/subcluster groupings (tiny B-tree vs full table)
- Inline literal state value in BuildWhereClause so SQLite matches
  partial indexes instead of parameterized placeholders
- Add per-request statsGroupCache (sync.Once per filter+groupBy key)
  so identical grouped stats queries execute only once per GQL operation
- Parallelize 4 histogram queries in AddHistograms using errgroup
- Consolidate frontend from 6 GQL aliases to 2, sort+slice top-10
  client-side via $derived

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 5b26a6e5ff10
2026-03-13 14:31:37 +01:00
cbe46c3524 Merge branch 'hotfix' of github.com:ClusterCockpit/cc-backend into hotfix 2026-03-13 13:17:34 +01:00
dd3e5427f4 Add covering indexes for status/dashboard queries (migration 13)
Adds composite covering indexes on (cluster, job_state, <group_col>, ...)
for user, project, and subcluster groupings to enable index-only scans
for status views. Drops subsumed 3-column indexes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 3d8def28e96e
2026-03-13 13:12:54 +01:00
Christoph Kluge
e666980184 fix typo 2026-03-13 12:07:43 +01:00
Christoph Kluge
c238f68af6 reduce unnecessary complexity 2026-03-13 12:05:16 +01:00