Commit Graph

20 Commits

Author SHA1 Message Date
1b72b0b5ad Fix critical/severe issues in init, startup and shutdown
- auth: do not abort the server when authentication is disabled. auth.Init
  is now always called; with disable-authentication it sets up an ephemeral
  session store (SESSION_KEY not required) and registers no authenticators,
  so the unconditional auth.GetAuthInstance() callers (server init,
  api.New()) always get a valid instance.
- main: run the graceful-shutdown sequence on the startup-error path. runServer
  derives a cancelable context and, on a server-start failure, cancels it and
  waits so the metricstore final checkpoint / WAL rotation, archiver flush and
  taskmanager shutdown actually run before exit.
- server: log the :80 HTTP->HTTPS redirect listener error instead of dropping it.
- archiver: guard Shutdown against being called when Start never ran
  (avoids close(nil) panic / blocking on a nil workerDone).
- nats API: stop worker goroutines on shutdown via a stop channel + idempotent
  Shutdown(); workers and subscription callbacks select on stop and the
  channels are never closed, so no send-on-closed-channel can occur. Wired
  into Server.Shutdown after the NATS client is closed.
- metricstore: make Shutdown idempotent (nil shutdownFunc, early return) and
  release shutdownFuncMu before the checkpoint write.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 3c179f9caa8f
2026-06-05 10:16:28 +02:00
6f7e262f3f Fix issues after security audit
Entire-Checkpoint: bc18358a9343
2026-06-04 18:33:30 +02:00
8b0881fb17 Exclude down nodes from HealthCheck
Entire-Checkpoint: 0c3347168c79
2026-03-18 11:20:12 +01:00
09d0ba71d2 Provide idential nodestate functionality in NATS API
Entire-Checkpoint: 3a40b75edd68
2026-03-16 12:13:14 +01:00
51517f8031 Reduce insert pressure in db. Increase sqlite timeout value
Entire-Checkpoint: a1e2931d4deb
2026-03-16 11:17:47 +01:00
6ecb934967 Switch to CC line-protocol package. Update cc-lib. 2026-02-27 08:55:33 +01:00
fc1ba1f5b3 Merge branch 'dev' of github.com:ClusterCockpit/cc-backend into dev 2026-02-21 13:52:14 +01:00
82e79b074a Reverse Lookup order in stop job request 2026-02-21 13:51:31 +01:00
e1c1148160 Fix more bugs related to job_cache ids used in job table 2026-02-20 09:20:18 +01:00
ac7eb93141 fix: Transfer always to main job table before archiving 2026-02-09 19:57:46 +01:00
Christoph Kluge
e9cd6b4225 set updateNodeStates timeStamp once per request
-prevents per-host timestamp mismatches due to handler iteration duration
2026-02-02 17:51:41 +01:00
499b4287f9 Switch to cclib nats client 2026-01-23 10:04:41 +01:00
Michael Panzlaff
94b86ef11a Mismatched event types are not something to be concerned about
If a different CCMessage type was sent over the same subject as
requested, that shouldn't raise a warning. This may happen in production
instances, but in order to ease debugging, lower it to 'debug' level.
2026-01-14 16:08:33 +01:00
b2f870e3c0 Convert nodestate nats API to influx line protocol payload. Review and add doc comments.
Improve and extend tests
2026-01-14 10:08:06 +01:00
8576ae458d Switch to cc-lib v2 2025-12-24 09:24:18 +01:00
999667ec0c Merge branch 'dev' of github.com:ClusterCockpit/cc-backend into dev 2025-12-23 07:56:16 +01:00
c1135531ba Port NATS api to ccMessages 2025-12-23 07:56:13 +01:00
0bc26aa194 Add error check 2025-12-23 05:56:46 +01:00
d30c6ef3bf Make NATS API subjects configurable 2025-12-17 06:08:09 +01:00
43e5fd1131 Add NATS API backend 2025-12-17 05:44:49 +01:00