Commit Graph

194 Commits

Author SHA1 Message Date
83d04dff17 feat(auth): replace .env/godotenv secret handling with config-based secrets
Secrets (JWT keys, LDAP sync password, OIDC client id/secret, cross-login
keys) are now configured directly in config.json under the auth section
where they are used. Each secret can still be supplied via its existing
environment variable, which takes precedence over the config value.

The godotenv dependency, the .env file, configs/env-template.txt and the
loadEnvironment() bootstrap step are removed. -init now writes the demo
JWT keys into config.json instead of a .env file.

Closes #283

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 3a7cb814c53f
2026-06-17 12:28:17 +02:00
2b01b57495 feat: replace gorilla/sessions with alexedwards/scs/v2
Browser sessions are now server-side, stored in the SQLite database via
scs/sqlite3store (new `sessions` table, DB migration to version 12) instead
of gorilla/sessions client-side cookie storage. Only an opaque random token
is kept in the cookie; session data lives server-side and survives restarts.

Session middleware is wired as a hybrid to avoid buffering large responses:
scs.LoadAndSave on the login/logout write paths, and a non-buffering
read-only LoadSession middleware on the secured/config/frontend read paths
so the large GraphQL /query responses stream unbuffered. JWT-only APIs
(/api, /userapi, /api/metricstore) and static files are left unwrapped.

The session cookie Secure flag is now derived from the server config (set
when cc-backend terminates TLS itself); previously it was effectively never
set. The SESSION_KEY env var is removed as server-side tokens need no
signing secret. The dormant Bearer-JWT branch in the frontend urql client
is removed; the web UI authenticates GraphQL via the session cookie.

Closes #558

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b51075f43cc7
2026-06-17 07:54:26 +02:00
1b72b0b5ad Fix critical/severe issues in init, startup and shutdown
- auth: do not abort the server when authentication is disabled. auth.Init
  is now always called; with disable-authentication it sets up an ephemeral
  session store (SESSION_KEY not required) and registers no authenticators,
  so the unconditional auth.GetAuthInstance() callers (server init,
  api.New()) always get a valid instance.
- main: run the graceful-shutdown sequence on the startup-error path. runServer
  derives a cancelable context and, on a server-start failure, cancels it and
  waits so the metricstore final checkpoint / WAL rotation, archiver flush and
  taskmanager shutdown actually run before exit.
- server: log the :80 HTTP->HTTPS redirect listener error instead of dropping it.
- archiver: guard Shutdown against being called when Start never ran
  (avoids close(nil) panic / blocking on a nil workerDone).
- nats API: stop worker goroutines on shutdown via a stop channel + idempotent
  Shutdown(); workers and subscription callbacks select on stop and the
  channels are never closed, so no send-on-closed-channel can occur. Wired
  into Server.Shutdown after the NATS client is closed.
- metricstore: make Shutdown idempotent (nil shutdownFunc, early return) and
  release shutdownFuncMu before the checkpoint write.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 3c179f9caa8f
2026-06-05 10:16:28 +02:00
16942f55a0 Fix medium-severity issues from follow-up security audit
Addresses the remaining medium findings from the second-pass audit:

- DoS hardening: bound GraphQL query cost with FixedComplexityLimit, and
  reject non-positive items-per-page / page values so uint64 conversion
  cannot underflow into an unbounded LIMIT/OFFSET. The -1 "load all"
  sentinel stays valid for dashboards; REST now returns 400 for bad input.

- Security headers: add X-Content-Type-Options, X-Frame-Options,
  Referrer-Policy and a conservative CSP (frame-ancestors/object-src/
  base-uri) that hardens against clickjacking and base-tag injection
  without restricting the self-hosted SPA's inline scripts.

- Stored XSS: render job.metaData.message as escaped text instead of
  {@html ...} in Job.root and JobFootprint, preserving line breaks via
  white-space: pre-wrap.

- SQL injection hardening: parameterize the tag-scope IN list and the
  manager project subquery in CountTags instead of interpolating
  user.Username / user.Projects (externally sourced via OIDC/LDAP).

- CSRF defense-in-depth: reject cross-site state-changing requests via
  Sec-Fetch-Site, failing open for non-browser clients, on top of the
  existing SameSite=Lax session cookie.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: de7d47a85c7c
2026-06-04 20:08:41 +02:00
6f7e262f3f Fix issues after security audit
Entire-Checkpoint: bc18358a9343
2026-06-04 18:33:30 +02:00
ac0a4cc39a Increase shutdown timeouts and WAL flush interval
Entire-Checkpoint: 94ee2fb97830
2026-03-27 09:56:34 +01:00
e759810051 Add shutdown timings. Do not drain WAL buffers on shutdown
Entire-Checkpoint: d4b497002f54
2026-03-26 07:02:37 +01:00
a550344f13 Increase server shutdown timeout
Entire-Checkpoint: cf3b472471bd
2026-03-25 06:15:55 +01:00
93a9d732a4 fix: Improve shutdown time
Entire-Checkpoint: a4d012e1edcf
2026-03-24 07:17:34 +01:00
45f329e5fb feat: Add command line switch to trigger manual metricstore checkpoint cleanup
Entire-Checkpoint: 29b9d52db89c
2026-03-23 07:58:35 +01:00
192c94a78d fix: Prevent interruption of body lineprotocol parsing on locks
Entire-Checkpoint: ccda3b2ff4cb
2026-03-23 07:12:13 +01:00
df93dbed63 Add busyTimeout config setting
Entire-Checkpoint: 81097a6c52a2
2026-03-16 11:30:21 +01:00
d586fe4b43 Optimize usage dashboard: partial indexes, request cache, parallel histograms
- Add migration 14: partial covering indexes WHERE job_state='running'
  for user/project/subcluster groupings (tiny B-tree vs full table)
- Inline literal state value in BuildWhereClause so SQLite matches
  partial indexes instead of parameterized placeholders
- Add per-request statsGroupCache (sync.Once per filter+groupBy key)
  so identical grouped stats queries execute only once per GQL operation
- Parallelize 4 histogram queries in AddHistograms using errgroup
- Consolidate frontend from 6 GQL aliases to 2, sort+slice top-10
  client-side via $derived

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 5b26a6e5ff10
2026-03-13 14:31:37 +01:00
0e27624d73 Add flag to optimize db. Remove ANALYZE on startup.
Entire-Checkpoint: d49917ff4b10
2026-03-12 20:12:49 +01:00
00d2f97c4c fix: Large heap allocations in sqlite driver. Sanitize sqlite config and make it configurablex. Allow to cancel queries. 2026-03-11 11:14:37 +01:00
70fea39d03 Add note on dynamic memory management for restarts 2026-03-06 10:56:23 +01:00
ddda341e10 Safeguard metricstore shutdown if internal metricstore is not initialized 2026-03-05 10:37:33 +01:00
Christoph Kluge
718ff60221 clarify ccms logs 2026-03-02 16:24:38 +01:00
03c65e06f6 Allow finer control for omit tagged jobs in retention policies 2026-02-23 08:46:47 +01:00
90b52f997d Cleanup and handle error in AppTagger 2026-02-19 08:24:39 +01:00
8d6c6b819b Update and port to cc-lib 2026-02-11 07:06:06 +01:00
abdd7ea6f1 Merge branch 'master' into dev 2026-02-09 07:46:44 +01:00
c7b366f35f Put notFoundHandler earlier to also catch subrouters 2026-02-09 07:46:37 +01:00
624746f34b Fix 404 handler route 2026-02-07 18:29:27 +01:00
2b395a94e6 Fix setup issue with chi router 2026-02-07 18:02:48 +01:00
f6aa40d927 Migrate from gorilla to chi web framework. add 404 handler 2026-02-07 17:48:12 +01:00
98661aad15 Increase default GC frequency 2026-01-28 10:41:44 +01:00
752e19c276 Pull out metric List build from metricstore Init 2026-01-27 17:06:52 +01:00
b307e885ce feat: Add support for multiple external metric stores 2026-01-27 10:02:07 +01:00
0ea836c69c Revert metricstore api paths 2026-01-26 13:17:36 +01:00
499b4287f9 Switch to cclib nats client 2026-01-23 10:04:41 +01:00
f41301036b Move metricstore from internal to pkg 2026-01-23 07:49:47 +01:00
1d4c79c821 Unify JSON attribute naming ot use kebab style case. Cleanup configuration. 2026-01-20 09:47:13 +01:00
5281f3bb60 Remove obsolete config option disable-archive 2026-01-19 16:42:30 +01:00
155e05495e Fix shutdown timout bug 2026-01-15 13:29:19 +01:00
9c92a7796b Introduce nodeprovider interface to break import cycle 2026-01-15 12:20:11 +01:00
cb219b3c74 Fix configuration issues. Fix shutdown hangs
Always turn on compression
2026-01-15 11:34:06 +01:00
Aditya Ujeniya
3276ed7785 Half-baked commit for new dynamic retention logic 2026-01-14 14:56:36 +01:00
c8627a13f4 Remove obsolete slusters config section 2026-01-14 11:17:49 +01:00
a9366d14c6 Add README for tagging. Enable tagging by flag without configuration option 2026-01-13 08:32:32 +01:00
4cec933349 Remove obsolete cluster config section 2026-01-13 06:28:33 +01:00
11ec2267da Major refactor of metric data handling
- make the  internal memory store required and default
- Rename memorystore to metricstore
- Rename metricDataDispatcher to metricdispatch
- Remove metricdata package
- Introduce metricsync package for upstream metric data pull
2025-12-25 08:42:54 +01:00
8576ae458d Switch to cc-lib v2 2025-12-24 09:24:18 +01:00
1cd4a57bd3 Remove support for mysql/mariadb 2025-12-20 11:13:41 +01:00
fdee4f8938 Integrate NATS API.
Only start either REST start/stop API or NATS start/stop API
2025-12-20 09:21:58 +01:00
Aditya Ujeniya
32e5353847 Fix to NATS deadlock and revert demo script 2025-12-17 18:14:36 +01:00
Aditya Ujeniya
d2f2d78954 Changing JWT output to stdout and change to help text 2025-12-17 15:58:42 +01:00
4ecc050c4c Fix deadlock if NATS is not configured 2025-12-17 07:03:01 +01:00
14f1192ccb Introduce central nats client 2025-12-16 09:35:33 +01:00
97a322354f Refactor 2025-12-15 14:06:33 +01:00