Commit Graph

294 Commits

Author SHA1 Message Date
a8194de492 Add diagnostic output for healthcheck 2026-02-07 06:17:34 +01:00
a8d385a1ee Update HealthCheck again Still WIP 2026-02-06 16:35:02 +01:00
5579b6f40c Adopt unit test to new API 2026-02-06 16:11:10 +01:00
7123a8c1cc Updated HealthCheck implementation WIP 2026-02-06 16:04:01 +01:00
f671d8df90 Add counts in healthcheck for logging output 2026-02-06 09:25:09 +01:00
Aditya Ujeniya
fcb37b0367 Update to count healthy metrics 2026-02-06 08:45:36 +01:00
0984c1d431 Add debug log with degrade and missing metrics for healthcheck 2026-02-06 07:21:04 +01:00
5d7dd62b72 Update unit test for new HealthCheck update 2026-02-04 12:53:24 +01:00
46fb52d67e Adopt documentation 2026-02-04 12:30:33 +01:00
Aditya Ujeniya
39b8356683 Optimized CCMS healthcheck 2026-02-04 10:24:45 +01:00
42ce598865 Merge branch 'dev' of github.com:ClusterCockpit/cc-backend into dev 2026-02-03 18:35:35 +01:00
0d62a300e7 Intermediate state of node Healthcheck
TODOS:
* Remove error handling from routine and simplify API call
* Use map for hardware level metrics
2026-02-03 18:35:17 +01:00
Aditya Ujeniya
3cf88f757c Update to checkpoint loader in CCMS 2026-02-03 16:25:48 +01:00
248f11f4f8 Change API of Node HealthState 2026-02-03 14:55:12 +01:00
00a41373e8 Add monitoring healthstate support in nodestate API. 2026-02-03 12:23:24 +01:00
Aditya Ujeniya
a71341064e Update to MetricStore HealthCheck API 2026-01-30 23:24:16 +01:00
Christoph Kluge
32f0664012 add indicator to nodeView state, cap bubble size in roofline 2026-01-30 14:32:41 +01:00
Christoph Kluge
4deec9a170 no append if ErrNoHostOrMetric fired 2026-01-29 15:18:50 +01:00
Aditya Ujeniya
7101d2bb3b Handle the metric/host not found case differently 2026-01-28 17:47:38 +01:00
0d857b49a2 Disable explicit GC calls 2026-01-28 11:21:27 +01:00
eb5aa9ad02 Disable explicit GC calls 2026-01-28 11:21:02 +01:00
9d15a87c88 Take into account the real allocated heap memory in MemoryUsageTracker 2026-01-27 18:23:09 +01:00
bbde91a1f9 Move wg increment inside goroutines. Make GC calls less aggressive 2026-01-27 17:25:29 +01:00
55cb2cb6d6 Prevent file not closed on error in avro checkpoint 2026-01-27 17:10:26 +01:00
752e19c276 Pull out metric List build from metricstore Init 2026-01-27 17:06:52 +01:00
b307e885ce feat: Add support for multiple external metric stores 2026-01-27 10:02:07 +01:00
7ecfc8468e Merge branch 'dev' of github.com:ClusterCockpit/cc-backend into dev 2026-01-26 08:38:56 +01:00
c782043c64 Upgrade cclib and remove usage of obsolete util.Float 2026-01-26 08:38:53 +01:00
Christoph Kluge
49938bcef8 remove blocking backend check
- threw errors on expected and correctly handled behavior for nodeList queries
2026-01-23 17:41:21 +01:00
525d99140f Fix missing comma in metristore schema 2026-01-23 10:15:29 +01:00
499b4287f9 Switch to cclib nats client 2026-01-23 10:04:41 +01:00
f41301036b Move metricstore from internal to pkg 2026-01-23 07:49:47 +01:00
1d4c79c821 Unify JSON attribute naming ot use kebab style case. Cleanup configuration. 2026-01-20 09:47:13 +01:00
5281f3bb60 Remove obsolete config option disable-archive 2026-01-19 16:42:30 +01:00
cb219b3c74 Fix configuration issues. Fix shutdown hangs
Always turn on compression
2026-01-15 11:34:06 +01:00
e1efc68476 Update dependencies. Rebuild graphql and swagger 2026-01-15 08:32:06 +01:00
c6465ad9e5 Add s3 configuration options 2026-01-14 15:28:34 +01:00
211d4fae54 Refactor s3 backend and suppress checksum warning 2026-01-14 15:10:20 +01:00
754f7e16f6 Reformat with gofumpt 2026-01-13 09:52:31 +01:00
8576ae458d Switch to cc-lib v2 2025-12-24 09:24:18 +01:00
c1135531ba Port NATS api to ccMessages 2025-12-23 07:56:13 +01:00
6e74fa294a Add role-based visibility for metrics
Fixes #387
2025-12-18 15:47:30 +01:00
b8fdfc30c0 Fix performance bugs in sqlite archive backend 2025-12-17 10:12:49 +01:00
14f1192ccb Introduce central nats client 2025-12-16 09:35:33 +01:00
7fce6fa401 Parallelize the Iter function in all archive backends 2025-12-16 09:11:09 +01:00
0306723307 Introduce transparent compression for importJob function in all archive backends 2025-12-16 08:55:31 +01:00
10aa2bfbd3 Add support for ClusterConfig 2025-12-15 11:24:12 +01:00
6cfed989ff Fix bugs in sqlite backend 2025-12-15 11:23:53 +01:00
48b68d3410 Fix aws endpoint deprecation 2025-12-04 06:20:19 +01:00
78530029ef Reformat 2025-12-03 14:54:48 +01:00