Commit Graph

32 Commits

Author SHA1 Message Date
a8194de492 Add diagnostic output for healthcheck 2026-02-07 06:17:34 +01:00
a8d385a1ee Update HealthCheck again Still WIP 2026-02-06 16:35:02 +01:00
5579b6f40c Adopt unit test to new API 2026-02-06 16:11:10 +01:00
7123a8c1cc Updated HealthCheck implementation WIP 2026-02-06 16:04:01 +01:00
f671d8df90 Add counts in healthcheck for logging output 2026-02-06 09:25:09 +01:00
Aditya Ujeniya
fcb37b0367 Update to count healthy metrics 2026-02-06 08:45:36 +01:00
0984c1d431 Add debug log with degrade and missing metrics for healthcheck 2026-02-06 07:21:04 +01:00
5d7dd62b72 Update unit test for new HealthCheck update 2026-02-04 12:53:24 +01:00
46fb52d67e Adopt documentation 2026-02-04 12:30:33 +01:00
Aditya Ujeniya
39b8356683 Optimized CCMS healthcheck 2026-02-04 10:24:45 +01:00
42ce598865 Merge branch 'dev' of github.com:ClusterCockpit/cc-backend into dev 2026-02-03 18:35:35 +01:00
0d62a300e7 Intermediate state of node Healthcheck
TODOS:
* Remove error handling from routine and simplify API call
* Use map for hardware level metrics
2026-02-03 18:35:17 +01:00
Aditya Ujeniya
3cf88f757c Update to checkpoint loader in CCMS 2026-02-03 16:25:48 +01:00
248f11f4f8 Change API of Node HealthState 2026-02-03 14:55:12 +01:00
00a41373e8 Add monitoring healthstate support in nodestate API. 2026-02-03 12:23:24 +01:00
Aditya Ujeniya
a71341064e Update to MetricStore HealthCheck API 2026-01-30 23:24:16 +01:00
Christoph Kluge
32f0664012 add indicator to nodeView state, cap bubble size in roofline 2026-01-30 14:32:41 +01:00
Christoph Kluge
4deec9a170 no append if ErrNoHostOrMetric fired 2026-01-29 15:18:50 +01:00
Aditya Ujeniya
7101d2bb3b Handle the metric/host not found case differently 2026-01-28 17:47:38 +01:00
0d857b49a2 Disable explicit GC calls 2026-01-28 11:21:27 +01:00
eb5aa9ad02 Disable explicit GC calls 2026-01-28 11:21:02 +01:00
9d15a87c88 Take into account the real allocated heap memory in MemoryUsageTracker 2026-01-27 18:23:09 +01:00
bbde91a1f9 Move wg increment inside goroutines. Make GC calls less aggressive 2026-01-27 17:25:29 +01:00
55cb2cb6d6 Prevent file not closed on error in avro checkpoint 2026-01-27 17:10:26 +01:00
752e19c276 Pull out metric List build from metricstore Init 2026-01-27 17:06:52 +01:00
b307e885ce feat: Add support for multiple external metric stores 2026-01-27 10:02:07 +01:00
7ecfc8468e Merge branch 'dev' of github.com:ClusterCockpit/cc-backend into dev 2026-01-26 08:38:56 +01:00
c782043c64 Upgrade cclib and remove usage of obsolete util.Float 2026-01-26 08:38:53 +01:00
Christoph Kluge
49938bcef8 remove blocking backend check
- threw errors on expected and correctly handled behavior for nodeList queries
2026-01-23 17:41:21 +01:00
525d99140f Fix missing comma in metristore schema 2026-01-23 10:15:29 +01:00
499b4287f9 Switch to cclib nats client 2026-01-23 10:04:41 +01:00
f41301036b Move metricstore from internal to pkg 2026-01-23 07:49:47 +01:00