Commit Graph

306 Commits

Author SHA1 Message Date
moebiusband 2e24fde430 Optimize sort order in nodestate parquet files 2026-02-18 08:06:00 +01:00
moebiusband 757be60b22 Switch to zstd compression for parquet writers 2026-02-18 07:55:28 +01:00
moebiusband 6035b62734 Run go fix 2026-02-17 21:04:17 +01:00
Aditya Ujeniya 1cf2c41bd7 Resize the buffers and put them into the pool 2026-02-16 18:21:45 +01:00
Aditya Ujeniya 2eeefc2720 Add healthCheck support for external CCMS 2026-02-16 16:57:17 +01:00
moebiusband 7d8b305cd9 Add format conversion to archive manager 2026-02-13 13:54:10 +01:00
moebiusband 2c8608f5a4 Update job archive retention to uniform policy with json and parquet target format 2026-02-13 12:19:31 +01:00
moebiusband 54ea5d7900 Add nodestate retention and archiving 2026-02-12 09:21:44 +01:00
moebiusband 865cd3db54 Prersist faulty nodestate metric lists to db 2026-02-12 08:48:15 +01:00
moebiusband 8d6c6b819b Update and port to cc-lib 2026-02-11 07:06:06 +01:00
moebiusband 035ac2384e Refactor GlobalMetricLists 2026-02-09 21:56:41 +01:00
moebiusband c920c57f5d Add parquet file job archiving target 2026-02-07 10:51:56 +01:00
moebiusband a8194de492 Add diagnostic output for healthcheck 2026-02-07 06:17:34 +01:00
moebiusband a8d385a1ee Update HealthCheck again Still WIP 2026-02-06 16:35:02 +01:00
moebiusband 5579b6f40c Adopt unit test to new API 2026-02-06 16:11:10 +01:00
moebiusband 7123a8c1cc Updated HealthCheck implementation WIP 2026-02-06 16:04:01 +01:00
moebiusband f671d8df90 Add counts in healthcheck for logging output 2026-02-06 09:25:09 +01:00
Aditya Ujeniya fcb37b0367 Update to count healthy metrics 2026-02-06 08:45:36 +01:00
moebiusband 0984c1d431 Add debug log with degrade and missing metrics for healthcheck 2026-02-06 07:21:04 +01:00
moebiusband 5d7dd62b72 Update unit test for new HealthCheck update 2026-02-04 12:53:24 +01:00
moebiusband 46fb52d67e Adopt documentation 2026-02-04 12:30:33 +01:00
Aditya Ujeniya 39b8356683 Optimized CCMS healthcheck 2026-02-04 10:24:45 +01:00
moebiusband 42ce598865 Merge branch 'dev' of github.com:ClusterCockpit/cc-backend into dev 2026-02-03 18:35:35 +01:00
moebiusband 0d62a300e7 Intermediate state of node Healthcheck
TODOS:
* Remove error handling from routine and simplify API call
* Use map for hardware level metrics
2026-02-03 18:35:17 +01:00
Aditya Ujeniya 3cf88f757c Update to checkpoint loader in CCMS 2026-02-03 16:25:48 +01:00
moebiusband 248f11f4f8 Change API of Node HealthState 2026-02-03 14:55:12 +01:00
moebiusband 00a41373e8 Add monitoring healthstate support in nodestate API. 2026-02-03 12:23:24 +01:00
Aditya Ujeniya a71341064e Update to MetricStore HealthCheck API 2026-01-30 23:24:16 +01:00
Christoph Kluge 32f0664012 add indicator to nodeView state, cap bubble size in roofline 2026-01-30 14:32:41 +01:00
Christoph Kluge 4deec9a170 no append if ErrNoHostOrMetric fired 2026-01-29 15:18:50 +01:00
Aditya Ujeniya 7101d2bb3b Handle the metric/host not found case differently 2026-01-28 17:47:38 +01:00
moebiusband 0d857b49a2 Disable explicit GC calls 2026-01-28 11:21:27 +01:00
moebiusband eb5aa9ad02 Disable explicit GC calls 2026-01-28 11:21:02 +01:00
moebiusband 9d15a87c88 Take into account the real allocated heap memory in MemoryUsageTracker 2026-01-27 18:23:09 +01:00
moebiusband bbde91a1f9 Move wg increment inside goroutines. Make GC calls less aggressive 2026-01-27 17:25:29 +01:00
moebiusband 55cb2cb6d6 Prevent file not closed on error in avro checkpoint 2026-01-27 17:10:26 +01:00
moebiusband 752e19c276 Pull out metric List build from metricstore Init 2026-01-27 17:06:52 +01:00
moebiusband b307e885ce feat: Add support for multiple external metric stores 2026-01-27 10:02:07 +01:00
moebiusband 7ecfc8468e Merge branch 'dev' of github.com:ClusterCockpit/cc-backend into dev 2026-01-26 08:38:56 +01:00
moebiusband c782043c64 Upgrade cclib and remove usage of obsolete util.Float 2026-01-26 08:38:53 +01:00
Christoph Kluge 49938bcef8 remove blocking backend check
- threw errors on expected and correctly handled behavior for nodeList queries
2026-01-23 17:41:21 +01:00
moebiusband 525d99140f Fix missing comma in metristore schema 2026-01-23 10:15:29 +01:00
moebiusband 499b4287f9 Switch to cclib nats client 2026-01-23 10:04:41 +01:00
moebiusband f41301036b Move metricstore from internal to pkg 2026-01-23 07:49:47 +01:00
moebiusband 1d4c79c821 Unify JSON attribute naming ot use kebab style case. Cleanup configuration. 2026-01-20 09:47:13 +01:00
moebiusband 5281f3bb60 Remove obsolete config option disable-archive 2026-01-19 16:42:30 +01:00
moebiusband cb219b3c74 Fix configuration issues. Fix shutdown hangs
Always turn on compression
2026-01-15 11:34:06 +01:00
moebiusband e1efc68476 Update dependencies. Rebuild graphql and swagger 2026-01-15 08:32:06 +01:00
moebiusband c6465ad9e5 Add s3 configuration options 2026-01-14 15:28:34 +01:00
moebiusband 211d4fae54 Refactor s3 backend and suppress checksum warning 2026-01-14 15:10:20 +01:00