Commit Graph

2720 Commits

Author SHA1 Message Date
60554896d5 Update ReleaseNote for upcoming release
Entire-Checkpoint: 30099a746fc7
2026-03-20 08:21:16 +01:00
bf48389aeb Optimize sortby in stats queries
Entire-Checkpoint: 9b5b833472e1
2026-03-20 05:39:22 +01:00
Christoph Kluge
10b4fa5a06 change: remove heuristic metricHealth, replace with DB metricHealth
- add metricHealth to single Node view
2026-03-19 15:55:58 +01:00
Christoph Kluge
886791cf8a remove deprecated minRunningFor filter remnants 2026-03-19 14:09:10 +01:00
Christoph Kluge
6cad2ee1f0 bump frontend dependencies, increase version to match release 2026-03-19 13:56:46 +01:00
Christoph Kluge
16ec1e69d9 streamline and unify statsSeries calc and render 2026-03-19 13:30:38 +01:00
Christoph Kluge
c42898bd99 fix: add top list query fixes to analysis and dashboard 2026-03-19 11:31:40 +01:00
22057ff281 Pass reqKey as CacheKey
Entire-Checkpoint: b95ef43221bb
2026-03-19 11:04:32 +01:00
Christoph Kluge
30b8ca4a1a further clarify plot titles 2026-03-19 10:45:55 +01:00
09501df3c2 fix: reduce memory usage in parquet checkpoint archiver
Stream CheckpointFile trees directly to parquet rows instead of
materializing all rows in a giant intermediate slice. This eliminates
~1.9GB per host of redundant allocations (repeated string headers)
and removes the expensive sort on millions of 104-byte structs.

Key changes:
- Replace flattenCheckpointFile + sortParquetRows + WriteHostRows with
  streaming WriteCheckpointFile that walks the tree with sorted keys
- Reduce results channel buffer from len(hostEntries) to 2 for
  back-pressure (at most NumWorkers+2 results in flight)
- Workers send CheckpointFile trees instead of []ParquetMetricRow
- Write rows in small 1024-element batches via reusable buffer

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: f31dc1847539
2026-03-18 17:32:16 +01:00
Christoph Kluge
bb6915771d fix: clarify title 2026-03-18 13:23:33 +01:00
8b0881fb17 Exclude down nodes from HealthCheck
Entire-Checkpoint: 0c3347168c79
2026-03-18 11:20:12 +01:00
Christoph Kluge
33beb3c806 fix: simplify stats query condition
- caused expensive subquery without need in frontend
2026-03-18 11:07:57 +01:00
c1d51959d5 Change dtermineState to enforce priority order
Make exception if node is idle + down, then final state is idle

Entire-Checkpoint: 92c797737df8
2026-03-18 10:57:06 +01:00
3328d2ca11 Update go version in CLAUDE.md 2026-03-18 10:37:32 +01:00
8f10eba771 Extend CLAUDE.md
Entire-Checkpoint: 17cdf997acff
2026-03-18 10:05:09 +01:00
c449996559 Add context to log message
Entire-Checkpoint: 55d95cdef0d4
2026-03-18 09:43:41 +01:00
51ae2a5d10 Remove tracked .entire/metadata/ files from git
These conversation transcript files were committed before the gitignore
rule existed. They are now properly ignored via .entire/.gitignore.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 07:10:01 +01:00
6ebc9e88fa Add more context information to auth failed log
Entire-Checkpoint: 2187cd89cb78
2026-03-18 06:56:01 +01:00
8b132ed7f8 fix: Blocking ReceiveNats call
Entire-Checkpoint: 38a235c86ceb
2026-03-18 06:47:45 +01:00
bf1a8a174e fix: Shard WAL consumer for higher throughput
Entire-Checkpoint: e583b7b11439
2026-03-18 06:32:14 +01:00
50aed595cf fix: metricstore NATS contention
Entire-Checkpoint: 7e68050cab59
2026-03-18 06:14:15 +01:00
33bc19c732 Upgrade cc-lib 2026-03-18 05:52:58 +01:00
045f81f985 Prepare release v1.5.2
Entire-Checkpoint: 9286f4c43ab5
2026-03-18 05:31:49 +01:00
d46e6371fc Add log about checkpoint archiving
Entire-Checkpoint: bf29af79b268
2026-03-18 05:22:39 +01:00
02f82c2c0b fix: Prevent memory spikes in parquet writer for metricstore move policy
Entire-Checkpoint: 4a675b8352a2
2026-03-18 05:08:37 +01:00
3314b8e284 Ignore ErrNoRows error. Include calling function in log.
Entire-Checkpoint: 20746187d135
2026-03-16 20:09:44 +01:00
6855d62bf2 Make log in scanRow more descriptive. No log for common no rows error
Entire-Checkpoint: 858b34ef56b8
2026-03-16 20:03:27 +01:00
7f3eb443d9 Include calling function in error message
Entire-Checkpoint: a4948d0fe7a3
2026-03-16 15:42:38 +01:00
bab6eb4c3a Convert Warn message on missing metrics to Debug level 2026-03-16 15:35:24 +01:00
09d0ba71d2 Provide idential nodestate functionality in NATS API
Entire-Checkpoint: 3a40b75edd68
2026-03-16 12:13:14 +01:00
df93dbed63 Add busyTimeout config setting
Entire-Checkpoint: 81097a6c52a2
2026-03-16 11:30:21 +01:00
e4f3fa9ba0 Wrap SyncJobs in transaction
Entire-Checkpoint: d4f6c79a8dc1
2026-03-16 11:25:49 +01:00
51517f8031 Reduce insert pressure in db. Increase sqlite timeout value
Entire-Checkpoint: a1e2931d4deb
2026-03-16 11:17:47 +01:00
0aad8f01c8 Upgrade cc-lib
Fixes panic in AddNodeScope

Entire-Checkpoint: afef27e07ec9
2026-03-16 08:55:56 +01:00
973ca87bd1 Extend known issues in ReleaseNotes 2026-03-15 07:02:54 +01:00
045311eec0 Prepare release 1.5.1
Entire-Checkpoint: baed7fbee099
2026-03-13 17:30:03 +01:00
e38396a081 Upgrade dependencies. Rebuild GraphQL.
Entire-Checkpoint: f770853c9fa0
2026-03-13 17:22:34 +01:00
e83bd2babd Consolidate migrations
Entire-Checkpoint: a3dba4105838
2026-03-13 17:14:13 +01:00
Christoph Kluge
ba366d0d72 use inline literals in simple queries, add downgrade optimize 2026-03-13 15:16:19 +01:00
f15f1452cc Inline jobstate literal in query
Entire-Checkpoint: 35f06df74b51
2026-03-13 15:16:07 +01:00
df2a13def2 Merge branch 'hotfix' of github.com:ClusterCockpit/cc-backend into hotfix 2026-03-13 14:34:11 +01:00
d586fe4b43 Optimize usage dashboard: partial indexes, request cache, parallel histograms
- Add migration 14: partial covering indexes WHERE job_state='running'
  for user/project/subcluster groupings (tiny B-tree vs full table)
- Inline literal state value in BuildWhereClause so SQLite matches
  partial indexes instead of parameterized placeholders
- Add per-request statsGroupCache (sync.Once per filter+groupBy key)
  so identical grouped stats queries execute only once per GQL operation
- Parallelize 4 histogram queries in AddHistograms using errgroup
- Consolidate frontend from 6 GQL aliases to 2, sort+slice top-10
  client-side via $derived

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 5b26a6e5ff10
2026-03-13 14:31:37 +01:00
Christoph Kluge
bc214f6cea add nullsafes to frontend 2026-03-13 14:20:45 +01:00
cbe46c3524 Merge branch 'hotfix' of github.com:ClusterCockpit/cc-backend into hotfix 2026-03-13 13:17:34 +01:00
0037d969b2 Consolidate UsageDash into single GraphQL query
Merge three separate queries (topJobsQuery, topNodesQuery, topAccsQuery)
into one topStatsQuery with 6 aliased jobsStatistics fields, reducing
3 HTTP round trips to 1 on the status dashboard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 40d806a3240c
2026-03-13 13:14:29 +01:00
dd3e5427f4 Add covering indexes for status/dashboard queries (migration 13)
Adds composite covering indexes on (cluster, job_state, <group_col>, ...)
for user, project, and subcluster groupings to enable index-only scans
for status views. Drops subsumed 3-column indexes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 3d8def28e96e
2026-03-13 13:12:54 +01:00
Christoph Kluge
e666980184 fix typo 2026-03-13 12:07:43 +01:00
Christoph Kluge
c238f68af6 reduce unnecessary complexity 2026-03-13 12:05:16 +01:00
Christoph Kluge
58c0c79f72 handle single job state queries as simple stringquery
- this will improve index usage for single state queries
2026-03-13 12:03:06 +01:00