13 Commits

Author SHA1 Message Date
dc8e8b9095 Add note about cache size choice
Entire-Checkpoint: 52552d244fc5
2026-03-11 10:45:47 +01:00
2080e268ce Remove accidently added entire logfile 2026-03-11 06:33:41 +01:00
e8204ff1a2 Add db configuration to README
Entire-Checkpoint: 1526810bf9c1
2026-03-11 06:21:49 +01:00
09fa239e8b fix: Prevent memory explosion in sqlite. And make db options configurable
Entire-Checkpoint: e368e6d8abf3
2026-03-11 06:14:05 +01:00
301e590580 Add migration with alternative reduces index set
Entire-Checkpoint: af7afc9a29ff
2026-03-11 05:46:03 +01:00
eba3995610 Add Analyse on db startup
Entire-Checkpoint: ea70a955214d
2026-03-11 05:28:52 +01:00
f8831e7040 Fixed merge errors
Entire-Checkpoint: ddd4fa4a7bbb
2026-03-11 05:09:38 +01:00
1cf99206a9 Merge branch 'hotfix' of github.com:ClusterCockpit/cc-backend into hotfix 2026-03-11 05:06:26 +01:00
5d3d77620e fix: Add defer.close for all queries 2026-03-11 05:04:20 +01:00
Christoph Kluge
5c72664162 bump frontend patch versions 2026-03-10 18:15:24 +01:00
Christoph Kluge
f3e796f3f5 add nullsafes to node view 2026-03-10 17:05:50 +01:00
Christoph Kluge
cc38b17472 fix wrong field checked vor json validity 2026-03-10 17:02:09 +01:00
282197ebef fix: Round floats in tagger message
Entire-Checkpoint: b68850c6fcff
2026-03-10 06:01:31 +01:00
29 changed files with 1456 additions and 106 deletions

View File

@@ -0,0 +1,29 @@
# Session Context
Session ID: 28147033-ddc8-4056-b064-e0558fbc614e
Commit Message: There are memory leaks in production on queries that trigger a full scan
## Prompts
### Prompt 1
There are memory leaks in production on queries that trigger a full scan. Review all queries for missing close and other bugs that could cause memory leaks in the sqlite-3 driver. Example for a query that triggers a massive allocation is a specific jobstate list with sorting on start_time on all jobs.
## Summary
All 31 `.Query()` calls now have proper `rows.Close()`. Build passes, tests pass, vet clean.
### Summary of changes
**`internal/repository/stats.go`** (7 fixes + 1 rewrite):
- Added `defer rows.Close()` + `rows.Err()` to `JobsStatsGrouped`, `JobCountGrouped`, `AddJobCountGrouped`, `jobsStatisticsHistogram`, `jobsDurationStatisticsHistogram`, `jobsMetricStatisticsHistogram`
- Rewrote `AddJobCount` to use `QueryRow()` instead of `Query()` (single-row result)
**`internal/repository/tags.go`** (5 fixes):
- Added `defer xrows.Close()` + `xrows.Err()` for `Queryx` in `CountTags`
- Added `defer rows.Close()` for second query in `CountTags`
- Added `defer rows.Close()` + `rows.Err()` to `GetTags`, `GetTagsDirect`, `getArchiveTags`
**`internal/repository/node.go`** (4 fixes + 1 rewrite):
- Added `defer rows.Close()` + `rows.Err()`, removed manual `rows.Close()` in `QueryNodes`, `QueryNodesWithMeta`, `CountStatesTimed`
- Rewrote `CountStates` to use `COUNT(*) ... GROUP BY` in SQL instead of fetching every row and counting in Go

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1 @@
There are memory leaks in production on queries that trigger a full scan. Review all queries for missing close and other bugs that could cause memory leaks in the sqlite-3 driver. Example for a query that triggers a massive allocation is a specific jobstate list with sorting on start_time on all jobs.

View File

@@ -0,0 +1,16 @@
All 31 `.Query()` calls now have proper `rows.Close()`. Build passes, tests pass, vet clean.
### Summary of changes
**`internal/repository/stats.go`** (7 fixes + 1 rewrite):
- Added `defer rows.Close()` + `rows.Err()` to `JobsStatsGrouped`, `JobCountGrouped`, `AddJobCountGrouped`, `jobsStatisticsHistogram`, `jobsDurationStatisticsHistogram`, `jobsMetricStatisticsHistogram`
- Rewrote `AddJobCount` to use `QueryRow()` instead of `Query()` (single-row result)
**`internal/repository/tags.go`** (5 fixes):
- Added `defer xrows.Close()` + `xrows.Err()` for `Queryx` in `CountTags`
- Added `defer rows.Close()` for second query in `CountTags`
- Added `defer rows.Close()` + `rows.Err()` to `GetTags`, `GetTagsDirect`, `getArchiveTags`
**`internal/repository/node.go`** (4 fixes + 1 rewrite):
- Added `defer rows.Close()` + `rows.Err()`, removed manual `rows.Close()` in `QueryNodes`, `QueryNodesWithMeta`, `CountStatesTimed`
- Rewrote `CountStates` to use `COUNT(*) ... GROUP BY` in SQL instead of fetching every row and counting in Go

View File

@@ -0,0 +1,40 @@
# Session Context
Session ID: 42401d2e-7d1c-4c0e-abe6-356cb2d48747
Commit Message: There also have to be bugs in jobQuery.go . Especially the following que
## Prompts
### Prompt 1
There also have to be bugs in jobQuery.go . Especially the following query triggers the memory leak: SELECT * FROM job WHERE job.job_state IN ("completed", "running", "failed") ORDER BY job.start_time DESC LIMIT 1 OFFSET 10; Dig deeper to find the cause. Also investigate why no existing index is used for this query.
## Summary
All passes. Here's a summary of findings and fixes:
---
## Findings
### Bug 1: Missing `defer rows.Close()` in `user.go:GetLdapUsernames()` (line 100)
**Fixed.** Every call leaked a database connection and its buffers.
### Bug 2: SQLite never runs `ANALYZE` — wrong index selection
**Fixed.** Added `ANALYZE` to `setupSqlite()` in `dbConnection.go`.
Without statistics in `sqlite_stat1`, SQLite's query planner **guessed** and chose `jobs_jobstate_energy` (a terrible index for this query) instead of scanning `jobs_starttime` in reverse. The result:
| | Before ANALYZE | After ANALYZE |
|---|---|---|
| Index used | `jobs_jobstate_energy` | `jobs_starttime` |
| Sort | **TEMP B-TREE** (materializes ALL rows) | None (index order) |
| Memory | Proportional to total matching rows | Constant |
| I/O | Full scan of all matching rows | Stops at OFFSET+LIMIT |
### Bug 3: `IN` clause + `ORDER BY` is fundamentally incompatible with composite indexes
Even with the "correct" index `(job_state, start_time)`, SQLite **cannot** merge-sort across 3 separate index range scans for `IN ('completed','running','failed')`. It always falls back to a temp B-tree sort. The only efficient plan is to use the standalone `jobs_starttime` index — which SQLite does automatically **after ANALYZE** because it realizes the 3 states cover virtually all rows, making the WHERE clause nearly a no-op.
### Observation: 79 indexes on the `job` table
This is excessive and actively harmful — it confuses the query planner (especially without ANALYZE) and slows writes. The `jobs_jobstate_starttime` index from migration 08 is also missing from the actual DB (only the 3-column `jobs_jobstate_starttime_duration` exists). This is worth investigating separately but is a schema/migration concern, not a code bug.

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1 @@
What optimal index selection do you suggest? The job table contains almost 20 million jobs in production causing timouts on any filter query that does not use and index.

View File

@@ -0,0 +1,9 @@
The plan reduces 79 indexes to 20 by:
- **Eliminating entire categories**: all `shared`-prefixed (16), all user/project sort variants (11), all standalone single-filter sorts (11)
- **Consolidating**: cluster+jobstate sort variants replaced by `(cluster, sort_col)` which works for any state combo with LIMIT
- **Keeping what matters**: the indexes SQLite actually picks with ANALYZE — `jobs_starttime`, `jobs_cluster_starttime_duration`, `jobs_user_starttime_duration`, `jobs_cluster_jobstate_duration_starttime`, and a few secondary sort indexes
Key trade-off: ~20% of queries that sort by rare columns (num_hwthreads, num_acc, energy) with a state filter will now do a cheap per-row state check instead of using a 3-column composite. With LIMIT this is negligible.

View File

@@ -133,6 +133,92 @@ ln -s <your-existing-job-archive> ./var/job-archive
./cc-backend -help
```
## Database Configuration
cc-backend uses SQLite as its database. For large installations, SQLite memory
usage can be tuned via the optional `db-config` section in config.json under
`main`:
```json
{
"main": {
"db": "./var/job.db",
"db-config": {
"cache-size-mb": 2048,
"soft-heap-limit-mb": 16384,
"max-open-connections": 4,
"max-idle-connections": 4,
"max-idle-time-minutes": 10
}
}
}
```
All fields are optional. If `db-config` is omitted entirely, built-in defaults
are used.
### Options
| Option | Default | Description |
| ----------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `cache-size-mb` | 2048 | SQLite page cache size per connection in MB. Maps to `PRAGMA cache_size`. Total cache memory is up to `cache-size-mb × max-open-connections`. |
| `soft-heap-limit-mb` | 16384 | Process-wide SQLite soft heap limit in MB. SQLite will try to release cache pages to stay under this limit. Queries won't fail if exceeded, but cache eviction becomes more aggressive. |
| `max-open-connections` | 4 | Maximum number of open database connections. |
| `max-idle-connections` | 4 | Maximum number of idle database connections kept in the pool. |
| `max-idle-time-minutes` | 10 | Maximum time in minutes a connection can sit idle before being closed. |
### Sizing Guidelines
SQLite's `cache_size` is a **per-connection** setting — each connection
maintains its own independent page cache. With multiple connections, the total
memory available for caching is the sum across all connections.
In practice, different connections tend to cache **different pages** (e.g., one
handles a job listing query while another runs a statistics aggregation), so
their caches naturally spread across the database. The formula
`DB_size / max-open-connections` gives enough per-connection cache that the
combined caches can cover the entire database.
However, this is a best-case estimate. Connections running similar queries will
cache the same pages redundantly. In the worst case (all connections caching
identical pages), only `cache-size-mb` worth of unique data is cached rather
than `cache-size-mb × max-open-connections`. For workloads with diverse
concurrent queries, cache overlap is typically low.
**Rules of thumb:**
- **cache-size-mb**: Set to `DB_size_in_MB / max-open-connections` to allow the
entire database to be cached in memory. For example, an 80GB database with 8
connections needs at least 10240 MB (10GB) per connection. If your workload
has many similar concurrent queries, consider setting it higher to account for
cache overlap between connections.
- **soft-heap-limit-mb**: Should be >= `cache-size-mb × max-open-connections` to
avoid cache thrashing. This is the total SQLite memory budget for the process.
- On small installations the defaults work well. On servers with large databases
(tens of GB) and plenty of RAM, increasing these values significantly improves
query performance by reducing disk I/O.
### Example: Large Server (512GB RAM, 80GB database)
```json
{
"main": {
"db-config": {
"cache-size-mb": 16384,
"soft-heap-limit-mb": 131072,
"max-open-connections": 8,
"max-idle-time-minutes": 30
}
}
}
```
This allows the entire 80GB database to be cached (8 × 16GB = 128GB page cache)
with a 128GB soft heap limit, using about 25% of available RAM.
The effective configuration is logged at startup for verification.
## Project file structure
- [`.github/`](https://github.com/ClusterCockpit/cc-backend/tree/master/.github)

View File

@@ -108,6 +108,26 @@ func initConfiguration() error {
}
func initDatabase() error {
if config.Keys.DbConfig != nil {
cfg := repository.DefaultConfig()
dc := config.Keys.DbConfig
if dc.CacheSizeMB > 0 {
cfg.DbCacheSizeMB = dc.CacheSizeMB
}
if dc.SoftHeapLimitMB > 0 {
cfg.DbSoftHeapLimitMB = dc.SoftHeapLimitMB
}
if dc.MaxOpenConnections > 0 {
cfg.MaxOpenConnections = dc.MaxOpenConnections
}
if dc.MaxIdleConnections > 0 {
cfg.MaxIdleConnections = dc.MaxIdleConnections
}
if dc.ConnectionMaxIdleTimeMins > 0 {
cfg.ConnectionMaxIdleTime = time.Duration(dc.ConnectionMaxIdleTimeMins) * time.Minute
}
repository.SetConfig(cfg)
}
repository.Connect(config.Keys.DB)
return nil
}

View File

@@ -77,6 +77,17 @@ type ProgramConfig struct {
// Node state retention configuration
NodeStateRetention *NodeStateRetention `json:"nodestate-retention"`
// Database tuning configuration
DbConfig *DbConfig `json:"db-config"`
}
type DbConfig struct {
CacheSizeMB int `json:"cache-size-mb"`
SoftHeapLimitMB int `json:"soft-heap-limit-mb"`
MaxOpenConnections int `json:"max-open-connections"`
MaxIdleConnections int `json:"max-idle-connections"`
ConnectionMaxIdleTimeMins int `json:"max-idle-time-minutes"`
}
type NodeStateRetention struct {

View File

@@ -177,6 +177,32 @@ var configSchema = `
}
},
"required": ["policy"]
},
"db-config": {
"description": "SQLite database tuning configuration.",
"type": "object",
"properties": {
"cache-size-mb": {
"description": "SQLite page cache size per connection in MB (default: 2048).",
"type": "integer"
},
"soft-heap-limit-mb": {
"description": "Process-wide SQLite soft heap limit in MB (default: 16384).",
"type": "integer"
},
"max-open-connections": {
"description": "Maximum number of open database connections (default: 4).",
"type": "integer"
},
"max-idle-connections": {
"description": "Maximum number of idle database connections (default: 4).",
"type": "integer"
},
"max-idle-time-minutes": {
"description": "Maximum idle time for a connection in minutes (default: 10).",
"type": "integer"
}
}
}
}
}`

View File

@@ -27,13 +27,25 @@ type RepositoryConfig struct {
ConnectionMaxLifetime time.Duration
// ConnectionMaxIdleTime is the maximum amount of time a connection may be idle.
// Default: 1 hour
// Default: 10 minutes
ConnectionMaxIdleTime time.Duration
// MinRunningJobDuration is the minimum duration in seconds for a job to be
// considered in "running jobs" queries. This filters out very short jobs.
// Default: 600 seconds (10 minutes)
MinRunningJobDuration int
// DbCacheSizeMB is the SQLite page cache size per connection in MB.
// Uses negative PRAGMA cache_size notation (KiB). With MaxOpenConnections=4
// and DbCacheSizeMB=2048, total page cache is up to 8GB.
// Default: 2048 (2GB)
DbCacheSizeMB int
// DbSoftHeapLimitMB is the process-wide SQLite soft heap limit in MB.
// SQLite will try to release cache pages to stay under this limit.
// It's a soft limit — queries won't fail, but cache eviction becomes more aggressive.
// Default: 16384 (16GB)
DbSoftHeapLimitMB int
}
// DefaultConfig returns the default repository configuration.
@@ -44,8 +56,10 @@ func DefaultConfig() *RepositoryConfig {
MaxOpenConnections: 4,
MaxIdleConnections: 4,
ConnectionMaxLifetime: time.Hour,
ConnectionMaxIdleTime: time.Hour,
ConnectionMaxIdleTime: 10 * time.Minute,
MinRunningJobDuration: 600, // 10 minutes
DbCacheSizeMB: 2048, // 2GB per connection
DbSoftHeapLimitMB: 16384, // 16GB process-wide
}
}

View File

@@ -36,9 +36,10 @@ type DatabaseOptions struct {
ConnectionMaxIdleTime time.Duration
}
func setupSqlite(db *sql.DB) error {
func setupSqlite(db *sql.DB, cfg *RepositoryConfig) error {
pragmas := []string{
"temp_store = memory",
fmt.Sprintf("soft_heap_limit = %d", int64(cfg.DbSoftHeapLimitMB)*1024*1024),
}
for _, pragma := range pragmas {
@@ -48,6 +49,14 @@ func setupSqlite(db *sql.DB) error {
}
}
// Update query planner statistics so SQLite picks optimal indexes.
// Without this, SQLite guesses row distributions and often chooses wrong
// indexes for queries with IN clauses + ORDER BY, causing full table sorts
// in temp B-trees instead of using covering indexes.
if _, err := db.Exec("ANALYZE"); err != nil {
cclog.Warnf("Failed to run ANALYZE: %v", err)
}
return nil
}
@@ -71,7 +80,8 @@ func Connect(db string) {
connectionURLParams.Add("_journal_mode", "WAL")
connectionURLParams.Add("_busy_timeout", "5000")
connectionURLParams.Add("_synchronous", "NORMAL")
connectionURLParams.Add("_cache_size", "1000000000")
cacheSizeKiB := repoConfig.DbCacheSizeMB * 1024 // Convert MB to KiB
connectionURLParams.Add("_cache_size", fmt.Sprintf("-%d", cacheSizeKiB))
connectionURLParams.Add("_foreign_keys", "true")
opts.URL = fmt.Sprintf("file:%s?%s", opts.URL, connectionURLParams.Encode())
@@ -86,11 +96,14 @@ func Connect(db string) {
cclog.Abortf("DB Connection: Could not connect to SQLite database with sqlx.Open().\nError: %s\n", err.Error())
}
err = setupSqlite(dbHandle.DB)
err = setupSqlite(dbHandle.DB, repoConfig)
if err != nil {
cclog.Abortf("Failed sqlite db setup.\nError: %s\n", err.Error())
}
cclog.Infof("SQLite config: cache_size=%dMB/conn, soft_heap_limit=%dMB, max_conns=%d",
repoConfig.DbCacheSizeMB, repoConfig.DbSoftHeapLimitMB, repoConfig.MaxOpenConnections)
dbHandle.SetMaxOpenConns(opts.MaxOpenConnections)
dbHandle.SetMaxIdleConns(opts.MaxIdleConnections)
dbHandle.SetConnMaxLifetime(opts.ConnectionMaxLifetime)

View File

@@ -171,7 +171,7 @@ func (r *JobRepository) FindByID(ctx context.Context, jobID int64) (*schema.Job,
return nil, qerr
}
return scanJob(q.RunWith(r.stmtCache).QueryRow())
return scanJob(q.RunWith(r.stmtCache).QueryRowContext(ctx))
}
// FindByIDWithUser executes a SQL query to find a specific batch job.
@@ -217,7 +217,7 @@ func (r *JobRepository) FindByJobID(ctx context.Context, jobID int64, startTime
return nil, qerr
}
return scanJob(q.RunWith(r.stmtCache).QueryRow())
return scanJob(q.RunWith(r.stmtCache).QueryRowContext(ctx))
}
// IsJobOwner checks if the specified user owns the batch job identified by jobID,

View File

@@ -63,11 +63,12 @@ func (r *JobRepository) QueryJobs(
}
} else {
// Order by footprint JSON field values
query = query.Where("JSON_VALID(footprint)")
switch order.Order {
case model.SortDirectionEnumAsc:
query = query.OrderBy(fmt.Sprintf("json_extract(footprint, '$.%s') ASC", field))
query = query.OrderBy(fmt.Sprintf("JSON_EXTRACT(footprint, \"$.%s\") ASC", field))
case model.SortDirectionEnumDesc:
query = query.OrderBy(fmt.Sprintf("json_extract(footprint, '$.%s') DESC", field))
query = query.OrderBy(fmt.Sprintf("JSON_EXTRACT(footprint, \"$.%s\") DESC", field))
default:
return nil, errors.New("invalid sorting order for footprint")
}
@@ -83,7 +84,7 @@ func (r *JobRepository) QueryJobs(
query = BuildWhereClause(f, query)
}
rows, err := query.RunWith(r.stmtCache).Query()
rows, err := query.RunWith(r.stmtCache).QueryContext(ctx)
if err != nil {
queryString, queryVars, _ := query.ToSql()
return nil, fmt.Errorf("query failed [%s] %v: %w", queryString, queryVars, err)
@@ -125,7 +126,7 @@ func (r *JobRepository) CountJobs(
}
var count int
if err := query.RunWith(r.DB).Scan(&count); err != nil {
if err := query.RunWith(r.DB).QueryRowContext(ctx).Scan(&count); err != nil {
return 0, fmt.Errorf("failed to count jobs: %w", err)
}
@@ -334,13 +335,14 @@ func buildTimeCondition(field string, cond *config.TimeRange, query sq.SelectBui
}
// buildFloatJSONCondition creates a filter on a numeric field within the footprint JSON column, using BETWEEN only if required.
func buildFloatJSONCondition(field string, cond *model.FloatRange, query sq.SelectBuilder) sq.SelectBuilder {
func buildFloatJSONCondition(jsonField string, cond *model.FloatRange, query sq.SelectBuilder) sq.SelectBuilder {
query = query.Where("JSON_VALID(footprint)")
if cond.From != 1.0 && cond.To != 0.0 {
return query.Where("json_extract(footprint, '$."+field+"') BETWEEN ? AND ?", cond.From, cond.To)
return query.Where("JSON_EXTRACT(footprint, \"$."+jsonField+"\") BETWEEN ? AND ?", cond.From, cond.To)
} else if cond.From != 1.0 && cond.To == 0.0 {
return query.Where("json_extract(footprint, '$."+field+"') >= ?", cond.From)
return query.Where("JSON_EXTRACT(footprint, \"$."+jsonField+"\") >= ?", cond.From)
} else if cond.From == 1.0 && cond.To != 0.0 {
return query.Where("json_extract(footprint, '$."+field+"') <= ?", cond.To)
return query.Where("JSON_EXTRACT(footprint, \"$."+jsonField+"\") <= ?", cond.To)
} else {
return query
}

View File

@@ -21,8 +21,8 @@ import (
// is added to internal/repository/migrations/sqlite3/.
//
// Version history:
// - Version 11: Add expression indexes on footprint JSON fields
// - Version 10: Previous version
// - Version 11: Optimize job table indexes (reduce from ~78 to 20)
// - Version 10: Node table
//
// Migration files are embedded at build time from the migrations directory.
const Version uint = 11

View File

@@ -1,15 +0,0 @@
-- Drop standalone expression indexes
DROP INDEX IF EXISTS jobs_fp_flops_any_avg;
DROP INDEX IF EXISTS jobs_fp_mem_bw_avg;
DROP INDEX IF EXISTS jobs_fp_mem_used_max;
DROP INDEX IF EXISTS jobs_fp_cpu_load_avg;
DROP INDEX IF EXISTS jobs_fp_net_bw_avg;
DROP INDEX IF EXISTS jobs_fp_net_data_vol_total;
DROP INDEX IF EXISTS jobs_fp_file_bw_avg;
DROP INDEX IF EXISTS jobs_fp_file_data_vol_total;
-- Drop composite indexes
DROP INDEX IF EXISTS jobs_cluster_fp_cpu_load_avg;
DROP INDEX IF EXISTS jobs_cluster_fp_flops_any_avg;
DROP INDEX IF EXISTS jobs_cluster_fp_mem_bw_avg;
DROP INDEX IF EXISTS jobs_cluster_fp_mem_used_max;

View File

@@ -1,19 +0,0 @@
-- Expression indexes on footprint JSON fields for WHERE and ORDER BY optimization.
-- SQLite matches expressions textually, so queries must use exactly:
-- json_extract(footprint, '$.field')
-- Standalone expression indexes (for filtering and sorting)
CREATE INDEX IF NOT EXISTS jobs_fp_flops_any_avg ON job (json_extract(footprint, '$.flops_any_avg'));
CREATE INDEX IF NOT EXISTS jobs_fp_mem_bw_avg ON job (json_extract(footprint, '$.mem_bw_avg'));
CREATE INDEX IF NOT EXISTS jobs_fp_mem_used_max ON job (json_extract(footprint, '$.mem_used_max'));
CREATE INDEX IF NOT EXISTS jobs_fp_cpu_load_avg ON job (json_extract(footprint, '$.cpu_load_avg'));
CREATE INDEX IF NOT EXISTS jobs_fp_net_bw_avg ON job (json_extract(footprint, '$.net_bw_avg'));
CREATE INDEX IF NOT EXISTS jobs_fp_net_data_vol_total ON job (json_extract(footprint, '$.net_data_vol_total'));
CREATE INDEX IF NOT EXISTS jobs_fp_file_bw_avg ON job (json_extract(footprint, '$.file_bw_avg'));
CREATE INDEX IF NOT EXISTS jobs_fp_file_data_vol_total ON job (json_extract(footprint, '$.file_data_vol_total'));
-- Composite indexes with cluster (for common filter+sort combinations)
CREATE INDEX IF NOT EXISTS jobs_cluster_fp_cpu_load_avg ON job (cluster, json_extract(footprint, '$.cpu_load_avg'));
CREATE INDEX IF NOT EXISTS jobs_cluster_fp_flops_any_avg ON job (cluster, json_extract(footprint, '$.flops_any_avg'));
CREATE INDEX IF NOT EXISTS jobs_cluster_fp_mem_bw_avg ON job (cluster, json_extract(footprint, '$.mem_bw_avg'));
CREATE INDEX IF NOT EXISTS jobs_cluster_fp_mem_used_max ON job (cluster, json_extract(footprint, '$.mem_used_max'));

View File

@@ -0,0 +1,161 @@
-- Migration 11 DOWN: Restore all indexes from migration 09
-- Reverts the index optimization by dropping the 20 optimized indexes
-- and recreating the original full set.
-- ============================================================
-- Drop optimized indexes
-- ============================================================
DROP INDEX IF EXISTS jobs_starttime;
DROP INDEX IF EXISTS jobs_cluster_starttime_duration;
DROP INDEX IF EXISTS jobs_cluster_duration_starttime;
DROP INDEX IF EXISTS jobs_cluster_jobstate_duration_starttime;
DROP INDEX IF EXISTS jobs_cluster_jobstate_starttime_duration;
DROP INDEX IF EXISTS jobs_cluster_user;
DROP INDEX IF EXISTS jobs_cluster_project;
DROP INDEX IF EXISTS jobs_cluster_subcluster;
DROP INDEX IF EXISTS jobs_cluster_numnodes;
DROP INDEX IF EXISTS jobs_user_starttime_duration;
DROP INDEX IF EXISTS jobs_project_starttime_duration;
DROP INDEX IF EXISTS jobs_jobstate_project;
DROP INDEX IF EXISTS jobs_jobstate_user;
DROP INDEX IF EXISTS jobs_jobstate_duration_starttime;
DROP INDEX IF EXISTS jobs_arrayjobid;
DROP INDEX IF EXISTS jobs_cluster_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_numacc;
DROP INDEX IF EXISTS jobs_cluster_energy;
DROP INDEX IF EXISTS jobs_cluster_partition_starttime;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate;
-- ============================================================
-- Recreate all indexes from migration 09
-- ============================================================
-- Cluster Filter
CREATE INDEX IF NOT EXISTS jobs_cluster_user ON job (cluster, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_cluster_project ON job (cluster, project);
CREATE INDEX IF NOT EXISTS jobs_cluster_subcluster ON job (cluster, subcluster);
-- Cluster Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_numnodes ON job (cluster, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_cluster_numhwthreads ON job (cluster, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_cluster_numacc ON job (cluster, num_acc);
CREATE INDEX IF NOT EXISTS jobs_cluster_energy ON job (cluster, energy);
-- Cluster Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_duration_starttime ON job (cluster, duration, start_time);
CREATE INDEX IF NOT EXISTS jobs_cluster_starttime_duration ON job (cluster, start_time, duration);
-- Cluster+Partition Filter
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_user ON job (cluster, cluster_partition, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_project ON job (cluster, cluster_partition, project);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_jobstate ON job (cluster, cluster_partition, job_state);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_shared ON job (cluster, cluster_partition, shared);
-- Cluster+Partition Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_numnodes ON job (cluster, cluster_partition, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_numhwthreads ON job (cluster, cluster_partition, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_numacc ON job (cluster, cluster_partition, num_acc);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_energy ON job (cluster, cluster_partition, energy);
-- Cluster+Partition Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_duration_starttime ON job (cluster, cluster_partition, duration, start_time);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_starttime_duration ON job (cluster, cluster_partition, start_time, duration);
-- Cluster+JobState Filter
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_user ON job (cluster, job_state, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_project ON job (cluster, job_state, project);
-- Cluster+JobState Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_numnodes ON job (cluster, job_state, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_numhwthreads ON job (cluster, job_state, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_numacc ON job (cluster, job_state, num_acc);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_energy ON job (cluster, job_state, energy);
-- Cluster+JobState Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_starttime_duration ON job (cluster, job_state, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_duration_starttime ON job (cluster, job_state, duration, start_time);
-- Cluster+Shared Filter
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_user ON job (cluster, shared, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_project ON job (cluster, shared, project);
-- Cluster+Shared Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_numnodes ON job (cluster, shared, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_numhwthreads ON job (cluster, shared, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_numacc ON job (cluster, shared, num_acc);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_energy ON job (cluster, shared, energy);
-- Cluster+Shared Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_starttime_duration ON job (cluster, shared, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_duration_starttime ON job (cluster, shared, duration, start_time);
-- User Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_user_numnodes ON job (hpc_user, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_user_numhwthreads ON job (hpc_user, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_user_numacc ON job (hpc_user, num_acc);
CREATE INDEX IF NOT EXISTS jobs_user_energy ON job (hpc_user, energy);
-- User Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_user_starttime_duration ON job (hpc_user, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_user_duration_starttime ON job (hpc_user, duration, start_time);
-- Project Filter
CREATE INDEX IF NOT EXISTS jobs_project_user ON job (project, hpc_user);
-- Project Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_project_numnodes ON job (project, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_project_numhwthreads ON job (project, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_project_numacc ON job (project, num_acc);
CREATE INDEX IF NOT EXISTS jobs_project_energy ON job (project, energy);
-- Project Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_project_starttime_duration ON job (project, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_project_duration_starttime ON job (project, duration, start_time);
-- JobState Filter
CREATE INDEX IF NOT EXISTS jobs_jobstate_user ON job (job_state, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_jobstate_project ON job (job_state, project);
-- JobState Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_jobstate_numnodes ON job (job_state, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_jobstate_numhwthreads ON job (job_state, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_jobstate_numacc ON job (job_state, num_acc);
CREATE INDEX IF NOT EXISTS jobs_jobstate_energy ON job (job_state, energy);
-- JobState Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_jobstate_starttime_duration ON job (job_state, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_jobstate_duration_starttime ON job (job_state, duration, start_time);
-- Shared Filter
CREATE INDEX IF NOT EXISTS jobs_shared_user ON job (shared, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_shared_project ON job (shared, project);
-- Shared Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_shared_numnodes ON job (shared, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_shared_numhwthreads ON job (shared, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_shared_numacc ON job (shared, num_acc);
CREATE INDEX IF NOT EXISTS jobs_shared_energy ON job (shared, energy);
-- Shared Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_shared_starttime_duration ON job (shared, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_shared_duration_starttime ON job (shared, duration, start_time);
-- ArrayJob Filter
CREATE INDEX IF NOT EXISTS jobs_arrayjobid_starttime ON job (array_job_id, start_time);
CREATE INDEX IF NOT EXISTS jobs_cluster_arrayjobid_starttime ON job (cluster, array_job_id, start_time);
-- Single filters with default starttime sorting
CREATE INDEX IF NOT EXISTS jobs_duration_starttime ON job (duration, start_time);
CREATE INDEX IF NOT EXISTS jobs_numnodes_starttime ON job (num_nodes, start_time);
CREATE INDEX IF NOT EXISTS jobs_numhwthreads_starttime ON job (num_hwthreads, start_time);
CREATE INDEX IF NOT EXISTS jobs_numacc_starttime ON job (num_acc, start_time);
CREATE INDEX IF NOT EXISTS jobs_energy_starttime ON job (energy, start_time);
-- Single filters with duration sorting
CREATE INDEX IF NOT EXISTS jobs_starttime_duration ON job (start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_numnodes_duration ON job (num_nodes, duration);
CREATE INDEX IF NOT EXISTS jobs_numhwthreads_duration ON job (num_hwthreads, duration);
CREATE INDEX IF NOT EXISTS jobs_numacc_duration ON job (num_acc, duration);
CREATE INDEX IF NOT EXISTS jobs_energy_duration ON job (energy, duration);
-- Backup Indices For High Variety Columns
CREATE INDEX IF NOT EXISTS jobs_starttime ON job (start_time);
CREATE INDEX IF NOT EXISTS jobs_duration ON job (duration);
-- Optimize DB index usage
PRAGMA optimize;

View File

@@ -0,0 +1,221 @@
-- Migration 11: Optimize job table indexes
-- Reduces from ~78 indexes to 20 for better write performance,
-- reduced disk usage, and more reliable query planner decisions.
-- Requires ANALYZE to be run after migration (done automatically on startup).
-- ============================================================
-- Drop ALL existing job indexes (from migrations 08/09)
-- sqlite_autoindex_job_1 (UNIQUE constraint) is kept automatically
-- ============================================================
-- Cluster Filter
DROP INDEX IF EXISTS jobs_cluster_user;
DROP INDEX IF EXISTS jobs_cluster_project;
DROP INDEX IF EXISTS jobs_cluster_subcluster;
-- Cluster Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_numnodes;
DROP INDEX IF EXISTS jobs_cluster_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_numacc;
DROP INDEX IF EXISTS jobs_cluster_energy;
-- Cluster Time Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_duration_starttime;
DROP INDEX IF EXISTS jobs_cluster_starttime_duration;
-- Cluster+Partition Filter
DROP INDEX IF EXISTS jobs_cluster_partition_user;
DROP INDEX IF EXISTS jobs_cluster_partition_project;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate;
DROP INDEX IF EXISTS jobs_cluster_partition_shared;
-- Cluster+Partition Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_partition_numnodes;
DROP INDEX IF EXISTS jobs_cluster_partition_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_partition_numacc;
DROP INDEX IF EXISTS jobs_cluster_partition_energy;
-- Cluster+Partition Time Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_partition_duration_starttime;
DROP INDEX IF EXISTS jobs_cluster_partition_starttime_duration;
-- Cluster+JobState Filter
DROP INDEX IF EXISTS jobs_cluster_jobstate_user;
DROP INDEX IF EXISTS jobs_cluster_jobstate_project;
-- Cluster+JobState Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_jobstate_numnodes;
DROP INDEX IF EXISTS jobs_cluster_jobstate_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_jobstate_numacc;
DROP INDEX IF EXISTS jobs_cluster_jobstate_energy;
-- Cluster+JobState Time Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_jobstate_starttime_duration;
DROP INDEX IF EXISTS jobs_cluster_jobstate_duration_starttime;
-- Cluster+Shared Filter
DROP INDEX IF EXISTS jobs_cluster_shared_user;
DROP INDEX IF EXISTS jobs_cluster_shared_project;
-- Cluster+Shared Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_shared_numnodes;
DROP INDEX IF EXISTS jobs_cluster_shared_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_shared_numacc;
DROP INDEX IF EXISTS jobs_cluster_shared_energy;
-- Cluster+Shared Time Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_shared_starttime_duration;
DROP INDEX IF EXISTS jobs_cluster_shared_duration_starttime;
-- User Filter Sorting
DROP INDEX IF EXISTS jobs_user_numnodes;
DROP INDEX IF EXISTS jobs_user_numhwthreads;
DROP INDEX IF EXISTS jobs_user_numacc;
DROP INDEX IF EXISTS jobs_user_energy;
-- User Time Filter Sorting
DROP INDEX IF EXISTS jobs_user_starttime_duration;
DROP INDEX IF EXISTS jobs_user_duration_starttime;
-- Project Filter
DROP INDEX IF EXISTS jobs_project_user;
-- Project Filter Sorting
DROP INDEX IF EXISTS jobs_project_numnodes;
DROP INDEX IF EXISTS jobs_project_numhwthreads;
DROP INDEX IF EXISTS jobs_project_numacc;
DROP INDEX IF EXISTS jobs_project_energy;
-- Project Time Filter Sorting
DROP INDEX IF EXISTS jobs_project_starttime_duration;
DROP INDEX IF EXISTS jobs_project_duration_starttime;
-- JobState Filter
DROP INDEX IF EXISTS jobs_jobstate_user;
DROP INDEX IF EXISTS jobs_jobstate_project;
-- JobState Filter Sorting
DROP INDEX IF EXISTS jobs_jobstate_numnodes;
DROP INDEX IF EXISTS jobs_jobstate_numhwthreads;
DROP INDEX IF EXISTS jobs_jobstate_numacc;
DROP INDEX IF EXISTS jobs_jobstate_energy;
-- JobState Time Filter Sorting
DROP INDEX IF EXISTS jobs_jobstate_starttime_duration;
DROP INDEX IF EXISTS jobs_jobstate_duration_starttime;
-- Shared Filter
DROP INDEX IF EXISTS jobs_shared_user;
DROP INDEX IF EXISTS jobs_shared_project;
-- Shared Filter Sorting
DROP INDEX IF EXISTS jobs_shared_numnodes;
DROP INDEX IF EXISTS jobs_shared_numhwthreads;
DROP INDEX IF EXISTS jobs_shared_numacc;
DROP INDEX IF EXISTS jobs_shared_energy;
-- Shared Time Filter Sorting
DROP INDEX IF EXISTS jobs_shared_starttime_duration;
DROP INDEX IF EXISTS jobs_shared_duration_starttime;
-- ArrayJob Filter
DROP INDEX IF EXISTS jobs_arrayjobid_starttime;
DROP INDEX IF EXISTS jobs_cluster_arrayjobid_starttime;
-- Single filters with default starttime sorting
DROP INDEX IF EXISTS jobs_duration_starttime;
DROP INDEX IF EXISTS jobs_numnodes_starttime;
DROP INDEX IF EXISTS jobs_numhwthreads_starttime;
DROP INDEX IF EXISTS jobs_numacc_starttime;
DROP INDEX IF EXISTS jobs_energy_starttime;
-- Single filters with duration sorting
DROP INDEX IF EXISTS jobs_starttime_duration;
DROP INDEX IF EXISTS jobs_numnodes_duration;
DROP INDEX IF EXISTS jobs_numhwthreads_duration;
DROP INDEX IF EXISTS jobs_numacc_duration;
DROP INDEX IF EXISTS jobs_energy_duration;
-- Backup Indices
DROP INDEX IF EXISTS jobs_starttime;
DROP INDEX IF EXISTS jobs_duration;
-- Legacy indexes from migration 08 (may exist on older DBs)
DROP INDEX IF EXISTS jobs_cluster;
DROP INDEX IF EXISTS jobs_cluster_starttime;
DROP INDEX IF EXISTS jobs_cluster_duration;
DROP INDEX IF EXISTS jobs_cluster_partition;
DROP INDEX IF EXISTS jobs_cluster_partition_starttime;
DROP INDEX IF EXISTS jobs_cluster_partition_duration;
DROP INDEX IF EXISTS jobs_cluster_partition_numnodes;
DROP INDEX IF EXISTS jobs_cluster_partition_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_partition_numacc;
DROP INDEX IF EXISTS jobs_cluster_partition_energy;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_user;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_project;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_starttime;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_duration;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_numnodes;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_numacc;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_energy;
DROP INDEX IF EXISTS jobs_cluster_jobstate;
DROP INDEX IF EXISTS jobs_cluster_jobstate_starttime;
DROP INDEX IF EXISTS jobs_cluster_jobstate_duration;
DROP INDEX IF EXISTS jobs_user;
DROP INDEX IF EXISTS jobs_user_starttime;
DROP INDEX IF EXISTS jobs_user_duration;
DROP INDEX IF EXISTS jobs_project;
DROP INDEX IF EXISTS jobs_project_starttime;
DROP INDEX IF EXISTS jobs_project_duration;
DROP INDEX IF EXISTS jobs_jobstate;
DROP INDEX IF EXISTS jobs_jobstate_cluster;
DROP INDEX IF EXISTS jobs_jobstate_starttime;
DROP INDEX IF EXISTS jobs_jobstate_duration;
DROP INDEX IF EXISTS jobs_numnodes;
DROP INDEX IF EXISTS jobs_numhwthreads;
DROP INDEX IF EXISTS jobs_numacc;
DROP INDEX IF EXISTS jobs_energy;
-- ============================================================
-- Create optimized set of 20 indexes
-- ============================================================
-- GROUP 1: Global sort (1 index)
-- Default sort for unfiltered/multi-state IN queries, time range, delete-before
CREATE INDEX jobs_starttime ON job (start_time);
-- GROUP 2: Cluster-prefixed (8 indexes)
-- Cluster + default sort, concurrent jobs, time range within cluster
CREATE INDEX jobs_cluster_starttime_duration ON job (cluster, start_time, duration);
-- Cluster + sort by duration
CREATE INDEX jobs_cluster_duration_starttime ON job (cluster, duration, start_time);
-- COVERING for cluster+state aggregation; running jobs (cluster, state, duration>?)
CREATE INDEX jobs_cluster_jobstate_duration_starttime ON job (cluster, job_state, duration, start_time);
-- Cluster+state+sort start_time (single state equality)
CREATE INDEX jobs_cluster_jobstate_starttime_duration ON job (cluster, job_state, start_time, duration);
-- COVERING for GROUP BY user with cluster filter
CREATE INDEX jobs_cluster_user ON job (cluster, hpc_user);
-- GROUP BY project with cluster filter
CREATE INDEX jobs_cluster_project ON job (cluster, project);
-- GROUP BY subcluster with cluster filter
CREATE INDEX jobs_cluster_subcluster ON job (cluster, subcluster);
-- Cluster + sort by num_nodes (state filtered per-row, fast with LIMIT)
CREATE INDEX jobs_cluster_numnodes ON job (cluster, num_nodes);
-- GROUP 3: User-prefixed (1 index)
-- Security filter (user role) + default sort
CREATE INDEX jobs_user_starttime_duration ON job (hpc_user, start_time, duration);
-- GROUP 4: Project-prefixed (1 index)
-- Security filter (manager role) + default sort
CREATE INDEX jobs_project_starttime_duration ON job (project, start_time, duration);
-- GROUP 5: JobState-prefixed (3 indexes)
-- State + project filter (for manager security within state query)
CREATE INDEX jobs_jobstate_project ON job (job_state, project);
-- State + user filter/aggregation
CREATE INDEX jobs_jobstate_user ON job (job_state, hpc_user);
-- COVERING for non-running jobs scan, state + sort duration
CREATE INDEX jobs_jobstate_duration_starttime ON job (job_state, duration, start_time);
-- GROUP 6: Rare filters (1 index)
-- Array job lookup
CREATE INDEX jobs_arrayjobid ON job (array_job_id);
-- GROUP 7: Secondary sort columns (5 indexes)
CREATE INDEX jobs_cluster_numhwthreads ON job (cluster, num_hwthreads);
CREATE INDEX jobs_cluster_numacc ON job (cluster, num_acc);
CREATE INDEX jobs_cluster_energy ON job (cluster, energy);
-- Cluster+partition + sort start_time
CREATE INDEX jobs_cluster_partition_starttime ON job (cluster, cluster_partition, start_time);
-- Cluster+partition+state filter
CREATE INDEX jobs_cluster_partition_jobstate ON job (cluster, cluster_partition, job_state);
-- Optimize DB index usage
PRAGMA optimize;

View File

@@ -366,19 +366,23 @@ func (r *NodeRepository) QueryNodes(
cclog.Errorf("Error while running query '%s' %v: %v", queryString, queryVars, err)
return nil, err
}
defer rows.Close()
nodes := make([]*schema.Node, 0)
for rows.Next() {
node := schema.Node{}
if err := rows.Scan(&node.Hostname, &node.Cluster, &node.SubCluster,
&node.NodeState, &node.HealthState); err != nil {
rows.Close()
cclog.Warn("Error while scanning rows (QueryNodes)")
return nil, err
}
nodes = append(nodes, &node)
}
if err := rows.Err(); err != nil {
return nil, err
}
return nodes, nil
}
@@ -415,6 +419,7 @@ func (r *NodeRepository) QueryNodesWithMeta(
cclog.Errorf("Error while running query '%s' %v: %v", queryString, queryVars, err)
return nil, err
}
defer rows.Close()
nodes := make([]*schema.Node, 0)
for rows.Next() {
@@ -424,7 +429,6 @@ func (r *NodeRepository) QueryNodesWithMeta(
if err := rows.Scan(&node.Hostname, &node.Cluster, &node.SubCluster,
&node.NodeState, &node.HealthState, &RawMetaData, &RawMetricHealth); err != nil {
rows.Close()
cclog.Warn("Error while scanning rows (QueryNodes)")
return nil, err
}
@@ -454,6 +458,10 @@ func (r *NodeRepository) QueryNodesWithMeta(
nodes = append(nodes, &node)
}
if err := rows.Err(); err != nil {
return nil, err
}
return nodes, nil
}
@@ -545,10 +553,11 @@ func (r *NodeRepository) MapNodes(cluster string) (map[string]string, error) {
func (r *NodeRepository) CountStates(ctx context.Context, filters []*model.NodeFilter, column string) ([]*model.NodeStates, error) {
query, qerr := AccessCheck(ctx,
sq.Select(column).
sq.Select(column, "COUNT(*) as count").
From("node").
Join("node_state ON node_state.node_id = node.id").
Where(latestStateCondition()))
Where(latestStateCondition()).
GroupBy(column))
if qerr != nil {
return nil, qerr
}
@@ -561,23 +570,21 @@ func (r *NodeRepository) CountStates(ctx context.Context, filters []*model.NodeF
cclog.Errorf("Error while running query '%s' %v: %v", queryString, queryVars, err)
return nil, err
}
defer rows.Close()
stateMap := map[string]int{}
nodes := make([]*model.NodeStates, 0)
for rows.Next() {
var state string
if err := rows.Scan(&state); err != nil {
rows.Close()
var count int
if err := rows.Scan(&state, &count); err != nil {
cclog.Warn("Error while scanning rows (CountStates)")
return nil, err
}
stateMap[state] += 1
nodes = append(nodes, &model.NodeStates{State: state, Count: count})
}
nodes := make([]*model.NodeStates, 0)
for state, counts := range stateMap {
node := model.NodeStates{State: state, Count: counts}
nodes = append(nodes, &node)
if err := rows.Err(); err != nil {
return nil, err
}
return nodes, nil
@@ -623,6 +630,7 @@ func (r *NodeRepository) CountStatesTimed(ctx context.Context, filters []*model.
cclog.Errorf("Error while running query '%s' %v: %v", queryString, queryVars, err)
return nil, err
}
defer rows.Close()
rawData := make(map[string][][]int)
for rows.Next() {
@@ -630,7 +638,6 @@ func (r *NodeRepository) CountStatesTimed(ctx context.Context, filters []*model.
var timestamp, count int
if err := rows.Scan(&state, &timestamp, &count); err != nil {
rows.Close()
cclog.Warnf("Error while scanning rows (CountStatesTimed) at time '%d'", timestamp)
return nil, err
}
@@ -643,6 +650,10 @@ func (r *NodeRepository) CountStatesTimed(ctx context.Context, filters []*model.
rawData[state][1] = append(rawData[state][1], count)
}
if err := rows.Err(); err != nil {
return nil, err
}
timedStates := make([]*model.NodeStatesTimed, 0)
for state, data := range rawData {
entry := model.NodeStatesTimed{State: state, Times: data[0], Counts: data[1]}

View File

@@ -230,11 +230,12 @@ func (r *JobRepository) JobsStatsGrouped(
query = query.Offset((uint64(page.Page) - 1) * limit).Limit(limit)
}
rows, err := query.RunWith(r.DB).Query()
rows, err := query.RunWith(r.DB).QueryContext(ctx)
if err != nil {
cclog.Warn("Error while querying DB for job statistics")
return nil, err
}
defer rows.Close()
stats := make([]*model.JobsStatistics, 0, 100)
@@ -320,6 +321,10 @@ func (r *JobRepository) JobsStatsGrouped(
}
}
if err := rows.Err(); err != nil {
return nil, err
}
cclog.Debugf("Timer JobsStatsGrouped %s", time.Since(start))
return stats, nil
}
@@ -350,7 +355,7 @@ func (r *JobRepository) JobsStats(
return nil, err
}
row := query.RunWith(r.DB).QueryRow()
row := query.RunWith(r.DB).QueryRowContext(ctx)
stats := make([]*model.JobsStatistics, 0, 1)
var jobs, users, walltime, nodes, nodeHours, cores, coreHours, accs, accHours sql.NullInt64
@@ -435,11 +440,12 @@ func (r *JobRepository) JobCountGrouped(
if err != nil {
return nil, err
}
rows, err := query.RunWith(r.DB).Query()
rows, err := query.RunWith(r.DB).QueryContext(ctx)
if err != nil {
cclog.Warn("Error while querying DB for job statistics")
return nil, err
}
defer rows.Close()
stats := make([]*model.JobsStatistics, 0, 100)
@@ -459,6 +465,10 @@ func (r *JobRepository) JobCountGrouped(
}
}
if err := rows.Err(); err != nil {
return nil, err
}
cclog.Debugf("Timer JobCountGrouped %s", time.Since(start))
return stats, nil
}
@@ -491,11 +501,12 @@ func (r *JobRepository) AddJobCountGrouped(
if err != nil {
return nil, err
}
rows, err := query.RunWith(r.DB).Query()
rows, err := query.RunWith(r.DB).QueryContext(ctx)
if err != nil {
cclog.Warn("Error while querying DB for job statistics")
return nil, err
}
defer rows.Close()
counts := make(map[string]int)
@@ -511,6 +522,10 @@ func (r *JobRepository) AddJobCountGrouped(
}
}
if err := rows.Err(); err != nil {
return nil, err
}
switch kind {
case "running":
for _, s := range stats {
@@ -550,23 +565,13 @@ func (r *JobRepository) AddJobCount(
if err != nil {
return nil, err
}
rows, err := query.RunWith(r.DB).Query()
if err != nil {
cclog.Warn("Error while querying DB for job statistics")
return nil, err
}
var count int
for rows.Next() {
var cnt sql.NullInt64
if err := rows.Scan(&cnt); err != nil {
cclog.Warn("Error while scanning rows")
if err := query.RunWith(r.DB).QueryRowContext(ctx).Scan(&cnt); err != nil {
cclog.Warn("Error while querying DB for job count")
return nil, err
}
count = int(cnt.Int64)
}
count := int(cnt.Int64)
switch kind {
case "running":
@@ -750,11 +755,12 @@ func (r *JobRepository) jobsStatisticsHistogram(
query = BuildWhereClause(f, query)
}
rows, err := query.GroupBy("value").RunWith(r.DB).Query()
rows, err := query.GroupBy("value").RunWith(r.DB).QueryContext(ctx)
if err != nil {
cclog.Error("Error while running query")
return nil, err
}
defer rows.Close()
points := make([]*model.HistoPoint, 0)
// is it possible to introduce zero values here? requires info about bincount
@@ -767,6 +773,11 @@ func (r *JobRepository) jobsStatisticsHistogram(
points = append(points, &point)
}
if err := rows.Err(); err != nil {
return nil, err
}
cclog.Debugf("Timer jobsStatisticsHistogram %s", time.Since(start))
return points, nil
}
@@ -818,11 +829,12 @@ func (r *JobRepository) jobsDurationStatisticsHistogram(
query = BuildWhereClause(f, query)
}
rows, err := query.GroupBy("value").RunWith(r.DB).Query()
rows, err := query.GroupBy("value").RunWith(r.DB).QueryContext(ctx)
if err != nil {
cclog.Error("Error while running query")
return nil, err
}
defer rows.Close()
// Match query results to pre-initialized bins.
// point.Value from query is the bin number; multiply by binSizeSeconds to match bin.Value.
@@ -841,6 +853,10 @@ func (r *JobRepository) jobsDurationStatisticsHistogram(
}
}
if err := rows.Err(); err != nil {
return nil, err
}
cclog.Debugf("Timer jobsStatisticsHistogram %s", time.Since(start))
return points, nil
}
@@ -921,14 +937,16 @@ func (r *JobRepository) jobsMetricStatisticsHistogram(
// Special case: value == peak would create bin N+1, so we test for equality
// and multiply peak by 0.999999999 to force it into bin N.
binQuery := fmt.Sprintf(`CAST(
((case when json_extract(footprint, '$.%s') = %f then %f*0.999999999 else json_extract(footprint, '$.%s') end) / %f)
((case when json_extract(footprint, "$.%s") = %f then %f*0.999999999 else json_extract(footprint, "$.%s") end) / %f)
* %v as INTEGER )`,
(metric + "_" + footprintStat), peak, peak, (metric + "_" + footprintStat), peak, *bins)
mainQuery := sq.Select(
fmt.Sprintf(`%s + 1 as bin`, binQuery),
`count(*) as count`,
).From("job").Where(fmt.Sprintf(`json_extract(footprint, '$.%s') is not null and json_extract(footprint, '$.%s') <= %f`, (metric + "_" + footprintStat), (metric + "_" + footprintStat), peak))
).From("job").Where(
"JSON_VALID(footprint)",
).Where(fmt.Sprintf(`json_extract(footprint, "$.%s") is not null and json_extract(footprint, "$.%s") <= %f`, (metric + "_" + footprintStat), (metric + "_" + footprintStat), peak))
mainQuery, qerr := SecurityCheck(ctx, mainQuery)
if qerr != nil {
@@ -941,11 +959,12 @@ func (r *JobRepository) jobsMetricStatisticsHistogram(
mainQuery = mainQuery.GroupBy("bin").OrderBy("bin")
rows, err := mainQuery.RunWith(r.DB).Query()
rows, err := mainQuery.RunWith(r.DB).QueryContext(ctx)
if err != nil {
cclog.Errorf("Error while running mainQuery: %s", err)
return nil, err
}
defer rows.Close()
// Pre-initialize bins with calculated min/max ranges.
// Example: peak=1000, bins=10 -> bin 1=[0,100), bin 2=[100,200), ..., bin 10=[900,1000]
@@ -974,6 +993,10 @@ func (r *JobRepository) jobsMetricStatisticsHistogram(
}
}
if err := rows.Err(); err != nil {
return nil, err
}
result := model.MetricHistoPoints{Metric: metric, Unit: unit, Stat: &footprintStat, Data: points}
cclog.Debugf("Timer jobsStatisticsHistogram %s", time.Since(start))

View File

@@ -283,6 +283,7 @@ func (r *JobRepository) CountTags(user *schema.User) (tags []schema.Tag, counts
if err != nil {
return nil, nil, err
}
defer xrows.Close()
for xrows.Next() {
var t schema.Tag
@@ -300,6 +301,10 @@ func (r *JobRepository) CountTags(user *schema.User) (tags []schema.Tag, counts
}
}
if err := xrows.Err(); err != nil {
return nil, nil, err
}
// Query and Count Jobs with attached Tags
q := sq.Select("t.tag_type, t.tag_name, t.id, count(jt.tag_id)").
From("tag t").
@@ -334,6 +339,7 @@ func (r *JobRepository) CountTags(user *schema.User) (tags []schema.Tag, counts
if err != nil {
return nil, nil, err
}
defer rows.Close()
counts = make(map[string]int)
for rows.Next() {
@@ -511,6 +517,7 @@ func (r *JobRepository) GetTags(user *schema.User, job *int64) ([]*schema.Tag, e
cclog.Errorf("Error get tags with %s: %v", s, err)
return nil, err
}
defer rows.Close()
tags := make([]*schema.Tag, 0)
for rows.Next() {
@@ -529,6 +536,10 @@ func (r *JobRepository) GetTags(user *schema.User, job *int64) ([]*schema.Tag, e
}
}
if err := rows.Err(); err != nil {
return nil, err
}
return tags, nil
}
@@ -544,6 +555,7 @@ func (r *JobRepository) GetTagsDirect(job *int64) ([]*schema.Tag, error) {
cclog.Errorf("Error get tags with %s: %v", s, err)
return nil, err
}
defer rows.Close()
tags := make([]*schema.Tag, 0)
for rows.Next() {
@@ -555,6 +567,10 @@ func (r *JobRepository) GetTagsDirect(job *int64) ([]*schema.Tag, error) {
tags = append(tags, tag)
}
if err := rows.Err(); err != nil {
return nil, err
}
return tags, nil
}
@@ -582,6 +598,7 @@ func (r *JobRepository) getArchiveTags(job *int64) ([]*schema.Tag, error) {
cclog.Errorf("Error get tags with %s: %v", s, err)
return nil, err
}
defer rows.Close()
tags := make([]*schema.Tag, 0)
for rows.Next() {
@@ -593,6 +610,10 @@ func (r *JobRepository) getArchiveTags(job *int64) ([]*schema.Tag, error) {
tags = append(tags, tag)
}
if err := rows.Err(); err != nil {
return nil, err
}
return tags, nil
}

View File

@@ -102,6 +102,7 @@ func (r *UserRepository) GetLdapUsernames() ([]string, error) {
cclog.Warn("Error while querying usernames")
return nil, err
}
defer rows.Close()
for rows.Next() {
var username string

View File

@@ -10,6 +10,7 @@ import (
"encoding/json"
"fmt"
"maps"
"math"
"os"
"path/filepath"
"strings"
@@ -111,6 +112,29 @@ type JobClassTagger struct {
getMetricConfig func(cluster, subCluster string) map[string]*schema.Metric
}
// roundEnv returns a copy of env with all float64 values rounded to 2 decimal places.
// Nested map[string]any and map[string]float64 values are recursed into.
func roundEnv(env map[string]any) map[string]any {
rounded := make(map[string]any, len(env))
for k, v := range env {
switch val := v.(type) {
case float64:
rounded[k] = math.Round(val*100) / 100
case map[string]any:
rounded[k] = roundEnv(val)
case map[string]float64:
rm := make(map[string]float64, len(val))
for mk, mv := range val {
rm[mk] = math.Round(mv*100) / 100
}
rounded[k] = rm
default:
rounded[k] = v
}
}
return rounded
}
func (t *JobClassTagger) prepareRule(b []byte, fns string) {
var rule RuleFormat
if err := json.NewDecoder(bytes.NewReader(b)).Decode(&rule); err != nil {
@@ -408,7 +432,7 @@ func (t *JobClassTagger) Match(job *schema.Job) {
// process hint template
var msg bytes.Buffer
if err := ri.hint.Execute(&msg, env); err != nil {
if err := ri.hint.Execute(&msg, roundEnv(env)); err != nil {
cclog.Errorf("Template error: %s", err.Error())
continue
}

View File

@@ -20,14 +20,14 @@
"wonka": "^6.3.5"
},
"devDependencies": {
"@rollup/plugin-commonjs": "^29.0.1",
"@rollup/plugin-commonjs": "^29.0.2",
"@rollup/plugin-node-resolve": "^16.0.3",
"@rollup/plugin-terser": "^1.0.0",
"@timohausmann/quadtree-js": "^1.2.6",
"rollup": "^4.59.0",
"rollup-plugin-css-only": "^4.5.5",
"rollup-plugin-svelte": "^7.2.3",
"svelte": "^5.53.7"
"svelte": "^5.53.9"
}
},
"node_modules/@0no-co/graphql.web": {
@@ -126,9 +126,9 @@
}
},
"node_modules/@rollup/plugin-commonjs": {
"version": "29.0.1",
"resolved": "https://registry.npmjs.org/@rollup/plugin-commonjs/-/plugin-commonjs-29.0.1.tgz",
"integrity": "sha512-VUEHINN2rQEWPfNUR3mzidRObM1XZKXMQsaG6qBlDqd6M1qyw91nDZvcSozgyjt3x/QKrgKBc5MdxfdxAy6tdg==",
"version": "29.0.2",
"resolved": "https://registry.npmjs.org/@rollup/plugin-commonjs/-/plugin-commonjs-29.0.2.tgz",
"integrity": "sha512-S/ggWH1LU7jTyi9DxZOKyxpVd4hF/OZ0JrEbeLjXk/DFXwRny0tjD2c992zOUYQobLrVkRVMDdmHP16HKP7GRg==",
"dev": true,
"license": "MIT",
"dependencies": {
@@ -1193,9 +1193,9 @@
}
},
"node_modules/svelte": {
"version": "5.53.7",
"resolved": "https://registry.npmjs.org/svelte/-/svelte-5.53.7.tgz",
"integrity": "sha512-uxck1KI7JWtlfP3H6HOWi/94soAl23jsGJkBzN2BAWcQng0+lTrRNhxActFqORgnO9BHVd1hKJhG+ljRuIUWfQ==",
"version": "5.53.9",
"resolved": "https://registry.npmjs.org/svelte/-/svelte-5.53.9.tgz",
"integrity": "sha512-MwDfWsN8qZzeP0jlQsWF4k/4B3csb3IbzCRggF+L/QqY7T8bbKvnChEo1cPZztF51HJQhilDbevWYl2LvXbquA==",
"license": "MIT",
"dependencies": {
"@jridgewell/remapping": "^2.3.4",

View File

@@ -7,14 +7,14 @@
"dev": "rollup -c -w"
},
"devDependencies": {
"@rollup/plugin-commonjs": "^29.0.1",
"@rollup/plugin-commonjs": "^29.0.2",
"@rollup/plugin-node-resolve": "^16.0.3",
"@rollup/plugin-terser": "^1.0.0",
"@timohausmann/quadtree-js": "^1.2.6",
"rollup": "^4.59.0",
"rollup-plugin-css-only": "^4.5.5",
"rollup-plugin-svelte": "^7.2.3",
"svelte": "^5.53.7"
"svelte": "^5.53.9"
},
"dependencies": {
"@rollup/plugin-replace": "^6.0.3",

View File

@@ -167,7 +167,7 @@
<InputGroup>
<InputGroupText><Icon name="hdd" /></InputGroupText>
<InputGroupText>Selected Node</InputGroupText>
<Input style="background-color: white;" type="text" value="{hostname} [{cluster} {$nodeMetricsData?.data ? `(${$nodeMetricsData.data.nodeMetrics[0].subCluster})` : ''}]" disabled/>
<Input style="background-color: white;" type="text" value="{hostname} [{cluster} {$nodeMetricsData?.data?.nodeMetrics[0] ? `(${$nodeMetricsData.data.nodeMetrics[0].subCluster})` : ''}]" disabled/>
</InputGroup>
</Col>
<!-- State Col -->
@@ -259,7 +259,7 @@
</CardHeader>
<CardBody>
<p>No dataset(s) returned for <b>{item.name}</b></p>
<p class="mb-1">Metric has been disabled for subcluster <b>{$nodeMetricsData.data.nodeMetrics[0].subCluster}</b>.</p>
<p class="mb-1">Metric has been disabled for subcluster <b>{$nodeMetricsData?.data?.nodeMetrics[0]?.subCluster}</b>.</p>
</CardBody>
</Card>
{:else if item?.metric}
@@ -267,7 +267,7 @@
metric={item.name}
timestep={item.metric.timestep}
cluster={clusterInfos.find((c) => c.name == cluster)}
subCluster={$nodeMetricsData.data.nodeMetrics[0].subCluster}
subCluster={$nodeMetricsData?.data?.nodeMetrics[0]?.subCluster}
series={item.metric.series}
enableFlip
forNode
@@ -286,17 +286,17 @@
{/snippet}
<PlotGrid
items={$nodeMetricsData.data.nodeMetrics[0].metrics
items={$nodeMetricsData?.data?.nodeMetrics[0]?.metrics
.map((m) => ({
...m,
availability: checkMetricAvailability(
globalMetrics,
m.name,
cluster,
$nodeMetricsData.data.nodeMetrics[0].subCluster,
$nodeMetricsData?.data?.nodeMetrics[0]?.subCluster,
),
}))
.sort((a, b) => a.name.localeCompare(b.name))}
.sort((a, b) => a.name.localeCompare(b.name)) || []}
itemsPerRow={ccconfig.plotConfiguration_plotsPerRow}
{gridContent}
/>