15 Commits

Author SHA1 Message Date
96fc44a649 fix: Optimize project stat query 2026-03-13 06:06:38 +01:00
8e86e8720d Make stats query selective. Add stats index. Add paging to user list.
Entire-Checkpoint: d42431eee30d
2026-03-12 20:16:55 +01:00
4555fb8a86 Merge branch 'hotfix' of github.com:ClusterCockpit/cc-backend into hotfix 2026-03-12 20:15:54 +01:00
0e27624d73 Add flag to optimize db. Remove ANALYZE on startup.
Entire-Checkpoint: d49917ff4b10
2026-03-12 20:12:49 +01:00
Christoph Kluge
8563ed5e08 fix: remove indices from migration 9
- optimization migration 11 drops these indices, so rather not create them in the first place
2026-03-12 14:45:45 +01:00
Christoph Kluge
2d07bdf6b5 fix: add missing nullsafe in publicDash 2026-03-12 14:13:45 +01:00
7f069f1ec1 Prepare bugfix release 1.5.1
Entire-Checkpoint: 15cc90a0347a
2026-03-12 06:40:36 +01:00
2506a92cdf Remove entire log 2026-03-12 06:14:11 +01:00
Christoph Kluge
972b14033a add db migration 11, optimizing index count 2026-03-11 16:07:29 +01:00
af78f06ced fix: Reduce complexity for groupBy stats queries
Entire-Checkpoint: fc899a70a751
2026-03-11 15:14:59 +01:00
6e0fe62566 Add new db config options to README 2026-03-11 14:30:41 +01:00
e70310dcbc fix: Segvault when taggers are enabled but rule directories missing 2026-03-11 11:15:08 +01:00
00d2f97c4c fix: Large heap allocations in sqlite driver. Sanitize sqlite config and make it configurablex. Allow to cancel queries. 2026-03-11 11:14:37 +01:00
c8d8f7084a Merge branch 'hotfix' of github.com:ClusterCockpit/cc-backend into hotfix 2026-03-11 07:50:55 +01:00
dc7407d0f0 fix: prevent segvault if enable-job-taggers option is tru but tagger config directories are missing
Entire-Checkpoint: 9ec86e3669e1
2026-03-11 07:50:53 +01:00
20 changed files with 338 additions and 654 deletions

File diff suppressed because one or more lines are too long

View File

@@ -1 +1 @@
What optimal index selection do you suggest? The job table contains almost 20 million jobs in production causing timouts on any filter query that does not use and index.
There also have to be bugs in jobQuery.go . Especially the following query triggers the memory leak: SELECT * FROM job WHERE job.job_state IN ("completed", "running", "failed") ORDER BY job.start_time DESC LIMIT 1 OFFSET 10; Dig deeper to find the cause. Also investigate why no existing index is used for this query.

View File

@@ -1,9 +1,27 @@
All passes. Here's a summary of findings and fixes:
---
The plan reduces 79 indexes to 20 by:
## Findings
- **Eliminating entire categories**: all `shared`-prefixed (16), all user/project sort variants (11), all standalone single-filter sorts (11)
- **Consolidating**: cluster+jobstate sort variants replaced by `(cluster, sort_col)` which works for any state combo with LIMIT
- **Keeping what matters**: the indexes SQLite actually picks with ANALYZE — `jobs_starttime`, `jobs_cluster_starttime_duration`, `jobs_user_starttime_duration`, `jobs_cluster_jobstate_duration_starttime`, and a few secondary sort indexes
### Bug 1: Missing `defer rows.Close()` in `user.go:GetLdapUsernames()` (line 100)
**Fixed.** Every call leaked a database connection and its buffers.
Key trade-off: ~20% of queries that sort by rare columns (num_hwthreads, num_acc, energy) with a state filter will now do a cheap per-row state check instead of using a 3-column composite. With LIMIT this is negligible.
### Bug 2: SQLite never runs `ANALYZE` — wrong index selection
**Fixed.** Added `ANALYZE` to `setupSqlite()` in `dbConnection.go`.
Without statistics in `sqlite_stat1`, SQLite's query planner **guessed** and chose `jobs_jobstate_energy` (a terrible index for this query) instead of scanning `jobs_starttime` in reverse. The result:
| | Before ANALYZE | After ANALYZE |
|---|---|---|
| Index used | `jobs_jobstate_energy` | `jobs_starttime` |
| Sort | **TEMP B-TREE** (materializes ALL rows) | None (index order) |
| Memory | Proportional to total matching rows | Constant |
| I/O | Full scan of all matching rows | Stops at OFFSET+LIMIT |
### Bug 3: `IN` clause + `ORDER BY` is fundamentally incompatible with composite indexes
Even with the "correct" index `(job_state, start_time)`, SQLite **cannot** merge-sort across 3 separate index range scans for `IN ('completed','running','failed')`. It always falls back to a temp B-tree sort. The only efficient plan is to use the standalone `jobs_starttime` index — which SQLite does automatically **after ANALYZE** because it realizes the 3 states cover virtually all rows, making the WHERE clause nearly a no-op.
### Observation: 79 indexes on the `job` table
This is excessive and actively harmful — it confuses the query planner (especially without ANALYZE) and slows writes. The `jobs_jobstate_starttime` index from migration 08 is also missing from the actual DB (only the 3-column `jobs_jobstate_starttime_duration` exists). This is worth investigating separately but is a schema/migration concern, not a code bug.

View File

@@ -1,6 +1,6 @@
TARGET = ./cc-backend
FRONTEND = ./web/frontend
VERSION = 1.5.0
VERSION = 1.5.1
GIT_HASH := $(shell git rev-parse --short HEAD || echo 'development')
CURRENT_TIME = $(shell date +"%Y-%m-%d:T%H:%M:%S")
LD_FLAGS = '-s -X main.date=${CURRENT_TIME} -X main.version=${VERSION} -X main.commit=${GIT_HASH}'

View File

@@ -1,11 +1,43 @@
# `cc-backend` version 1.5.0
# `cc-backend` version 1.5.1
Supports job archive version 3 and database version 10.
Supports job archive version 3 and database version 11.
This is a feature release of `cc-backend`, the API backend and frontend
This is a bugfix release of `cc-backend`, the API backend and frontend
implementation of ClusterCockpit.
For release specific notes visit the [ClusterCockpit Documentation](https://clusterockpit.org/docs/release/).
## Changes in 1.5.1
### Database
- **New migration (version 11)**: Optimized database index count for better performance
- **ANALYZE on startup**: Database statistics are now refreshed on startup for improved query planning
- **SQLite configuration hardening**: Sanitized SQLite configuration with new configurable options; fixes large heap allocations in the SQLite driver
- **Query cancellation**: Long-running database queries can now be cancelled
- **Resource leak fix**: Added missing `defer Close()` calls for all query result sets
### Bug fixes
- **Segfault when taggers misconfigured**: Fixed crash when `enable-job-taggers` is set but tagger rule directories are missing
- **GroupBy stats query complexity**: Reduced complexity for `groupBy` statistics queries
- **Ranged filter conditions**: Fixed GT and LT conditions in ranged filters
- **Energy filter preset**: Reduced energy filter preset to a more practical default
- **JSON validity check**: Fixed wrong field being checked for JSON validity
- **Tagger float rounding**: Fixed rounding of floats in tagger messages
- **Node view null safety**: Added null-safe checks in node view to prevent runtime errors
### Frontend
- **Bumped patch versions**: Updated frontend dependencies to latest patch versions
### Documentation
- **New DB config options**: Added new database configuration options to README
---
*The sections below document all features and changes introduced in the 1.5.0 major release, which 1.5.1 is based on.*
## Breaking changes
### Configuration changes
@@ -34,7 +66,7 @@ For release specific notes visit the [ClusterCockpit Documentation](https://clus
### Dependency changes
- **cc-lib v2.5.1**: Switched to cc-lib version 2 with updated APIs (currently at v2.5.1)
- **cc-lib v2.8.0**: Switched to cc-lib version 2 with updated APIs
- **cclib NATS client**: Now using the cclib NATS client implementation
- Removed obsolete `util.Float` usage from cclib

View File

@@ -11,7 +11,7 @@ import "flag"
var (
flagReinitDB, flagInit, flagServer, flagSyncLDAP, flagGops, flagMigrateDB, flagRevertDB,
flagForceDB, flagDev, flagVersion, flagLogDateTime, flagApplyTags bool
flagForceDB, flagDev, flagVersion, flagLogDateTime, flagApplyTags, flagOptimizeDB bool
flagNewUser, flagDelUser, flagGenJWT, flagConfigFile, flagImportJob, flagLogLevel string
)
@@ -27,6 +27,7 @@ func cliInit() {
flag.BoolVar(&flagRevertDB, "revert-db", false, "Migrate database to previous version and exit")
flag.BoolVar(&flagApplyTags, "apply-tags", false, "Run taggers on all completed jobs and exit")
flag.BoolVar(&flagForceDB, "force-db", false, "Force database version, clear dirty flag and exit")
flag.BoolVar(&flagOptimizeDB, "optimize-db", false, "Optimize database: run VACUUM to reclaim space, then ANALYZE to update query planner statistics")
flag.BoolVar(&flagLogDateTime, "logdate", false, "Set this flag to add date and time to log messages")
flag.StringVar(&flagConfigFile, "config", "./config.json", "Specify alternative path to `config.json`")
flag.StringVar(&flagNewUser, "add-user", "", "Add a new user. Argument format: <username>:[admin,support,manager,api,user]:<password>")

View File

@@ -509,6 +509,20 @@ func run() error {
return err
}
// Optimize database if requested
if flagOptimizeDB {
db := repository.GetConnection()
cclog.Print("Running VACUUM to reclaim space and defragment database...")
if _, err := db.DB.Exec("VACUUM"); err != nil {
return fmt.Errorf("VACUUM failed: %w", err)
}
cclog.Print("Running ANALYZE to update query planner statistics...")
if _, err := db.DB.Exec("ANALYZE"); err != nil {
return fmt.Errorf("ANALYZE failed: %w", err)
}
cclog.Exitf("OptimizeDB Success: Database '%s' optimized (VACUUM + ANALYZE).\n", config.Keys.DB)
}
// Handle user commands (add, delete, sync, JWT)
if err := handleUserCommands(); err != nil {
return err

View File

@@ -645,6 +645,7 @@ func (r *queryResolver) Jobs(ctx context.Context, filter []*model.JobFilter, pag
// JobsStatistics is the resolver for the jobsStatistics field.
func (r *queryResolver) JobsStatistics(ctx context.Context, filter []*model.JobFilter, metrics []string, page *model.PageRequest, sortBy *model.SortByAggregate, groupBy *model.Aggregate, numDurationBins *string, numMetricBins *int) ([]*model.JobsStatistics, error) {
startOverall := time.Now()
var err error
var stats []*model.JobsStatistics
@@ -652,31 +653,50 @@ func (r *queryResolver) JobsStatistics(ctx context.Context, filter []*model.JobF
defaultDurationBins := "1h"
defaultMetricBins := 10
if requireField(ctx, "totalJobs") || requireField(ctx, "totalUsers") || requireField(ctx, "totalWalltime") || requireField(ctx, "totalNodes") || requireField(ctx, "totalCores") ||
requireField(ctx, "totalAccs") || requireField(ctx, "totalNodeHours") || requireField(ctx, "totalCoreHours") || requireField(ctx, "totalAccHours") {
// Build requested fields map for selective column computation
statsFields := []string{"totalJobs", "totalUsers", "totalWalltime", "totalNodes", "totalCores",
"totalAccs", "totalNodeHours", "totalCoreHours", "totalAccHours", "runningJobs", "shortJobs"}
reqFields := make(map[string]bool, len(statsFields))
fetchedMainStats := false
for _, f := range statsFields {
if requireField(ctx, f) {
reqFields[f] = true
if f != "runningJobs" && f != "shortJobs" {
fetchedMainStats = true
}
}
}
if fetchedMainStats {
if groupBy == nil {
stats, err = r.Repo.JobsStats(ctx, filter)
stats, err = r.Repo.JobsStats(ctx, filter, reqFields)
} else {
stats, err = r.Repo.JobsStatsGrouped(ctx, filter, page, sortBy, groupBy)
startGrouped := time.Now()
stats, err = r.Repo.JobsStatsGrouped(ctx, filter, page, sortBy, groupBy, reqFields)
cclog.Infof("Timer JobsStatsGrouped call: %s", time.Since(startGrouped))
}
} else {
stats = make([]*model.JobsStatistics, 0, 1)
stats = append(stats, &model.JobsStatistics{})
}
if groupBy != nil {
if requireField(ctx, "shortJobs") {
stats, err = r.Repo.AddJobCountGrouped(ctx, filter, groupBy, stats, "short")
}
if requireField(ctx, "runningJobs") {
stats, err = r.Repo.AddJobCountGrouped(ctx, filter, groupBy, stats, "running")
}
} else {
if requireField(ctx, "shortJobs") {
stats, err = r.Repo.AddJobCount(ctx, filter, stats, "short")
}
if requireField(ctx, "runningJobs") {
stats, err = r.Repo.AddJobCount(ctx, filter, stats, "running")
// runningJobs and shortJobs are already inlined in JobsStats/JobsStatsGrouped.
// Only run separate count queries if main stats were not fetched.
if !fetchedMainStats {
if groupBy != nil {
if requireField(ctx, "shortJobs") {
stats, err = r.Repo.AddJobCountGrouped(ctx, filter, groupBy, stats, "short")
}
if requireField(ctx, "runningJobs") {
stats, err = r.Repo.AddJobCountGrouped(ctx, filter, groupBy, stats, "running")
}
} else {
if requireField(ctx, "shortJobs") {
stats, err = r.Repo.AddJobCount(ctx, filter, stats, "short")
}
if requireField(ctx, "runningJobs") {
stats, err = r.Repo.AddJobCount(ctx, filter, stats, "running")
}
}
}
@@ -716,6 +736,7 @@ func (r *queryResolver) JobsStatistics(ctx context.Context, filter []*model.JobF
}
}
cclog.Infof("Timer JobsStatistics overall: %s", time.Since(startOverall))
return stats, nil
}

View File

@@ -57,7 +57,7 @@ func DefaultConfig() *RepositoryConfig {
MaxIdleConnections: 4,
ConnectionMaxLifetime: time.Hour,
ConnectionMaxIdleTime: 10 * time.Minute,
MinRunningJobDuration: 600, // 10 minutes
MinRunningJobDuration: 600, // 10 minutes
DbCacheSizeMB: 2048, // 2GB per connection
DbSoftHeapLimitMB: 16384, // 16GB process-wide
}

View File

@@ -49,14 +49,6 @@ func setupSqlite(db *sql.DB, cfg *RepositoryConfig) error {
}
}
// Update query planner statistics so SQLite picks optimal indexes.
// Without this, SQLite guesses row distributions and often chooses wrong
// indexes for queries with IN clauses + ORDER BY, causing full table sorts
// in temp B-trees instead of using covering indexes.
if _, err := db.Exec("ANALYZE"); err != nil {
cclog.Warnf("Failed to run ANALYZE: %v", err)
}
return nil
}

View File

@@ -21,11 +21,12 @@ import (
// is added to internal/repository/migrations/sqlite3/.
//
// Version history:
// - Version 11: Optimize job table indexes (reduce from ~78 to 20)
// - Version 12: Add covering index for stats queries (cluster, start_time, hpc_user, ...)
// - Version 11: Optimize job table indexes (reduce from ~78 to 48)
// - Version 10: Node table
//
// Migration files are embedded at build time from the migrations directory.
const Version uint = 11
const Version uint = 12
//go:embed migrations/*
var migrationFiles embed.FS

View File

@@ -139,12 +139,6 @@ CREATE INDEX IF NOT EXISTS jobs_cluster_partition_project ON job (cluster, clust
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_jobstate ON job (cluster, cluster_partition, job_state);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_shared ON job (cluster, cluster_partition, shared);
-- Cluster+Partition Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_numnodes ON job (cluster, cluster_partition, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_numhwthreads ON job (cluster, cluster_partition, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_numacc ON job (cluster, cluster_partition, num_acc);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_energy ON job (cluster, cluster_partition, energy);
-- Cluster+Partition Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_duration_starttime ON job (cluster, cluster_partition, duration, start_time);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_starttime_duration ON job (cluster, cluster_partition, start_time, duration);
@@ -152,11 +146,6 @@ CREATE INDEX IF NOT EXISTS jobs_cluster_partition_starttime_duration ON job (clu
-- Cluster+JobState Filter
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_user ON job (cluster, job_state, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_project ON job (cluster, job_state, project);
-- Cluster+JobState Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_numnodes ON job (cluster, job_state, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_numhwthreads ON job (cluster, job_state, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_numacc ON job (cluster, job_state, num_acc);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_energy ON job (cluster, job_state, energy);
-- Cluster+JobState Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_starttime_duration ON job (cluster, job_state, start_time, duration);
@@ -165,34 +154,18 @@ CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_duration_starttime ON job (clus
-- Cluster+Shared Filter
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_user ON job (cluster, shared, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_project ON job (cluster, shared, project);
-- Cluster+Shared Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_numnodes ON job (cluster, shared, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_numhwthreads ON job (cluster, shared, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_numacc ON job (cluster, shared, num_acc);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_energy ON job (cluster, shared, energy);
-- Cluster+Shared Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_starttime_duration ON job (cluster, shared, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_duration_starttime ON job (cluster, shared, duration, start_time);
-- User Filter
-- User Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_user_numnodes ON job (hpc_user, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_user_numhwthreads ON job (hpc_user, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_user_numacc ON job (hpc_user, num_acc);
CREATE INDEX IF NOT EXISTS jobs_user_energy ON job (hpc_user, energy);
-- Cluster+Shared Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_user_starttime_duration ON job (hpc_user, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_user_duration_starttime ON job (hpc_user, duration, start_time);
-- Project Filter
CREATE INDEX IF NOT EXISTS jobs_project_user ON job (project, hpc_user);
-- Project Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_project_numnodes ON job (project, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_project_numhwthreads ON job (project, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_project_numacc ON job (project, num_acc);
CREATE INDEX IF NOT EXISTS jobs_project_energy ON job (project, energy);
-- Cluster+Shared Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_project_starttime_duration ON job (project, start_time, duration);
@@ -201,11 +174,6 @@ CREATE INDEX IF NOT EXISTS jobs_project_duration_starttime ON job (project, dura
-- JobState Filter
CREATE INDEX IF NOT EXISTS jobs_jobstate_user ON job (job_state, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_jobstate_project ON job (job_state, project);
-- JobState Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_jobstate_numnodes ON job (job_state, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_jobstate_numhwthreads ON job (job_state, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_jobstate_numacc ON job (job_state, num_acc);
CREATE INDEX IF NOT EXISTS jobs_jobstate_energy ON job (job_state, energy);
-- Cluster+Shared Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_jobstate_starttime_duration ON job (job_state, start_time, duration);
@@ -214,11 +182,6 @@ CREATE INDEX IF NOT EXISTS jobs_jobstate_duration_starttime ON job (job_state, d
-- Shared Filter
CREATE INDEX IF NOT EXISTS jobs_shared_user ON job (shared, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_shared_project ON job (shared, project);
-- Shared Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_shared_numnodes ON job (shared, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_shared_numhwthreads ON job (shared, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_shared_numacc ON job (shared, num_acc);
CREATE INDEX IF NOT EXISTS jobs_shared_energy ON job (shared, energy);
-- Cluster+Shared Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_shared_starttime_duration ON job (shared, start_time, duration);
@@ -226,7 +189,6 @@ CREATE INDEX IF NOT EXISTS jobs_shared_duration_starttime ON job (shared, durati
-- ArrayJob Filter
CREATE INDEX IF NOT EXISTS jobs_arrayjobid_starttime ON job (array_job_id, start_time);
CREATE INDEX IF NOT EXISTS jobs_cluster_arrayjobid_starttime ON job (cluster, array_job_id, start_time);
-- Single filters with default starttime sorting
CREATE INDEX IF NOT EXISTS jobs_duration_starttime ON job (duration, start_time);
@@ -244,7 +206,6 @@ CREATE INDEX IF NOT EXISTS jobs_energy_duration ON job (energy, duration);
-- Backup Indices For High Variety Columns
CREATE INDEX IF NOT EXISTS jobs_starttime ON job (start_time);
CREATE INDEX IF NOT EXISTS jobs_duration ON job (duration);
-- Notes:
-- Cluster+Partition+Jobstate Filter: Tested -> Full Array Of Combinations non-required

View File

@@ -1,160 +1,55 @@
-- Migration 11 DOWN: Restore all indexes from migration 09
-- Reverts the index optimization by dropping the 20 optimized indexes
-- and recreating the original full set.
-- Migration 11 DOWN: Restore indexes from migration 09
-- ============================================================
-- Drop optimized indexes
-- Recreate all removed indexes from migration 09
-- ============================================================
DROP INDEX IF EXISTS jobs_starttime;
DROP INDEX IF EXISTS jobs_cluster_starttime_duration;
DROP INDEX IF EXISTS jobs_cluster_duration_starttime;
DROP INDEX IF EXISTS jobs_cluster_jobstate_duration_starttime;
DROP INDEX IF EXISTS jobs_cluster_jobstate_starttime_duration;
DROP INDEX IF EXISTS jobs_cluster_user;
DROP INDEX IF EXISTS jobs_cluster_project;
DROP INDEX IF EXISTS jobs_cluster_subcluster;
DROP INDEX IF EXISTS jobs_cluster_numnodes;
DROP INDEX IF EXISTS jobs_user_starttime_duration;
DROP INDEX IF EXISTS jobs_project_starttime_duration;
DROP INDEX IF EXISTS jobs_jobstate_project;
DROP INDEX IF EXISTS jobs_jobstate_user;
DROP INDEX IF EXISTS jobs_jobstate_duration_starttime;
DROP INDEX IF EXISTS jobs_arrayjobid;
DROP INDEX IF EXISTS jobs_cluster_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_numacc;
DROP INDEX IF EXISTS jobs_cluster_energy;
DROP INDEX IF EXISTS jobs_cluster_partition_starttime;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate;
-- ============================================================
-- Recreate all indexes from migration 09
-- ============================================================
-- Cluster Filter
CREATE INDEX IF NOT EXISTS jobs_cluster_user ON job (cluster, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_cluster_project ON job (cluster, project);
CREATE INDEX IF NOT EXISTS jobs_cluster_subcluster ON job (cluster, subcluster);
-- Cluster Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_numnodes ON job (cluster, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_cluster_numhwthreads ON job (cluster, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_cluster_numacc ON job (cluster, num_acc);
CREATE INDEX IF NOT EXISTS jobs_cluster_energy ON job (cluster, energy);
-- Cluster Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_duration_starttime ON job (cluster, duration, start_time);
CREATE INDEX IF NOT EXISTS jobs_cluster_starttime_duration ON job (cluster, start_time, duration);
-- Cluster+Partition Filter
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_user ON job (cluster, cluster_partition, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_project ON job (cluster, cluster_partition, project);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_jobstate ON job (cluster, cluster_partition, job_state);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_shared ON job (cluster, cluster_partition, shared);
-- Cluster+Partition Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_numnodes ON job (cluster, cluster_partition, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_numhwthreads ON job (cluster, cluster_partition, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_numacc ON job (cluster, cluster_partition, num_acc);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_energy ON job (cluster, cluster_partition, energy);
-- Cluster+Partition Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_duration_starttime ON job (cluster, cluster_partition, duration, start_time);
CREATE INDEX IF NOT EXISTS jobs_cluster_partition_starttime_duration ON job (cluster, cluster_partition, start_time, duration);
-- Cluster+JobState Filter
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_user ON job (cluster, job_state, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_project ON job (cluster, job_state, project);
-- Cluster+JobState Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_numnodes ON job (cluster, job_state, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_numhwthreads ON job (cluster, job_state, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_numacc ON job (cluster, job_state, num_acc);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_energy ON job (cluster, job_state, energy);
-- Cluster+JobState Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_starttime_duration ON job (cluster, job_state, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_cluster_jobstate_duration_starttime ON job (cluster, job_state, duration, start_time);
-- Cluster+Shared Filter
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_user ON job (cluster, shared, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_project ON job (cluster, shared, project);
-- Cluster+Shared Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_numnodes ON job (cluster, shared, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_numhwthreads ON job (cluster, shared, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_numacc ON job (cluster, shared, num_acc);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_energy ON job (cluster, shared, energy);
-- Cluster+Shared Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_starttime_duration ON job (cluster, shared, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_cluster_shared_duration_starttime ON job (cluster, shared, duration, start_time);
-- User Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_user_numnodes ON job (hpc_user, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_user_numhwthreads ON job (hpc_user, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_user_numacc ON job (hpc_user, num_acc);
CREATE INDEX IF NOT EXISTS jobs_user_energy ON job (hpc_user, energy);
-- User Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_user_starttime_duration ON job (hpc_user, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_user_duration_starttime ON job (hpc_user, duration, start_time);
-- Project Filter
CREATE INDEX IF NOT EXISTS jobs_project_user ON job (project, hpc_user);
-- Project Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_project_numnodes ON job (project, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_project_numhwthreads ON job (project, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_project_numacc ON job (project, num_acc);
CREATE INDEX IF NOT EXISTS jobs_project_energy ON job (project, energy);
-- Project Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_project_starttime_duration ON job (project, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_project_duration_starttime ON job (project, duration, start_time);
-- JobState Filter
CREATE INDEX IF NOT EXISTS jobs_jobstate_user ON job (job_state, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_jobstate_project ON job (job_state, project);
-- JobState Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_jobstate_numnodes ON job (job_state, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_jobstate_numhwthreads ON job (job_state, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_jobstate_numacc ON job (job_state, num_acc);
CREATE INDEX IF NOT EXISTS jobs_jobstate_energy ON job (job_state, energy);
-- JobState Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_jobstate_starttime_duration ON job (job_state, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_jobstate_duration_starttime ON job (job_state, duration, start_time);
-- Shared Filter
CREATE INDEX IF NOT EXISTS jobs_shared_user ON job (shared, hpc_user);
CREATE INDEX IF NOT EXISTS jobs_shared_project ON job (shared, project);
-- Shared Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_shared_numnodes ON job (shared, num_nodes);
CREATE INDEX IF NOT EXISTS jobs_shared_numhwthreads ON job (shared, num_hwthreads);
CREATE INDEX IF NOT EXISTS jobs_shared_numacc ON job (shared, num_acc);
CREATE INDEX IF NOT EXISTS jobs_shared_energy ON job (shared, energy);
-- Shared Time Filter Sorting
CREATE INDEX IF NOT EXISTS jobs_shared_starttime_duration ON job (shared, start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_shared_duration_starttime ON job (shared, duration, start_time);
-- ArrayJob Filter
CREATE INDEX IF NOT EXISTS jobs_arrayjobid_starttime ON job (array_job_id, start_time);
CREATE INDEX IF NOT EXISTS jobs_cluster_arrayjobid_starttime ON job (cluster, array_job_id, start_time);
-- Single filters with default starttime sorting
CREATE INDEX IF NOT EXISTS jobs_duration_starttime ON job (duration, start_time);
CREATE INDEX IF NOT EXISTS jobs_numnodes_starttime ON job (num_nodes, start_time);
CREATE INDEX IF NOT EXISTS jobs_numhwthreads_starttime ON job (num_hwthreads, start_time);
CREATE INDEX IF NOT EXISTS jobs_numacc_starttime ON job (num_acc, start_time);
CREATE INDEX IF NOT EXISTS jobs_energy_starttime ON job (energy, start_time);
-- Single filters with duration sorting
CREATE INDEX IF NOT EXISTS jobs_starttime_duration ON job (start_time, duration);
CREATE INDEX IF NOT EXISTS jobs_numnodes_duration ON job (num_nodes, duration);
CREATE INDEX IF NOT EXISTS jobs_numhwthreads_duration ON job (num_hwthreads, duration);
CREATE INDEX IF NOT EXISTS jobs_numacc_duration ON job (num_acc, duration);
CREATE INDEX IF NOT EXISTS jobs_energy_duration ON job (energy, duration);
-- Backup Indices For High Variety Columns
CREATE INDEX IF NOT EXISTS jobs_starttime ON job (start_time);
CREATE INDEX IF NOT EXISTS jobs_duration ON job (duration);
-- Optimize DB index usage

View File

@@ -1,221 +1,61 @@
-- Migration 11: Optimize job table indexes
-- Reduces from ~78 indexes to 20 for better write performance,
-- Migration 11: Remove overly specific table indexes formerly used in sorting
-- When one or two indexed columns are used, sorting usually is fast
-- Reduces from ~78 indexes to 48 for better write performance,
-- reduced disk usage, and more reliable query planner decisions.
-- Requires ANALYZE to be run after migration (done automatically on startup).
-- ============================================================
-- Drop ALL existing job indexes (from migrations 08/09)
-- Drop SELECTED existing job indexes (from migrations 08/09)
-- sqlite_autoindex_job_1 (UNIQUE constraint) is kept automatically
-- ============================================================
-- Cluster Filter
DROP INDEX IF EXISTS jobs_cluster_user;
DROP INDEX IF EXISTS jobs_cluster_project;
DROP INDEX IF EXISTS jobs_cluster_subcluster;
-- Cluster Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_numnodes;
DROP INDEX IF EXISTS jobs_cluster_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_numacc;
DROP INDEX IF EXISTS jobs_cluster_energy;
-- Cluster Time Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_duration_starttime;
DROP INDEX IF EXISTS jobs_cluster_starttime_duration;
-- Cluster+Partition Filter
DROP INDEX IF EXISTS jobs_cluster_partition_user;
DROP INDEX IF EXISTS jobs_cluster_partition_project;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate;
DROP INDEX IF EXISTS jobs_cluster_partition_shared;
-- Cluster+Partition Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_partition_numnodes;
DROP INDEX IF EXISTS jobs_cluster_partition_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_partition_numacc;
DROP INDEX IF EXISTS jobs_cluster_partition_energy;
-- Cluster+Partition Time Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_partition_duration_starttime;
DROP INDEX IF EXISTS jobs_cluster_partition_starttime_duration;
-- Cluster+JobState Filter
DROP INDEX IF EXISTS jobs_cluster_jobstate_user;
DROP INDEX IF EXISTS jobs_cluster_jobstate_project;
-- Cluster+JobState Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_jobstate_numnodes;
DROP INDEX IF EXISTS jobs_cluster_jobstate_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_jobstate_numacc;
DROP INDEX IF EXISTS jobs_cluster_jobstate_energy;
-- Cluster+JobState Time Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_jobstate_starttime_duration;
DROP INDEX IF EXISTS jobs_cluster_jobstate_duration_starttime;
-- Cluster+Shared Filter
DROP INDEX IF EXISTS jobs_cluster_shared_user;
DROP INDEX IF EXISTS jobs_cluster_shared_project;
-- Cluster+Shared Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_shared_numnodes;
DROP INDEX IF EXISTS jobs_cluster_shared_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_shared_numacc;
DROP INDEX IF EXISTS jobs_cluster_shared_energy;
-- Cluster+Shared Time Filter Sorting
DROP INDEX IF EXISTS jobs_cluster_shared_starttime_duration;
DROP INDEX IF EXISTS jobs_cluster_shared_duration_starttime;
-- User Filter Sorting
DROP INDEX IF EXISTS jobs_user_numnodes;
DROP INDEX IF EXISTS jobs_user_numhwthreads;
DROP INDEX IF EXISTS jobs_user_numacc;
DROP INDEX IF EXISTS jobs_user_energy;
-- User Time Filter Sorting
DROP INDEX IF EXISTS jobs_user_starttime_duration;
DROP INDEX IF EXISTS jobs_user_duration_starttime;
-- Project Filter
DROP INDEX IF EXISTS jobs_project_user;
-- Project Filter Sorting
DROP INDEX IF EXISTS jobs_project_numnodes;
DROP INDEX IF EXISTS jobs_project_numhwthreads;
DROP INDEX IF EXISTS jobs_project_numacc;
DROP INDEX IF EXISTS jobs_project_energy;
-- Project Time Filter Sorting
DROP INDEX IF EXISTS jobs_project_starttime_duration;
DROP INDEX IF EXISTS jobs_project_duration_starttime;
-- JobState Filter
DROP INDEX IF EXISTS jobs_jobstate_user;
DROP INDEX IF EXISTS jobs_jobstate_project;
-- JobState Filter Sorting
DROP INDEX IF EXISTS jobs_jobstate_numnodes;
DROP INDEX IF EXISTS jobs_jobstate_numhwthreads;
DROP INDEX IF EXISTS jobs_jobstate_numacc;
DROP INDEX IF EXISTS jobs_jobstate_energy;
-- JobState Time Filter Sorting
DROP INDEX IF EXISTS jobs_jobstate_starttime_duration;
DROP INDEX IF EXISTS jobs_jobstate_duration_starttime;
-- Shared Filter
DROP INDEX IF EXISTS jobs_shared_user;
DROP INDEX IF EXISTS jobs_shared_project;
-- Shared Filter Sorting
DROP INDEX IF EXISTS jobs_shared_numnodes;
DROP INDEX IF EXISTS jobs_shared_numhwthreads;
DROP INDEX IF EXISTS jobs_shared_numacc;
DROP INDEX IF EXISTS jobs_shared_energy;
-- Shared Time Filter Sorting
DROP INDEX IF EXISTS jobs_shared_starttime_duration;
DROP INDEX IF EXISTS jobs_shared_duration_starttime;
-- ArrayJob Filter
DROP INDEX IF EXISTS jobs_arrayjobid_starttime;
DROP INDEX IF EXISTS jobs_cluster_arrayjobid_starttime;
-- Single filters with default starttime sorting
DROP INDEX IF EXISTS jobs_duration_starttime;
DROP INDEX IF EXISTS jobs_numnodes_starttime;
DROP INDEX IF EXISTS jobs_numhwthreads_starttime;
DROP INDEX IF EXISTS jobs_numacc_starttime;
DROP INDEX IF EXISTS jobs_energy_starttime;
-- Single filters with duration sorting
DROP INDEX IF EXISTS jobs_starttime_duration;
DROP INDEX IF EXISTS jobs_numnodes_duration;
DROP INDEX IF EXISTS jobs_numhwthreads_duration;
DROP INDEX IF EXISTS jobs_numacc_duration;
DROP INDEX IF EXISTS jobs_energy_duration;
-- Backup Indices
DROP INDEX IF EXISTS jobs_starttime;
-- Backup Indices For High Variety Columns
DROP INDEX IF EXISTS jobs_duration;
-- Legacy indexes from migration 08 (may exist on older DBs)
DROP INDEX IF EXISTS jobs_cluster;
DROP INDEX IF EXISTS jobs_cluster_starttime;
DROP INDEX IF EXISTS jobs_cluster_duration;
DROP INDEX IF EXISTS jobs_cluster_partition;
DROP INDEX IF EXISTS jobs_cluster_partition_starttime;
DROP INDEX IF EXISTS jobs_cluster_partition_duration;
DROP INDEX IF EXISTS jobs_cluster_partition_numnodes;
DROP INDEX IF EXISTS jobs_cluster_partition_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_partition_numacc;
DROP INDEX IF EXISTS jobs_cluster_partition_energy;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_user;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_project;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_starttime;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_duration;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_numnodes;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_numhwthreads;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_numacc;
DROP INDEX IF EXISTS jobs_cluster_partition_jobstate_energy;
DROP INDEX IF EXISTS jobs_cluster_jobstate;
DROP INDEX IF EXISTS jobs_cluster_jobstate_starttime;
DROP INDEX IF EXISTS jobs_cluster_jobstate_duration;
DROP INDEX IF EXISTS jobs_user;
DROP INDEX IF EXISTS jobs_user_starttime;
DROP INDEX IF EXISTS jobs_user_duration;
DROP INDEX IF EXISTS jobs_project;
DROP INDEX IF EXISTS jobs_project_starttime;
DROP INDEX IF EXISTS jobs_project_duration;
DROP INDEX IF EXISTS jobs_jobstate;
DROP INDEX IF EXISTS jobs_jobstate_cluster;
DROP INDEX IF EXISTS jobs_jobstate_starttime;
DROP INDEX IF EXISTS jobs_jobstate_duration;
DROP INDEX IF EXISTS jobs_numnodes;
DROP INDEX IF EXISTS jobs_numhwthreads;
DROP INDEX IF EXISTS jobs_numacc;
DROP INDEX IF EXISTS jobs_energy;
-- ============================================================
-- Create optimized set of 20 indexes
-- ============================================================
-- GROUP 1: Global sort (1 index)
-- Default sort for unfiltered/multi-state IN queries, time range, delete-before
CREATE INDEX jobs_starttime ON job (start_time);
-- GROUP 2: Cluster-prefixed (8 indexes)
-- Cluster + default sort, concurrent jobs, time range within cluster
CREATE INDEX jobs_cluster_starttime_duration ON job (cluster, start_time, duration);
-- Cluster + sort by duration
CREATE INDEX jobs_cluster_duration_starttime ON job (cluster, duration, start_time);
-- COVERING for cluster+state aggregation; running jobs (cluster, state, duration>?)
CREATE INDEX jobs_cluster_jobstate_duration_starttime ON job (cluster, job_state, duration, start_time);
-- Cluster+state+sort start_time (single state equality)
CREATE INDEX jobs_cluster_jobstate_starttime_duration ON job (cluster, job_state, start_time, duration);
-- COVERING for GROUP BY user with cluster filter
CREATE INDEX jobs_cluster_user ON job (cluster, hpc_user);
-- GROUP BY project with cluster filter
CREATE INDEX jobs_cluster_project ON job (cluster, project);
-- GROUP BY subcluster with cluster filter
CREATE INDEX jobs_cluster_subcluster ON job (cluster, subcluster);
-- Cluster + sort by num_nodes (state filtered per-row, fast with LIMIT)
CREATE INDEX jobs_cluster_numnodes ON job (cluster, num_nodes);
-- GROUP 3: User-prefixed (1 index)
-- Security filter (user role) + default sort
CREATE INDEX jobs_user_starttime_duration ON job (hpc_user, start_time, duration);
-- GROUP 4: Project-prefixed (1 index)
-- Security filter (manager role) + default sort
CREATE INDEX jobs_project_starttime_duration ON job (project, start_time, duration);
-- GROUP 5: JobState-prefixed (3 indexes)
-- State + project filter (for manager security within state query)
CREATE INDEX jobs_jobstate_project ON job (job_state, project);
-- State + user filter/aggregation
CREATE INDEX jobs_jobstate_user ON job (job_state, hpc_user);
-- COVERING for non-running jobs scan, state + sort duration
CREATE INDEX jobs_jobstate_duration_starttime ON job (job_state, duration, start_time);
-- GROUP 6: Rare filters (1 index)
-- Array job lookup
CREATE INDEX jobs_arrayjobid ON job (array_job_id);
-- GROUP 7: Secondary sort columns (5 indexes)
CREATE INDEX jobs_cluster_numhwthreads ON job (cluster, num_hwthreads);
CREATE INDEX jobs_cluster_numacc ON job (cluster, num_acc);
CREATE INDEX jobs_cluster_energy ON job (cluster, energy);
-- Cluster+partition + sort start_time
CREATE INDEX jobs_cluster_partition_starttime ON job (cluster, cluster_partition, start_time);
-- Cluster+partition+state filter
CREATE INDEX jobs_cluster_partition_jobstate ON job (cluster, cluster_partition, job_state);
-- Optimize DB index usage
PRAGMA optimize;

View File

@@ -0,0 +1 @@
DROP INDEX IF EXISTS jobs_cluster_user_starttime_stats;

View File

@@ -0,0 +1,11 @@
-- Migration 12: Add covering index for grouped stats queries
-- Column order: cluster (equality), hpc_user (GROUP BY), start_time (range scan)
-- Includes aggregated columns to avoid main table lookups entirely.
CREATE INDEX IF NOT EXISTS jobs_cluster_user_starttime_stats
ON job (cluster, hpc_user, start_time, duration, job_state, num_nodes, num_hwthreads, num_acc);
CREATE INDEX IF NOT EXISTS jobs_cluster_project_starttime_stats
ON job (cluster, project, start_time, duration, job_state, num_nodes, num_hwthreads, num_acc);
PRAGMA optimize;

View File

@@ -105,9 +105,9 @@ func (r *JobRepository) buildCountQuery(
var query sq.SelectBuilder
if col != "" {
query = sq.Select(col, "COUNT(job.id)").From("job").GroupBy(col)
query = sq.Select(col, "COUNT(*)").From("job").GroupBy(col)
} else {
query = sq.Select("COUNT(job.id)").From("job")
query = sq.Select("COUNT(*)").From("job")
}
switch kind {
@@ -124,59 +124,100 @@ func (r *JobRepository) buildCountQuery(
return query
}
// buildStatsQuery constructs a SQL query to compute comprehensive job statistics with optional grouping.
// buildStatsQuery constructs a SQL query to compute job statistics with optional grouping.
// Only requested columns are computed; unrequested columns select 0 as placeholder.
//
// Parameters:
// - filter: Job filters to apply (cluster, user, time range, etc.)
// - col: Column name to GROUP BY; empty string for overall statistics without grouping
//
// Returns a SelectBuilder that produces comprehensive statistics:
// - totalJobs: Count of jobs
// - totalUsers: Count of distinct users (always 0 when grouping by user)
// - totalWalltime: Sum of job durations in hours
// - totalNodes: Sum of nodes used across all jobs
// - totalNodeHours: Sum of (duration × num_nodes) in hours
// - totalCores: Sum of hardware threads used across all jobs
// - totalCoreHours: Sum of (duration × num_hwthreads) in hours
// - totalAccs: Sum of accelerators used across all jobs
// - totalAccHours: Sum of (duration × num_acc) in hours
//
// Special handling:
// - Running jobs: Duration calculated as (now - start_time) instead of stored duration
// - Grouped queries: Also select grouping column and user's display name from hpc_user table
// - All time values converted from seconds to hours (÷ 3600) and rounded
// - shortThreshold: Duration threshold in seconds for counting short-running jobs
// - reqFields: Set of requested field names; nil means compute all fields
func (r *JobRepository) buildStatsQuery(
filter []*model.JobFilter,
col string,
shortThreshold int,
reqFields map[string]bool,
) sq.SelectBuilder {
var query sq.SelectBuilder
now := time.Now().Unix()
// Helper: return real expression if field is requested (or reqFields is nil), else "0 as alias"
need := func(field string) bool {
return reqFields == nil || reqFields[field]
}
durationExpr := fmt.Sprintf(`(CASE WHEN job.job_state = 'running' THEN %d - job.start_time ELSE job.duration END)`, now)
// Build column list
columns := make([]string, 0, 14)
if col != "" {
query = sq.Select(
col,
"name",
"COUNT(job.id) as totalJobs",
"COUNT(DISTINCT job.hpc_user) AS totalUsers",
fmt.Sprintf(`CAST(ROUND(SUM((CASE WHEN job.job_state = "running" THEN %d - job.start_time ELSE job.duration END)) / 3600) as int) as totalWalltime`, time.Now().Unix()),
`CAST(SUM(job.num_nodes) as int) as totalNodes`,
fmt.Sprintf(`CAST(ROUND(SUM((CASE WHEN job.job_state = "running" THEN %d - job.start_time ELSE job.duration END) * job.num_nodes) / 3600) as int) as totalNodeHours`, time.Now().Unix()),
`CAST(SUM(job.num_hwthreads) as int) as totalCores`,
fmt.Sprintf(`CAST(ROUND(SUM((CASE WHEN job.job_state = "running" THEN %d - job.start_time ELSE job.duration END) * job.num_hwthreads) / 3600) as int) as totalCoreHours`, time.Now().Unix()),
`CAST(SUM(job.num_acc) as int) as totalAccs`,
fmt.Sprintf(`CAST(ROUND(SUM((CASE WHEN job.job_state = "running" THEN %d - job.start_time ELSE job.duration END) * job.num_acc) / 3600) as int) as totalAccHours`, time.Now().Unix()),
).From("job").LeftJoin("hpc_user ON hpc_user.username = job.hpc_user").GroupBy(col)
columns = append(columns, col)
}
columns = append(columns, "COUNT(*) as totalJobs")
if need("totalUsers") && col != "job.hpc_user" {
columns = append(columns, "COUNT(DISTINCT job.hpc_user) AS totalUsers")
} else {
query = sq.Select(
"COUNT(job.id) as totalJobs",
"COUNT(DISTINCT job.hpc_user) AS totalUsers",
fmt.Sprintf(`CAST(ROUND(SUM((CASE WHEN job.job_state = "running" THEN %d - job.start_time ELSE job.duration END)) / 3600) as int)`, time.Now().Unix()),
`CAST(SUM(job.num_nodes) as int)`,
fmt.Sprintf(`CAST(ROUND(SUM((CASE WHEN job.job_state = "running" THEN %d - job.start_time ELSE job.duration END) * job.num_nodes) / 3600) as int)`, time.Now().Unix()),
`CAST(SUM(job.num_hwthreads) as int)`,
fmt.Sprintf(`CAST(ROUND(SUM((CASE WHEN job.job_state = "running" THEN %d - job.start_time ELSE job.duration END) * job.num_hwthreads) / 3600) as int)`, time.Now().Unix()),
`CAST(SUM(job.num_acc) as int)`,
fmt.Sprintf(`CAST(ROUND(SUM((CASE WHEN job.job_state = "running" THEN %d - job.start_time ELSE job.duration END) * job.num_acc) / 3600) as int)`, time.Now().Unix()),
).From("job")
columns = append(columns, "0 AS totalUsers")
}
if need("totalWalltime") {
columns = append(columns, fmt.Sprintf(`CAST(ROUND(SUM(%s) / 3600) as int) as totalWalltime`, durationExpr))
} else {
columns = append(columns, "0 as totalWalltime")
}
if need("totalNodes") {
columns = append(columns, `CAST(SUM(job.num_nodes) as int) as totalNodes`)
} else {
columns = append(columns, "0 as totalNodes")
}
if need("totalNodeHours") {
columns = append(columns, fmt.Sprintf(`CAST(ROUND(SUM(%s * job.num_nodes) / 3600) as int) as totalNodeHours`, durationExpr))
} else {
columns = append(columns, "0 as totalNodeHours")
}
if need("totalCores") {
columns = append(columns, `CAST(SUM(job.num_hwthreads) as int) as totalCores`)
} else {
columns = append(columns, "0 as totalCores")
}
if need("totalCoreHours") {
columns = append(columns, fmt.Sprintf(`CAST(ROUND(SUM(%s * job.num_hwthreads) / 3600) as int) as totalCoreHours`, durationExpr))
} else {
columns = append(columns, "0 as totalCoreHours")
}
if need("totalAccs") {
columns = append(columns, `CAST(SUM(job.num_acc) as int) as totalAccs`)
} else {
columns = append(columns, "0 as totalAccs")
}
if need("totalAccHours") {
columns = append(columns, fmt.Sprintf(`CAST(ROUND(SUM(%s * job.num_acc) / 3600) as int) as totalAccHours`, durationExpr))
} else {
columns = append(columns, "0 as totalAccHours")
}
if need("runningJobs") {
columns = append(columns, `COUNT(CASE WHEN job.job_state = 'running' THEN 1 END) as runningJobs`)
} else {
columns = append(columns, "0 as runningJobs")
}
if need("shortJobs") {
columns = append(columns, fmt.Sprintf(`COUNT(CASE WHEN job.duration < %d THEN 1 END) as shortJobs`, shortThreshold))
} else {
columns = append(columns, "0 as shortJobs")
}
query := sq.Select(columns...).From("job")
if col != "" {
query = query.GroupBy(col)
}
for _, f := range filter {
@@ -186,35 +227,19 @@ func (r *JobRepository) buildStatsQuery(
return query
}
// JobsStatsGrouped computes comprehensive job statistics grouped by a dimension (user, project, cluster, or subcluster).
//
// This is the primary method for generating aggregated statistics views in the UI, providing
// metrics like total jobs, walltime, and resource usage broken down by the specified grouping.
//
// Parameters:
// - ctx: Context for security checks and cancellation
// - filter: Filters to apply (time range, cluster, job state, etc.)
// - page: Optional pagination (ItemsPerPage: -1 disables pagination)
// - sortBy: Optional sort column (totalJobs, totalWalltime, totalCoreHours, etc.)
// - groupBy: Required grouping dimension (User, Project, Cluster, or SubCluster)
//
// Returns a slice of JobsStatistics, one per group, with:
// - ID: The group identifier (username, project name, cluster name, etc.)
// - Name: Display name (for users, from hpc_user.name; empty for other groups)
// - Statistics: totalJobs, totalUsers, totalWalltime, resource usage metrics
//
// Security: Respects user roles via SecurityCheck - users see only their own data unless admin/support.
// Performance: Results are sorted in SQL and pagination applied before scanning rows.
// JobsStatsGrouped computes job statistics grouped by a dimension (user, project, cluster, or subcluster).
// Only columns listed in reqFields are computed; others return 0. User display names are looked up
// in a separate lightweight query to avoid JOIN overhead on the main aggregation.
func (r *JobRepository) JobsStatsGrouped(
ctx context.Context,
filter []*model.JobFilter,
page *model.PageRequest,
sortBy *model.SortByAggregate,
groupBy *model.Aggregate,
reqFields map[string]bool,
) ([]*model.JobsStatistics, error) {
start := time.Now()
col := groupBy2column[*groupBy]
query := r.buildStatsQuery(filter, col)
query := r.buildStatsQuery(filter, col, config.Keys.ShortRunningJobsDuration, reqFields)
query, err := SecurityCheck(ctx, query)
if err != nil {
@@ -241,83 +266,28 @@ func (r *JobRepository) JobsStatsGrouped(
for rows.Next() {
var id sql.NullString
var name sql.NullString
var jobs, users, walltime, nodes, nodeHours, cores, coreHours, accs, accHours sql.NullInt64
if err := rows.Scan(&id, &name, &jobs, &users, &walltime, &nodes, &nodeHours, &cores, &coreHours, &accs, &accHours); err != nil {
var jobs, users, walltime, nodes, nodeHours, cores, coreHours, accs, accHours, runningJobs, shortJobs sql.NullInt64
if err := rows.Scan(&id, &jobs, &users, &walltime, &nodes, &nodeHours, &cores, &coreHours, &accs, &accHours, &runningJobs, &shortJobs); err != nil {
cclog.Warnf("Error while scanning rows: %s", err.Error())
return nil, err
}
if id.Valid {
var totalJobs, totalUsers, totalWalltime, totalNodes, totalNodeHours, totalCores, totalCoreHours, totalAccs, totalAccHours int
var personName string
if name.Valid {
personName = name.String
}
if jobs.Valid {
totalJobs = int(jobs.Int64)
}
if users.Valid {
totalUsers = int(users.Int64)
}
if walltime.Valid {
totalWalltime = int(walltime.Int64)
}
if nodes.Valid {
totalNodes = int(nodes.Int64)
}
if cores.Valid {
totalCores = int(cores.Int64)
}
if accs.Valid {
totalAccs = int(accs.Int64)
}
if nodeHours.Valid {
totalNodeHours = int(nodeHours.Int64)
}
if coreHours.Valid {
totalCoreHours = int(coreHours.Int64)
}
if accHours.Valid {
totalAccHours = int(accHours.Int64)
}
if col == "job.hpc_user" {
// name := r.getUserName(ctx, id.String)
stats = append(stats,
&model.JobsStatistics{
ID: id.String,
Name: personName,
TotalJobs: totalJobs,
TotalWalltime: totalWalltime,
TotalNodes: totalNodes,
TotalNodeHours: totalNodeHours,
TotalCores: totalCores,
TotalCoreHours: totalCoreHours,
TotalAccs: totalAccs,
TotalAccHours: totalAccHours,
})
} else {
stats = append(stats,
&model.JobsStatistics{
ID: id.String,
TotalJobs: totalJobs,
TotalUsers: totalUsers,
TotalWalltime: totalWalltime,
TotalNodes: totalNodes,
TotalNodeHours: totalNodeHours,
TotalCores: totalCores,
TotalCoreHours: totalCoreHours,
TotalAccs: totalAccs,
TotalAccHours: totalAccHours,
})
}
stats = append(stats,
&model.JobsStatistics{
ID: id.String,
TotalJobs: int(jobs.Int64),
TotalUsers: int(users.Int64),
TotalWalltime: int(walltime.Int64),
TotalNodes: int(nodes.Int64),
TotalNodeHours: int(nodeHours.Int64),
TotalCores: int(cores.Int64),
TotalCoreHours: int(coreHours.Int64),
TotalAccs: int(accs.Int64),
TotalAccHours: int(accHours.Int64),
RunningJobs: int(runningJobs.Int64),
ShortJobs: int(shortJobs.Int64),
})
}
}
@@ -325,7 +295,35 @@ func (r *JobRepository) JobsStatsGrouped(
return nil, err
}
cclog.Debugf("Timer JobsStatsGrouped %s", time.Since(start))
// Post-query name lookup for user grouping (avoids LEFT JOIN on aggregation query)
if col == "job.hpc_user" && len(stats) > 0 {
usernames := make([]any, len(stats))
for i, s := range stats {
usernames[i] = s.ID
}
nameQuery := sq.Select("username", "name").From("hpc_user").Where(sq.Eq{"username": usernames})
nameRows, err := nameQuery.RunWith(r.DB).QueryContext(ctx)
if err != nil {
cclog.Warnf("Error looking up user names: %s", err.Error())
// Non-fatal: stats are still valid without display names
} else {
defer nameRows.Close()
nameMap := make(map[string]string, len(stats))
for nameRows.Next() {
var username, name string
if err := nameRows.Scan(&username, &name); err == nil {
nameMap[username] = name
}
}
for _, s := range stats {
if name, ok := nameMap[s.ID]; ok {
s.Name = name
}
}
}
}
return stats, nil
}
@@ -347,9 +345,10 @@ func (r *JobRepository) JobsStatsGrouped(
func (r *JobRepository) JobsStats(
ctx context.Context,
filter []*model.JobFilter,
reqFields map[string]bool,
) ([]*model.JobsStatistics, error) {
start := time.Now()
query := r.buildStatsQuery(filter, "")
query := r.buildStatsQuery(filter, "", config.Keys.ShortRunningJobsDuration, reqFields)
query, err := SecurityCheck(ctx, query)
if err != nil {
return nil, err
@@ -358,8 +357,8 @@ func (r *JobRepository) JobsStats(
row := query.RunWith(r.DB).QueryRowContext(ctx)
stats := make([]*model.JobsStatistics, 0, 1)
var jobs, users, walltime, nodes, nodeHours, cores, coreHours, accs, accHours sql.NullInt64
if err := row.Scan(&jobs, &users, &walltime, &nodes, &nodeHours, &cores, &coreHours, &accs, &accHours); err != nil {
var jobs, users, walltime, nodes, nodeHours, cores, coreHours, accs, accHours, runningJobs, shortJobs sql.NullInt64
if err := row.Scan(&jobs, &users, &walltime, &nodes, &nodeHours, &cores, &coreHours, &accs, &accHours, &runningJobs, &shortJobs); err != nil {
cclog.Warn("Error while scanning rows")
return nil, err
}
@@ -384,6 +383,8 @@ func (r *JobRepository) JobsStats(
TotalNodeHours: totalNodeHours,
TotalCoreHours: totalCoreHours,
TotalAccHours: totalAccHours,
RunningJobs: int(runningJobs.Int64),
ShortJobs: int(shortJobs.Int64),
})
}
@@ -641,7 +642,7 @@ func (r *JobRepository) AddHistograms(
var err error
// Return X-Values always as seconds, will be formatted into minutes and hours in frontend
value := fmt.Sprintf(`CAST(ROUND(((CASE WHEN job.job_state = "running" THEN %d - job.start_time ELSE job.duration END) / %d) + 1) as int) as value`, time.Now().Unix(), targetBinSize)
value := fmt.Sprintf(`CAST(ROUND(((CASE WHEN job.job_state = 'running' THEN %d - job.start_time ELSE job.duration END) / %d) + 1) as int) as value`, time.Now().Unix(), targetBinSize)
stat.HistDuration, err = r.jobsDurationStatisticsHistogram(ctx, value, filter, targetBinSize, &targetBinCount)
if err != nil {
cclog.Warn("Error while loading job statistics histogram: job duration")
@@ -745,7 +746,7 @@ func (r *JobRepository) jobsStatisticsHistogram(
) ([]*model.HistoPoint, error) {
start := time.Now()
query, qerr := SecurityCheck(ctx,
sq.Select(value, "COUNT(job.id) AS count").From("job"))
sq.Select(value, "COUNT(*) AS count").From("job"))
if qerr != nil {
return nil, qerr
@@ -811,7 +812,7 @@ func (r *JobRepository) jobsDurationStatisticsHistogram(
) ([]*model.HistoPoint, error) {
start := time.Now()
query, qerr := SecurityCheck(ctx,
sq.Select(value, "COUNT(job.id) AS count").From("job"))
sq.Select(value, "COUNT(*) AS count").From("job"))
if qerr != nil {
return nil, qerr
@@ -819,12 +820,18 @@ func (r *JobRepository) jobsDurationStatisticsHistogram(
// Each bin represents a duration range: bin N = [N*binSizeSeconds, (N+1)*binSizeSeconds)
// Example: binSizeSeconds=3600 (1 hour), bin 1 = 0-1h, bin 2 = 1-2h, etc.
points := make([]*model.HistoPoint, 0)
points := make([]*model.HistoPoint, 0, *targetBinCount)
for i := 1; i <= *targetBinCount; i++ {
point := model.HistoPoint{Value: i * binSizeSeconds, Count: 0}
points = append(points, &point)
}
// Build a map from bin value (seconds) to slice index for O(1) lookup.
binMap := make(map[int]int, len(points))
for i, p := range points {
binMap[p.Value] = i
}
for _, f := range filters {
query = BuildWhereClause(f, query)
}
@@ -836,8 +843,8 @@ func (r *JobRepository) jobsDurationStatisticsHistogram(
}
defer rows.Close()
// Match query results to pre-initialized bins.
// point.Value from query is the bin number; multiply by binSizeSeconds to match bin.Value.
// Match query results to pre-initialized bins using map lookup.
// point.Value from query is the bin number; multiply by binSizeSeconds to get bin key.
for rows.Next() {
point := model.HistoPoint{}
if err := rows.Scan(&point.Value, &point.Count); err != nil {
@@ -845,11 +852,8 @@ func (r *JobRepository) jobsDurationStatisticsHistogram(
return nil, err
}
for _, e := range points {
if e.Value == (point.Value * binSizeSeconds) {
e.Count = point.Count
break
}
if idx, ok := binMap[point.Value*binSizeSeconds]; ok {
points[idx].Count = point.Count
}
}
@@ -968,16 +972,25 @@ func (r *JobRepository) jobsMetricStatisticsHistogram(
// Pre-initialize bins with calculated min/max ranges.
// Example: peak=1000, bins=10 -> bin 1=[0,100), bin 2=[100,200), ..., bin 10=[900,1000]
points := make([]*model.MetricHistoPoint, 0)
points := make([]*model.MetricHistoPoint, 0, *bins)
binStep := int(peak) / *bins
for i := 1; i <= *bins; i++ {
binMin := (binStep * (i - 1))
binMax := (binStep * i)
epoint := model.MetricHistoPoint{Bin: &i, Count: 0, Min: &binMin, Max: &binMax}
idx := i
epoint := model.MetricHistoPoint{Bin: &idx, Count: 0, Min: &binMin, Max: &binMax}
points = append(points, &epoint)
}
// Match query results to pre-initialized bins.
// Build a map from bin number to slice index for O(1) lookup.
binMap := make(map[int]int, len(points))
for i, p := range points {
if p.Bin != nil {
binMap[*p.Bin] = i
}
}
// Match query results to pre-initialized bins using map lookup.
for rows.Next() {
rpoint := model.MetricHistoPoint{}
if err := rows.Scan(&rpoint.Bin, &rpoint.Count); err != nil {
@@ -985,10 +998,9 @@ func (r *JobRepository) jobsMetricStatisticsHistogram(
return nil, err
}
for _, e := range points {
if e.Bin != nil && rpoint.Bin != nil && *e.Bin == *rpoint.Bin {
e.Count = rpoint.Count
break
if rpoint.Bin != nil {
if idx, ok := binMap[*rpoint.Bin]; ok {
points[idx].Count = rpoint.Count
}
}
}

View File

@@ -14,7 +14,7 @@ import (
func TestBuildJobStatsQuery(t *testing.T) {
r := setup(t)
q := r.buildStatsQuery(nil, "USER")
q := r.buildStatsQuery(nil, "USER", 300, nil)
sql, _, err := q.ToSql()
noErr(t, err)
@@ -29,7 +29,7 @@ func TestJobStats(t *testing.T) {
err := r.DB.QueryRow(`SELECT COUNT(*) FROM job`).Scan(&expectedCount)
noErr(t, err)
stats, err := r.JobsStats(getContext(t), []*model.JobFilter{})
stats, err := r.JobsStats(getContext(t), []*model.JobFilter{}, nil)
noErr(t, err)
if stats[0].TotalJobs != expectedCount {

View File

@@ -302,11 +302,11 @@
if (subclusterData) {
for (let i = 0; i < subclusterData.length; i++) {
const flopsData = subclusterData[i].metrics.find((s) => s.name == "flops_any")
const memBwData = subclusterData[i].metrics.find((s) => s.name == "mem_bw")
const flopsData = subclusterData[i]?.metrics?.find((s) => s.name == "flops_any")
const memBwData = subclusterData[i]?.metrics?.find((s) => s.name == "mem_bw")
const f = flopsData.metric.series[0].statistics.avg
const m = memBwData.metric.series[0].statistics.avg
const f = flopsData?.metric?.series[0]?.statistics?.avg || 0;
const m = memBwData?.metric?.series[0]?.statistics?.avg || 0;
let intensity = f / m
if (Number.isNaN(intensity) || !Number.isFinite(intensity)) {

View File

@@ -34,6 +34,7 @@
formatDurationTime
} from "./generic/units.js";
import Filters from "./generic/Filters.svelte";
import Pagination from "./generic/joblist/Pagination.svelte";
/* Svelte 5 Props */
let {
@@ -51,6 +52,8 @@
let jobFilters = $state([]);
let nameFilter = $state("");
let sorting = $state({ field: "totalJobs", direction: "desc" });
let page = $state(1);
let itemsPerPage = $state(25);
/* Derived Vars */
const fetchRunning = $derived(jobFilters.some(jf => jf?.state?.length == 1 && jf?.state?.includes("running")));
@@ -64,6 +67,12 @@
const sortedRows = $derived(
$stats.data ? sort($stats.data.rows, sorting, nameFilter) : []
);
const paginatedRows = $derived(
sortedRows.slice((page - 1) * itemsPerPage, page * itemsPerPage)
);
/* Reset page when sorting or filter changes */
$effect(() => { sorting; nameFilter; page = 1; });
let stats = $derived(
queryStore({
@@ -360,7 +369,7 @@
>
</tr>
{:else if $stats.data}
{#each sort($stats.data.rows, sorting, nameFilter) as row (row.id)}
{#each paginatedRows as row (row.id)}
<tr>
<td>
{#if type == "USER"}
@@ -402,3 +411,16 @@
{/if}
</tbody>
</Table>
{#if sortedRows.length > 0}
<Pagination
{page}
{itemsPerPage}
totalItems={sortedRows.length}
itemText={type === 'USER' ? 'Users' : 'Projects'}
pageSizes={[25, 50, 100]}
updatePaging={(detail) => {
itemsPerPage = detail.itemsPerPage;
page = detail.page;
}}
/>
{/if}