Add Analyse on db startup

Entire-Checkpoint: ea70a955214d
This commit is contained in:
2026-03-11 05:28:52 +01:00
parent f8831e7040
commit eba3995610
6 changed files with 265 additions and 11 deletions

View File

@@ -1,18 +1,40 @@
# Session Context
Session ID: 42401d2e-7d1c-4c0e-abe6-356cb2d48747
Commit Message: Check if the fixes are correctly merged in nodes.go
Commit Message: There also have to be bugs in jobQuery.go . Especially the following que
## Prompts
### Prompt 1
Check if the fixes are correctly merged in nodes.go
There also have to be bugs in jobQuery.go . Especially the following query triggers the memory leak: SELECT * FROM job WHERE job.job_state IN ("completed", "running", "failed") ORDER BY job.start_time DESC LIMIT 1 OFFSET 10; Dig deeper to find the cause. Also investigate why no existing index is used for this query.
## Summary
All clean. Here's a summary of what was wrong and fixed in `node.go`:
All passes. Here's a summary of findings and fixes:
1. **Duplicate `QueryNodesWithMeta`** (lines 468-541): The old unfixed version was left in alongside the fixed version at line 391. Removed the duplicate.
2. **`CountStates`**: Had both old code (manual `stateMap` counting without SQL `COUNT(*)`) and new code (SQL `COUNT(*) ... GROUP BY`) concatenated together. The second `rows.Next()` loop was dead code since the first loop already consumed all rows. Replaced with the correct version using `SELECT column, COUNT(*) ... GROUP BY column`.
3. **`QueryNodes` line 376**: Leftover `rows.Close()` inside the scan error path (redundant with `defer`). Removed.
---
## Findings
### Bug 1: Missing `defer rows.Close()` in `user.go:GetLdapUsernames()` (line 100)
**Fixed.** Every call leaked a database connection and its buffers.
### Bug 2: SQLite never runs `ANALYZE` — wrong index selection
**Fixed.** Added `ANALYZE` to `setupSqlite()` in `dbConnection.go`.
Without statistics in `sqlite_stat1`, SQLite's query planner **guessed** and chose `jobs_jobstate_energy` (a terrible index for this query) instead of scanning `jobs_starttime` in reverse. The result:
| | Before ANALYZE | After ANALYZE |
|---|---|---|
| Index used | `jobs_jobstate_energy` | `jobs_starttime` |
| Sort | **TEMP B-TREE** (materializes ALL rows) | None (index order) |
| Memory | Proportional to total matching rows | Constant |
| I/O | Full scan of all matching rows | Stops at OFFSET+LIMIT |
### Bug 3: `IN` clause + `ORDER BY` is fundamentally incompatible with composite indexes
Even with the "correct" index `(job_state, start_time)`, SQLite **cannot** merge-sort across 3 separate index range scans for `IN ('completed','running','failed')`. It always falls back to a temp B-tree sort. The only efficient plan is to use the standalone `jobs_starttime` index — which SQLite does automatically **after ANALYZE** because it realizes the 3 states cover virtually all rows, making the WHERE clause nearly a no-op.
### Observation: 79 indexes on the `job` table
This is excessive and actively harmful — it confuses the query planner (especially without ANALYZE) and slows writes. The `jobs_jobstate_starttime` index from migration 08 is also missing from the actual DB (only the 3-column `jobs_jobstate_starttime_duration` exists). This is worth investigating separately but is a schema/migration concern, not a code bug.

File diff suppressed because one or more lines are too long

View File

@@ -1 +1 @@
Check if the fixes are correctly merged in nodes.go
There also have to be bugs in jobQuery.go . Especially the following query triggers the memory leak: SELECT * FROM job WHERE job.job_state IN ("completed", "running", "failed") ORDER BY job.start_time DESC LIMIT 1 OFFSET 10; Dig deeper to find the cause. Also investigate why no existing index is used for this query.

View File

@@ -1,5 +1,27 @@
All clean. Here's a summary of what was wrong and fixed in `node.go`:
All passes. Here's a summary of findings and fixes:
1. **Duplicate `QueryNodesWithMeta`** (lines 468-541): The old unfixed version was left in alongside the fixed version at line 391. Removed the duplicate.
2. **`CountStates`**: Had both old code (manual `stateMap` counting without SQL `COUNT(*)`) and new code (SQL `COUNT(*) ... GROUP BY`) concatenated together. The second `rows.Next()` loop was dead code since the first loop already consumed all rows. Replaced with the correct version using `SELECT column, COUNT(*) ... GROUP BY column`.
3. **`QueryNodes` line 376**: Leftover `rows.Close()` inside the scan error path (redundant with `defer`). Removed.
---
## Findings
### Bug 1: Missing `defer rows.Close()` in `user.go:GetLdapUsernames()` (line 100)
**Fixed.** Every call leaked a database connection and its buffers.
### Bug 2: SQLite never runs `ANALYZE` — wrong index selection
**Fixed.** Added `ANALYZE` to `setupSqlite()` in `dbConnection.go`.
Without statistics in `sqlite_stat1`, SQLite's query planner **guessed** and chose `jobs_jobstate_energy` (a terrible index for this query) instead of scanning `jobs_starttime` in reverse. The result:
| | Before ANALYZE | After ANALYZE |
|---|---|---|
| Index used | `jobs_jobstate_energy` | `jobs_starttime` |
| Sort | **TEMP B-TREE** (materializes ALL rows) | None (index order) |
| Memory | Proportional to total matching rows | Constant |
| I/O | Full scan of all matching rows | Stops at OFFSET+LIMIT |
### Bug 3: `IN` clause + `ORDER BY` is fundamentally incompatible with composite indexes
Even with the "correct" index `(job_state, start_time)`, SQLite **cannot** merge-sort across 3 separate index range scans for `IN ('completed','running','failed')`. It always falls back to a temp B-tree sort. The only efficient plan is to use the standalone `jobs_starttime` index — which SQLite does automatically **after ANALYZE** because it realizes the 3 states cover virtually all rows, making the WHERE clause nearly a no-op.
### Observation: 79 indexes on the `job` table
This is excessive and actively harmful — it confuses the query planner (especially without ANALYZE) and slows writes. The `jobs_jobstate_starttime` index from migration 08 is also missing from the actual DB (only the 3-column `jobs_jobstate_starttime_duration` exists). This is worth investigating separately but is a schema/migration concern, not a code bug.

View File

@@ -48,6 +48,14 @@ func setupSqlite(db *sql.DB) error {
}
}
// Update query planner statistics so SQLite picks optimal indexes.
// Without this, SQLite guesses row distributions and often chooses wrong
// indexes for queries with IN clauses + ORDER BY, causing full table sorts
// in temp B-trees instead of using covering indexes.
if _, err := db.Exec("ANALYZE"); err != nil {
cclog.Warnf("Failed to run ANALYZE: %v", err)
}
return nil
}

View File

@@ -102,6 +102,7 @@ func (r *UserRepository) GetLdapUsernames() ([]string, error) {
cclog.Warn("Error while querying usernames")
return nil, err
}
defer rows.Close()
for rows.Next() {
var username string