mirror of
https://github.com/ClusterCockpit/cc-backend
synced 2026-03-15 04:17:30 +01:00
Add Analyse on db startup
Entire-Checkpoint: ea70a955214d
This commit is contained in:
@@ -1,18 +1,40 @@
|
|||||||
# Session Context
|
# Session Context
|
||||||
|
|
||||||
Session ID: 42401d2e-7d1c-4c0e-abe6-356cb2d48747
|
Session ID: 42401d2e-7d1c-4c0e-abe6-356cb2d48747
|
||||||
Commit Message: Check if the fixes are correctly merged in nodes.go
|
Commit Message: There also have to be bugs in jobQuery.go . Especially the following que
|
||||||
|
|
||||||
## Prompts
|
## Prompts
|
||||||
|
|
||||||
### Prompt 1
|
### Prompt 1
|
||||||
|
|
||||||
Check if the fixes are correctly merged in nodes.go
|
There also have to be bugs in jobQuery.go . Especially the following query triggers the memory leak: SELECT * FROM job WHERE job.job_state IN ("completed", "running", "failed") ORDER BY job.start_time DESC LIMIT 1 OFFSET 10; Dig deeper to find the cause. Also investigate why no existing index is used for this query.
|
||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
All clean. Here's a summary of what was wrong and fixed in `node.go`:
|
All passes. Here's a summary of findings and fixes:
|
||||||
|
|
||||||
1. **Duplicate `QueryNodesWithMeta`** (lines 468-541): The old unfixed version was left in alongside the fixed version at line 391. Removed the duplicate.
|
---
|
||||||
2. **`CountStates`**: Had both old code (manual `stateMap` counting without SQL `COUNT(*)`) and new code (SQL `COUNT(*) ... GROUP BY`) concatenated together. The second `rows.Next()` loop was dead code since the first loop already consumed all rows. Replaced with the correct version using `SELECT column, COUNT(*) ... GROUP BY column`.
|
|
||||||
3. **`QueryNodes` line 376**: Leftover `rows.Close()` inside the scan error path (redundant with `defer`). Removed.
|
## Findings
|
||||||
|
|
||||||
|
### Bug 1: Missing `defer rows.Close()` in `user.go:GetLdapUsernames()` (line 100)
|
||||||
|
**Fixed.** Every call leaked a database connection and its buffers.
|
||||||
|
|
||||||
|
### Bug 2: SQLite never runs `ANALYZE` — wrong index selection
|
||||||
|
**Fixed.** Added `ANALYZE` to `setupSqlite()` in `dbConnection.go`.
|
||||||
|
|
||||||
|
Without statistics in `sqlite_stat1`, SQLite's query planner **guessed** and chose `jobs_jobstate_energy` (a terrible index for this query) instead of scanning `jobs_starttime` in reverse. The result:
|
||||||
|
|
||||||
|
| | Before ANALYZE | After ANALYZE |
|
||||||
|
|---|---|---|
|
||||||
|
| Index used | `jobs_jobstate_energy` | `jobs_starttime` |
|
||||||
|
| Sort | **TEMP B-TREE** (materializes ALL rows) | None (index order) |
|
||||||
|
| Memory | Proportional to total matching rows | Constant |
|
||||||
|
| I/O | Full scan of all matching rows | Stops at OFFSET+LIMIT |
|
||||||
|
|
||||||
|
### Bug 3: `IN` clause + `ORDER BY` is fundamentally incompatible with composite indexes
|
||||||
|
|
||||||
|
Even with the "correct" index `(job_state, start_time)`, SQLite **cannot** merge-sort across 3 separate index range scans for `IN ('completed','running','failed')`. It always falls back to a temp B-tree sort. The only efficient plan is to use the standalone `jobs_starttime` index — which SQLite does automatically **after ANALYZE** because it realizes the 3 states cover virtually all rows, making the WHERE clause nearly a no-op.
|
||||||
|
|
||||||
|
### Observation: 79 indexes on the `job` table
|
||||||
|
This is excessive and actively harmful — it confuses the query planner (especially without ANALYZE) and slows writes. The `jobs_jobstate_starttime` index from migration 08 is also missing from the actual DB (only the 3-column `jobs_jobstate_starttime_duration` exists). This is worth investigating separately but is a schema/migration concern, not a code bug.
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -1 +1 @@
|
|||||||
Check if the fixes are correctly merged in nodes.go
|
There also have to be bugs in jobQuery.go . Especially the following query triggers the memory leak: SELECT * FROM job WHERE job.job_state IN ("completed", "running", "failed") ORDER BY job.start_time DESC LIMIT 1 OFFSET 10; Dig deeper to find the cause. Also investigate why no existing index is used for this query.
|
||||||
@@ -1,5 +1,27 @@
|
|||||||
All clean. Here's a summary of what was wrong and fixed in `node.go`:
|
All passes. Here's a summary of findings and fixes:
|
||||||
|
|
||||||
1. **Duplicate `QueryNodesWithMeta`** (lines 468-541): The old unfixed version was left in alongside the fixed version at line 391. Removed the duplicate.
|
---
|
||||||
2. **`CountStates`**: Had both old code (manual `stateMap` counting without SQL `COUNT(*)`) and new code (SQL `COUNT(*) ... GROUP BY`) concatenated together. The second `rows.Next()` loop was dead code since the first loop already consumed all rows. Replaced with the correct version using `SELECT column, COUNT(*) ... GROUP BY column`.
|
|
||||||
3. **`QueryNodes` line 376**: Leftover `rows.Close()` inside the scan error path (redundant with `defer`). Removed.
|
## Findings
|
||||||
|
|
||||||
|
### Bug 1: Missing `defer rows.Close()` in `user.go:GetLdapUsernames()` (line 100)
|
||||||
|
**Fixed.** Every call leaked a database connection and its buffers.
|
||||||
|
|
||||||
|
### Bug 2: SQLite never runs `ANALYZE` — wrong index selection
|
||||||
|
**Fixed.** Added `ANALYZE` to `setupSqlite()` in `dbConnection.go`.
|
||||||
|
|
||||||
|
Without statistics in `sqlite_stat1`, SQLite's query planner **guessed** and chose `jobs_jobstate_energy` (a terrible index for this query) instead of scanning `jobs_starttime` in reverse. The result:
|
||||||
|
|
||||||
|
| | Before ANALYZE | After ANALYZE |
|
||||||
|
|---|---|---|
|
||||||
|
| Index used | `jobs_jobstate_energy` | `jobs_starttime` |
|
||||||
|
| Sort | **TEMP B-TREE** (materializes ALL rows) | None (index order) |
|
||||||
|
| Memory | Proportional to total matching rows | Constant |
|
||||||
|
| I/O | Full scan of all matching rows | Stops at OFFSET+LIMIT |
|
||||||
|
|
||||||
|
### Bug 3: `IN` clause + `ORDER BY` is fundamentally incompatible with composite indexes
|
||||||
|
|
||||||
|
Even with the "correct" index `(job_state, start_time)`, SQLite **cannot** merge-sort across 3 separate index range scans for `IN ('completed','running','failed')`. It always falls back to a temp B-tree sort. The only efficient plan is to use the standalone `jobs_starttime` index — which SQLite does automatically **after ANALYZE** because it realizes the 3 states cover virtually all rows, making the WHERE clause nearly a no-op.
|
||||||
|
|
||||||
|
### Observation: 79 indexes on the `job` table
|
||||||
|
This is excessive and actively harmful — it confuses the query planner (especially without ANALYZE) and slows writes. The `jobs_jobstate_starttime` index from migration 08 is also missing from the actual DB (only the 3-column `jobs_jobstate_starttime_duration` exists). This is worth investigating separately but is a schema/migration concern, not a code bug.
|
||||||
@@ -48,6 +48,14 @@ func setupSqlite(db *sql.DB) error {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Update query planner statistics so SQLite picks optimal indexes.
|
||||||
|
// Without this, SQLite guesses row distributions and often chooses wrong
|
||||||
|
// indexes for queries with IN clauses + ORDER BY, causing full table sorts
|
||||||
|
// in temp B-trees instead of using covering indexes.
|
||||||
|
if _, err := db.Exec("ANALYZE"); err != nil {
|
||||||
|
cclog.Warnf("Failed to run ANALYZE: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -102,6 +102,7 @@ func (r *UserRepository) GetLdapUsernames() ([]string, error) {
|
|||||||
cclog.Warn("Error while querying usernames")
|
cclog.Warn("Error while querying usernames")
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
|
defer rows.Close()
|
||||||
|
|
||||||
for rows.Next() {
|
for rows.Next() {
|
||||||
var username string
|
var username string
|
||||||
|
|||||||
Reference in New Issue
Block a user