Importer Package
The importer package provides functionality for importing job data into the ClusterCockpit database from archived job files.
Overview
This package supports two primary import workflows:
- Bulk Database Initialization - Reinitialize the entire job database from archived jobs
- Individual Job Import - Import specific jobs from metadata/data file pairs
Both workflows enrich job metadata by calculating performance footprints and energy consumption metrics before persisting to the database.
Main Entry Points
InitDB()
Reinitializes the job database from all archived jobs.
if err := importer.InitDB(); err != nil {
log.Fatal(err)
}
This function:
- Flushes existing job, tag, and jobtag tables
- Iterates through all jobs in the configured archive
- Enriches each job with calculated metrics
- Inserts jobs into the database in batched transactions (100 jobs per batch)
- Continues on individual job failures, logging errors
Use Case: Initial database setup or complete database rebuild from archive.
HandleImportFlag(flag string)
Imports jobs from specified file pairs.
// Format: "<meta.json>:<data.json>[,<meta2.json>:<data2.json>,...]"
flag := "/path/to/meta.json:/path/to/data.json"
if err := importer.HandleImportFlag(flag); err != nil {
log.Fatal(err)
}
This function:
- Parses the comma-separated file pairs
- Validates metadata and job data against schemas (if validation enabled)
- Enriches each job with footprints and energy metrics
- Imports jobs into both the archive and database
- Fails fast on the first error
Use Case: Importing specific jobs from external sources or manual job additions.
Job Enrichment
Both import workflows use enrichJobMetadata() to calculate:
Performance Footprints
Performance footprints are calculated from metric averages based on the subcluster configuration:
job.Footprint["mem_used_avg"] = 45.2 // GB
job.Footprint["cpu_load_avg"] = 0.87 // percentage
Energy Metrics
Energy consumption is calculated from power metrics using the formula:
Energy (kWh) = (Power (W) × Duration (s) / 3600) / 1000
For each energy metric:
job.EnergyFootprint["acc_power"] = 12.5 // kWh
job.Energy = 150.2 // Total energy in kWh
Note: Energy calculations for metrics with unit "energy" (Joules) are not yet implemented.
Data Validation
SanityChecks(job *schema.Job)
Validates job metadata before database insertion:
- Cluster exists in configuration
- Subcluster is valid (assigns if needed)
- Job state is valid
- Resources and user fields are populated
- Node counts and hardware thread counts are positive
- Resource count matches declared node count
Normalization Utilities
The package includes utilities for normalizing metric values to appropriate SI prefixes:
Normalize(avg float64, prefix string)
Adjusts values and SI prefixes for readability:
factor, newPrefix := importer.Normalize(2048.0, "M")
// Converts 2048 MB → ~2.0 GB
// Returns: factor for conversion, "G"
This is useful for automatically scaling metrics (e.g., memory, storage) to human-readable units.
Dependencies
github.com/ClusterCockpit/cc-backend/internal/repository- Database operationsgithub.com/ClusterCockpit/cc-backend/pkg/archive- Job archive accessgithub.com/ClusterCockpit/cc-lib/schema- Job schema definitionsgithub.com/ClusterCockpit/cc-lib/ccLogger- Logginggithub.com/ClusterCockpit/cc-lib/ccUnits- SI unit handling
Error Handling
- InitDB: Continues processing on individual job failures, logs errors, returns summary
- HandleImportFlag: Fails fast on first error, returns immediately
- Both functions log detailed error context for debugging
Performance
- Transaction Batching: InitDB processes jobs in batches of 100 for optimal database performance
- Tag Caching: Tag IDs are cached during import to minimize database queries
- Progress Reporting: InitDB prints progress updates during bulk operations