Archive Migration Tool
Overview
The archive-migration tool migrates job archives from old schema versions to the current schema version. It handles schema changes such as the exclusive → shared field transformation and adds/removes fields as needed.
Features
- Parallel Processing: Uses worker pool for fast migration
- Dry-Run Mode: Preview changes without modifying files
- Safe Transformations: Applies well-defined schema transformations
- Progress Reporting: Shows real-time migration progress
- Error Handling: Continues on individual failures, reports at end
Schema Transformations
Exclusive → Shared
Converts the old exclusive integer field to the new shared string field:
0→"multi_user"1→"none"2→"single_user"
Missing Fields
Adds fields required by current schema:
submitTime: Defaults tostartTimeif missingenergy: Defaults to0.0requestedMemory: Defaults to0shared: Defaults to"none"if still missing after transformation
Deprecated Fields
Removes fields no longer in schema:
mem_used_max,flops_any_avg,mem_bw_avgload_avg,net_bw_avg,net_data_vol_totalfile_bw_avg,file_data_vol_total
Usage
Build
cd /Users/jan/prg/cc-backend/tools/archive-migration
go build
Dry Run (Preview Changes)
./archive-migration --archive /path/to/archive --dry-run
Migrate Archive
# IMPORTANT: Backup your archive first!
cp -r /path/to/archive /path/to/archive-backup
# Run migration
./archive-migration --archive /path/to/archive
Command-Line Options
--archive <path>: Path to job archive (required)--dry-run: Preview changes without modifying files--workers <n>: Number of parallel workers (default: 4)--loglevel <level>: Logging level: debug, info, warn, err, fatal, crit (default: info)--logdate: Add timestamps to log messages
Examples
# Preview what would change
./archive-migration --archive ./var/job-archive --dry-run
# Migrate with verbose logging
./archive-migration --archive ./var/job-archive --loglevel debug
# Migrate with 8 workers for faster processing
./archive-migration --archive ./var/job-archive --workers 8
Safety
Caution
Always backup your archive before running migration!
The tool modifies meta.json files in place. While transformations are designed to be safe, unexpected issues could occur. Follow these safety practices:
- Always run with
--dry-runfirst to preview changes - Backup your archive before migration
- Test on a copy of your archive first
- Verify results after migration
Verification
After migration, verify the archive:
# Use archive-manager to check the archive
cd ../archive-manager
./archive-manager -s /path/to/migrated-archive
# Or validate specific jobs
./archive-manager -s /path/to/migrated-archive --validate
Troubleshooting
Migration Failures
If individual jobs fail to migrate:
- Check the error messages for specific files
- Examine the failing
meta.jsonfiles manually - Fix invalid JSON or unexpected field types
- Re-run migration (already-migrated jobs will be processed again)
Performance
For large archives:
- Increase
--workersfor more parallelism - Use
--loglevel warnto reduce log output - Monitor disk I/O if migration is slow
Technical Details
The migration process:
- Walks archive directory recursively
- Finds all
meta.jsonfiles - Distributes jobs to worker pool
- For each job:
- Reads JSON file
- Applies transformations in order
- Writes back migrated data (if not dry-run)
- Reports statistics and errors
Transformations are idempotent - running migration multiple times is safe (though not recommended for performance).