Added Job Script Tagger and WebDataset Creator

This script processes image files from tar archives or directories, extracts job IDs, fetches job scripts from database, and identifies application tags based on keywords. Valid samples are saved to sharded WebDatasets while problematic ones are logged.
This commit is contained in:
2025-10-14 15:05:15 +02:00
parent 4adda57539
commit 2bd43009c3

1174
job_script_tagger.py Normal file

File diff suppressed because it is too large Load Diff