Added Job Script Tagger and WebDataset Creator
This script processes image files from tar archives or directories, extracts job IDs, fetches job scripts from database, and identifies application tags based on keywords. Valid samples are saved to sharded WebDatasets while problematic ones are logged.
This commit is contained in:
1174
job_script_tagger.py
Normal file
1174
job_script_tagger.py
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user