HTCondor to ClusterCockpit Sync
Go to file
Joachim Meyer a3ca962d84 Add Schedd plugin to synch with CC.
This should be much more reliable, albeit being more prone to crash a HTCondor component (the schedd) if there's a bug...
2022-12-15 16:13:45 +01:00
contrib Add systemd service and timer to start this script every minute 2022-09-06 15:02:28 +02:00
curlpp@592552a165 Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
.gitignore Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
.gitmodules Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
CMakeLists.txt Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
condor_job.hpp Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
condor_status_to_gpu_map.py Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
config.json.example Initial commit 2022-08-25 15:38:06 +02:00
htcondor_cc_sync_plugin.cpp Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
htcondor_cc_sync_plugin.hpp Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
htcondor-clustercockpit-push.py Also stop jobs that ended with shadow exception. 2022-11-29 15:48:05 +01:00
json.hpp Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
LICENSE Initial commit 2022-08-25 15:30:56 +02:00
map-gpu-ids.py Start revamping to use htcondor EventLog not slurm 2022-11-04 16:25:48 +01:00
openapi_0.0.37.patch Initial commit 2022-08-25 15:38:06 +02:00
Readme.md Fix some layout issues in Readme.md 2022-08-25 16:09:22 +02:00
ReadMe.md Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
requirements.txt Start revamping to use htcondor EventLog not slurm 2022-11-04 16:25:48 +01:00

HTCondor to ClusterCockpit Sync

HTCondor ClassAdLog Plugin

Building

Requirements: A build environment reasonably similar to the submission nodes (might want to use the HTCondor nmi build docker containers).

Use CMake to configure the project.

mkdir build ; cd build
cmake .. -DCONDOR_SRC=<path/to/htcondor> -DCONDOR_BUILD=<path/to/htcondor/build> -DCMAKE_BUILD_TYPE=Release

Configuration

The target system will need the corresponding curl package installed.

Adapt and add to condor_config.local or any other HTCondor config file:

SCHEDD.PLUGINS = $(SCHEDD.PLUGINS) /path/to/libhtcondor_cc_sync_plugin.so

CCSYNC_URL=<ClusterCockpit-URL>
CCSYNC_APIKEY=<API-Key>
CCSYNC_CLUSTER_NAME=<ClusterCockpit's cluster name this submit node works for>
CCSYNC_GPU_MAP=/path/to/gpu_map.json
CCSYNC_SUBMIT_ID=<Unique submission node id, expected to be in 0..3 (see #globalJobIdToInt)>

gpu_map.json is expected in the format and can be generated with condor_status_to_gpu_map.py <path/to/condor_status.json>, where condor_status.json is generated by calling condor_status -json > condor_status.json on the cluster:

{
    "hostname1": {
        "GPU-acb66c44": "0000:07:00.0",
        ...
    },
    "hostname2": {
        "GPU-31f57da0": "0000:0A:00.0",
        ...
    }
}