HTCondor to ClusterCockpit Sync
Go to file
Joachim Meyer 019bfa5ee9 JobCurrentStartDate & EnteredCurrentStatus differ.
When starting a job they then and when differ by 1s or so.. so using JobCurrentStartDate in the end does not match the EnteredCurrentStatus from the start, CC fails to stop the right job.
2022-12-20 17:15:01 +01:00
contrib Add systemd service and timer to start this script every minute 2022-09-06 15:02:28 +02:00
curlpp@592552a165 Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
scripts Fix AssignedGPUs parsing. 2022-12-16 13:42:04 +01:00
src JobCurrentStartDate & EnteredCurrentStatus differ. 2022-12-20 17:15:01 +01:00
.gitignore Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
.gitmodules Add Schedd plugin to synch with CC. 2022-12-15 16:13:45 +01:00
CMakeLists.txt Limit verbosity a bit (SCHEDD_DEBUG=D_FULLDEBUG). 2022-12-16 09:35:10 +01:00
ReadMe.md Limit verbosity a bit (SCHEDD_DEBUG=D_FULLDEBUG). 2022-12-16 09:35:10 +01:00

HTCondor to ClusterCockpit Sync

HTCondor ClassAdLog Plugin

Building

Requirements: A build environment reasonably similar to the submission nodes (might want to use the HTCondor nmi build docker containers).

Use CMake to configure the project.

mkdir build ; cd build
cmake .. -DCONDOR_SRC=<path/to/htcondor> -DCONDOR_BUILD=<path/to/htcondor/build> -DCMAKE_BUILD_TYPE=Release

Configuration

The target system will need the corresponding curl package installed.

Adapt and add to condor_config.local or any other HTCondor config file:

SCHEDD.PLUGINS = $(SCHEDD.PLUGINS) /path/to/libhtcondor_cc_sync_plugin.so

CCSYNC_URL=<ClusterCockpit-URL>
CCSYNC_APIKEY=<API-Key>
CCSYNC_CLUSTER_NAME=<ClusterCockpit's cluster name this submit node works for>
CCSYNC_GPU_MAP=/path/to/gpu_map.json
CCSYNC_SUBMIT_ID=<Unique submission node id, expected to be in 0..3 (see #globalJobIdToInt)>

gpu_map.json is expected in the format and can be generated with condor_status_to_gpu_map.py <path/to/condor_status.json>, where condor_status.json is generated by calling condor_status -json > condor_status.json on the cluster:

{
    "hostname1": {
        "GPU-acb66c44": "0000:07:00.0",
        ...
    },
    "hostname2": {
        "GPU-31f57da0": "0000:0A:00.0",
        ...
    }
}

For getting a debug dump of the class ads at the end of the endTransaction, build with -DVERBOSE (automatically set for Debug or RelWithDebInfo builds) and set SCHEDD_DEBUG=D_FULLDEBUG in the condor config.