Joachim Meyer 019bfa5ee9 JobCurrentStartDate & EnteredCurrentStatus differ.
When starting a job they then and when differ by 1s or so.. so using JobCurrentStartDate in the end does not match the EnteredCurrentStatus from the start, CC fails to stop the right job.
2022-12-20 17:15:01 +01:00
2022-12-16 13:42:04 +01:00
2022-12-15 16:13:45 +01:00

HTCondor to ClusterCockpit Sync

HTCondor ClassAdLog Plugin

Building

Requirements: A build environment reasonably similar to the submission nodes (might want to use the HTCondor nmi build docker containers).

Use CMake to configure the project.

mkdir build ; cd build
cmake .. -DCONDOR_SRC=<path/to/htcondor> -DCONDOR_BUILD=<path/to/htcondor/build> -DCMAKE_BUILD_TYPE=Release

Configuration

The target system will need the corresponding curl package installed.

Adapt and add to condor_config.local or any other HTCondor config file:

SCHEDD.PLUGINS = $(SCHEDD.PLUGINS) /path/to/libhtcondor_cc_sync_plugin.so

CCSYNC_URL=<ClusterCockpit-URL>
CCSYNC_APIKEY=<API-Key>
CCSYNC_CLUSTER_NAME=<ClusterCockpit's cluster name this submit node works for>
CCSYNC_GPU_MAP=/path/to/gpu_map.json
CCSYNC_SUBMIT_ID=<Unique submission node id, expected to be in 0..3 (see #globalJobIdToInt)>

gpu_map.json is expected in the format and can be generated with condor_status_to_gpu_map.py <path/to/condor_status.json>, where condor_status.json is generated by calling condor_status -json > condor_status.json on the cluster:

{
    "hostname1": {
        "GPU-acb66c44": "0000:07:00.0",
        ...
    },
    "hostname2": {
        "GPU-31f57da0": "0000:0A:00.0",
        ...
    }
}

For getting a debug dump of the class ads at the end of the endTransaction, build with -DVERBOSE (automatically set for Debug or RelWithDebInfo builds) and set SCHEDD_DEBUG=D_FULLDEBUG in the condor config.

Description
HTCondor to ClusterCockpit Sync
Readme MIT 207 KiB
Languages
C++ 98.7%
Python 1.1%
CMake 0.2%