mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-11-03 10:15:06 +01:00
1.6 KiB
1.6 KiB
slurm_cgroup collector
The slurm_cgroup collector reads job-specific resource metrics from the cgroup v2 filesystem and provides hwthread metrics for memory and CPU usage of running SLURM jobs.
Example configuration
"slurm_cgroup": {
"cgroup_base": "/sys/fs/cgroup/system.slice/slurmstepd.scope",
"exclude_metrics": [
"job_sys_cpu",
"job_mem_limit"
],
"use_sudo": false
}
- The
cgroup_baseparameter (optional) can be set to specify the root path to SLURM job cgroups. The default is/sys/fs/cgroup/system.slice/slurmstepd.scope. - The
exclude_metricsarray can be used to suppress individual metrics from being sent to the sink. - The cgroups metrics are only available for root users. If password-less sudo is configured, you can enable sudo in the configuration.
Reported metrics
All metrics are available per hardware thread :
job_mem_used(unit=Bytes): Current memory usage of the jobjob_max_mem_used(unit=Bytes): Peak memory usagejob_mem_limit(unit=Bytes): Cgroup memory limitjob_user_cpu(unit=%): User CPU utilization percentagejob_sys_cpu(unit=%): System CPU utilization percentage
Each metric has tags:
type=hwthreadtype-id=<core_id>
Limitations
- cgroups v2 required: This collector only supports systems running with cgroups v2 (unified hierarchy).