Files
cc-metric-collector/collectors/slurmCgroupMetric.md
2025-10-07 13:10:17 +02:00

1.5 KiB

slurm_cgroup collector

The slurm_cgroup collector reads job-specific resource metrics from the cgroup v2 filesystem and provides hwthread metrics for memory and CPU usage of running SLURM jobs.

Example configuration

"slurm_cgroup": {
  "cgroup_base": "/sys/fs/cgroup/system.slice/slurmstepd.scope",
  "exclude_metrics": [
    "job_sys_cpu",
    "job_mem_limit"
  ]
}
  • The cgroup_base parameter (optional) can be set to specify the root path to SLURM job cgroups. The default is /sys/fs/cgroup/system.slice/slurmstepd.scope.
  • The exclude_metrics array can be used to suppress individual metrics from being sent to the sink.

Reported metrics

All metrics are available per hardware thread :

  • job_mem_used (unit=Bytes): Current memory usage of the job
  • job_max_mem_used (unit=Bytes): Peak memory usage
  • job_mem_limit (unit=Bytes): Cgroup memory limit
  • job_user_cpu (unit=%): User CPU utilization percentage
  • job_sys_cpu (unit=%): System CPU utilization percentage

Each metric has tags:

  • type=hwthread
  • type-id=<core_id>

Limitations

  • cgroups v2 required: This collector only supports systems running with cgroups v2 (unified hierarchy).