Files
cc-metric-collector/collectors/slurmCgroupMetric.md
2025-10-07 13:10:17 +02:00

49 lines
1.5 KiB
Markdown

<!--
---
title: Slurm cgroup metric collector
description: Collect per-core memory and CPU usage for SLURM jobs from cgroup v2
categories: [cc-metric-collector]
tags: ['Admin']
weight: 3
hugo_path: docs/reference/cc-metric-collector/collectors/slurm_cgroup.md
---
-->
## `slurm_cgroup` collector
The `slurm_cgroup` collector reads job-specific resource metrics from the cgroup v2 filesystem and provides **hwthread** metrics for memory and CPU usage of running SLURM jobs.
### Example configuration
```json
"slurm_cgroup": {
"cgroup_base": "/sys/fs/cgroup/system.slice/slurmstepd.scope",
"exclude_metrics": [
"job_sys_cpu",
"job_mem_limit"
]
}
```
* The `cgroup_base` parameter (optional) can be set to specify the root path to SLURM job cgroups. The default is `/sys/fs/cgroup/system.slice/slurmstepd.scope`.
* The `exclude_metrics` array can be used to suppress individual metrics from being sent to the sink.
### Reported metrics
All metrics are available **per hardware thread** :
* `job_mem_used` (`unit=Bytes`): Current memory usage of the job
* `job_max_mem_used` (`unit=Bytes`): Peak memory usage
* `job_mem_limit` (`unit=Bytes`): Cgroup memory limit
* `job_user_cpu` (`unit=%`): User CPU utilization percentage
* `job_sys_cpu` (`unit=%`): System CPU utilization percentage
Each metric has tags:
* `type=hwthread`
* `type-id=<core_id>`
### Limitations
* **cgroups v2 required:** This collector only supports systems running with cgroups v2 (unified hierarchy).