mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-10-07 23:04:32 +02:00
49 lines
1.5 KiB
Markdown
49 lines
1.5 KiB
Markdown
<!--
|
|
---
|
|
title: Slurm cgroup metric collector
|
|
description: Collect per-core memory and CPU usage for SLURM jobs from cgroup v2
|
|
categories: [cc-metric-collector]
|
|
tags: ['Admin']
|
|
weight: 3
|
|
hugo_path: docs/reference/cc-metric-collector/collectors/slurm_cgroup.md
|
|
---
|
|
-->
|
|
|
|
## `slurm_cgroup` collector
|
|
|
|
The `slurm_cgroup` collector reads job-specific resource metrics from the cgroup v2 filesystem and provides **hwthread** metrics for memory and CPU usage of running SLURM jobs.
|
|
|
|
### Example configuration
|
|
|
|
```json
|
|
"slurm_cgroup": {
|
|
"cgroup_base": "/sys/fs/cgroup/system.slice/slurmstepd.scope",
|
|
"exclude_metrics": [
|
|
"job_sys_cpu",
|
|
"job_mem_limit"
|
|
]
|
|
}
|
|
```
|
|
|
|
* The `cgroup_base` parameter (optional) can be set to specify the root path to SLURM job cgroups. The default is `/sys/fs/cgroup/system.slice/slurmstepd.scope`.
|
|
* The `exclude_metrics` array can be used to suppress individual metrics from being sent to the sink.
|
|
|
|
### Reported metrics
|
|
|
|
All metrics are available **per hardware thread** :
|
|
|
|
* `job_mem_used` (`unit=Bytes`): Current memory usage of the job
|
|
* `job_max_mem_used` (`unit=Bytes`): Peak memory usage
|
|
* `job_mem_limit` (`unit=Bytes`): Cgroup memory limit
|
|
* `job_user_cpu` (`unit=%`): User CPU utilization percentage
|
|
* `job_sys_cpu` (`unit=%`): System CPU utilization percentage
|
|
|
|
Each metric has tags:
|
|
|
|
* `type=hwthread`
|
|
* `type-id=<core_id>`
|
|
|
|
### Limitations
|
|
|
|
* **cgroups v2 required:** This collector only supports systems running with cgroups v2 (unified hierarchy).
|