mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2025-01-15 00:29:09 +01:00
40 lines
1.9 KiB
Markdown
40 lines
1.9 KiB
Markdown
# `slurm` collector
|
|
|
|
```json
|
|
"slurm": {
|
|
"interval" : "1s",
|
|
"send_job_events" : true,
|
|
"send_job_metrics" : true,
|
|
"send_step_events": false,
|
|
"send_step_metrics" : false,
|
|
}
|
|
```
|
|
|
|
The `slurm` collector reads the data from `/sys/fs/cgroup/` to detect the creation and deletion of SLURM jobs on the node. Then detecting an event, it collects some event related information and sends the event. The event detection happens every `interval`.
|
|
|
|
Additionally, for all running jobs, is can collect metrics and send them out. This collection is done in the global collector interval.
|
|
|
|
Options:
|
|
* `interval`: Time interval in which the folders are checked for new or vanished SLURM jobs
|
|
* `send_job_events`: Send events when a job starts or ends
|
|
* `send_job_metrics`: Send metrics of each running job with the global collector interval
|
|
* `send_step_events`: Send events when a job step starts
|
|
* `send_step_metrics`: Send metrics of each job step with the global collector interval
|
|
|
|
## Testing
|
|
For testing the collector, you can specifiy a different base directory that should be checked for new events. The default is `/sys/fs/cgroup/`. By specifying a `sysfs_base` in the configuration, this can be changed. Moreover, with the `slurmJobDetector_dummy.sh`, you can create and delete "jobs" for testing. Use the same directory with `--basedir`
|
|
|
|
```sh
|
|
$ slurmJobDetector_dummy.sh -h
|
|
|
|
Usage: slurmJobDetector_dummy.sh <opts>
|
|
[ -h | --help ]
|
|
[ -v | --verbosity ]
|
|
[ -u | --uid <UID> (default: XXXX) ]
|
|
[ -j | --jobid <JOBID> (default: random) ]
|
|
[ -b | --basedir <JOBID> (default: ./slurm-test) ]
|
|
[ -d | --delete ]
|
|
[ -l | --list ]
|
|
```
|
|
|
|
With no options, it creates a job with the executing user's UID and a random JOBID. For deletion, use `-d -j JOBID`, deletion requires a JOBID. If you want to get a list of all UIDs and JOBIDs that currently exist, you can get the list with `-l`. |