mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2024-11-10 04:27:25 +01:00
Add SLURM collector to README
This commit is contained in:
parent
48335dd872
commit
113ccb3ac5
@ -40,6 +40,7 @@ In contrast to the configuration files for sinks and receivers, the collectors c
|
||||
* [`beegfs_meta`](./beegfsmetaMetric.md)
|
||||
* [`beegfs_storage`](./beegfsstorageMetric.md)
|
||||
* [`rocm_smi`](./rocmsmiMetric.md)
|
||||
* [`slurm`](./slurmJobDetector.md)
|
||||
|
||||
## Todos
|
||||
|
||||
|
25
collectors/slurmJobDetector.md
Normal file
25
collectors/slurmJobDetector.md
Normal file
@ -0,0 +1,25 @@
|
||||
# `slurm` collector
|
||||
|
||||
```json
|
||||
"slurm": {
|
||||
"interval" : "1s",
|
||||
"send_job_events" : true,
|
||||
"send_job_metrics" : true,
|
||||
"send_step_events": false,
|
||||
"send_step_metrics" : false,
|
||||
}
|
||||
```
|
||||
|
||||
The `slurm` collector reads the data from `/sys/fs/cgroup/` to detect the creation and deletion of SLURM jobs on the node. Then detecting an event, it collects some event related information and sends the event. The event detection happens every `interval`.
|
||||
|
||||
Additionally, for all running jobs, is can collect metrics and send them out. This collection is done in the global collector interval.
|
||||
|
||||
Options:
|
||||
* `interval`: Time interval in which the folders are checked for new or vanished SLURM jobs
|
||||
* `send_job_events`: Send events when a job starts or ends
|
||||
* `send_job_metrics`: Send metrics of each running job with the global collector interval
|
||||
* `send_step_events`: Send events when a job step starts
|
||||
* `send_step_metrics`: Send metrics of each job step with the global collector interval
|
||||
|
||||
Testing options:
|
||||
For testing the collector, you can specifiy a different base directory that should be checked for new events. The default is `/sys/fs/cgroup/`. By specifying a `sysfs_base` in the configuration, this can be changed. Moreover, with the `slurmJobDetector_dummy.sh`, you can create and delete "jobs" for testing.
|
Loading…
Reference in New Issue
Block a user