1.6 KiB
lustre_jobstat collector
Note: This collector is meant to run on the Lustre servers, not the clients
The Lustre filesystem provides a feature (job_stats) to group processes on client side with an identifier string (like a compute job with its jobid) and retrieve the file system operation counts on the server side. Check the section How to configure job_stats for more information.
Configuration
"lustre_jobstat_": {
"lctl_command": "/path/to/lctl",
"use_sudo": false,
"exclude_metrics": [
"setattr",
"getattr"
],
"send_abs_values" : true,
"jobid_regex" : "^(?P<jobid>[%d%w%.]+)$"
}
The lustre_jobstat collector uses the lctl application with the get_param option to get all mdt.*.job_stats and obdfilter.*.job_stats metrics. These metrics are only available for root users. If password-less sudo is configured, you can enable sudo in the configuration. In the exclude_metrics list, some metrics can be excluded to reduce network traffic and storage. With the send_abs_values flag, the collector sends absolute values for the configured metrics. The jobid_regex can be used to split the Lustre job_stats identifier into multiple parts. Since JSON cannot handle strings like \d, use % instead of \.
Metrics:
lustre_job_read_samples(unit:requests)lustre_job_read_min_bytes(unit:bytes)lustre_job_read_max_bytes(unit:bytes)
The collector adds the tags: type=jobid,typeid=<jobid_from_regex>,stype=device,stype=<device_name_from_output>.
The collector adds the mega information: unit=<unit>,scope=job