cc-metric-collector/collectors/lustreJobstatMetric.md

1.6 KiB

lustre_jobstat collector

Note: This collector is meant to run on the Lustre servers, not the clients

The Lustre filesystem provides a feature (job_stats) to group processes on client side with an identifier string (like a compute job with its jobid) and retrieve the file system operation counts on the server side. Check the section How to configure job_stats for more information.

Configuration

  "lustre_jobstat_": {
    "lctl_command": "/path/to/lctl",
    "use_sudo": false,
    "exclude_metrics": [
      "setattr",
      "getattr"
    ],
    "send_abs_values" : true,
    
    "jobid_regex" : "^(?P<jobid>[%d%w%.]+)$"
  }

The lustre_jobstat collector uses the lctl application with the get_param option to get all mdt.*.job_stats and obdfilter.*.job_stats metrics. These metrics are only available for root users. If password-less sudo is configured, you can enable sudo in the configuration. In the exclude_metrics list, some metrics can be excluded to reduce network traffic and storage. With the send_abs_values flag, the collector sends absolute values for the configured metrics. The jobid_regex can be used to split the Lustre job_stats identifier into multiple parts. Since JSON cannot handle strings like \d, use % instead of \.

Metrics:

  • lustre_job_read_samples (unit: requests)
  • lustre_job_read_min_bytes (unit: bytes)
  • lustre_job_read_max_bytes (unit: bytes)

The collector adds the tags: type=jobid,typeid=<jobid_from_regex>,stype=device,stype=<device_name_from_output>.

The collector adds the mega information: unit=<unit>,scope=job

How to configure job_stats