1.9 KiB
rocm_smi collector
"rocm_smi": {
"exclude_devices": [
"0","1", "0000000:ff:01.0"
],
"exclude_metrics": [
"rocm_mm_util",
"rocm_temp_vrsoc"
],
"use_pci_info_as_type_id": true,
"add_pci_info_tag": false,
"add_serial_meta": false,
}
The rocm_smi collector can be configured to leave out specific devices with the exclude_devices option. It takes logical IDs in the list of available devices or the PCI address similar to NVML format (%08X:%02X:%02X.0). Metrics (listed below) that should not be sent to the MetricRouter can be excluded with the exclude_metrics option.
The metrics sent by the rocm_smi collector use accelerator as type tag. For the type-id, it uses the device handle index by default. With the use_pci_info_as_type_id option, the PCI ID is used instead. If both values should be added as tags, activate the add_pci_info_tag option. It uses the device handle index as type-id and adds the PCI ID as separate pci_identifier tag.
Optionally, it is possible to add the serial to the meta informations. They are not sent to the sinks (if not configured otherwise).
Metrics:
rocm_gfx_utilrocm_umc_utilrocm_mm_utilrocm_avg_powerrocm_temp_memrocm_temp_hotspotrocm_temp_edgerocm_temp_vrgfxrocm_temp_vrsocrocm_temp_vrmemrocm_gfx_clockrocm_soc_clockrocm_u_clockrocm_v0_clockrocm_v1_clockrocm_d0_clockrocm_d1_clockrocm_temp_hbm
Some metrics add the additional sub type tag (stype) like the rocm_temp_hbm metrics set stype=device,stype-id=<HBM_slice_number>.