2.2 KiB
nvidiaGPM collector
"nvidia_gpm": {
"metrics": [
"nv_fb_mem_used",
"nv_fan"
],
"exclude_devices": [
"0","1", "0000000:ff:01.0"
],
"process_mig_devices": false,
"use_pci_info_as_type_id": true,
"add_pci_info_tag": false,
"add_uuid_meta": false,
"add_board_number_meta": false,
"add_serial_meta": false,
"use_uuid_for_mig_device": false,
"use_slice_for_mig_device": false
}
The nvidia_gpm collector can be configured to leave out specific devices with the exclude_devices option. It takes IDs as supplied to the NVML with nvmlDeviceGetHandleByIndex() or the PCI address in NVML format (%08X:%02X:%02X.0). Commonly only the physical GPUs are monitored. If MIG devices should be analyzed as well, set process_mig_devices (adds stype=mig,stype-id=<mig_index>). With the options use_uuid_for_mig_device and use_slice_for_mig_device, the <mig_index> can be replaced with the UUID (e.g. MIG-6a9f7cc8-6d5b-5ce0-92de-750edc4d8849) or the MIG slice name (e.g. 1g.5gb).
The metrics sent by the nvidia_gpm collector use accelerator as type tag. For the type-id, it uses the device handle index by default. With the use_pci_info_as_type_id option, the PCI ID is used instead. If both values should be added as tags, activate the add_pci_info_tag option. It uses the device handle index as type-id and adds the PCI ID as separate pci_identifier tag.
Optionally, it is possible to add the UUID, the board part number and the serial to the meta informations. They are not sent to the sinks (if not configured otherwise).
Available Metrics:
nv_gpm_graphics_utilnv_gpm_sm_utilnv_gpm_sm_occupancynv_gpm_integer_utilnv_gpm_any_tensor_utilnv_gpm_dfma_tensor_utilnv_gpm_hmma_tensor_utilnv_gpm_imma_tensor_utilnv_gpm_dram_bw_utilnv_gpm_fp64_utilnv_gpm_fp32_utilnv_gpm_fp16_util