Files
cc-metric-collector/collectors/smartmonMetric.md

2.2 KiB

smartmon collector

  "smartmon": {
    "use_sudo": true,
    "exclude_devices": [
      "/dev/sda"
    ],
    "excludeMetrics": [
      "smartmon_warn_temp_time",
      "smartmon_crit_comp_time"
    ],
    "devices": [
      {
        "name": "/dev/nvme0",
        "type": "nvme"
      }
    ]
  }

The smartmon collector retrieves S.M.A.R.T data from NVMEs via command smartctl.

Available NVMEs can be either automatically detected by a device scan or manually added with the "devices" config option.

Metrics:

  • smartmon_temp: Temperature of the device (unit=degC)
  • smartmon_avail_spare: Amount of spare left (unit=percent)
  • smartmon_percent_used: Percentage of the device is used (unit=percent)
  • smartmon_data_units_read: Read data units
  • smartmon_data_units_write: Written data units
  • smartmon_host_reads: Read operations
  • smartmon_host_writes: Write operations
  • smartmon_power_cycles: Number of power cycles
  • smartmon_power_on: Seconds the device is powered on (unit=seconds)
  • smartmon_unsafe_shutdowns: Count of unsafe shutdowns
  • smartmon_media_errors: Media errors of the device
  • smartmon_errlog_entries: Error log entries
  • smartmon_warn_temp_time: Time above the warning temperature threshold
  • smartmon_crit_comp_time: Time above the critical composite temperature threshold

smartctl typically require root to run. In order to run cc-metric-collector without root priviliges, you can enable use_sudo. Add a file like this in /etc/sudoers.d/ to allow cc-metric-collector to run the required command:

# Do not log the following sudo commands from monitoring, since this causes a lot of log spam.
# However keep log_denied enabled, to detect failures
Defaults: monitoring !log_allowed, !pam_session

# Allow to use lctl
monitoring ALL = (root) NOPASSWD:/absolute/path/to/smartctl --json=c --device=* "--all" *
# Or add individual rules for each device
# monitoring ALL = (root) NOPASSWD:/absolute/path/to/smartctl --json=c --device=<device_type> "--all" <device>