mirror of
https://github.com/ClusterCockpit/cc-metric-collector.git
synced 2024-11-10 04:27:25 +01:00
162cce0fda
* InfiniBandCollector: Scale raw readings from octets to bytes * Fix clock frequency coming from LikwidCollector and update docs * Build DEB package for Ubuntu 20.04 for releases * Fix memstat collector with numa_stats option * Remove useless prints from MemstatCollector * Replace ioutils with os and io (#87) * Use lower case for error strings in RocmSmiCollector * move maybe-usable-by-other-cc-components to pkg. Fix all files to use the new paths (#88) * Add collector for monitoring the execution of cc-metric-collector itself (#81) * Add collector to monitor execution of cc-metric-collector itself * Register SelfCollector * Fix import paths for moved packages * Check if at least one CPU with frequency information was detected * Correct type: /proc/stats -> /proc/stat * Update README.md * Run ipmitool asynchron. Improved error handling. * Corrected some typos * Add running average power limit (RAPL) metric collector * Add running average power limit (RAPL) metric collector * Do not mess up with the orignal configuration * * Corrected json config in numastatsMetric.md * Added some debug output to numastatsMetric.go * Fixed computing number of physical packages for non continous physical package IDs (e.g. on Ampere Altra Q80-30) * Fix kernel panic for receiver config with missing receiver type * Add receiver to gather remote IPMI sensor metrics * Added config option to add ipmi-sensors command line options * Add documentaion for IPMI receiver * Update to latest version of included go modules * Add go.mod to App dependency * Try to use common metric tags across hardware vendors * Add IPMI metric: current * remove prefix enumeration like 01-... * Add IPMI receiver example configuration to receivers.json * Minimal formating changes * Add hostlist package * Added tests for hostlist Expand() * Use package hostlist to expand a host list * Use package hostlist to expand a host list * Some servers return "ConsumedPowerWatt":65535 instead of "ConsumedPowerWatt":null * Updated to latest package versions * Do not allow unknown fields in JSON configuration file * Add workflow to customize packages to docs * NFS I/O Stats Collector (#91) * Initial version * Delete values for vanished mount points and comments * Fix for Likwid collector (#95) * Run LIKWID in separate thread and check metric type * Change LIKWID collector documentation to use 'type' instead of 'scope' * Re-initialize LIKWID after one read is missing due to lock toggle * Register cc-metric-collector at Zenodo (#93) * Add initial version of Zenodo project file * Orcid ID added * Update .zenodo.json Co-authored-by: Holger Obermaier <holger.obermaier@kit.edu> * Update ipmiMetric.go Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com> Co-authored-by: Holger Obermaier <Holger.Obermaier@kit.edu>
55 lines
2.3 KiB
Markdown
55 lines
2.3 KiB
Markdown
## Redfish receiver
|
|
|
|
The Redfish receiver uses the [Redfish (specification)](https://www.dmtf.org/standards/redfish) to query thermal and power metrics. Thermal metrics may include various fan speeds and temperatures. Power metrics may include the current power consumption of various hardware components. It may also include the minimum, maximum and average power consumption of these components in a given time interval. The receiver will poll each configured redfish device once in a given interval. Multiple devices can be accessed in parallel to increase throughput.
|
|
|
|
### Configuration structure
|
|
|
|
```json
|
|
{
|
|
"<redfish receiver name>": {
|
|
"type": "redfish",
|
|
"username": "<Username>",
|
|
"password": "<Password>",
|
|
"endpoint": "https://%h-bmc",
|
|
"exclude_metrics": [ "min_consumed_watts" ],
|
|
"client_config": [
|
|
{
|
|
"host_list": "n[1,2-4]"
|
|
},
|
|
{
|
|
"host_list": "n5"
|
|
"disable_power_metrics": true
|
|
},
|
|
{
|
|
"host_list": "n6" ],
|
|
"username": "<Username 2>",
|
|
"password": "<Password 2>",
|
|
"endpoint": "https://%h-BMC",
|
|
"disable_thermal_metrics": true
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
Global settings:
|
|
|
|
- `fanout`: Maximum number of simultaneous redfish connections (default: 64)
|
|
- `interval`: How often the redfish power metrics should be read and send to the sink (default: 30 s)
|
|
- `http_insecure`: Control whether a client verifies the server's certificate (default: true == do not verify server's certificate)
|
|
- `http_timeout`: Time limit for requests made by this HTTP client (default: 10 s)
|
|
|
|
Global and per redfish device settings (per redfish device settings overwrite the global settings):
|
|
|
|
- `disable_power_metrics`: disable collection of power metrics
|
|
- `disable_processor_metrics`: disable collection of processor metrics
|
|
- `disable_thermal_metrics`: disable collection of thermal metrics
|
|
- `exclude_metrics`: list of excluded metrics
|
|
- `username`: User name to authenticate with
|
|
- `password`: Password to use for authentication
|
|
- `endpoint`: URL of the redfish service (placeholder `%h` gets replaced by the hostname)
|
|
|
|
Per redfish device settings:
|
|
|
|
- `host_list`: List of hosts with the same client configuration
|