mirror of
				https://github.com/ClusterCockpit/cc-metric-collector.git
				synced 2025-11-04 02:35:07 +01:00 
			
		
		
		
	Modularize the whole thing (#16)
* Use channels, add a metric router, split up configuration and use extended version of Influx line protocol internally * Use central timer for collectors and router. Add expressions to router * Add expression to router config * Update entry points * Start with README * Update README for CCMetric * Formatting * Update README.md * Add README for MultiChanTicker * Add README for MultiChanTicker * Update README.md * Add README to metric router * Update main README * Remove SinkEntity type * Update README for sinks * Update go files * Update README for receivers * Update collectors README * Update collectors README * Use seperate page per collector * Fix for tempstat page * Add docs for customcmd collector * Add docs for ipmistat collector * Add docs for topprocs collector * Update customCmdMetric.md * Use seconds when calculating LIKWID metrics * Add IB metrics ib_recv_pkts and ib_xmit_pkts * Drop domain part of host name * Updated to latest stable version of likwid * Define source code dependencies in Makefile * Add GPFS / IBM Spectrum Scale collector * Add vet and staticcheck make targets * Add vet and staticcheck make targets * Avoid go vet warning: struct field tag `json:"..., omitempty"` not compatible with reflect.StructTag.Get: suspicious space in struct tag value struct field tag `json:"...", omitempty` not compatible with reflect.StructTag.Get: key:"value" pairs not separated by spaces * Add sample collector to README.md * Add CPU frequency collector * Avoid staticcheck warning: redundant return statement * Avoid staticcheck warning: unnecessary assignment to the blank identifier * Simplified code * Add CPUFreqCollectorCpuinfo a metric collector to measure the current frequency of the CPUs as obtained from /proc/cpuinfo Only measure on the first hyperthread * Add collector for NFS clients * Move publication of metrics into Flush() for NatsSink * Update GitHub actions * Refactoring * Avoid vet warning: Println arg list ends with redundant newline * Avoid vet warning struct field commands has json tag but is not exported * Avoid vet warning: return copies lock value. * Corrected typo * Refactoring * Add go sources in internal/... * Bad separator in Makefile * Fix Infiniband collector Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
This commit is contained in:
		
							
								
								
									
										40
									
								
								collectors/nvidiaMetric.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										40
									
								
								collectors/nvidiaMetric.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,40 @@
 | 
			
		||||
 | 
			
		||||
## `nvidia` collector
 | 
			
		||||
 | 
			
		||||
```json
 | 
			
		||||
  "lustrestat": {
 | 
			
		||||
    "exclude_devices" : [
 | 
			
		||||
      "0","1"
 | 
			
		||||
    ],
 | 
			
		||||
    "exclude_metrics": [
 | 
			
		||||
      "fb_memory",
 | 
			
		||||
      "fan"
 | 
			
		||||
    ]
 | 
			
		||||
  }
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Metrics:
 | 
			
		||||
* `util`
 | 
			
		||||
* `mem_util`
 | 
			
		||||
* `mem_total`
 | 
			
		||||
* `fb_memory`
 | 
			
		||||
* `temp`
 | 
			
		||||
* `fan`
 | 
			
		||||
* `ecc_mode`
 | 
			
		||||
* `perf_state`
 | 
			
		||||
* `power_usage_report`
 | 
			
		||||
* `graphics_clock_report`
 | 
			
		||||
* `sm_clock_report`
 | 
			
		||||
* `mem_clock_report`
 | 
			
		||||
* `max_graphics_clock`
 | 
			
		||||
* `max_sm_clock`
 | 
			
		||||
* `max_mem_clock`
 | 
			
		||||
* `ecc_db_error`
 | 
			
		||||
* `ecc_sb_error`
 | 
			
		||||
* `power_man_limit`
 | 
			
		||||
* `encoder_util`
 | 
			
		||||
* `decoder_util`
 | 
			
		||||
 | 
			
		||||
It uses a separate `type` in the metrics. The output metric looks like this:
 | 
			
		||||
`<name>,type=accelerator,type-id=<nvidia-gpu-id> value=<metric value> <timestamp>`
 | 
			
		||||
 | 
			
		||||
		Reference in New Issue
	
	Block a user