* cpustatMetric.go: Use derived values instead of absolute values
The values in /proc/stat are absolute counters related to the boot
time of the system. To obtain a utilization of the CPU, the changes
in the counters must be derived according to time. To take only the
absolute values leads to the fact that changes in the utilization,
straight with larger values, do not become visible.
* Add new collector for /proc/schedstat
The `schedstat` collector reads data from /proc/schedstat and calculates
a load value, separated by hwthread. This might be useful to detect bad
cpu pinning on shared nodes etc.
Co-authored-by: Michael Schwarz <post@michael-schwarz.name>
* Cleanup: Remove unused code
* Use Golang duration parser for 'interval' and 'duration'
in main config
* Update handling of LIKWID headers. Download only if not already present in the system. Fixes#73
* Units with cc-units (#64)
* Add option to normalize units with cc-unit
* Add unit conversion to router
* Add option to change unit prefix in the router
* Add to MetricRouter README
* Add order of operations in router to README
* Use second add_tags/del_tags only if metric gets renamed
* Skip disks in DiskstatCollector that have size=0
* Check readability of sensor files in TempCollector
* Fix for --once option
* Rename `cpu` type to `hwthread` (#69)
* Rename 'cpu' type to 'hwthread' to avoid naming clashes with MetricStore and CC-Webfrontend
* Collectors in parallel (#74)
* Provide info to CollectorManager whether the collector can be executed in parallel with others
* Split serial and parallel collectors. Read in parallel first
* Update NvidiaCollector with new metrics, MIG and NvLink support (#75)
* CC topology module update (#76)
* Rename CPU to hardware thread, write some comments
* Do renaming in other parts
* Remove CpuList and SocketList function from metricCollector. Available in ccTopology
* Option to use MIG UUID as subtype-id in NvidiaCollector
* Option to use MIG slice name as subtype-id in NvidiaCollector
* MetricRouter: Fix JSON in README
* Fix for Github Action to really use the selected version
* Remove Ganglia installation in runonce Action and add Go 1.18
* Fix daemon options in init script
* Add separate go.mod files to use it with deprecated 1.16
* Minor updates for Makefiles
* fix string comparison
* AMD ROCm SMI collector (#77)
* Add collector for AMD ROCm SMI metrics
* Fix import path
* Fix imports
* Remove Board Number
* store GPU index explicitly
* Remove board number from description
* Use http instead of ftp to download likwid
* Fix serial number in rocmCollector
* Improved http sink (#78)
* automatic flush in NatsSink
* tweak default options of HttpSink
* shorter cirt. section and retries for HttpSink
* fix error handling
* Remove file added by mistake.
* Use http instead of ftp to download likwid
* Fix serial number in rocmCollector
Co-authored-by: Thomas Roehl <thomas.roehl@fau.de>
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
Co-authored-by: Lou <lou.knauer@gmx.de>
* Update configuration.md
Add an additional receiver to have better alignment of components
* Change default GpfsCollector command to `mmpmon` (#53)
* Set default cmd to 'mmpmon'
* Reuse looked up path
* Cast const to string
* Just download LIKWID to get the headers (#54)
* Just download LIKWID to get the headers
* Remove perl-Data-Dumper from BuildRequires, only required by LIKWID build
* Add HttpReceiver as counterpart to the HttpSink (#49)
* Use GBytes as unit for large memory numbers
* Make maxForward configurable, save old name in meta in rename metrics and make the hostname tag key configurable
* Single release action (#55)
Building all RPMs and releasing in a single workflow
* Makefile target to build binary-only Debian packages (#61)
* Add 'install' and 'DEB' make targets to build binary-only Debian packages
* Add control file for DEB builds
* Use a single line for bash loop in make clean
* Add config options for retry intervals of InfluxDB clients (#59)
* Refactoring of LikwidCollector and metric units (#62)
* Reduce complexity of LikwidCollector and allow metric units
* Add unit to LikwidCollector docu and fix some typos
* Make library path configurable
* Use old metric name in Ganglia if rename has happened in the router (#60)
* Use old metric name if rename has happened in the router
* Also check for Ganglia renames for the oldname
* Derived metrics (#57)
* Add time-based derivatived (e.g. bandwidth) to some collectors
* Add documentation
* Add comments
* Fix: Only compute rates with a valid previous state
* Only compute rates with a valid previous state
* Define const values for net/dev fields
* Set default config values
* Add comments
* Refactor: Consolidate data structures
* Refactor: Consolidate data structures
* Refactor: Avoid struct deep copy
* Refactor: Avoid redundant tag maps
* Refactor: Use int64 type for absolut values
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
* Simplified iota usage
* Move unit tag to meta data tags
* Derived metrics (#65)
* Add time-based derivatived (e.g. bandwidth) to some collectors
* Add documentation
* Add comments
* Fix: Only compute rates with a valid previous state
* Only compute rates with a valid previous state
* Define const values for net/dev fields
* Set default config values
* Add comments
* Refactor: Consolidate data structures
* Refactor: Consolidate data structures
* Refactor: Avoid struct deep copy
* Refactor: Avoid redundant tag maps
* Refactor: Use int64 type for absolut values
* Update LustreCollector
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
* Meta to tags list and map for sinks (#63)
* Change ccMetric->Influx functions
* Use a meta_as_tags string list in config but create a lookup map afterwards
* Add meta as tag logic to sampleSink
* Fix staticcheck warnings (#66)
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>
* Use channels, add a metric router, split up configuration and use extended version of Influx line protocol internally
* Use central timer for collectors and router. Add expressions to router
* Add expression to router config
* Update entry points
* Start with README
* Update README for CCMetric
* Formatting
* Update README.md
* Add README for MultiChanTicker
* Add README for MultiChanTicker
* Update README.md
* Add README to metric router
* Update main README
* Remove SinkEntity type
* Update README for sinks
* Update go files
* Update README for receivers
* Update collectors README
* Update collectors README
* Use seperate page per collector
* Fix for tempstat page
* Add docs for customcmd collector
* Add docs for ipmistat collector
* Add docs for topprocs collector
* Update customCmdMetric.md
* Use seconds when calculating LIKWID metrics
* Add IB metrics ib_recv_pkts and ib_xmit_pkts
* Drop domain part of host name
* Updated to latest stable version of likwid
* Define source code dependencies in Makefile
* Add GPFS / IBM Spectrum Scale collector
* Add vet and staticcheck make targets
* Add vet and staticcheck make targets
* Avoid go vet warning:
struct field tag `json:"..., omitempty"` not compatible with reflect.StructTag.Get: suspicious space in struct tag value
struct field tag `json:"...", omitempty` not compatible with reflect.StructTag.Get: key:"value" pairs not separated by spaces
* Add sample collector to README.md
* Add CPU frequency collector
* Avoid staticcheck warning: redundant return statement
* Avoid staticcheck warning: unnecessary assignment to the blank identifier
* Simplified code
* Add CPUFreqCollectorCpuinfo
a metric collector to measure the current frequency of the CPUs
as obtained from /proc/cpuinfo
Only measure on the first hyperthread
* Add collector for NFS clients
* Move publication of metrics into Flush() for NatsSink
* Update GitHub actions
* Refactoring
* Avoid vet warning: Println arg list ends with redundant newline
* Avoid vet warning struct field commands has json tag but is not exported
* Avoid vet warning: return copies lock value.
* Corrected typo
* Refactoring
* Add go sources in internal/...
* Bad separator in Makefile
* Fix Infiniband collector
Co-authored-by: Holger Obermaier <40787752+ho-ob@users.noreply.github.com>