mirror of
				https://github.com/ClusterCockpit/cc-metric-store.git
				synced 2025-10-31 09:05:06 +01:00 
			
		
		
		
	Update README and Remove TODO
This commit is contained in:
		
							
								
								
									
										37
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										37
									
								
								README.md
									
									
									
									
									
								
							| @@ -6,16 +6,16 @@ The cc-metric-store provides a simple in-memory time series database for storing | |||||||
| metrics of cluster nodes at preconfigured intervals. It is meant to be used as | metrics of cluster nodes at preconfigured intervals. It is meant to be used as | ||||||
| part of the [ClusterCockpit suite](https://github.com/ClusterCockpit). As all | part of the [ClusterCockpit suite](https://github.com/ClusterCockpit). As all | ||||||
| data is kept in-memory (but written to disk as compressed JSON for long term | data is kept in-memory (but written to disk as compressed JSON for long term | ||||||
| storage), accessing it is very fast. It also provides aggregations over time | storage), accessing it is very fast. It also provides topology aware | ||||||
| _and_ nodes/sockets/cpus. | aggregations over time _and_ nodes/sockets/cpus. | ||||||
|  |  | ||||||
| There are major limitations: Data only gets written to disk at periodic | There are major limitations: Data only gets written to disk at periodic | ||||||
| checkpoints, not as soon as it is received. | checkpoints, not as soon as it is received. Also only the fixed configured | ||||||
|  | duration is stored and available. | ||||||
|  |  | ||||||
| Go look at the `TODO.md` file and the [GitHub | Go look at the [GitHub | ||||||
| Issues](https://github.com/ClusterCockpit/cc-metric-store/issues) for a progress | Issues](https://github.com/ClusterCockpit/cc-metric-store/issues) for a progress | ||||||
| overview. Things work, but are not properly tested. The | overview. The [NATS.io](https://nats.io/) based writing endpoint consumes messages in [this | ||||||
| [NATS.io](https://nats.io/) based writing endpoint consumes messages in [this |  | ||||||
| format of the InfluxDB line | format of the InfluxDB line | ||||||
| protocol](https://github.com/ClusterCockpit/cc-specifications/blob/master/metrics/lineprotocol_alternative.md). | protocol](https://github.com/ClusterCockpit/cc-specifications/blob/master/metrics/lineprotocol_alternative.md). | ||||||
|  |  | ||||||
| @@ -42,19 +42,14 @@ go test -bench=. -race -v ./... | |||||||
|  |  | ||||||
| ## What are these selectors mentioned in the code? | ## What are these selectors mentioned in the code? | ||||||
|  |  | ||||||
| Tags in InfluxDB are used to build indexes over the stored data. InfluxDB-Tags | The cc-metric-store works as a time-series database and uses the InfluxDB line | ||||||
| have no relation to each other, they do not depend on each other and have no | protocol as input format. Unlike InfluxDB, the data is indexed by one single | ||||||
| hierarchy. Different tags build up different indexes (I am no expert at all, but | strictly hierarchical tree structure. A selector is build out of the tags in the | ||||||
| this is how i think they work). | InfluxDB line protocol, and can be used to select a node (not in the sense of a | ||||||
|  | compute node, can also be a socket, cpu, ...) in that tree. The implementation | ||||||
| This project also works as a time-series database and uses the InfluxDB line | calls those nodes `level` to avoid confusion. It is impossible to access data | ||||||
| protocol. Unlike InfluxDB, the data is indexed by one single strictly | only by knowing the _socket_ or _cpu_ tag, all higher up levels have to be | ||||||
| hierarchical tree structure. A selector is build out of the tags in the InfluxDB | specified as well. | ||||||
| line protocol, and can be used to select a node (not in the sense of a compute |  | ||||||
| node, can also be a socket, cpu, ...) in that tree. The implementation calls |  | ||||||
| those nodes `level` to avoid confusion. It is impossible to access data only by |  | ||||||
| knowing the _socket_ or _cpu_ tag, all higher up levels have to be specified as |  | ||||||
| well. |  | ||||||
|  |  | ||||||
| This is what the hierarchy currently looks like: | This is what the hierarchy currently looks like: | ||||||
|  |  | ||||||
| @@ -68,6 +63,8 @@ This is what the hierarchy currently looks like: | |||||||
|     - cpu3 |     - cpu3 | ||||||
|     - cpu4 |     - cpu4 | ||||||
|     - ... |     - ... | ||||||
|  |     - gpu1 | ||||||
|  |     - gpu2 | ||||||
|   - host2 |   - host2 | ||||||
|   - ... |   - ... | ||||||
| - cluster2 | - cluster2 | ||||||
| @@ -116,7 +113,7 @@ this](https://pkg.go.dev/time#ParseDuration) (Allowed suffixes: `s`, `m`, `h`, | |||||||
| There are two ways for sending data to the cc-metric-store, both of which are | There are two ways for sending data to the cc-metric-store, both of which are | ||||||
| supported by the | supported by the | ||||||
| [cc-metric-collector](https://github.com/ClusterCockpit/cc-metric-collector). | [cc-metric-collector](https://github.com/ClusterCockpit/cc-metric-collector). | ||||||
| This example uses Nats, the alternative is to use HTTP. | This example uses NATS, the alternative is to use HTTP. | ||||||
|  |  | ||||||
| ```sh | ```sh | ||||||
| # Only needed once, downloads the docker image | # Only needed once, downloads the docker image | ||||||
|   | |||||||
							
								
								
									
										51
									
								
								TODO.md
									
									
									
									
									
								
							
							
						
						
									
										51
									
								
								TODO.md
									
									
									
									
									
								
							| @@ -1,51 +0,0 @@ | |||||||
| # Possible Tasks and Improvements |  | ||||||
|  |  | ||||||
| Importance: |  | ||||||
|  |  | ||||||
| - **I** Important |  | ||||||
| - **N** Nice to have |  | ||||||
| - **W** Won't do. Probably not necessary. |  | ||||||
|  |  | ||||||
| - Benchmarking |  | ||||||
|   - Benchmark and compare common timeseries DBs with our data and our queries (N) |  | ||||||
| - Web interface |  | ||||||
|   - Provide simple http endpoint with a status and debug view (Start with Basic |  | ||||||
|     Authentication) |  | ||||||
| - Configuration |  | ||||||
|   - Consolidate configuration with cc-backend, remove redundant information |  | ||||||
|   - Support to receive configuration via NATS channel |  | ||||||
| - Memory management |  | ||||||
|   - To overcome garbage collection overhead: Reimplement in Rust (N) |  | ||||||
|   - Request memory directly batchwise via mmap (started in branch) (W) |  | ||||||
| - Archive |  | ||||||
|   - S3 backend for archive (I) |  | ||||||
|   - Store information in each buffer if already archived (N) |  | ||||||
|   - Do not create new checkpoint if all buffers already archived (N) |  | ||||||
| - Checkpoints |  | ||||||
|   - S3 backend for checkpoints (I) |  | ||||||
|   - Combine checkpoints into larger files (I) |  | ||||||
|   - Binary checkpoints (started in branch) (W) |  | ||||||
| - API |  | ||||||
|   - Redesign query interface (N) |  | ||||||
|   - Provide an endpoint for node health based on received metric data (I) |  | ||||||
|   - Introduce JWT authentication for REST and NATS (I) |  | ||||||
| - Testing |  | ||||||
|   - General tests (I) |  | ||||||
|   - Test data generator for regression tests (I) |  | ||||||
|   - Check for corner cases that should fail gracefully (N) |  | ||||||
|   - Write a more realistic `ToArchive`/`FromArchive` Tests (N) |  | ||||||
| - Aggregation |  | ||||||
|   - Calculate averages buffer-wise as soon as full, average weighted by length of buffer (N) |  | ||||||
|   - Only the head-buffer needs to be fully traversed (N) |  | ||||||
|   - If aggregating over hwthreads/cores/sockets cache those results and reuse |  | ||||||
|     some of that for new queries aggregating only over the newer data (W) |  | ||||||
| - Core functionality |  | ||||||
|   - Implement a health checker component that provides information to the web |  | ||||||
|     interface and REST API (I) |  | ||||||
|   - Support units for metrics including to request unit conversions (I) |  | ||||||
| - Compression |  | ||||||
|   - Enable compression for http API requests (N) |  | ||||||
|   - Enable compression for checkpoints/archive (I) |  | ||||||
| - Sampling |  | ||||||
|   - Support data re sampling to reduce data points (I) |  | ||||||
|   - Use re sampling algorithms that preserve min/max as far as possible (I) |  | ||||||
		Reference in New Issue
	
	Block a user