mirror of
https://github.com/ClusterCockpit/cc-examples.git
synced 2026-03-17 22:17:30 +01:00
Update config for v1.5.0
This commit is contained in:
@@ -9,25 +9,15 @@ You can find an overview about all clusters
|
||||
|
||||
Some systems run with job exclusive nodes, others have node sharing enabled.
|
||||
There are CPU systems (Fritz, Meggie, Woody, TinyFat) as well as GPU accelerated
|
||||
clusters (Alex, TinyGPU).
|
||||
clusters (Alex, Helma, TinyGPU).
|
||||
|
||||
NHR@FAU uses the following stack:
|
||||
|
||||
* `cc-metric-collector` as node agent
|
||||
* `cc-metric-store` as temporal metric time series cache. We use one instance
|
||||
for all clusters.
|
||||
* `cc-metric-collector`
|
||||
* `cc-backend`
|
||||
* A homegrown python script running on the management nodes for providing job
|
||||
meta data from Slurm
|
||||
* Builtin sqlite database for job meta and user data (currently 50GB large)
|
||||
* Job Archive without retention using compressed data.json files (around 700GB)
|
||||
* `cc-slurm-adapter`
|
||||
|
||||
Currently all API use regular HTTP protocol, but we plan to switch to NATS for
|
||||
all communication.
|
||||
We also push the metric data to an InfluxDB instance for debugging purposes.
|
||||
|
||||
The backend and metric store run on the same dedicated Dell server running
|
||||
Ubuntu Linux:
|
||||
We use the following server with Ubuntu Linux:
|
||||
|
||||
* Two Intel Xeon(R) Platinum 8352Y with 32 cores each
|
||||
* 512 GB Main memory capacity
|
||||
|
||||
Reference in New Issue
Block a user