mirror of
https://github.com/ClusterCockpit/cc-examples.git
synced 2026-03-17 22:17:30 +01:00
Rename folder and update config
This commit is contained in:
37
nhr@fau/README.md
Normal file
37
nhr@fau/README.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# ClusterCockpit at NHR@FAU
|
||||
|
||||
NHR@FAU provides a production instance of ClusterCockpit for support personnel
|
||||
and users. Authentication is via an LDAP directory as well as via our HPC Portal
|
||||
(homegrown account management platform) using JWT tokens.
|
||||
|
||||
You can find an overview about all clusters
|
||||
[here](https://doc.nhr.fau.de/clusters/overview/).
|
||||
|
||||
Some systems run with job exclusive nodes, others have node sharing enabled.
|
||||
There are CPU systems (Fritz, Meggie, Woody, TinyFat) as well as GPU accelerated
|
||||
clusters (Alex, TinyGPU).
|
||||
|
||||
NHR@FAU uses the following stack:
|
||||
|
||||
* `cc-metric-collector` as node agent
|
||||
* `cc-metric-store` as temporal metric time series cache. We use one instance
|
||||
for all clusters.
|
||||
* `cc-backend`
|
||||
* A homegrown python script running on the management nodes for providing job
|
||||
meta data from Slurm
|
||||
* Builtin sqlite database for job meta and user data (currently 50GB large)
|
||||
* Job Archive without retention using compressed data.json files (around 700GB)
|
||||
|
||||
Currently all API use regular HTTP protocol, but we plan to switch to NATS for
|
||||
all communication.
|
||||
We also push the metric data to an InfluxDB instance for debugging purposes.
|
||||
|
||||
The backend and metric store run on the same dedicated Dell server running
|
||||
Ubuntu Linux:
|
||||
|
||||
* Two Intel Xeon(R) Platinum 8352Y with 32 cores each
|
||||
* 512 GB Main memory capacity
|
||||
* A NVMe Raid with two 7TB disks
|
||||
|
||||
This configuration is probably complete overkill, but we wanted to be on the
|
||||
safe side.
|
||||
Reference in New Issue
Block a user