Added new Docker Compose setup

This commit is contained in:
Bole Ma
2023-12-08 15:50:11 +01:00
parent cf7ddde758
commit 237c0ecd43
12 changed files with 1522 additions and 174 deletions

158
README.md
View File

@@ -1,74 +1,116 @@
# cc-docker
# Slurm Docker Cluster
This is a `docker-compose` setup which provides a quickly started environment for ClusterCockpit development and testing, using `cc-backend`.
A number of services is readily available as docker container (nats, cc-metric-store, InfluxDB, LDAP), or easily added by manual configuration (MySQL).
This is a multi-container Slurm cluster using docker-compose. The compose file
creates named volumes for persistent storage of MySQL data files as well as
Slurm state and log directories.
It includes the following containers:
* nats (Default)
* cc-metric-store (Default)
* influxdb (Default)
* openldap (Default)
* mysql (Optional)
* mariadb (Optional)
* phpmyadmin (Optional)
## Containers and Volumes
The setup comes with fixture data for a Job archive, cc-metric-store checkpoints, InfluxDB, MySQL, and a LDAP user directory.
The compose file will run the following containers:
## Known Issues
* mysql
* slurmdbd
* slurmctld
* c1 (slurmd)
* c2 (slurmd)
* `docker-compose` installed on Ubuntu (18.04, 20.04) via `apt-get` can not correctly parse `docker-compose.yml` due to version differences. Install latest version of `docker-compose` from https://docs.docker.com/compose/install/ instead.
* You need to ensure that no other web server is running on ports 8080 (cc-backend), 8081 (phpmyadmin), 8084 (cc-metric-store), 8086 (nfluxDB), 4222 and 8222 (Nats), or 3306 (MySQL). If one or more ports are already in use, you habe to adapt the related config accordingly.
* Existing VPN connections sometimes cause problems with docker. If `docker-compose` does not start up correctly, try disabling any active VPN connection. Refer to https://stackoverflow.com/questions/45692255/how-make-openvpn-work-with-docker for further information.
The compose file will create the following named volumes:
## Configuration Templates
* etc_munge ( -> /etc/munge )
* etc_slurm ( -> /etc/slurm )
* slurm_jobdir ( -> /data )
* var_lib_mysql ( -> /var/lib/mysql )
* var_log_slurm ( -> /var/log/slurm )
Located in `./templates`
* `docker-compose.yml.default`: Docker-Compose file to setup cc-metric-store, InfluxDB, MariaDB, PhpMyadmin, and LDAP containers (Default). Used in `setupDev.sh`.
* `docker-compose.yml.mysql`: Docker-Compose configuration template if MySQL is desired instead of MariaDB.
* `env.default`: Environment variables for setup with cc-metric-store, InfluxDB, MariaDB, PhpMyadmin, and LDAP containers (Default). Used in `setupDev.sh`.
* `env.mysql`: Additional environment variables required if MySQL is desired instead of MariaDB.
## Building the Docker Image
## Setup
Build the image locally:
1. Clone `cc-backend` repository in chosen base folder: `$> git clone https://github.com/ClusterCockpit/cc-backend.git`
2. Run `$ ./setupDev.sh`: **NOTICE** The script will download files of a total size of 338MB (mostly for the InfluxDB data).
3. The setup-script launches the supporting container stack in the background automatically if everything went well. Run `$> ./cc-backend/cc-backend` to start `cc-backend.`
4. By default, you can access `cc-backend` in your browser at `http://localhost:8080`. You can shut down the cc-backend server by pressing `CTRL-C`, remember to also shut down all containers via `$> docker-compose down` afterwards.
5. You can restart the containers with: `$> docker-compose up -d`.
## Post-Setup Adjustment for using `influxdb`
When using `influxdb` as a metric database, one must adjust the following files:
* `cc-backend/var/job-archive/emmy/cluster.json`
* `cc-backend/var/job-archive/woody/cluster.json`
In the JSON, exchange the content of the `metricDataRepository`-Entry (By default configured for `cc-metric-store`) with:
```console
docker build -t slurm-docker-cluster:21.08.6 .
```
"metricDataRepository": {
"kind": "influxdb",
"url": "http://localhost:8086",
"token": "egLfcf7fx0FESqFYU3RpAAbj",
"bucket": "ClusterCockpit",
"org": "ClusterCockpit",
"skiptls": false
}
Build a different version of Slurm using Docker build args and the Slurm Git
tag:
```console
docker build --build-arg SLURM_TAG="slurm-19-05-2-1" -t slurm-docker-cluster:19.05.2 .
```
Or equivalently using `docker-compose`:
```console
SLURM_TAG=slurm-19-05-2-1 IMAGE_TAG=19.05.2 docker-compose build
```
## Usage
## Starting the Cluster
Credentials for the preconfigured demo user are:
* User: `demo`
* Password: `AdminDev`
Run `docker-compose` to instantiate the cluster:
You can also login as regular user using any credential in the LDAP user directory at `./data/ldap/users.ldif`.
```console
IMAGE_TAG=19.05.2 docker-compose up -d
```
TODO: Update job archive and all other metric data.
The job archive with 1867 jobs originates from the second half of 2020.
Roughly 2700 jobs from the first week of 2021 are loaded with data from InfluxDB.
Some views of ClusterCockpit (e.g. the Users view) show the last week or month.
To show some data there you have to set the filter to time periods with jobs (August 2020 to January 2021).
## Register the Cluster with SlurmDBD
To register the cluster to the slurmdbd daemon, run the `register_cluster.sh`
script:
```console
./register_cluster.sh
```
> Note: You may have to wait a few seconds for the cluster daemons to become
> ready before registering the cluster. Otherwise, you may get an error such
> as **sacctmgr: error: Problem talking to the database: Connection refused**.
>
> You can check the status of the cluster by viewing the logs: `docker-compose
> logs -f`
## Accessing the Cluster
Use `docker exec` to run a bash shell on the controller container:
```console
docker exec -it slurmctld bash
```
From the shell, execute slurm commands, for example:
```console
[root@slurmctld /]# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 5-00:00:00 2 idle c[1-2]
```
## Submitting Jobs
The `slurm_jobdir` named volume is mounted on each Slurm container as `/data`.
Therefore, in order to see job output files while on the controller, change to
the `/data` directory when on the **slurmctld** container and then submit a job:
```console
[root@slurmctld /]# cd /data/
[root@slurmctld data]# sbatch --wrap="uptime"
Submitted batch job 2
[root@slurmctld data]# ls
slurm-2.out
```
## Stopping and Restarting the Cluster
```console
docker-compose stop
docker-compose start
```
## Deleting the Cluster
To remove all containers and volumes, run:
```console
docker-compose stop
docker-compose rm -f
docker volume rm slurm-docker-cluster_etc_munge slurm-docker-cluster_etc_slurm slurm-docker-cluster_slurm_jobdir slurm-docker-cluster_var_lib_mysql slurm-docker-cluster_var_log_slurm
```