mirror of
https://github.com/ClusterCockpit/cc-docker.git
synced 2025-08-14 23:22:59 +02:00
Added new Docker Compose setup
This commit is contained in:
158
README.md
158
README.md
@@ -1,74 +1,116 @@
|
||||
# cc-docker
|
||||
# Slurm Docker Cluster
|
||||
|
||||
This is a `docker-compose` setup which provides a quickly started environment for ClusterCockpit development and testing, using `cc-backend`.
|
||||
A number of services is readily available as docker container (nats, cc-metric-store, InfluxDB, LDAP), or easily added by manual configuration (MySQL).
|
||||
This is a multi-container Slurm cluster using docker-compose. The compose file
|
||||
creates named volumes for persistent storage of MySQL data files as well as
|
||||
Slurm state and log directories.
|
||||
|
||||
It includes the following containers:
|
||||
* nats (Default)
|
||||
* cc-metric-store (Default)
|
||||
* influxdb (Default)
|
||||
* openldap (Default)
|
||||
* mysql (Optional)
|
||||
* mariadb (Optional)
|
||||
* phpmyadmin (Optional)
|
||||
## Containers and Volumes
|
||||
|
||||
The setup comes with fixture data for a Job archive, cc-metric-store checkpoints, InfluxDB, MySQL, and a LDAP user directory.
|
||||
The compose file will run the following containers:
|
||||
|
||||
## Known Issues
|
||||
* mysql
|
||||
* slurmdbd
|
||||
* slurmctld
|
||||
* c1 (slurmd)
|
||||
* c2 (slurmd)
|
||||
|
||||
* `docker-compose` installed on Ubuntu (18.04, 20.04) via `apt-get` can not correctly parse `docker-compose.yml` due to version differences. Install latest version of `docker-compose` from https://docs.docker.com/compose/install/ instead.
|
||||
* You need to ensure that no other web server is running on ports 8080 (cc-backend), 8081 (phpmyadmin), 8084 (cc-metric-store), 8086 (nfluxDB), 4222 and 8222 (Nats), or 3306 (MySQL). If one or more ports are already in use, you habe to adapt the related config accordingly.
|
||||
* Existing VPN connections sometimes cause problems with docker. If `docker-compose` does not start up correctly, try disabling any active VPN connection. Refer to https://stackoverflow.com/questions/45692255/how-make-openvpn-work-with-docker for further information.
|
||||
The compose file will create the following named volumes:
|
||||
|
||||
## Configuration Templates
|
||||
* etc_munge ( -> /etc/munge )
|
||||
* etc_slurm ( -> /etc/slurm )
|
||||
* slurm_jobdir ( -> /data )
|
||||
* var_lib_mysql ( -> /var/lib/mysql )
|
||||
* var_log_slurm ( -> /var/log/slurm )
|
||||
|
||||
Located in `./templates`
|
||||
* `docker-compose.yml.default`: Docker-Compose file to setup cc-metric-store, InfluxDB, MariaDB, PhpMyadmin, and LDAP containers (Default). Used in `setupDev.sh`.
|
||||
* `docker-compose.yml.mysql`: Docker-Compose configuration template if MySQL is desired instead of MariaDB.
|
||||
* `env.default`: Environment variables for setup with cc-metric-store, InfluxDB, MariaDB, PhpMyadmin, and LDAP containers (Default). Used in `setupDev.sh`.
|
||||
* `env.mysql`: Additional environment variables required if MySQL is desired instead of MariaDB.
|
||||
## Building the Docker Image
|
||||
|
||||
## Setup
|
||||
Build the image locally:
|
||||
|
||||
1. Clone `cc-backend` repository in chosen base folder: `$> git clone https://github.com/ClusterCockpit/cc-backend.git`
|
||||
|
||||
2. Run `$ ./setupDev.sh`: **NOTICE** The script will download files of a total size of 338MB (mostly for the InfluxDB data).
|
||||
|
||||
3. The setup-script launches the supporting container stack in the background automatically if everything went well. Run `$> ./cc-backend/cc-backend` to start `cc-backend.`
|
||||
|
||||
4. By default, you can access `cc-backend` in your browser at `http://localhost:8080`. You can shut down the cc-backend server by pressing `CTRL-C`, remember to also shut down all containers via `$> docker-compose down` afterwards.
|
||||
|
||||
5. You can restart the containers with: `$> docker-compose up -d`.
|
||||
|
||||
## Post-Setup Adjustment for using `influxdb`
|
||||
|
||||
When using `influxdb` as a metric database, one must adjust the following files:
|
||||
* `cc-backend/var/job-archive/emmy/cluster.json`
|
||||
* `cc-backend/var/job-archive/woody/cluster.json`
|
||||
|
||||
In the JSON, exchange the content of the `metricDataRepository`-Entry (By default configured for `cc-metric-store`) with:
|
||||
```console
|
||||
docker build -t slurm-docker-cluster:21.08.6 .
|
||||
```
|
||||
"metricDataRepository": {
|
||||
"kind": "influxdb",
|
||||
"url": "http://localhost:8086",
|
||||
"token": "egLfcf7fx0FESqFYU3RpAAbj",
|
||||
"bucket": "ClusterCockpit",
|
||||
"org": "ClusterCockpit",
|
||||
"skiptls": false
|
||||
}
|
||||
|
||||
Build a different version of Slurm using Docker build args and the Slurm Git
|
||||
tag:
|
||||
|
||||
```console
|
||||
docker build --build-arg SLURM_TAG="slurm-19-05-2-1" -t slurm-docker-cluster:19.05.2 .
|
||||
```
|
||||
|
||||
Or equivalently using `docker-compose`:
|
||||
|
||||
```console
|
||||
SLURM_TAG=slurm-19-05-2-1 IMAGE_TAG=19.05.2 docker-compose build
|
||||
```
|
||||
|
||||
|
||||
## Usage
|
||||
## Starting the Cluster
|
||||
|
||||
Credentials for the preconfigured demo user are:
|
||||
* User: `demo`
|
||||
* Password: `AdminDev`
|
||||
Run `docker-compose` to instantiate the cluster:
|
||||
|
||||
You can also login as regular user using any credential in the LDAP user directory at `./data/ldap/users.ldif`.
|
||||
```console
|
||||
IMAGE_TAG=19.05.2 docker-compose up -d
|
||||
```
|
||||
|
||||
TODO: Update job archive and all other metric data.
|
||||
The job archive with 1867 jobs originates from the second half of 2020.
|
||||
Roughly 2700 jobs from the first week of 2021 are loaded with data from InfluxDB.
|
||||
Some views of ClusterCockpit (e.g. the Users view) show the last week or month.
|
||||
To show some data there you have to set the filter to time periods with jobs (August 2020 to January 2021).
|
||||
## Register the Cluster with SlurmDBD
|
||||
|
||||
To register the cluster to the slurmdbd daemon, run the `register_cluster.sh`
|
||||
script:
|
||||
|
||||
```console
|
||||
./register_cluster.sh
|
||||
```
|
||||
|
||||
> Note: You may have to wait a few seconds for the cluster daemons to become
|
||||
> ready before registering the cluster. Otherwise, you may get an error such
|
||||
> as **sacctmgr: error: Problem talking to the database: Connection refused**.
|
||||
>
|
||||
> You can check the status of the cluster by viewing the logs: `docker-compose
|
||||
> logs -f`
|
||||
|
||||
## Accessing the Cluster
|
||||
|
||||
Use `docker exec` to run a bash shell on the controller container:
|
||||
|
||||
```console
|
||||
docker exec -it slurmctld bash
|
||||
```
|
||||
|
||||
From the shell, execute slurm commands, for example:
|
||||
|
||||
```console
|
||||
[root@slurmctld /]# sinfo
|
||||
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
|
||||
normal* up 5-00:00:00 2 idle c[1-2]
|
||||
```
|
||||
|
||||
## Submitting Jobs
|
||||
|
||||
The `slurm_jobdir` named volume is mounted on each Slurm container as `/data`.
|
||||
Therefore, in order to see job output files while on the controller, change to
|
||||
the `/data` directory when on the **slurmctld** container and then submit a job:
|
||||
|
||||
```console
|
||||
[root@slurmctld /]# cd /data/
|
||||
[root@slurmctld data]# sbatch --wrap="uptime"
|
||||
Submitted batch job 2
|
||||
[root@slurmctld data]# ls
|
||||
slurm-2.out
|
||||
```
|
||||
|
||||
## Stopping and Restarting the Cluster
|
||||
|
||||
```console
|
||||
docker-compose stop
|
||||
docker-compose start
|
||||
```
|
||||
|
||||
## Deleting the Cluster
|
||||
|
||||
To remove all containers and volumes, run:
|
||||
|
||||
```console
|
||||
docker-compose stop
|
||||
docker-compose rm -f
|
||||
docker volume rm slurm-docker-cluster_etc_munge slurm-docker-cluster_etc_slurm slurm-docker-cluster_slurm_jobdir slurm-docker-cluster_var_lib_mysql slurm-docker-cluster_var_log_slurm
|
||||
```
|
||||
|
Reference in New Issue
Block a user