mirror of
https://github.com/ClusterCockpit/cc-docker.git
synced 2025-04-30 00:01:43 +02:00
117 lines
2.8 KiB
Markdown
117 lines
2.8 KiB
Markdown
# Slurm Docker Cluster
|
|
|
|
This is a multi-container Slurm cluster using docker-compose. The compose file
|
|
creates named volumes for persistent storage of MySQL data files as well as
|
|
Slurm state and log directories.
|
|
|
|
## Containers and Volumes
|
|
|
|
The compose file will run the following containers:
|
|
|
|
* mysql
|
|
* slurmdbd
|
|
* slurmctld
|
|
* c1 (slurmd)
|
|
* c2 (slurmd)
|
|
|
|
The compose file will create the following named volumes:
|
|
|
|
* etc_munge ( -> /etc/munge )
|
|
* etc_slurm ( -> /etc/slurm )
|
|
* slurm_jobdir ( -> /data )
|
|
* var_lib_mysql ( -> /var/lib/mysql )
|
|
* var_log_slurm ( -> /var/log/slurm )
|
|
|
|
## Building the Docker Image
|
|
|
|
Build the image locally:
|
|
|
|
```console
|
|
docker build -t slurm-docker-cluster:21.08.6 .
|
|
```
|
|
|
|
Build a different version of Slurm using Docker build args and the Slurm Git
|
|
tag:
|
|
|
|
```console
|
|
docker build --build-arg SLURM_TAG="slurm-19-05-2-1" -t slurm-docker-cluster:19.05.2 .
|
|
```
|
|
|
|
Or equivalently using `docker-compose`:
|
|
|
|
```console
|
|
SLURM_TAG=slurm-19-05-2-1 IMAGE_TAG=19.05.2 docker-compose build
|
|
```
|
|
|
|
|
|
## Starting the Cluster
|
|
|
|
Run `docker-compose` to instantiate the cluster:
|
|
|
|
```console
|
|
IMAGE_TAG=19.05.2 docker-compose up -d
|
|
```
|
|
|
|
## Register the Cluster with SlurmDBD
|
|
|
|
To register the cluster to the slurmdbd daemon, run the `register_cluster.sh`
|
|
script:
|
|
|
|
```console
|
|
./register_cluster.sh
|
|
```
|
|
|
|
> Note: You may have to wait a few seconds for the cluster daemons to become
|
|
> ready before registering the cluster. Otherwise, you may get an error such
|
|
> as **sacctmgr: error: Problem talking to the database: Connection refused**.
|
|
>
|
|
> You can check the status of the cluster by viewing the logs: `docker-compose
|
|
> logs -f`
|
|
|
|
## Accessing the Cluster
|
|
|
|
Use `docker exec` to run a bash shell on the controller container:
|
|
|
|
```console
|
|
docker exec -it slurmctld bash
|
|
```
|
|
|
|
From the shell, execute slurm commands, for example:
|
|
|
|
```console
|
|
[root@slurmctld /]# sinfo
|
|
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
|
|
normal* up 5-00:00:00 2 idle c[1-2]
|
|
```
|
|
|
|
## Submitting Jobs
|
|
|
|
The `slurm_jobdir` named volume is mounted on each Slurm container as `/data`.
|
|
Therefore, in order to see job output files while on the controller, change to
|
|
the `/data` directory when on the **slurmctld** container and then submit a job:
|
|
|
|
```console
|
|
[root@slurmctld /]# cd /data/
|
|
[root@slurmctld data]# sbatch --wrap="uptime"
|
|
Submitted batch job 2
|
|
[root@slurmctld data]# ls
|
|
slurm-2.out
|
|
```
|
|
|
|
## Stopping and Restarting the Cluster
|
|
|
|
```console
|
|
docker-compose stop
|
|
docker-compose start
|
|
```
|
|
|
|
## Deleting the Cluster
|
|
|
|
To remove all containers and volumes, run:
|
|
|
|
```console
|
|
docker-compose stop
|
|
docker-compose rm -f
|
|
docker volume rm slurm-docker-cluster_etc_munge slurm-docker-cluster_etc_slurm slurm-docker-cluster_slurm_jobdir slurm-docker-cluster_var_lib_mysql slurm-docker-cluster_var_log_slurm
|
|
```
|