SLURM GPU
SLURM partitions and nodes#
The ABiMS HPC Cluster is providing GPU (Graphics Processing Unit) nodes with GPU cards.
GPUs have the advantage of offering a large number of computational units compared to CPUs and are particularly suited to highly parallel computations such as Deep Learning, data mining, image processing and pattern recognition. For example, the use of tools that can take advantage of GPU processors has recently enabled the democratisation of Nanopore technology for sequencing or the rise of epigenetics.
(Last update: 04/05/2022)
GPU nodes#
Nbr | GPU | CPU | RAM (GB ) | Type | Disk /tmp |
---|---|---|---|---|---|
1 | 2 | 40 | 128 | NVIDIA k80 | 32 GB |
GPU Instance Profiles#
⚠️ The values below can change. To check the current situation:
sinfo -Ne -p gpu --format "%.15N %.4c %.7m %G"
Profile Name | GPU Memory | Number of Instances Available |
---|---|---|
k80 | 24GB | 2 |
Usage#
Pre-requisites#
To access to GPU nodes, you need to be granted to access to the gpu
partition.
You need to request one using to the support support.abims@sb-roscoff.fr
Parameters to control the job#
#SBATCH --partition=gpu
#SBATCH --gres=gpu:k80:1
--partition=gpu
: the partition that allows access to the GPU nodes--gres=gpu:k80:1
:
k80: a card profile (see above)
:1
: the number of card in the reservation (see above)
$CUDA_VISIBLE_DEVICES
#
Note that you can use the variable $CUDA_VISIBLE_DEVICES
in the command line to indicate the device number to your software (if it request it).
# Here is the values of CUDA_VISIBLE_DEVICES with "interactive" srun jobs.
$ srun --pty -p gpu --gres=gpu:k80:1 env | grep CUDA
CUDA_VISIBLE_DEVICES=0
$ srun --pty -p gpu --gres=gpu:k80:2 env | grep CUDA
CUDA_VISIBLE_DEVICES=0,1
Examples#
Hello world#
The NVIDIA System Management Interface (nvidia-smi
) is a command line utility, intended to aid in the management and monitoring of NVIDIA GPU devices.
$ srun -p gpu --gres=gpu:k80:2 nvidia-smi
srun: job 35429913 queued and waiting for resources
srun: job 35429913 has been allocated resources
Wed Feb 1 16:36:54 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:84:00.0 Off | 0 |
| N/A 40C P0 55W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:85:00.0 Off | 0 |
| N/A 31C P0 70W / 149W | 0MiB / 11441MiB | 99% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Guppy basecaller#
(This example need ❤️)
Here a minimal sbatch script for guppy_basecaller
.
#SBATCH --partition=gpu
#SBATCH --gres=gpu:k80:1
#SBATCH --cpus-per-task=XX
#SBATCH --mem=XXGB
module avail guppy/6.1.1-gpu
guppy_basecaller [...] --device "cuda:$CUDA_VISIBLE_DEVICES"
💡 Tips#
For one card, --device "cuda:$CUDA_VISIBLE_DEVICES"
is ok since the render will be --device "cuda:0"
.
But for 2 cards, guppy_basecaller expect something like that --device "cuda:0 cuda:1"
. Maybe try something like that:
[...]
DEVICES=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{for(i=1; i<=NF; i++) { printf "cuda:"$i" " }}')
[...]
guppy_basecaller [...] --device "cuda:$DEVICES"
Optimization#
Note that you can optimize the job but setting the following option (See Guppy manuel for more information):
--gpu_runners_per_device
: Number of runners per GPU device.--cpu_threads_per_caller
: Number of CPU worker threads per basecaller.--num_callers
: Number of parallel basecallers to create.--num_alignment_threads
: Number of worker threads to use for alignment.
Alphafold2#
Please have a look at Software environment > Alphafold2 page !