SLURM user guide
💡 Please have a look at your tutorial for beginners: tutorials/snakemake
SLURM is the queue manager used on the ABiMS HPC Cluster. You must use SLURM to submit jobs to the cluster.
All the commands presented in this guide are to be used from the host ssh.sb-roscoff.fr
SLURM partitions and nodes
The ABiMS HPC Cluster is organized into several SLURM partitions. Each partition gathers a set of compute nodes that have similar usage.
Partitions | Time out | Max resources / user | Purpose |
---|---|---|---|
fast |
<= 24 hours | cpu=300, mem=1500GB | Default - Regular jobs |
long |
<= 30 days | cpu=300, mem=1500GB | Long jobs |
bigmem |
<= 60 days | mem=2500GB | On demand - For jobs requiring a lot of RAM |
clc |
<= 30 days | cpu=300, mem=1500GB | On demand - Access CLC Assembly Cell |
gpu |
<= 10 days | cpu=300, mem=1500GB | On demand - Access GPU cards |
The default values
Param | Default value |
---|---|
--mem |
2GB |
--cpus |
1 |
Submitting a job to the cluster
They are two commands to submit a job to the cluster:
srun
to run jobs interactively
sbatch
to submit a batch job
Submit a job using srun
To learn more about the srun
command, see the official documentation
Usage: stand alone
The job will start immediately after you execute the srun command. The outputs are returned to the terminal. You have to wait until the job has terminated before starting a new job. This works with ANY command.
srun hostname
💡 This example will show you that the job is running on one node and not on the login node.
Usage: interactif
If an interaction is needed:
module load r
srun --mem 20GB --pty R
--pty
: will keep the interaction possible--mem 20GB
: will allow 20GB of memory to your job instead of the 2GB by default
Submit a job using sbatch
To learn more about the sbatch
command, see the official documentation
Usage
The job starts when resources are available. The command only returns the job id. The outputs are sent to file(s). This works ONLY with shell scripts. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input.
Batch scripts rules
The script can contain srun commands. Each srun is a job step. The script must start with shebang (#!) followed by the path of the interpreter
#!/bin/bash
or
#!/usr/bin/env python
The execution parameters can be set within the shell bowtie2.sbatch
itself:
#!/bin/bash
#
#SBATCH --mem 40GB
srun bowtie2 -x hg19 -1 sample_R1.fq.gz -2 sample_R2.fq.gz -S sample_hg19.sam
sbatch bowtie2.sbatch
The scripts can contain slurm options just after the shebang but before the script commands → #SBATCH
Note that the syntax #SBATCH
is important and doesn't contain any !
(as in the Shebang)
Advice: We recommend to set as many parameters as you can in the script to keep a track of your execution parameters for a future submission.
Execution parameters
These parameters are common to the commands srun
and sbatch
.
Parameters for log
#!/bin/bash
#
#SBATCH -o slurm.%N.%j.out # STDOUT file with the Node name and the Job ID
#SBATCH -e slurm.%N.%j.err # STDERR file with the Node name and the Job ID
Parameters to control the job
--partition=<partition_names>
, -p
Request a specific partition for the resource allocation. Each partition (queue in SGE) have their own limits: time, memory, nodes ...
--mem=<size[units]>
Specify the real memory required per node. The default units is MB
(Default: 2GB)
The job is killed if it exceeds the limit
Note that you can use the variable $SLURM_MEM_PER_NODE
in the command line to synchronize the software settings and the resource allocated.
--time=<time>
, -t
Set a limit on the total run time of the job allocation.
Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".
Parameters for multithreading
--cpus-per-task=<ncpus>
, --cpus
, -c
Request a number of CPUs (default 1)
Note that you can use the variable $SLURM_CPUS_PER_TASK
in the command line to avoid mistake between the resource allocated and the job.
#!/bin/bash
#
#SBATCH --cpus-per-task=8
srun bowtie2 --threads $SLURM_CPUS_PER_TASK -x hg19 -1 sample_R1.fq.gz -2 sample_R2.fq.gz -S sample_hg19.sam
Job information on running job
List a user's current jobs:
squeue -u <username>
List a user's running jobs:
squeue -u <username> -t RUNNING
List a user's pending jobs:
squeue -u <username> -t PENDING
View accounting information for all user's job for the current day :
sacct --format=JobID,JobName,User,Submit,ReqCPUS,ReqMem,Start,NodeList,State,CPUTime,MaxVMSize%15 -u <username>
View accounting information for all user's job for the 2 last days (it worth an alias) :
sacct -a -S $(date --date='2 days ago' +%Y-%m-%dT%H:%M) --format=JobID,JobName,User%15,Partition,ReqCPUS,ReqMem,State,Start,End,CPUTime,MaxVMSize -u <username>
List detailed job information:
scontrol show -dd jobid=<jobid>
Manage jobs
To cancel/stop a job:
scancel <jobid>
To cancel all jobs for a user:
scancel -u <username>
To cancel all pending jobs for a user:
scancel -t PENDING -u <username>
Job information on ended job
Jobs that do not respect their resource reservation can be killed automatically by the cluster. On the contrary, it is not relevant to reserve too many resources.
You can obtain post-mortem information for your jobs.
acct
SLURM incorporates a mechanism to track the resource consumption of each job.
The sacct command is used to query the SLURM database to track resource consumption:
To consult the basic information of a job:
sacct -j $job_id
Display detailed information about a job:
sacct --format=JobID,JobName,User%15,Partition,ReqCPUS,ReqMem,MaxVMSize%15,State,Start,End,CPUTime,NodeList -j $job_id
💡 Advice: you can create an alias in your ~/.bashrc
alias sacctReq='sacct --format=JobID,JobName,User%15,Partition,ReqCPUS,ReqMem,MaxVMSize%15,State,Start,End,CPUTime,NodeList'
seff
seff takes a jobid and reports on the efficiency of that job's cpu and memory utilization
seff $job_id