Skip to content

SLURM user guide

💡 Please have a look at your tutorial for beginners: tutorials/analysis_slurm

SLURM is the queue manager used on the ABiMS HPC Cluster. You must use SLURM to submit jobs to the cluster.

All the commands presented in this guide are to be used from the host ssh.sb-roscoff.fr

Prerequisites#

  • An account in the ABiMS -> https://my.sb-roscoff.fr
  • An active project -> request one https://my.sb-roscoff.fr or join one of your collaborators
  • The ABiMS provide a demo SLURM account which allow few hours of calculation for testing purpose. But you need to request a proper project (including a SLURM account and a project working space) to really use the cluster

SLURM partitions and nodes#

The ABiMS HPC Cluster is organized into several SLURM partitions. Each partition gathers a set of compute nodes that have similar usage.

Partitions Time out Max resources / user Purpose
fast <= 24 hours cpu=300, mem=1500GB Default - Regular jobs
long <= 30 days cpu=300, mem=1500GB Long jobs
bigmem <= 60 days mem=2500GB On demand - For jobs requiring a lot of RAM
clc <= 30 days cpu=300, mem=1500GB On demand - Access CLC Assembly Cell
gpu <= 10 days cpu=300, mem=1500GB On demand - Access GPU cards

The default values#

Param Default value
--mem 2GB
--cpus 1

Submitting a job to the cluster#

They are two commands to submit a job to the cluster:

srun to run jobs interactively sbatch to submit a batch job

Submit a job using srun#

To learn more about the srun command, see the official documentation

Usage: stand alone#

The job will start immediately after you execute the srun command. The outputs are returned to the terminal. You have to wait until the job has terminated before starting a new job. This works with ANY command.

srun hostname

💡 This example will show you that the job is running on one node and not on the login node.

Usage: interactif#

If an interaction is needed:

module load r
srun --mem 20GB --pty R
  • --pty: will keep the interaction possible
  • --mem 20GB: will allow 20GB of memory to your job instead of the 2GB by default

Submit a job using sbatch#

To learn more about the sbatch command, see the official documentation

Usage#

The job starts when resources are available. The command only returns the job id. The outputs are sent to file(s). This works ONLY with shell scripts. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input.

Batch scripts rules#

The script can contain srun commands. Each srun is a job step. The script must start with shebang (#!) followed by the path of the interpreter

#!/bin/bash

or

#!/usr/bin/env python

The execution parameters can be set within the shell bowtie2.sbatch itself:

#!/bin/bash
#
#SBATCH --mem 40GB
srun bowtie2 -x hg19 -1 sample_R1.fq.gz -2 sample_R2.fq.gz -S sample_hg19.sam
sbatch bowtie2.sbatch

The scripts can contain slurm options just after the shebang but before the script commands → #SBATCH

Note that the syntax #SBATCH is important and doesn't contain any ! (as in the Shebang)

Advice: We recommend to set as many parameters as you can in the script to keep a track of your execution parameters for a future submission.

Execution parameters#

These parameters are common to the commands srun and sbatch.

Account#

Accounts are the accounts to which job allowances are credited. So far, no real limits on the ABiMS HPC Cluster. But the SLURM account is mandatory and allow different feature as such conccurent job limit or reservation.

A new user have by default a demo account. But you need to request of join a proper project to use the cluster.

As soon as it's done:

  • To change your default account: sacctmgr update user $USER set defaultaccount=<project-name> - that way you won't need to specify the account at each job.
  • To submit a job at runtime on a specific account sbatch -A <project-name> script.sbatch or in your sbatch script:
#!/bin/bash
#
#SBATCH -A world_peace

Parameters for log#

#!/bin/bash
#
#SBATCH -o slurm.%N.%j.out  # STDOUT file with the Node name and the Job ID
#SBATCH -e slurm.%N.%j.err  # STDERR file with the Node name and the Job ID

Parameters to control the job#

--partition=<partition_names>, -p

Request a specific partition for the resource allocation. Each partition (queue in SGE) have their own limits: time, memory, nodes ...

--mem=<size[units]>

Specify the real memory required per node. The default units is MB (Default: 2GB)

The job is killed if it exceeds the limit

Note that you can use the variable $SLURM_MEM_PER_NODE in the command line to synchronize the software settings and the resource allocated.

--time=<time>, -t

Set a limit on the total run time of the job allocation.

Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

Parameters for multithreading#

--cpus-per-task=<ncpus>, --cpus, -c

Request a number of CPUs (default 1)

Note that you can use the variable $SLURM_CPUS_PER_TASK in the command line to avoid mistake between the resource allocated and the job.

#!/bin/bash
#
#SBATCH --cpus-per-task=8

srun bowtie2 --threads $SLURM_CPUS_PER_TASK -x hg19 -1 sample_R1.fq.gz -2 sample_R2.fq.gz -S sample_hg19.sam

Job information on running job#

List a user's current jobs:

squeue -u <username>

List a user's running jobs:

squeue -u <username> -t RUNNING

List a user's pending jobs:

squeue -u <username> -t PENDING

View accounting information for all user's job for the current day :

sacct --format=JobID,JobName,User,Submit,ReqCPUS,ReqMem,Start,NodeList,State,CPUTime,MaxVMSize%15 -u <username>

View accounting information for all user's job for the 2 last days (it worth an alias) :

sacct -a -S $(date --date='2 days ago' +%Y-%m-%dT%H:%M) --format=JobID,JobName,User%15,Partition,ReqCPUS,ReqMem,State,Start,End,CPUTime,MaxVMSize -u <username>

List detailed job information:

scontrol show -dd jobid=<jobid>

Manage jobs#

To cancel/stop a job:

scancel <jobid>

To cancel all jobs for a user:

scancel -u <username>

To cancel all pending jobs for a user:

scancel -t PENDING -u <username>

Job information on ended job#

Jobs that do not respect their resource reservation can be killed automatically by the cluster. On the contrary, it is not relevant to reserve too many resources.

You can obtain post-mortem information for your jobs.

acct#

SLURM incorporates a mechanism to track the resource consumption of each job.

The sacct command is used to query the SLURM database to track resource consumption:

To consult the basic information of a job:

sacct -j $job_id

Display detailed information about a job:

sacct --format=JobID,JobName,User%15,Partition,ReqCPUS,ReqMem,MaxVMSize%15,State,Start,End,CPUTime,NodeList -j $job_id

💡 Advice: you can create an alias in your ~/.bashrc

alias sacctReq='sacct --format=JobID,JobName,User%15,Partition,ReqCPUS,ReqMem,MaxVMSize%15,State,Start,End,CPUTime,NodeList'

Job efficiency#

Efficiency is important here because calculations consume a lot of energy and use shared resources.

Have a look to the dedicated documentation: SLURM job efficiency