Disclaim: This tutorial has been designed to be run on the IFB Core Cluster or the ABiMS Cluster, part of the IFB NNCR Cluster. Although except the "Software environment" part, the rest can suit with any SLURM Cluster.

Aims

This tutorial aims to give the basic workflow when analyse data on a SLURM remote HPC cluster infrastructure

  1. Connection to the cluster login node
  2. Pushing the input data
  3. Loading the software environment for the analyse
  4. Launching the analyse job
  5. Getting back the results on your personal computer

Documentations

Note: at some point, you will have to complete your knowledge with other documentations

Practical informations

Nomenclature

During this tutorial, you will have to launch some commands in a terminal.

This is a terminal:

$ # This is a comment that will be executed
$ progam "This is a command. Don't type the $"
This is the result of the command

The $ is your terminal prompt

You will have to replace:

We will establish a connection between your computer and the login node using the protocol SSH (Secure Shell) and the program ssh.

Prerequisites

In practice

Windows

Open a terminal or alternatives (ex: MobaXterm)

MobaXterm_1

MobaXterm_1

MacOSX or Linux

Use the ssh program to establish a secure connection with the targeted login node with Terminal:

$ # For the IFB Core Cluster:
$ ssh -Y your_login@core.cluster.france-bioinformatique.fr

$ # For the ABiMS Cluster:
$ ssh -Y your_login@slurm0.sb-roscoff.fr

your_login@slurm0.sb-roscoff.fr\'s password:

Tips: You will then be prompted to enter your password (beware: at the password prompt, the characters you type are not printed on the screen, for obvious security reasons).

0- Paths

There are two ways to nagivate within a tree of directories:

Navigation Trees

–> For this tutorial, we will mainly use absolute paths.

1- print the current working directory pwd and change the directory cd

$ # Display your current directory
$ pwd
/shared/home/your_login
$ # Move to another directory
$ cd /shared/bank
$ # Display your new current directory
$ pwd
/shared/bank
$ # Move to your project directory
$ cd /shared/projects/your_project
$ # Display your current directory
$ pwd
/shared/projects/your_project

2- list the contain of a directory

$ cd /shared/bank
$ # List the current directory
$ ls
accession2taxid          lachancea_kluyveri     rosa_chinensis
arabidopsis_thaliana     mus_musculus           saccharomyces_cerevisiae
bos_taurus               nicotiana_tabacum      uniprot
canis_lupus_familiaris   nr                     uniprot_swissprot
danio_rerio              nt                     uniref50
homo_sapiens             refseq                 uniref90

$ # Or list directories somewhere on the filesystem
$ ls /shared/bank/uniprot_swissprot/current/
blast  diamond  fasta  flat  mapping  mmseqs

$ ls /shared/bank/uniprot_swissprot/current/fasta/
uniprot_swissprot.fsa

3- make directories mkdir

We will create this arborescence in your project directory :

.
└── tuto_slurm
    ├── 01_fastq
    └── 02_qualitycontrol

(Computer scientists suck at botany, for them the root is at the top :/)

$ cd /shared/projects/your_project
$ pwd
/shared/projects/your_project
$ ls # So far there is nothing in your project directory
$ mkdir tuto_slurm
$ ls # We can check that the directory has been created
tuto_slurm
$ cd tuto_slurm # Oh! a relative path
$ pwd
/shared/projects/your_project/tuto_slurm
$ mkdir 01_fastq 02_qualitycontrol
$ ls
01_fastq           02_qualitycontrol
$ cd /shared/projects/your_project
$ tree  # tree will help you to display a nice tree
.
└── tuto_slurm
    ├── 01_fastq
    └── 02_qualitycontrol

3 directories, 0 files
# you can also use: tree -d
# to display only directory structure

You can either fetch data:

In this part, we will choose the first solution and fetch from the Zenodo website, a public repository with wget.

The usage of a SFTP Client is explained in the section "Transfer: get back your results on your personal computer". It's just the reverse !

In practice

For this tutorial, we will borrow a FASTQ file provided by the excellent Galaxy Training Network: 10.5281/zenodo.61771.

$ cd /shared/projects/your_project/tuto_slurm/01_fastq
$ wget https://zenodo.org/record/61771/files/GSM461178_untreat_paired_subset_1.fastq
$ ls
GSM461178_untreat_paired_subset_1.fastq
$ ls -lh    # Two option of ls that will among other things give use the weight of our file: 20MB. "l" for long format and "h" for human readable
total 20M
-rw-r--r-- 1 your_login root 20M Nov  6 07:33 GSM461178_untreat_paired_subset_1.fastq

Why do we need to "load" tools?

At the IFB, the cluster administrators install all tools required by the users. To access to a tool, you need to load it into your environment using a special application called module.

In practice

Let's load the software environment for FastQC, a quality control tool.

$ # List all the softwares and versions available
$ module avail
abyss/2.2.1               emboss/6.6.0            mirdeep2/2.0.1.2  rseqc/2.6.4
adxv/1.9.14               enabrowsertools/1.5.4   mixcr/2.1.10      rstudio-server/1.2.5042
alientrimmer/0.4.1        ensembl-vep/98.2        mmseqs2/8-fac81   salmon/0.11.3
anvio/6.1                 epa-ng/0.3.6            mmseqs2/8.fac81   salmon/0.14.1
anvio/6.2                 epic2/0.0.41            mmseqs2/10-6d92c  salmon/0.14.2
$ # List the different versions of one software
$ module avail fastqc
fastqc/0.11.5  fastqc/0.11.7  fastqc/0.11.8  fastqc/0.11.9

$ # We can check that the fastqc application isn't available by default
$ fastqc --version
-bash: fastqc: command not found
$ # Load the module for fastqc version 0.11.9
$ module load fastqc/0.11.9
$ # Check the availability and the version
$ fastqc --version
FastQC v0.11.9
$ # List loaded modules
$ module list
Currently Loaded Modulefiles:
 1) fastqc/0.11.9

Note that the module load command is only enabled for your current terminal session. You have to load it on each session and at the beginning of your sbatch scripts (cf. below).

[For curious] Under the hood

At the IFB, our scientific softwares are composed:

To provide the same interface for both Conda and Singularity technologies, the IFB NNCR Cluster provides an abstraction layer with Environment Modules

How does a computer work?

Computer components:

Type of "Architecture":

HPC infra

SLURM components

A HPC/SLURM infrastructure is composed of:

SLURM components

The sequence:

  1. You are logged in on a login node
  2. You submit a job using either srun or sbatch
  3. The master receives your job request and puts it in a queue list - You wait patiently
  4. The master: when the resources you requested are available on one of the nodes, the job is sent on it
  5. The computer node processes your job - Again, you wait patiently
  6. You enjoy your results when the job has ended

The resources tracked by SLURM

The resources you need for your job can be set using options:

There are 2 main partitions:

srun vs sbatch

"Interactive" mode: srun

⚠️ The job is killed if the terminal is closed or the network is cut off.

SLURM srun

Batch mode: sbatch

Better for reproducibility because it's self documented.

SLURM srun

Overview

srun suits with small jobs in duration because indeed, the job is killed if the terminal is closed or the network is cut off. The classic examples are files [de]compression (ex: tar, gzip ...), files parsing (ex: sort, grep, awk, sed ...), etc.

SLURM srun

In practice

$ cd /shared/projects/your_project/tuto_slurm/02_qualitycontrol/
$ # Creation of a dedicated folder for srun
$ mkdir srun
$ cd srun

$ # Load the module for fastqc if it wasn't done yet
$ module load fastqc/0.11.9
$ srun fastqc /shared/projects/your_project/tuto_slurm/01_fastq/GSM461178_untreat_paired_subset_1.fastq -o .
Started analysis of GSM461178_untreat_paired_subset_1.fastq
Approx 5% complete for GSM461178_untreat_paired_subset_1.fastq
Approx 10% complete for GSM461178_untreat_paired_subset_1.fastq
[...]
Approx 95% complete for GSM461178_untreat_paired_subset_1.fastq
Approx 100% complete for GSM461178_untreat_paired_subset_1.fastq
Analysis complete for GSM461178_untreat_paired_subset_1.fastq

$ # We can check the files produced
$ ls
GSM461178_untreat_paired_subset_1_fastqc.html  GSM461178_untreat_paired_subset_1_fastqc.zip

⚠️ Note that if you omit the srun command, the job will run on the login node. It's bad!

Explainations

srun fastqc /shared/projects/your_project/tuto_slurm/01_fastq/GSM461178_untreat_paired_subset_1.fastq -o .

With an absolute path the command will be write as follow:

srun fastqc /shared/projects/your_project/tuto_slurm/01_fastq/GSM461178_untreat_paired_subset_1.fastq -o /shared/projects/your_project/tuto_slurm/02_qualitycontrol/srun/

Note that implicitly, 2GB of RAM and 1 CPU is reserved , you can modify theses parameters and use additional memory:

srun --cpus-per-task 1 --mem 2GB fastqc /shared/projects/your_project/tuto_slurm/01_fastq/GSM461178_untreat_paired_subset_1.fastq -o .

Overview

sbatch will launch the jobs in background. Additionally to your results, SLURM will create 2 files containing the standard output and the standard error flows. The advantage of using sbatch is that you can close your terminal during the job execution.

The conterpart is that you need write a script file that will contain your command lines and the sbatch parameters.

SLURM srun

1- A script file

There are different ways to create a script file on a remote server:

We will use the gedit solution for this part of the tutorial. But the usage of a SFTP Client is explained in the part "Transfer: get back your results on your personal computer".

In practice

1. Open an other terminal because gedit display a lot of annoying warnings. Don't forget the -Y option for graphical forwarding.

ssh -Y your_login@slurm0.sb-roscoff.fr

2. Open gedit

mkdir /shared/projects/your_project/tuto_slurm/scripts/
gedit /shared/projects/your_project/tuto_slurm/scripts/fastqc.sbatch &

gedit should open a file named fastqc.sbatchin a detached window. The & in bash will put gedit in background and so release the terminal to type other commands.

3. Write you script within gedit

#!/bin/bash
module load fastqc/0.11.9
srun fastqc /shared/projects/your_project/tuto_slurm/01_fastq/GSM461178_untreat_paired_subset_1.fastq -o .

Note that implicitly, 2GB of RAM and 1 CPU is reserved: you can modify theses parameters and use additional memory

#!/bin/bash
#SBATCH --cpus-per-task 1
#SBATCH --mem 4GB
module load fastqc/0.11.9
srun fastqc /shared/projects/your_project/tuto_slurm/01_fastq/GSM461178_untreat_paired_subset_1.fastq -o .

Explainations

Save your fastqc.sbatch file by clicking the SAVE button in gedit.

2- Launch

Now back to our first terminal, we will launch the job using sbatch

$ cd /shared/projects/your_project/tuto_slurm/02_qualitycontrol/
$ mkdir sbatch
$ cd sbatch

$ sbatch /shared/projects/your_project/tuto_slurm/scripts/fastqc.sbatch
  Submitted batch job 203739

3- Monitoring during the run

Pending Status: PD

Maybe your job will have to wait for available resources.

$ squeue -u your_login
   JOBID PARTITION     NAME          USER ST       TIME  NODES NODELIST(REASON)
  203739      fast fastqc.+    your_login PD       0:00      1 (Resources)

Running Status: R

At some point, the job will run on one of the computer nodes.

$ squeue -u your_login
   JOBID PARTITION     NAME          USER ST       TIME  NODES NODELIST(REASON)
  203739      fast fastqc.+    your_login PD       5:00      1 cpu-node-23

4- Monitoring at the end of the job

Possibility, some jobs will FAILED, one of the reasons is that the job consume more memory than reserved.

It can be checked by comparing the memory requested ReqCPUS and used MaxVMSize.

Check the memory usage

sacct --format=JobID,JobName,User,Submit,ReqCPUS,ReqMem,Start,NodeList,State,CPUTime,MaxVMSize%15 -j 203739
      JobID    JobName     User          Submit         ReqCPUS ReqMem               Start NodeList  State     CPUTime   MaxVMSize
------------ ---------- ----------- ------------------- ------- ------ ------------------- -------- ------ ----------- -----------
203739       fastqc.sb+  your_login 2020-09-02T22:06:31       1    2Gn 2020-11-03T23:32:38      n97 FAILED 26-12:25:00
203739.batch      batch             2020-09-03T23:32:38       2    2Gn 2020-11-03T23:32:38      n97 FAILED 26-12:25:00    2279915K

In this case, the job consume at some point 2.2GB (MaxVMSize=2279915K). You should increase the reservation with --mem 4GB!?

5- Cancel a submitted job

Simply use the scancel command with jobID(s) to kill

scancel 218672

To get back and forth files between a remote server and your local Personal Computer, we need a FTP/SFTP Clients:

For this tutorial, we will use FileZilla because it's the only one being cross-platform. It's also the more complex so the other ones will be easy to handle.

1- Connection

2- The interface

The interface is rather completed with logs, a lot of panels ... But don't be afraid:

FileZilla Interface

3- Browse the 2 arborescences

4- Transfer

You just need to Drag and Drop the file between the "Local panel" and the "Remote panel".

It's the same mechanism to get and to push data depending if you drag a file from or to the "Remote panel"

FileZilla Interface

Congrats, you have launched your first job on a HPC Cluster and get the results on your own computer!

FileZilla Interface