Skip to content

Nextflow

This doc need ❤️

Using a "regular" NextFlow workfow#

wget https://github.com/nf-core/configs/blob/master/conf/abims.config
module load nextflow slurm-drmaa graphviz

# Or let nf-core client download the workflow
srun nextflow run ... -profile abims.config ...

# To launch in background
sbatch --wrap "nextflow run ... -profile abims.config ..."

Functional annotation with Orson pipeline#

Orson is a functional annotation pipeline developed in Nextflow by the SeBiMERteam and available at this address : https://gitlab.ifremer.fr/bioinfo/workflows/orson/-/tree/master

# Clone repository 
git clone https://gitlab.ifremer.fr/bioinfo/workflows/orson.git

You might at first run the pipeline with test dataset:

module load nextflow graphviz

# Get the config file for your cluster
wget https://raw.githubusercontent.com/nf-core/configs/master/conf/abims.config -O orson/conf/abims.config

# Run Orson with test dataset
cd orson/
srun nextflow run main.nf -profile test,singularity --downloadDB_enable false \
--hit_tool=diamond --blast_db "/path/to/indexed/db" \
-c conf/abims.config -resume

The test dataset will be imported and then multiple analysis will be run:

  • EggNOG-Mapper
  • InterProScan
  • BeeDeem
  • Busco
  • Diamond

Once you know it works well, you can run analysis with your own dataset:

Here an example of sbatch file :

#!/bin/bash
#SBATCH -p long

module load nextflow graphviz

nextflow run main.nf --fasta "query.fa" --query_type p -profile custom,singularity --downloadDB_enable false \
--blast_db "/path/to/indexed/db" -c conf/abims.config -resume

By default, the previous tools will be launched, but it is possible to disable some of them.

There are some useful arguments :

--query_type [n,p]

Set to "n" for nucleic acid sequences input or to "p" for protein sequences.

--hit_tool [PLAST, BLAST, diamond]

Indicates the tool of your choice for the comparison of your sequences to the reference database.

--outdir

The output directory where the results will be published.

-w/--work-dir

The temporary directory where intermediate data will be written. (Can be /scratch/ directory)

Please refere as Orson's documentation for more details : https://gitlab.ifremer.fr/bioinfo/workflows/orson/-/tree/master/docs

Note that Orson will check the presence of Singularity containers in orson/container/ and if it doesn't find it, it will import them.

On the ABIMS cluster, containers are already provided in /shared/software/singularity/images/nextflow/. We need to give this path to nextflow so it won't download it again.

# Give container's path
cd orson/
ln -s /shared/software/singularity/images/nextflow/*.sif containers/

It is also possible to run the pipeline with Hectar annotation's tool (for Heterokontes).

To do it, you need to clone the branch of git repository containing Orson's code adapted with Hectar.
As the code is not published, Orson will not fetch the container Singularity, so it is important to provide it. (It is already provided in the container path above).

# Clone repository 
git clone --depth 1 --branch hectar_rebased https://gitlab.ifremer.fr/abims-sbr/orson.git

# Get container
cd orson/
ln -s -s /shared/software/singularity/images/nextflow/*.sif containers/

Then run the same command as before, adding --hectar_enable true to launch Hectar analysis. (Available only for proteins)

Using nf-core#

All nf-core pipelines have been successfully configured for use on the ABiMS cluster.

Check this page: github.com/nf-core/configs/blob/master/docs/abims.md