Conda / Singularity / Module
To provide a software environment, we rely on 2 main technologies: Conda and Singularity depending of the software specificities, their license...
In order to offer a unified user interface, we implement Environment Modules on top.
Environment Modules#
The Environment Modules package is a tool that simplifies shell initialization and lets users easily modify their environment during the session with modulefiles. Each modulefile contains the information needed to configure the shell for an application. Modules can be loaded and unloaded dynamically and atomically, in an clean fashion. Modules are useful in managing different versions of applications. Modules can also be bundled into metamodules that will load an entire suite of different applications.
In our case, within the ABiMS Cluster and more generally within the IFB NNCR Cluster, the Conda environment or the Singularity image will be loaded through a Module environment : One Conda environment, One Modulefile.
But why using module load fastqc/0.11.7
instead of conda activate fastqc-0.11.7
?
- Module will provide a useful autocompletion to help you in searching a tool and a version
module load snp<TAB><TAB>
,module load fasqtc<TAB><TAB>
. - Module will be able to load either Conda environment or Singularity wrappers. That way, you have one loader for different underlying technologies.
How to?#
List the available software#
module avail
Load a software#
module load fastqc/0.11.7
Unload a software#
module unload fastqc/0.11.7
Software stacking#
Conda/Modules can be stacked if you need several software at once.
module load trinity/2.8.4 fastqc/0.11.7
module load snakemake/5.3.0
But in case of incompatibilities, as for example two software which require python2 and python3, it's recommanded to load the software just before using it.
$ module load trinity/2.8.4
Trinity --seqType fq --max_memory 50G --left reads_1.fq.gz --right reads_2.fq.gz --CPU 6
$ module unload trinity/2.8.4
$ module load fastqc/0.11.7
fastqc Trinity.fas
$ module unload fastqc/0.11.7
Conda#
Most of the tools need some requirements. Some need lot of requirements: Python or R libraries in specific version, the last brand new compilator, or simply a newer one, ... Often, those dependencies are not compatible with what we already have on our system or between them.
Conda is an open source package, dependency and environment manager for any language: Python, R, Ruby, Lua, Scala, Java, Javascript, C/ C++, FORTRAN. Miniconda is a small “bootstrap” version that includes only conda, Python, and the packages they depend on. Over 720 scientific packages and their dependencies can be installed individually from the Continuum repository with the conda install
command.
At IFB, it will allow us to install tools within some dedicated and isolated environments. Note that all software are not provided by Conda. Meanwhile, IFB platforms are contributing to add package in Conda through the GitHub repository Bioconda.
Shared Conda environments#
We are installing software and software environments within Miniconda3.
Why ask for a shared environment#
There are different use cases:
- I don't know how to use Conda
- I'm preparing a training session and I want all attendees to have the same software environment
- Conda packages can be heavy in term of disk usage
To request a tool or a Conda environment, 2 solutions:
- Propose one via our dedicated git repository cluster/tools
- Request a tool on our IFB community forum: https://community.france-bioinformatique.fr/
To know if a package and a specific version is available in the channels bioconda
, conda-forge
and default
:
conda search -c conda-forge -c bioconda mu_tool
"Private" Conda environments#
We don't recommend installing tools on your own if the required tool is available as a Conda package in Bioconda or Conda-forge channels
Because your ~
directory isn't designed to store lots of files. If you really want to install Conda packages, please install them in your project directory.
To do that, you need to edit a configuration file ~/.condarc
~/.condarc
envs_dirs:
- /shared/projects/<project_name>/conda/envs
pkgs_dirs:
- /shared/projects/<project_name>/conda/pkgs
Then, use the following commands to create your private Conda environment, to activate it and to install the Conda packages:
module load conda
conda create -n MYENV
source activate MYENV
conda install PKGNAME1 [PKGNAME2...] # install packages
Cons of Conda#
Because, those environments isolate the software. The other Python, R or Perl libraries which are installed on the system or within other Conda environments are not available within the Conda environment. If it is an issue for you, let us know.
Singularity#
Singularity is a free, cross-platform and open-source computer program that performs operating-system-level virtualization also known as containerization.
One of the main uses of Singularity is to bring containers and reproducibility to scientific computing and the high-performance computing (HPC) world.
For more information, please visit this page: Singularity advanced guide