OTU clustering with SWARM on RAPIDRUN data encapsulated in SNAKEMAKE
OTU clustering based on [TARA Fred's metabarcoding pipeline](https://github.com/frederic-mahe/swarm/wiki/Fred%27s-metabarcoding-pipeline) applied on RAPIDRUN data managed with [SNAKEMAKE](https://snakemake.readthedocs.io/en/stable/)
# Installation
# Prerequisites
## Prerequisites
* linux system
*[python3](https://www.python.org/)
*[conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) or [pip3](https://pip.pypa.io/en/stable/)
* python3
* snakemake
* singularity
python3 dependencies
## Installation via Conda
The default conda solver is a bit slow and sometimes has issues with selecting the latest package releases. Therefore, we recommend to install Mamba as a drop-in replacement via
```
pip3 install pandas
pip3 install biopython
conda install -c conda-forge mamba
```
python3 dependencies to run `snakemake`:
Then, you can install Snakemake, pandas, biopython and dependencies with
from the conda-forge and bioconda channels. This will install all required software into an isolated software environment, that has to be activated with
```
CORES=32
CONFIGFILE="01_infos/config_test.yaml"
bash main.sh $CORES $CONFIGFILE
conda activate snakemake_rapidrun
```
# Run from scratch
# Get started
## clone repositories
* open a shell
* clone the project and switch to the main folder, it's your working directory
* will cluster sequences by Molecular Operational Taxonomic Unit (MOTU)
The complete data set can be downloaded and stored into [resources/tutorial](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/tree/master/resources/tutorial) folder with the following command: