Commit 992b1095 authored by peguerin's avatar peguerin
Browse files

readme update

parent 61fc3b8b
# snakemake_rapidrun_swarm
## eDNA-seq Metabarcoding OTU-clustering pipeline
OTU clustering based on [TARA Fred's metabarcoding pipeline](https://github.com/frederic-mahe/swarm/wiki/Fred%27s-metabarcoding-pipeline) applied on RAPIDRUN data managed with [SNAKEMAKE](https://snakemake.readthedocs.io/en/stable/)
**eDNA-seq Metabarcoding OTU-clustering** is a bioinformatics pipeline built using Snakemake, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
# Installation
## Introduction
## Prerequisites
**eDNA-seq Metabarcoding OTU-clustering** is specifically used for the analysis of environmental DNA metabarcoding NGS data, demultiplexing, filtering and clustering sequences in Operational Taxonomic Unit (OTU).
* linux system
* [python3](https://www.python.org/)
* [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/)
This pipeline has been initially tested with marine environment samples, using molecular markers such as Teleo1. The workflow should work with any organisms and environment. It is proven for large-scale data analysis.
## Installation via Conda
OTU-clustering steps are based on [TARA Fred's metabarcoding pipeline](https://github.com/frederic-mahe/swarm/wiki/Fred%27s-metabarcoding-pipeline).
The default conda solver is a bit slow and sometimes has issues with selecting the latest package releases. Therefore, we recommend to install Mamba as a drop-in replacement via
```
conda install -c conda-forge mamba
```
Then, you can install Snakemake, pandas, biopython and dependencies with
```
mamba create -n snakemake_rapidrun -c conda-forge -c bioconda snakemake biopython pandas
```
## Method
from the conda-forge and bioconda channels. This will install all required software into an isolated software environment, that has to be activated with
```
conda activate snakemake_rapidrun
```
The wofklows processes raw data from fastq inputs (FastQC), merges paired-end reads together (vsearch), applies complex demultiplexing based on notice provided by the sequencing platform, primer clipping (cutadapt), sample dereplication (vsearch), sequencing quality extraction, clusters sequences in OTU (swarm), detects chimera (vsearch) and assign taxonomy to each OTU (NCBI taxonomy; ecotag; obitools). Ultimately, OTU tables with and without taxonomy assignments are generated. See the output documentation for more details.
# Get started
## Workflow
* open a shell
* clone the project and switch to the main folder, it's your working directory
```
git clone https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm
cd snakemake_rapidrun_swarm
```
* Activate the conda environment to access the required dependencies
1. [Installation]()
2. Pipeline configuration
* [Local installation]()
* [Adding your own system config]()
* [Parameters]()
3. [Running the pipeline]()
* [Quick start]()
* [Basic run]()
* [Reproducibility]()
* [Input files]()
* [Config file]()
* [step 1...]()
* [step2...]()
4. [Output results]()
5. [How-to guide]()
6. [Reference]()
7. [Metabarcoding context - discussion to go further]()
```
conda activate snakemake_rapidrun
```
## Credits
You are ready to run the analysis !
**eDNA-seq Metabarcoding OTU-clustering** was coded and written by Virginie Marques and Pierre-Edouard Guerin.
We thank the following people for their help in the development of this pipeline:
## Download data
* Agnes Duhamet
* Alice Valentini
* Apolline Gorry
* Bastien Mace
* David Mouillot
* Emilie Boulanger
* Laetitia Mathon
* Laura Benestan
* Stephanie Manel
* Tony Dejean
The complete data set can be downloaded and stored into [resources/tutorial](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/tree/master/resources/tutorial) folder with the following command:
## Contributions and Support
```
wget -c https://gitlab.mbb.univ-montp2.fr/edna/tutorial_metabarcoding_data/-/raw/master/tutorial_rapidrun_data.tar.gz -O - | tar -xz -C ./resources/tutorial/
```
* Data is downloaded at [resources/tutorial](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/tree/master/resources/tutorial)
* This is a tiny subset of a real metabarcoding analysis in rapidrun format
:bug: If you are sure you have found a bug, then by all means submit a bug report. You can submit your bug reports on Gitlab [here](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/issues).
# Run the workflow
Simply type the following command to process data (estimated time: 25 minutes)
For further information or help, don't hesitate to get in touch on Slack (you can join with this invite).
```
bash main.sh config/config_tutorial.yaml 8
```
[![Get help on Slack](https://img.shields.io/badge/slack-cefebev%23metabarcoding_otu-4A154B?logo=slack)](https://cefebev.slack.com/archives/C01NYK8B9K7)
## Citations
You can cite the **eDNA-seq Metabarcoding OTU-clustering** publication as follows:
<div style="background: #f1f1f1; ">
**Blind assessment of vertebrate taxonomic diversity across spatial scales by clustering environmental DNA metabarcoding sequences**
*Virginie Marques, Pierre‐Édouard Guerin, Mathieu Rocle, Alice Valentini, Stephanie Manel, David Mouillot, Tony Dejean*
Molecular Ecography. 2020 Aug 04. doi: https://doi.org/10.1111/ecog.05049.
</div>
* This will generate OTU occ:urences tables into [results/06_assignment/04_table](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/tree/master/results/06_assignment/04_table)
* The first argument [config/config_tutorial.yaml](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/blob/master/config/config_tutorial.yaml) contains mandatory parameters information
* The second argument **8** is the number of CPU cores you want to allow the system uses to run the whole workflow
# To go further
Please check the [wiki](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/home).
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment