README.md 5.41 KB
Newer Older
peguerin's avatar
peguerin committed
1
## eDNA-seq Metabarcoding OTU-clustering pipeline
peguerin's avatar
peguerin committed
2

peguerin's avatar
peguerin committed
3
**eDNA-seq Metabarcoding OTU-clustering** is a bioinformatics pipeline built using Snakemake, a workflow tool to run and manage tasks in any execution environment. It comes with docker containers making installation trivial and results reproducible.
peguerin's avatar
peguerin committed
4

peguerin's avatar
peguerin committed
5
## Introduction
peguerin's avatar
peguerin committed
6

peguerin's avatar
peguerin committed
7
**eDNA-seq Metabarcoding OTU-clustering** is specifically used for the analysis of environmental DNA metabarcoding NGS data, demultiplexing, filtering and clustering sequences in Operational Taxonomic Unit (OTU).
peguerin's avatar
peguerin committed
8

peguerin's avatar
peguerin committed
9
This pipeline has been initially tested with marine environmental DNA samples, using molecular markers such as Vert01 ([Riaz et al. 2011](https://doi.org/10.1093/nar/gkr732)), Teleo01 ([Valentini et al. 2016](https://doi.org/10.1111/mec.13428)), Chond01 or Mamm01 ([Taberlet et al. 2018](10.1093/oso/9780198767220.001.0001)). The workflow should work with any organisms and environment. It is proven for large-scale data analysis.
peguerin's avatar
peguerin committed
10

peguerin's avatar
peguerin committed
11
12


peguerin's avatar
peguerin committed
13
## Method
peguerin's avatar
peguerin committed
14

peguerin's avatar
peguerin committed
15
16
17
The wofklows processes raw data from fastq inputs (FastQC), merges paired-end reads together (vsearch), applies complex demultiplexing based on notice provided by the sequencing platform, trims primers (cutadapt), dereplicates sequences (vsearch), extracts sequencing quality values, clusters sequences in OTU (swarm), detects and removes chimera (vsearch) and assigns taxonomy to each OTU (NCBI taxonomy; ecotag; obitools). Ultimately, OTU tables with and without taxonomy assignments are generated. See the [output documentation]() for more details.

OTU-clustering steps are based on [TARA Fred's metabarcoding pipeline](https://github.com/frederic-mahe/swarm/wiki/Fred%27s-metabarcoding-pipeline).
peguerin's avatar
peguerin committed
18
19


peguerin's avatar
peguerin committed
20
## Workflow
peguerin's avatar
peguerin committed
21

peguerin's avatar
peguerin committed
22
![ednaotucluster](docs/edna_otucluster_workflow.png)
peguerin's avatar
peguerin committed
23

peguerin's avatar
peguerin committed
24
1. [Installation](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/home#installation)
peguerin's avatar
peguerin committed
25
26
27
28
29
2. Pipeline configuration
    * [Local installation]()
    * [Adding your own system config]()
    * [Parameters]()
3. [Running the pipeline]()
peguerin's avatar
peguerin committed
30
31
    * [Quick start](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/home#get-started)
    * [Basic run](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/home#run-the-workflow)
peguerin's avatar
peguerin committed
32
    * [Reproducibility]()
peguerin's avatar
peguerin committed
33
    * [Prepare Spygen RAPIDRUN input files](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/prepare_spygen_data)
peguerin's avatar
peguerin committed
34
35
36
37
    * [Config file]()
    * [step 1...]()
    * [step2...]()
4. [Output results]()
peguerin's avatar
peguerin committed
38
5. [How-to guide](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/How-to-guide)
peguerin's avatar
peguerin committed
39
6. [References](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/References)
peguerin's avatar
peguerin committed
40
7. [Metabarcoding context - discussion to go further]()
peguerin's avatar
peguerin committed
41

peguerin's avatar
peguerin committed
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
## Quick Start

See [Install section](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/home#installation) for installation instructions.

Download example data:

```
curl -JLO http://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_data_test/-/raw/master/test_rapidrun_data.tar.gz; tar zfxz test_rapidrun_data.tar.gz -C resources/test/
```

`./resources/test/test_rapidrun_data/`: this folder contains a reference database for 4 markers (Teleo01; Mamm01; Vert01; Chond01), NGS metabarcoding raw data, required metadata to handle demultiplexing on RAPIDRUN format.

If you have installed **eDNA-seq Metabarcoding OTU-clustering**, you can run the example data with:

```
snakemake --configfile config/config_test_rapidrun.yaml --cores 4 --use-conda
```

This will generate outputs into `./results` folder.

OTU tables are available as `.csv` file while intermediates files are stored into `./results/intermediates` folder.

```
intermediates                       projet1_mamm_ecotag_ncbi_motu.csv   projet1_teleo_table_motu.csv
projet1_chond_ecotag_ncbi_motu.csv  projet1_mamm_table_motu.csv         projet1_vert_ecotag_ncbi_motu.csv
projet1_chond_table_motu.csv        projet1_teleo_ecotag_ncbi_motu.csv  projet1_vert_table_motu.csv
```

## Next steps

Now that you've gotten the example to work, see the [wiki](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/home) to navigate to the more detailed descriptions and instructions for exploring your own data.



peguerin's avatar
peguerin committed
76
## Credits
peguerin's avatar
peguerin committed
77

peguerin's avatar
peguerin committed
78
**eDNA-seq Metabarcoding OTU-clustering** was coded and written by Virginie Marques and Pierre-Edouard Guerin.
peguerin's avatar
peguerin committed
79

peguerin's avatar
peguerin committed
80
We thank the following people for their help in the development of this pipeline: Agnes Duhamet, Alice Valentini, Apolline Gorry, Bastien Mace, David Mouillot, Emilie Boulanger, Laetitia Mathon, Laura Benestan, Stephanie Manel, Tony Dejean.
peguerin's avatar
peguerin committed
81

peguerin's avatar
peguerin committed
82

peguerin's avatar
peguerin committed
83
## Contributions and Support
peguerin's avatar
peguerin committed
84

peguerin's avatar
peguerin committed
85
:bug: If you are sure you have found a bug, please submit a bug report. You can submit your bug reports on Gitlab [here](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/issues).
peguerin's avatar
peguerin committed
86

peguerin's avatar
peguerin committed
87
88


peguerin's avatar
peguerin committed
89
For further information or help, don't hesitate to get in touch on Slack (you can join with this invite).
peguerin's avatar
peguerin committed
90

peguerin's avatar
peguerin committed
91
92
93
94
95
96
97
98
[![Get help on Slack](https://img.shields.io/badge/slack-cefebev%23metabarcoding_otu-4A154B?logo=slack)](https://cefebev.slack.com/archives/C01NYK8B9K7)


## Citations

You can cite the **eDNA-seq Metabarcoding OTU-clustering** publication as follows:


peguerin's avatar
peguerin committed
99
> **Blind assessment of vertebrate taxonomic diversity across spatial scales by clustering environmental DNA metabarcoding sequences**
peguerin's avatar
peguerin committed
100
>
peguerin's avatar
peguerin committed
101
> *Virginie Marques, Pierre‐Edouard Guerin, Mathieu Rocle, Alice Valentini, Stephanie Manel, David Mouillot, Tony Dejean*
peguerin's avatar
peguerin committed
102
>
peguerin's avatar
peguerin committed
103
> Ecography. 2020 Aug 04. doi:  https://doi.org/10.1111/ecog.05049.
peguerin's avatar
peguerin committed
104
105


peguerin's avatar
peguerin committed
106

peguerin's avatar
peguerin committed
107

peguerin's avatar
peguerin committed
108
109


peguerin's avatar
peguerin committed
110