## eDNA-seq Metabarcoding OTU-clustering pipeline **eDNA-seq Metabarcoding OTU-clustering** is a bioinformatics pipeline built using Snakemake, a workflow tool to run and manage tasks in any execution environment. It comes with docker containers making installation trivial and results reproducible. ## Introduction **eDNA-seq Metabarcoding OTU-clustering** is specifically used for the analysis of environmental DNA metabarcoding NGS data, demultiplexing, filtering and clustering sequences in Operational Taxonomic Unit (OTU). This pipeline has been initially tested with marine environmental DNA samples, using molecular markers such as Vert01 ([Riaz et al. 2011](https://doi.org/10.1093/nar/gkr732)), Teleo01 ([Valentini et al. 2016](https://doi.org/10.1111/mec.13428)), Chond01 or Mamm01 ([Taberlet et al. 2018](10.1093/oso/9780198767220.001.0001)). The workflow should work with any organisms and environment. It is proven for large-scale data analysis. ## Method The wofklows processes raw data from fastq inputs (FastQC), merges paired-end reads together (vsearch), applies complex demultiplexing based on notice provided by the sequencing platform, trims primers (cutadapt), dereplicates sequences (vsearch), extracts sequencing quality values, clusters sequences in OTU (swarm), detects and removes chimera (vsearch) and assigns taxonomy to each OTU (NCBI taxonomy; ecotag; obitools). Ultimately, OTU tables with and without taxonomy assignments are generated. See the [output documentation]() for more details. OTU-clustering steps are based on [TARA Fred's metabarcoding pipeline](https://github.com/frederic-mahe/swarm/wiki/Fred%27s-metabarcoding-pipeline). ## Workflow ![ednaotucluster](docs/edna_otucluster_workflow.png) 1. [Installation](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/home#installation) 2. Pipeline configuration * [Local installation]() * [Adding your own system config]() * [Parameters]() 3. [Running the pipeline]() * [Quick start](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/home#get-started) * [Basic run](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/home#run-the-workflow) * [Reproducibility]() * [Input files]() * [Config file]() * [step 1...]() * [step2...]() 4. [Output results]() 5. [How-to guide](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/How-to-guide) 6. [Reference]() 7. [Metabarcoding context - discussion to go further]() ## Quick Start See [Install section](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/home#installation) for installation instructions. Download example data: ``` curl -JLO http://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_data_test/-/raw/master/test_rapidrun_data.tar.gz; tar zfxz test_rapidrun_data.tar.gz -C resources/test/ ``` `./resources/test/test_rapidrun_data/`: this folder contains a reference database for 4 markers (Teleo01; Mamm01; Vert01; Chond01), NGS metabarcoding raw data, required metadata to handle demultiplexing on RAPIDRUN format. If you have installed **eDNA-seq Metabarcoding OTU-clustering**, you can run the example data with: ``` snakemake --configfile config/config_test_rapidrun.yaml --cores 4 --use-conda ``` This will generate outputs into `./results` folder. OTU tables are available as `.csv` file while intermediates files are stored into `./results/intermediates` folder. ``` intermediates projet1_mamm_ecotag_ncbi_motu.csv projet1_teleo_table_motu.csv projet1_chond_ecotag_ncbi_motu.csv projet1_mamm_table_motu.csv projet1_vert_ecotag_ncbi_motu.csv projet1_chond_table_motu.csv projet1_teleo_ecotag_ncbi_motu.csv projet1_vert_table_motu.csv ``` ## Next steps Now that you've gotten the example to work, see the [wiki](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/wikis/home) to navigate to the more detailed descriptions and instructions for exploring your own data. ## Credits **eDNA-seq Metabarcoding OTU-clustering** was coded and written by Virginie Marques and Pierre-Edouard Guerin. We thank the following people for their help in the development of this pipeline: Agnes Duhamet, Alice Valentini, Apolline Gorry, Bastien Mace, David Mouillot, Emilie Boulanger, Laetitia Mathon, Laura Benestan, Stephanie Manel, Tony Dejean. ## Contributions and Support :bug: If you are sure you have found a bug, please submit a bug report. You can submit your bug reports on Gitlab [here](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_swarm/-/issues). For further information or help, don't hesitate to get in touch on Slack (you can join with this invite). [![Get help on Slack](https://img.shields.io/badge/slack-cefebev%23metabarcoding_otu-4A154B?logo=slack)](https://cefebev.slack.com/archives/C01NYK8B9K7) ## Citations You can cite the **eDNA-seq Metabarcoding OTU-clustering** publication as follows: > **Blind assessment of vertebrate taxonomic diversity across spatial scales by clustering environmental DNA metabarcoding sequences** > > *Virginie Marques, Pierre‐Edouard Guerin, Mathieu Rocle, Alice Valentini, Stephanie Manel, David Mouillot, Tony Dejean* > > Ecography. 2020 Aug 04. doi: https://doi.org/10.1111/ecog.05049.