@@ -6,7 +6,7 @@ Only_obitools pipeline using NEXTFLOW
1.[Introduction](#1-introduction)
2.[Installation](#2-installation)
1.[Requirements](#21-requirements)
2.[Downloading](#22-downloading)
2.[Initialisation](#22-initialisation)
3.[Reporting bugs](#3-reporting-bugs)
4.[Running the pipeline](#4-running-the-pipeline)
...
...
@@ -16,6 +16,9 @@ Only_obitools pipeline using NEXTFLOW
Here, we reproduce the bioinformatics pipeline used by [SPYGEN](http://www.spygen.com/) to generate species environmental presence from raw eDNA data. This pipeline is based on [OBItools](https://git.metabarcoding.org/obitools/obitools/wikis/home) a set of python programs designed to analyse Next Generation Sequencer outputs (illumina) in the context of DNA Metabarcoding.
This pipeline use the workflow management system [nextflow](https://www.nextflow.io/). So you will need to install it. If you don't want to use a workflow management system, an "only bash" version is alternatively available [here](http://gitlab.mbb.univ-montp2.fr/edna/only_obitools).
# 2. Installation
## 2.1. Requirements
...
...
@@ -26,8 +29,12 @@ programs and libraries you absolutely need are:
In addition, you will need a reference database for taxonomic assignment. You can build a reference database by following the instructions [here](http://gitlab.mbb.univ-montp2.fr/edna/reference_database).
## 2.2. Downloading
## 2.2. Initiatilisation
* open a shell
* make a folder, name it yourself, I named it workdir
- folder which contains reference database files. You can build a reference database by following the instructions [here](http://gitlab.mbb.univ-montp2.fr/edna/reference_database).
- folder which contains pairend-end raw reads `.fastq.gz` files and the sample description `.dat` files. Raw reads files from the same pair must be named as `*_R1.fastq.gz` and `*_R2.fastq.gz` where wildcard `*` is the name of the sequencing run. The alphanumeric order of the names of sample description `.dat` files must be the same than the names of paired-end raw reads `.fastq.gz` files. The sample description file is a text file where each line describes one sample. Columns are separated by space or tab characters. Sample description file is described [here](https:///pythonhosted.org/OBITools/scripts/ngsfilter.html).
5. make sure that the programs stated in the Requirements section below are installed on your machine. After nextflow is downloaded, replace all the "YOUR_***" parts in the following command with your own paths
6. run your command
Demultiplexing and filtering of the eDNA metabarcoding raw data
```
./nextflow run scripts/step1.nf --datafolder 'path/to/fastq/and/dat/files'
```
Outputs are stored into newly created `work/` folder.
Concatenating sample by run id
```
bash scripts/step2.sh
```
Cleaned sequences for each run are stored into newly created `runs/` folder.
Taxonomic assignment and generating matrix species/sample for each run
```
./nextflow run scripts/step3.nf --db_ref /path/to/reference/database/and/prefix --db_fasta /path/to/reference/database/fasta/file
```
To build your own reference database see the details [here](http://gitlab.mbb.univ-montp2.fr/edna/reference_database).
Alternatively, you can run into one single command the whole pipeline by typing :
that's it ! The pipeline is running and crunching your data. Look for the overview.txt or. overview_new.txt in your output folder after the pipeline is finished