README.md 4.64 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
3
4
5
6
7
Only_obitools pipeline using NEXTFLOW
=====================================

# Table of contents

1. [Introduction](#1-introduction)
2. [Installation](#2-installation)
peguerin's avatar
peguerin committed
8
	1. [Requirements](#21-requirements)
peguerin's avatar
peguerin committed
9
	2. [Initialisation](#22-initialisation)
peguerin's avatar
peguerin committed
10
11
12
13
14
15
16
17
18
3. [Reporting bugs](#3-reporting-bugs)
4. [Running the pipeline](#4-running-the-pipeline)

-----------------

# 1. Introduction

Here, we reproduce the bioinformatics pipeline used by [SPYGEN](http://www.spygen.com/) to generate species environmental presence from raw eDNA data. This pipeline is based on [OBItools](https://git.metabarcoding.org/obitools/obitools/wikis/home) a set of python programs designed to analyse Next Generation Sequencer outputs (illumina) in the context of DNA Metabarcoding.

peguerin's avatar
peguerin committed
19
20
21

This pipeline use the workflow management system [nextflow](https://www.nextflow.io/). So you will need to install it. If you don't want to use a workflow management system, an "only bash" version is alternatively available [here](http://gitlab.mbb.univ-montp2.fr/edna/only_obitools).

peguerin's avatar
peguerin committed
22
23
24
25
26
27
28
29
30
31
# 2. Installation

## 2.1. Requirements

In order to run "only_obitools", you need a couple of programs. Most of
them should be available pre-compiled for your distribution. The
programs and libraries you absolutely need are:

- [OBItools](https://pythonhosted.org/OBITools/welcome.html#installing-the-obitools)

peguerin's avatar
peguerin committed
32
33
34
35
- [Java 8 (or later)](https://www.nextflow.io/docs/latest/getstarted.html)


In addition, you will need a reference database for taxonomic assignment. You can build a reference database by following the instructions [here](http://gitlab.mbb.univ-montp2.fr/edna/reference_database).
peguerin's avatar
peguerin committed
36

peguerin's avatar
peguerin committed
37
## 2.2. Initiatilisation
peguerin's avatar
peguerin committed
38

peguerin's avatar
peguerin committed
39
40
41
42
43
44
45
46
47
48
49
* open a shell
* make a folder, name it yourself, I named it workdir
```
mkdir workdir
cd workdir
```
* clone the project and switch to the main folder, it's your working directory
```
git clone http://gitlab.mbb.univ-montp2.fr/edna/nextflow_obitools.git
cd nextflow_obitools
```
peguerin's avatar
peguerin committed
50
51
52
* define 2 external folders : 
    - folder which contains reference database files. You can build a reference database by following the instructions [here](http://gitlab.mbb.univ-montp2.fr/edna/reference_database).
    - folder which contains pairend-end raw reads `.fastq.gz` files and the sample description `.dat` files. Raw reads files from the same pair must be named as `*_R1.fastq.gz` and `*_R2.fastq.gz` where wildcard `*` is the name of the sequencing run. The alphanumeric order of the names of sample description `.dat` files must be the same than the names of paired-end raw reads `.fastq.gz` files. The sample description file is a text file where each line describes one sample. Columns are separated by space or tab characters. Sample description file is described [here](https:///pythonhosted.org/OBITools/scripts/ngsfilter.html).
peguerin's avatar
peguerin committed
53
54
55
56
57
58
59

# 3. Reporting bugs

If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report.

peguerin's avatar
peguerin committed
60
I use [GitLab's issue system](https://gitlab.mbb.univ-montp2.fr/edna/nextflow_obitools/issues)
peguerin's avatar
peguerin committed
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc


# 4. Running the pipeline

Quickstart

1. create a new folder for nextflow to work in
2. switch to this new folder
3. open a shell
4. type this command to download nextflow into this folder
```
curl -fsSL get.nextflow.io | bash
```
5. make sure that the programs stated in the Requirements section below are installed on your machine. After nextflow is downloaded, replace all the "YOUR_***" parts in the following command with your own paths

6. run your command 
peguerin's avatar
peguerin committed
79
80

Demultiplexing and filtering of the eDNA metabarcoding raw data
peguerin's avatar
peguerin committed
81
82
83
```
./nextflow run scripts/step1.nf  --datafolder 'path/to/fastq/and/dat/files'
```
peguerin's avatar
peguerin committed
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Outputs are stored into newly created `work/` folder.
Concatenating sample by run id
```
bash scripts/step2.sh
```
Cleaned sequences for each run are stored into newly created `runs/` folder.
Taxonomic assignment and generating matrix species/sample for each run
```
./nextflow run scripts/step3.nf  --db_ref /path/to/reference/database/and/prefix --db_fasta /path/to/reference/database/fasta/file

```
To build your own reference database see the details [here](http://gitlab.mbb.univ-montp2.fr/edna/reference_database).

Alternatively, you can run into one single command the whole pipeline by typing :
```
bash main.sh path/to/fastq/and/dat/files /path/to/reference/database/and/prefix /path/to/reference/database/fasta/file
```
peguerin's avatar
peguerin committed
101
102
103

that's it ! The pipeline is running and crunching your data. Look for the overview.txt or. overview_new.txt in your output folder after the pipeline is finished