README.md 4.29 KB
Newer Older
Bastien Macé's avatar
Bastien Macé committed
1
2
# eDNA_intra_pipeline_comparison

Bastien Macé's avatar
Bastien Macé committed
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
**Bastien Macé, 2021**

_________________________________


# Table of contents

  * [I - Introduction](#intro)
  * [II - Installation](#install)
    + [Preliminary steps for OBITools](#preliminary-steps-for-obitools)
    + [Preliminary steps for DADA2](#preliminary-steps-for-dada2)
    + [Preliminary steps for SWARM](#preliminary-steps-for-swarm)
    + [Preliminary steps for LULU](#preliminary-steps-for-lulu)
    + [Preliminary steps for VSEARCH](#preliminary-steps-for-vsearch)
  * [III - Pre-processing steps](#step1)
  * [IV - Key processing steps](#step2)
    + [IV - 1 - OBITOOLS processing step (Pipelines A)](#step21)
    + [IV - 2 - DADA2 processing step (Pipelines B)](#step22)
    + [IV - 3 - SWARM processing step (Pipelines C)](#step23)
    + [IV - 4 - SWARM + LULU processing step (Pipelines D)](#step24)
  * [V - Post-processing steps](#step3)
    + [V - 1 - No post-processing step (Pipelines A1/B1/C1/D1)](#step31)
    + [V - 2 - Bimeric sequences removal (Pipelines A2/B2/C2/D2)](#step32)
    + [V - 3 - Chimeric sequences removal (Pipelines A3/B3/C3/D3)](#step32)
  * [VI - Analyse your results](#step4)

_________________________________

<a name="intro"></a>
## Introduction

This project aims to compare twelve bioinformatics pipelines based on five existing metabarcoding programs to make recommendations for data management in intraspecific variability studies using environmental DNA.
Bastien Macé's avatar
Bastien Macé committed
35

Bastien Macé's avatar
Bastien Macé committed
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
Data processing is necessary in metabarcoding studies to eliminate false sequences which are generated during amplification and sequencing, and particularly for intraspecific studies from eDNA samples, where the presence of false sequences in the data can over-estimate the intraspecific genetic variability.
That is why there is a need in filtering sequences with bioinformatics pipelines. Different bioinformatics tools have been developped for metabarcoding studies. Here, we propose to compare some of them, by building twelve unique pipelines.

For that, we use the following programs :

- [OBITOOLS](https://git.metabarcoding.org/obitools/obitools/wikis/home) : a set of commands written in python
- [DADA2](https://benjjneb.github.io/dada2/index.html) : a R package
- [SWARM](https://github.com/torognes/swarm) : a command written in C++
- [LULU](https://github.com/tobiasgf/lulu) : a R package
- [VSEARCH](https://github.com/torognes/vsearch) : a set of commands written in C++

In our study, we analyze the results of a paired-end sequencing, after extraction and amplification of filtrated eDNA from aquarium seawater, to detect intraspecific haplotypic variability in *Mullus surmuletus*.

<a name="install"></a>
## Installation

### Preliminary steps for OBITools

Bastien Macé's avatar
Bastien Macé committed
54
You need to have Anaconda installed. If it's not the case, click on this [link](https://www.anaconda.com/products/individual/get-started) and download it. Install the download in your shell, close your shell and reopen it.
Bastien Macé's avatar
Bastien Macé committed
55
56
57
58
59
60
61
62
63
64
65

Verify conda is correctly installed. It should be here :
```
~/anaconda3/bin/conda
```

Write the following line :
```
conda config --set auto_activate_base false
```

Bastien Macé's avatar
Bastien Macé committed
66
Then, create your new environment obitools from your root in your corresponding path. For example :
Bastien Macé's avatar
Bastien Macé committed
67
68
69
70
71
```
ENVYAML=./dada2_and_obitools/obitools_env_conda.yaml
conda env create -f $ENVYAML
```

Bastien Macé's avatar
Bastien Macé committed
72
Now you can activate your environment before starting OBITOOLS commands :
Bastien Macé's avatar
Bastien Macé committed
73
74
75
```
conda activate obitools
```
Bastien Macé's avatar
Bastien Macé committed
76

Bastien Macé's avatar
Bastien Macé committed
77
78
79
80
81
82
And deactivate it :
```
conda deactivate
```
### Preliminary steps for DADA2

Bastien Macé's avatar
Bastien Macé committed
83
84
85
86
87
88
89
You need to have a recent R version (3.6.2 minimum). If it's not the case, click on this [link](hhttps://cran.r-project.org/) and download it.

Then, open your IDE (RStudio for example), and install the package :
```
install.packages("dada2")
```

Bastien Macé's avatar
Bastien Macé committed
90
91
### Preliminary steps for SWARM

Bastien Macé's avatar
Bastien Macé committed
92
93
94
95
96
97
98
Get the compressed packaged on the [creator GitHub](https://github.com/torognes/swarm) in your downloads folder and install it :
```
git clone https://github.com/torognes/swarm.git
cd swarm/
make
```

Bastien Macé's avatar
Bastien Macé committed
99
100
### Preliminary steps for LULU

Bastien Macé's avatar
Bastien Macé committed
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
Open your IDE for R (RStudio for example), and install the package :
```
install.packages("lulu")
```

### Preliminary steps for VSEARCH

Get the compressed packaged on the [creator GitHub](https://github.com/torognes/vsearch) in your downloads folder and install it :
```
git clone https://github.com/torognes/vsearch.git
cd vsearch
./autogen.sh
./configure
make
sudo make install
```