README.md 3.16 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
3
4
5
6
7
8
9
Only_obitools pipeline
======================

# Table of contents

1. [Introduction](#1-introduction)
2. [Installation](#2-installation)
3. [Reporting bugs](#3-reporting-bugs)
4. [Running the pipeline](#4-running-the-pipeline)
peguerin's avatar
peguerin committed
10
5. [Results](#5-results)
peguerin's avatar
peguerin committed
11
12
13
14
15

-----------------

# 1. Introduction

peguerin's avatar
peguerin committed
16
Here, we reproduce the bioinformatics pipeline used by [SPYGEN](http://www.spygen.com/) to generate species environmental presence from raw eDNA data. This pipeline is based on [OBItools](https://git.metabarcoding.org/obitools/obitools/wikis/home) a set of python programs designed to analyse Next Generation Sequencer outputs (illumina) in the context of DNA Metabarcoding.
peguerin's avatar
peguerin committed
17
18
19
20
21
22
23
24
25
26


# 2. Installation

In order to run "only_obitools", you need a couple of programs. Most of
them should be available pre-compiled for your distribution. The
programs and libraries you absolutely need are:

- [OBItools](https://pythonhosted.org/OBITools/welcome.html#installing-the-obitools)

peguerin's avatar
peguerin committed
27
- [GNU Parallel](https://www.gnu.org/software/parallel/)
peguerin's avatar
peguerin committed
28

peguerin's avatar
peguerin committed
29
In addition, you will need a reference database for taxonomic assignment. You can build a reference database by following the instructions [here](http://gitlab.mbb.univ-montp2.fr/edna/reference_database).
peguerin's avatar
peguerin committed
30
31
32
33
34
35
36
37
38
39
40
41
42
43


# 3. Reporting bugs

If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report.

I use [GitLab's issue system](https://gitlab.com/edna/only_obitools/issues)
as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc

# 4. Running the pipeline

peguerin's avatar
peguerin committed
44
45
46
47
48
49
50
51
* open a shell
* make a folder, name it yourself, I named it workdir
```
mkdir workdir
cd workdir
```
* clone the project and switch to the main folder, it's your working directory
```
peguerin's avatar
peguerin committed
52
git clone http://gitlab.mbb.univ-montp2.fr/edna/only_obitools.git
peguerin's avatar
peguerin committed
53
54
55
cd only_obitools
```
* define 2 folders : 
peguerin's avatar
peguerin committed
56
    - folder which contains reference database files. You can build a reference database by following the instructions [here](http://gitlab.mbb.univ-montp2.fr/edna/reference_database).
peguerin's avatar
peguerin committed
57
58
59
    - folder which contains pairend-end raw reads `.fastq.gz` files and the sample description `.dat` files. Raw reads files from the same pair must be named as `*_R1.fastq.gz` and `*_R2.fastq.gz` where wildcard `*` is the name of the sequencing run. The alphanumeric order of the names of sample description `.dat` files must be the same than the names of paired-end raw reads `.fastq.gz` files. The sample description file is a text file where each line describes one sample. Columns are separated by space or tab characters. Sample description file is described [here](https://pythonhosted.org/OBITools/scripts/ngsfilter.html).
* run the pipeline :
```
peguerin's avatar
peguerin committed
60
bash pipeline.sh /path/to/data /path/to/reference_database
peguerin's avatar
peguerin committed
61
62
63
64
65

```
order of arguments is important : 
1. absolute path to the folder which contains paired-end raw reads files and sample description file 
2. absolute path to the folder which contains reference database files
peguerin's avatar
peguerin committed
66
67
68
69
70
71
72


# 5. Results

* [main](main) : contains intermediate files

* [final](main) : contains all the matrix species/sample for each sequencing run