README.md 3.16 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Only_obitools pipeline using SNAKEMAKE
======================================

# Table of contents

1. [Introduction](#1-introduction)
2. [Installation](#2-installation)
3. [Reporting bugs](#3-reporting-bugs)
4. [Running the pipeline](#4-running-the-pipeline)

-----------------

# 1. Introduction

Here, we reproduce the bioinformatics pipeline used by [SPYGEN](http://www.spygen.com/) to generate species environmental presence from raw eDNA data. This pipeline is based on [OBItools](https://git.metabarcoding.org/obitools/obitools/wikis/home) a set of python programs designed to analyse Next Generation Sequencer outputs (illumina) in the context of DNA Metabarcoding.


# 2. Installation

In order to run "snakemake_only_obitools", you need a couple of programs. Most of
them should be available pre-compiled for your distribution. The
programs and libraries you absolutely need are:

- [python3](https://www.python.org/download/releases/3.0/)

- [OBItools](https://pythonhosted.org/OBITools/welcome.html#installing-the-obitools)

- [snakemake](https://bitbucket.org/snakemake/snakemake)


# 3. Reporting bugs

If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report.

I use [GitLab's issue system](https://gitlab.com/edna/only_obitools/issues)
as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc

# 4. Running the pipeline

Quickstart

1. open a shell
peguerin's avatar
peguerin committed
46
47
2. make a folder, name it yourself, I named it workdir

peguerin's avatar
peguerin committed
48
49
50
51
```
mkdir workdir
cd workdir
```
peguerin's avatar
peguerin committed
52

peguerin's avatar
peguerin committed
53
3. clone the project and switch to the main folder, it's your working directory
peguerin's avatar
peguerin committed
54

peguerin's avatar
peguerin committed
55
56
57
58
```
git clone http://gitlab.mbb.univ-montp2.fr/edna/snakemake_only_obitools.git
cd snakemake_only_obitools
```
peguerin's avatar
peguerin committed
59

peguerin's avatar
peguerin committed
60
4. define 2 folders : 
peguerin's avatar
peguerin committed
61
62
    - folder which contains reference database files. You can built a reference database by following the instructions [here](projet_builtdatabase).
    - folder which contains pairend-end raw reads `.fastq.gz` files and the sample description `.dat` files. Raw reads files from the same pair must be named as `*_R1.fastq.gz` and `*_R2.fastq.gz` where wildcard `*` is the name of the sequencing run. The alphanumeric order of the names of sample description `.dat` files must be the same than the names of paired-end raw reads `.fastq.gz` files. The sample description file is a text file where each line describes one sample. Columns are separated by space or tab characters. Sample description file is described [here](https://pythonhosted.org/OBITools/scripts/ngsfilter.html).
peguerin's avatar
peguerin committed
63
    - 
peguerin's avatar
peguerin committed
64
5. run the pipeline :
peguerin's avatar
peguerin committed
65

peguerin's avatar
peguerin committed
66
67
68
69
```
bash main.sh /path/to/fastq_dat_files /path/to/reference_database_folder 16
```
order of arguments is important : 1) path to the folder which contains paired-end raw reads files and sample description file 2) path to the folder which contains reference database files 3) number of available cores (here for instance 16 cores)
peguerin's avatar
peguerin committed
70

peguerin's avatar
peguerin committed
71
6. run the pipeline step by step :
peguerin's avatar
peguerin committed
72
73
74
75
open the file `main.sh` to see details

that's it ! The pipeline is running and crunching your data. Look for the log folder output folder after the pipeline is finished.