README.md 3.63 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# STACKS2 using SNAKEMAKE Workflow

RADseq workflow using [STACKS2](http://creskolab.uoregon.edu/stacks/)
This was designed to process RADseq data from [RESERVEBENEFIT](https://www.biodiversa.org/1023) project.



# Table of contents

1. [Introduction](#1-introduction)
2. [Installation](#2-installation)
  1. [Prerequisite](#21-prerequisite)
  2. [Data Files](#22-data-files)
  3. [Set up](#23-set-up)
3. [Reporting bugs](#4-reporting-bugs)
4. [Running the pipeline](#5-running-the-pipeline)
  1. [Filter raw data](#51-filter-raw-data)
  2. [Georeferenced sequences alignments by species](#52-data-files)
  3. [Species sequence pairwise comparison](#53-species-sequence-pairwise-comparison)
  4. [Genetic Diversity calculation](#54-genetic-diversity-calculation)
  5. [Statistical analysis](#55-statistical-analysis)



# 1. Introduction

blablabla


# 2. Installation


## 2.1 Prerequisite
You must install the following softwares and packages :

- [SNAKEMAKE 5.3.0](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html)
Check version and if the program is correctly installed by typing :
```
snakemake --version
## should give you the output
5.3.0
```

- [STACKS 2.0b](http://catchenlab.life.illinois.edu/stacks/)
Check version and if programs are correctly installed by typing :
```
process_radtags --version
clone_filter --version
gstacks --version
populations --version
## should give you the output
2.0b
```

- [BWA 0.7.17](https://icb.med.cornell.edu/wiki/index.php/Elementolab/BWA_tutorial)
Download `bwa` at: http://sourceforge.net/projects/bio-bwa/files/
```
tar -xvf bwa-x.x.x.tar.bz2   
cd bwa-x.x.x
./configure --prefix=/where/to/install
make  
make install
```
Check version and if programs are correctly installed by typing :
```
bwa
## should give you the output
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.17-r1188
...
```
- [SAMTOOLS](http://www.htslib.org/)
Download `htslib` and `samtools` at : http://www.htslib.org/download/
Building each desired package from source is very simple:
```
cd htslib-1.x
./configure --prefix=/where/to/install
make
make install
cd ..
## and similarly for samtools :
cd samtools-1.x
./configure --prefix=/where/to/install
make
make install
```
Check version and if programs are correctly installed by typing :
```
samtools --version
## should give you the output
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.
```

## 2.2 Data Files
The included data files are :

* [config.yaml](01-info_files/config.yaml) :
* [barcodes.txt](01-info_files/barcodes.txt) :
* [infos.csv](01-info_files) :
* [populations_map.txt](01-info_files) :

## 2.3 Set Up

clone the project and switch to the main folder, it's your working directory
```
git clone dzdzdzworldmap_fish_genetic_diversity.git
cd snakemake_stacks2
```

# 4. Reporting bugs

If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report.

I use [GitLab's issue system](https://gitlab.com/reservebenefit/worldmap_fish_genetic_diversity/issues)
as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc


# 4. Running the pipeline

Quickstart

* open a shell
* make a folder, name it yourself, I named it workdir

```
mkdir workdir
cd workdir
```
* clone the project and switch to the main folder, it's your working directory

```
git clone 
cd snakemake_stacks2
```
WORK IN PROGRESS !!!!

that's it ! The pipeline is running and crunching your data. Look for the log folder output folder after the pipeline is finished.