README.md 8.14 KB
Newer Older
peguerin's avatar
peguerin committed
1
Analyse scripts for "Global patterns of fish genetic diversity increase with “current” temperature"
peguerin's avatar
peguerin committed
2
3
4
5
================================================
2017
This folder contains all the scripts to reproduce all the analysis

peguerin's avatar
peguerin committed
6
7
8
9
10
11
# Table of contents

1. [Introduction](#1-introduction)
2. [Installation](#2-installation)
  1. [Prerequisite](#21-prerequisite)
  2. [Data Files](#22-data-files)
peguerin's avatar
peguerin committed
12
  3. [Set up](#23-set-up)
peguerin's avatar
peguerin committed
13
3. [Scripts Code Source](#3-scripts-code-source)
peguerin's avatar
peguerin committed
14
15
4. [Reporting bugs](#4-reporting-bugs)
5. [Running the pipeline](#5-running-the-pipeline)
peguerin's avatar
peguerin committed
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
  1. [Filter raw data](#51-filter-raw-data)
  2. [Georeferenced sequences alignments by species](#52-data-files)
  3. [Species sequence pairwise comparison](#53-species-sequence-pairwise-comparison)
  4. [Genetic Diversity calculation](#54-genetic-diversity-calculation)
  5. [Statistical analysis](#55-statistical-analysis)

# 1. Introduction

blablabla

# 2. Installation


## 2.1 Prerequisite
You must install the following softwares and packages :

- [JULIA Version 0.5.2](https://julialang.org/)
- [R Version 3.2.3](https://cran.r-project.org/)
    - [R-package]ggplot2 `install.packages("ggplot2")`
    - [R-package]rgeos `install.packages("rgeos")`
peguerin's avatar
peguerin committed
36
    - [R-package]rgdal please download rgdal at (https://cran.r-project.org/web/packages/rgdal/index.html) and then `install.packages("~/Downloads/rgdal_1.1-10.tar.gz")`
peguerin's avatar
peguerin committed
37
38
39
40
41
42
43
44
45
46
    - [R-package]nodiv `install.packages("nodiv")`
    - [R-package]raster `install.packages("raster")`
    - [R-package]lme4 `install.packages("lme4")`
    - [R-package]sp `install.packages("sp")`
    - [R-package]sjPlot `install.packages("sjPlot")`
    - [R-package]FactoMineR `install.packages("FactoMineR")`
    - [R-package]factoextra `install.packages("factoextra")`
    - [R-package]spdep `install.packages("spdep")`
    - [R-package]countrycode `install.packages("countrycode")`
- [Python Version 2.7.12](https://www.python.org/)
peguerin's avatar
peguerin committed
47
- [MUSCLE Version 3.8.31](https://www.drive5.com/muscle/)
peguerin's avatar
peguerin committed
48
49

## 2.2 Data Files
peguerin's avatar
peguerin committed
50
51
52
The included data files are :

* `02-raw_data/seqbold_data.txt`                            : Georeferenced sequences of individuals from the supergroup "actinopterygii" have been downloaded from [http://www.boldsystems.org/index.php/Public_SearchTerms?taxon=&searchMenu=records&query=actinopterygii]
peguerin's avatar
peguerin committed
53
* `01-infos/grid_equalarea200km`                            : Shapefile of worldmap equal area projection epsg:4326 with nested equal area grids (cell sizes of 200km)
peguerin's avatar
peguerin committed
54
55
56
57
58
* `01-infos/ne_110m_land`                                   : Shapefile of worldcoast from [http://www.naturalearthdata.com]
* `01-infos/equalarea_id_coords.tsv`                        : ID and left/right/top/bottom coordinates of each equal area into the shapefile grid_equalarea200km.
* `01-infos/marine_actinopterygii_species.txt`              : List of "actinopterygii" saltwater species according to [http://www.fishbase.org/]
* `01-infos/freshwater_behrman_worldcoast_data_object.RData`: R spatial object from "sp package" which is an equal area grid in Berhmann projection with worldcoast shape and presence/absence of freshwater fish species from [https://www.iucn.org/]
* `01-infos/marine_behrman_worldcoast_data_object.RData`    : ... of marine fish species.
peguerin's avatar
peguerin committed
59
* IL EN MANQUE...
peguerin's avatar
peguerin committed
60

peguerin's avatar
peguerin committed
61
62
63
64
65
66
67
68
69
70
71
## 2.3 Set Up
clone the project and switch to the main folder, it's your working directory

```
git clone http://gitlab.mbb.univ-montp2.fr/reservebenefit/worldmap_fish_genetic_diversity.git
cd worldmap_fish_genetic_diversity
```

Then you will need to download georeferenced sequences of actinopterygii individuals "combined TSV file" from [http://www.boldsystems.org/index.php/Public_SearchTerms?taxon=&searchMenu=records&query=actinopterygii]
Store it into the folder `02-raw_data` and rename it `seqbold_data.txt`

peguerin's avatar
peguerin committed
72
You're ready to run the analysis. Now follow the instructions at [Running the pipeline](#5-running-the-pipeline)
peguerin's avatar
peguerin committed
73
74


peguerin's avatar
peguerin committed
75
# 3. Scripts Code Source
peguerin's avatar
peguerin committed
76
## 3.1 [00-scripts/step1](00-scripts/step1) : filter raw data
peguerin's avatar
peguerin committed
77

peguerin's avatar
peguerin committed
78
- BASH scripts
peguerin's avatar
peguerin committed
79
    * [filter_raw_data.sh](00-scripts/step1/filter_raw_data.sh)       : Keep only the CO1 sequences with lat/lon information
peguerin's avatar
peguerin committed
80
    * [get_geonames_coordinates.sh](00-scripts/step1/get_geonames_coordinates.sh)   : Uses [http://www.geonames.org/] to find missing coordinates of individual sequences from their textual information of location.
peguerin's avatar
peguerin committed
81

peguerin's avatar
peguerin committed
82
- PYTHON scripts
peguerin's avatar
peguerin committed
83
    * [lat_long_DMS_DD_converter.py](00-scripts/step1/lat_long_DMS_DD_converter.py)     : Converts from DMS format to DD format the given coordinates.
peguerin's avatar
peguerin committed
84

peguerin's avatar
peguerin committed
85
## 3.2 [00-scripts/step2](00-scripts/step2) : georeferenced sequences alignments by species
peguerin's avatar
peguerin committed
86

peguerin's avatar
peguerin committed
87
- BASH scripts
peguerin's avatar
peguerin committed
88
89
    * [seq_alnt_filtered_data.sh](00-scripts/step2/seq_alnt_filtered_data.sh)     : aligns sequences from the same species with MUSCLE and creates coordinates file for each sequence.
    * [cluster_freshwater_vs_marine.sh](00-scripts/step2/cluster_freshwater_vs_marine.sh) : according to a list of marine species, moves the fasta and coords files into marine, freshwater repertories.
peguerin's avatar
peguerin committed
90

peguerin's avatar
peguerin committed
91
- PYTHON scripts
peguerin's avatar
peguerin committed
92
    * [fasta_coords_files_species_generator.py](00-scripts/step2/fasta_coords_files_species_generator.py) : extracts sequences and associated coordinates from the filtered data.
peguerin's avatar
peguerin committed
93

peguerin's avatar
peguerin committed
94
- R scripts
peguerin's avatar
peguerin committed
95
    * [equalareacoords.R](00-scripts/step2/equalareacoords.R)    : attributes at each sequence an ID of cell of the shapefile of worldmap equal area projection from its coordinates.
peguerin's avatar
peguerin committed
96

peguerin's avatar
peguerin committed
97
## 3.3 [00-scripts/step3](00-scripts/step3) : species sequence pairwise comparison
peguerin's avatar
peguerin committed
98

peguerin's avatar
peguerin committed
99
- JULIA scripts
peguerin's avatar
peguerin committed
100
101
102
    * [Lib_Compare_Pairwise.jl](00-scripts/step3/Lib_Compare_Pairwise.jl)      : functions to compute the Genetic Diversity value from a set of sequences.
    * [Lib_Create_Master_Matrices.jl](00-scripts/step3/Lib_Create_Master_Matrices.jl) : functions to create master data matrices that are used to compute genetic diversity.
    * [master_matrices.jl](00-scripts/step3/master_matrices.jl)            : generates master data matrices from species sequences alignments.
peguerin's avatar
peguerin committed
103

peguerin's avatar
peguerin committed
104
## 3.4 [00-scripts/step4](00-scripts/step4) : genetic Diversity calculation
peguerin's avatar
peguerin committed
105

peguerin's avatar
peguerin committed
106
- BASH scripts
peguerin's avatar
peguerin committed
107
    * [gdval_by_site.sh](00-scripts/step4/gdval_by_site.sh) : generates CSV files with 2 columns : cell ID and mean genetic diversity per species into the cell
peguerin's avatar
peguerin committed
108

peguerin's avatar
peguerin committed
109
- JULIA scripts
peguerin's avatar
peguerin committed
110
111
112
    * [equalarea_numbers.jl](00-scripts/step4/equalarea_numbers.jl)         : attributes mean genetic diversity at each equal area grid cell. Genetic diversity is calculated from master data matrices
    * [metrics_by_area_and_species.jl](00-scripts/step4/metrics_by_area_and_species.jl): generates files for statistical analysis at next step : mean genetic diversity per cell, genetic diversity per species per cell, number of individuals per species, number of species per cell, cell coordinates, cell ID...
    * [Lib_GD_summary_functions.jl](00-scripts/step4/Lib_GD_summary_functions.jl)   : functions to calculate genetic diversity at species level and cell level
peguerin's avatar
peguerin committed
113
114


peguerin's avatar
peguerin committed
115
## 3.5 [00-scripts/step5](00-scripts/step5) : statistical analysis
peguerin's avatar
peguerin committed
116
- R scripts
peguerin's avatar
peguerin committed
117
    * `descripteurs.R`
peguerin's avatar
peguerin committed
118
    * `figures.R` WORK IN PROGRESS !!!
peguerin's avatar
peguerin committed
119

peguerin's avatar
peguerin committed
120
121
122
123
124
125
126
127
128
129
130
131
132
133

# 4. Reporting bugs

If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report.

I use [GitLab's issue system](https://gitlab.com/reservebenefit/worldmap_fish_genetic_diversity/issues)
as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc

# 5. Running the pipeline

## 5.1 Filter raw data
peguerin's avatar
peguerin committed
134
135
136
```
bash ./00-scripts/step1/filter_raw_data.sh
```
peguerin's avatar
peguerin committed
137
138

## 5.2 Georeferenced sequences alignments by species
peguerin's avatar
peguerin committed
139
140
141
142
143
144
145
146
```
bash ./00-scripts/step2/seq_alnt_filtered_data.sh
mkdir ./06-species_alnt_cluster/total
mkdir ./06-species_alnt_cluster/freshwater
mkdir ./06-species_alnt_cluster/marine
bash ./00-scripts/step2/cluster_freshwater_vs_marine.sh
Rscript ./00-scripts/step2/equalareacoords.R
```
peguerin's avatar
peguerin committed
147
## 5.3 Species sequence pairwise comparison
peguerin's avatar
peguerin committed
148
149
150
```
julia ./00-scripts/step3/master_matrices.jl
```
peguerin's avatar
peguerin committed
151
152

## 5.4 Genetic Diversity calculation
peguerin's avatar
peguerin committed
153
154
155
156
157
```
julia ./00-scripts/step4/equalarea_numbers.jl
bash ./00-scripts/step4/gdval_by_site.sh
julia ./00-scripts/step4/metrics_by_area_and_species.jl
```
peguerin's avatar
peguerin committed
158
## 5.5 Statistical analysis
peguerin's avatar
peguerin committed
159
160
161
162
```
Rscript ./00-scripts/step5/descripteurs.R
Rscript ./00-scripts/step5/figures.R
```