README.md 8.2 KB
Newer Older
peguerin's avatar
peguerin committed
1
Analyse scripts for "Global patterns of fish genetic diversity increase with “current” temperature"
peguerin's avatar
peguerin committed
2
3
4
5
================================================
2017
This folder contains all the scripts to reproduce all the analysis

peguerin's avatar
peguerin committed
6
7
8
9
10
11
# Table of contents

1. [Introduction](#1-introduction)
2. [Installation](#2-installation)
  1. [Prerequisite](#21-prerequisite)
  2. [Data Files](#22-data-files)
peguerin's avatar
peguerin committed
12
  3. [Set up](#23-set-up)
peguerin's avatar
peguerin committed
13
3. [Scripts Code Source](#3-scripts-code-source)
peguerin's avatar
peguerin committed
14
15
4. [Reporting bugs](#4-reporting-bugs)
5. [Running the pipeline](#5-running-the-pipeline)
peguerin's avatar
peguerin committed
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
  1. [Filter raw data](#51-filter-raw-data)
  2. [Georeferenced sequences alignments by species](#52-data-files)
  3. [Species sequence pairwise comparison](#53-species-sequence-pairwise-comparison)
  4. [Genetic Diversity calculation](#54-genetic-diversity-calculation)
  5. [Statistical analysis](#55-statistical-analysis)

# 1. Introduction

blablabla

# 2. Installation


## 2.1 Prerequisite
You must install the following softwares and packages :

- [JULIA Version 0.5.2](https://julialang.org/)
- [R Version 3.2.3](https://cran.r-project.org/)
    - [R-package]ggplot2 `install.packages("ggplot2")`
    - [R-package]rgeos `install.packages("rgeos")`
    - [R-package]rgdal please download rgdal at [https://cran.r-project.org/web/packages/rgdal/index.html] and then `install.packages("~/Downloads/rgdal_1.1-10.tar.gz")`
    - [R-package]nodiv `install.packages("nodiv")`
    - [R-package]raster `install.packages("raster")`
    - [R-package]lme4 `install.packages("lme4")`
    - [R-package]sp `install.packages("sp")`
    - [R-package]sjPlot `install.packages("sjPlot")`
    - [R-package]FactoMineR `install.packages("FactoMineR")`
    - [R-package]factoextra `install.packages("factoextra")`
    - [R-package]spdep `install.packages("spdep")`
    - [R-package]countrycode `install.packages("countrycode")`
- [Python Version 2.7.12](https://www.python.org/)
peguerin's avatar
peguerin committed
47
- [MUSCLE Version 3.8.31](https://www.drive5.com/muscle/)
peguerin's avatar
peguerin committed
48
49

## 2.2 Data Files
peguerin's avatar
peguerin committed
50
51
52
The included data files are :

* `02-raw_data/seqbold_data.txt`                            : Georeferenced sequences of individuals from the supergroup "actinopterygii" have been downloaded from [http://www.boldsystems.org/index.php/Public_SearchTerms?taxon=&searchMenu=records&query=actinopterygii]
peguerin's avatar
peguerin committed
53
* `01-infos/grid_equalarea200km`                            : Shapefile of worldmap equal area projection epsg:4326 with nested equal area grids (cell sizes of 200km)
peguerin's avatar
peguerin committed
54
55
56
57
58
* `01-infos/ne_110m_land`                                   : Shapefile of worldcoast from [http://www.naturalearthdata.com]
* `01-infos/equalarea_id_coords.tsv`                        : ID and left/right/top/bottom coordinates of each equal area into the shapefile grid_equalarea200km.
* `01-infos/marine_actinopterygii_species.txt`              : List of "actinopterygii" saltwater species according to [http://www.fishbase.org/]
* `01-infos/freshwater_behrman_worldcoast_data_object.RData`: R spatial object from "sp package" which is an equal area grid in Berhmann projection with worldcoast shape and presence/absence of freshwater fish species from [https://www.iucn.org/]
* `01-infos/marine_behrman_worldcoast_data_object.RData`    : ... of marine fish species.
peguerin's avatar
peguerin committed
59
* IL EN MANQUE...
peguerin's avatar
peguerin committed
60

peguerin's avatar
peguerin committed
61
62
63
64
65
66
67
68
69
70
71
## 2.3 Set Up
clone the project and switch to the main folder, it's your working directory

```
git clone http://gitlab.mbb.univ-montp2.fr/reservebenefit/worldmap_fish_genetic_diversity.git
cd worldmap_fish_genetic_diversity
```

Then you will need to download georeferenced sequences of actinopterygii individuals "combined TSV file" from [http://www.boldsystems.org/index.php/Public_SearchTerms?taxon=&searchMenu=records&query=actinopterygii]
Store it into the folder `02-raw_data` and rename it `seqbold_data.txt`

peguerin's avatar
peguerin committed
72
You're ready to run the analysis. Now follow the instructions at [Running the pipeline](#5-running-the-pipeline)
peguerin's avatar
peguerin committed
73
74


peguerin's avatar
peguerin committed
75
# 3. Scripts Code Source
peguerin's avatar
peguerin committed
76
## 3.1 [00-scripts/step1](00-scripts/step1) : filter raw data
peguerin's avatar
peguerin committed
77

peguerin's avatar
peguerin committed
78
- BASH scripts
peguerin's avatar
peguerin committed
79
    * [filter_raw_data.sh](00-scripts/step1/filter_raw_data.sh)       : Keep only the CO1 sequences with lat/lon information
peguerin's avatar
peguerin committed
80
    * [get_geonames_coordinates.sh](00-scripts/step1/get_geonames_coordinates.sh)   : Uses [http://www.geonames.org/] to find missing coordinates of individual sequences from their textual information of location.
peguerin's avatar
peguerin committed
81

peguerin's avatar
peguerin committed
82
- PYTHON scripts
peguerin's avatar
peguerin committed
83
    * [lat_long_DMS_DD_converter.py](00-scripts/step1/lat_long_DMS_DD_converter.py)     : Converts from DMS format to DD format the given coordinates.
peguerin's avatar
peguerin committed
84

peguerin's avatar
peguerin committed
85
## 3.2 [00-scripts/step2](00-scripts/step2) : georeferenced sequences alignments by species
peguerin's avatar
peguerin committed
86

peguerin's avatar
peguerin committed
87
- BASH scripts
peguerin's avatar
peguerin committed
88
89
    * [seq_alnt_filtered_data.sh](00-scripts/step2/seq_alnt_filtered_data.sh)     : aligns sequences from the same species with MUSCLE and creates coordinates file for each sequence.
    * [cluster_freshwater_vs_marine.sh](00-scripts/step2/cluster_freshwater_vs_marine.sh) : according to a list of marine species, moves the fasta and coords files into marine, freshwater repertories.
peguerin's avatar
peguerin committed
90

peguerin's avatar
peguerin committed
91
- PYTHON scripts
peguerin's avatar
peguerin committed
92
    * [fasta_coords_files_species_generator.py](00-scripts/step2/fasta_coords_files_species_generator.py) : extracts sequences and associated coordinates from the filtered data.
peguerin's avatar
peguerin committed
93

peguerin's avatar
peguerin committed
94
- R scripts
peguerin's avatar
peguerin committed
95
    * [equalareacoords.R](00-scripts/step2/equalareacoords.R)    : attributes at each sequence an ID of cell of the shapefile of worldmap equal area projection from its coordinates.
peguerin's avatar
peguerin committed
96

peguerin's avatar
peguerin committed
97
## 3.3 [00-scripts/step3](00-scripts/step3) : species sequence pairwise comparison
peguerin's avatar
peguerin committed
98

peguerin's avatar
peguerin committed
99
- JULIA scripts
peguerin's avatar
peguerin committed
100
101
102
    * [Lib_Compare_Pairwise.jl](00-scripts/step3/Lib_Compare_Pairwise.jl)      : functions to compute the Genetic Diversity value from a set of sequences.
    * [Lib_Create_Master_Matrices.jl](00-scripts/step3/Lib_Create_Master_Matrices.jl) : functions to create master data matrices that are used to compute genetic diversity.
    * [master_matrices.jl](00-scripts/step3/master_matrices.jl)            : generates master data matrices from species sequences alignments.
peguerin's avatar
peguerin committed
103

peguerin's avatar
peguerin committed
104
## 3.4 [00-scripts/step4](00-scripts/step4) : genetic Diversity calculation
peguerin's avatar
peguerin committed
105

peguerin's avatar
peguerin committed
106
- R scripts
peguerin's avatar
peguerin committed
107
    * [gdval_by_site.R](00-scripts/step4/gdval_by_site.R) : removes the cells that are not in the area of at least one marine/freshwater species according to IUCN shapefiles, creates the data files used to generate Figure 1
peguerin's avatar
peguerin committed
108

peguerin's avatar
peguerin committed
109
- JULIA scripts
peguerin's avatar
peguerin committed
110
111
112
    * [equalarea_numbers.jl](00-scripts/step4/equalarea_numbers.jl)         : attributes mean genetic diversity at each equal area grid cell. Genetic diversity is calculated from master data matrices
    * [metrics_by_area_and_species.jl](00-scripts/step4/metrics_by_area_and_species.jl): generates files for statistical analysis at next step : mean genetic diversity per cell, genetic diversity per species per cell, number of individuals per species, number of species per cell, cell coordinates, cell ID...
    * [Lib_GD_summary_functions.jl](00-scripts/step4/Lib_GD_summary_functions.jl)   : functions to calculate genetic diversity at species level and cell level
peguerin's avatar
peguerin committed
113
114


peguerin's avatar
peguerin committed
115
## 3.5 [00-scripts/step5](00-scripts/step5) : statistical analysis
peguerin's avatar
peguerin committed
116
- R scripts
peguerin's avatar
peguerin committed
117
    * `descripteurs.R`
peguerin's avatar
peguerin committed
118
    * `figures.R` WORK IN PROGRESS !!!
peguerin's avatar
peguerin committed
119

peguerin's avatar
peguerin committed
120
121
122
123
124
125
126
127
128
129
130
131
132
133

# 4. Reporting bugs

If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report.

I use [GitLab's issue system](https://gitlab.com/reservebenefit/worldmap_fish_genetic_diversity/issues)
as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc

# 5. Running the pipeline

## 5.1 Filter raw data
peguerin's avatar
peguerin committed
134
135
136
```
bash ./00-scripts/step1/filter_raw_data.sh
```
peguerin's avatar
peguerin committed
137
138

## 5.2 Georeferenced sequences alignments by species
peguerin's avatar
peguerin committed
139
140
141
142
143
144
145
146
```
bash ./00-scripts/step2/seq_alnt_filtered_data.sh
mkdir ./06-species_alnt_cluster/total
mkdir ./06-species_alnt_cluster/freshwater
mkdir ./06-species_alnt_cluster/marine
bash ./00-scripts/step2/cluster_freshwater_vs_marine.sh
Rscript ./00-scripts/step2/equalareacoords.R
```
peguerin's avatar
peguerin committed
147
## 5.3 Species sequence pairwise comparison
peguerin's avatar
peguerin committed
148
149
150
```
julia ./00-scripts/step3/master_matrices.jl
```
peguerin's avatar
peguerin committed
151
152

## 5.4 Genetic Diversity calculation
peguerin's avatar
peguerin committed
153
154
155
156
157
```
julia ./00-scripts/step4/equalarea_numbers.jl
bash ./00-scripts/step4/gdval_by_site.sh
julia ./00-scripts/step4/metrics_by_area_and_species.jl
```
peguerin's avatar
peguerin committed
158
## 5.5 Statistical analysis
peguerin's avatar
peguerin committed
159
160
161
162
```
Rscript ./00-scripts/step5/descripteurs.R
Rscript ./00-scripts/step5/figures.R
```