README.md 6.88 KB
Newer Older
peguerin's avatar
peguerin committed
1
Analyse scripts for "Global patterns of fish genetic diversity increase with “current” temperature"
peguerin's avatar
peguerin committed
2
3
4
5
================================================
2017
This folder contains all the scripts to reproduce all the analysis

peguerin's avatar
peguerin committed
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Table of contents

1. [Introduction](#1-introduction)
2. [Installation](#2-installation)
  1. [Prerequisite](#21-prerequisite)
  2. [Data Files](#22-data-files)
3. [Scripts Code Source](#3-scripts-code-source)

4. [Reporting bugs](#3-reporting-bugs)
5. [Running the pipeline](#4-running-the-pipeline)
  1. [Filter raw data](#51-filter-raw-data)
  2. [Georeferenced sequences alignments by species](#52-data-files)
  3. [Species sequence pairwise comparison](#53-species-sequence-pairwise-comparison)
  4. [Genetic Diversity calculation](#54-genetic-diversity-calculation)
  5. [Statistical analysis](#55-statistical-analysis)

# 1. Introduction

blablabla

# 2. Installation


## 2.1 Prerequisite
You must install the following softwares and packages :

- [JULIA Version 0.5.2](https://julialang.org/)
- [R Version 3.2.3](https://cran.r-project.org/)
    - [R-package]ggplot2 `install.packages("ggplot2")`
    - [R-package]rgeos `install.packages("rgeos")`
    - [R-package]rgdal please download rgdal at [https://cran.r-project.org/web/packages/rgdal/index.html] and then `install.packages("~/Downloads/rgdal_1.1-10.tar.gz")`
    - [R-package]nodiv `install.packages("nodiv")`
    - [R-package]raster `install.packages("raster")`
    - [R-package]lme4 `install.packages("lme4")`
    - [R-package]sp `install.packages("sp")`
    - [R-package]sjPlot `install.packages("sjPlot")`
    - [R-package]FactoMineR `install.packages("FactoMineR")`
    - [R-package]factoextra `install.packages("factoextra")`
    - [R-package]spdep `install.packages("spdep")`
    - [R-package]countrycode `install.packages("countrycode")`
- [Python Version 2.7.12](https://www.python.org/)
peguerin's avatar
peguerin committed
47
- [MUSCLE Version 3.8.31](https://www.drive5.com/muscle/)
peguerin's avatar
peguerin committed
48
49

## 2.2 Data Files
peguerin's avatar
peguerin committed
50
51
52
The included data files are :

* `02-raw_data/seqbold_data.txt`                            : Georeferenced sequences of individuals from the supergroup "actinopterygii" have been downloaded from [http://www.boldsystems.org/index.php/Public_SearchTerms?taxon=&searchMenu=records&query=actinopterygii]
peguerin's avatar
peguerin committed
53
* `01-infos/grid_equalarea200km`                            : Shapefile of worldmap equal area projection epsg:4326 with nested equal area grids (cell sizes of 200km)
peguerin's avatar
peguerin committed
54
55
56
57
58
* `01-infos/ne_110m_land`                                   : Shapefile of worldcoast from [http://www.naturalearthdata.com]
* `01-infos/equalarea_id_coords.tsv`                        : ID and left/right/top/bottom coordinates of each equal area into the shapefile grid_equalarea200km.
* `01-infos/marine_actinopterygii_species.txt`              : List of "actinopterygii" saltwater species according to [http://www.fishbase.org/]
* `01-infos/freshwater_behrman_worldcoast_data_object.RData`: R spatial object from "sp package" which is an equal area grid in Berhmann projection with worldcoast shape and presence/absence of freshwater fish species from [https://www.iucn.org/]
* `01-infos/marine_behrman_worldcoast_data_object.RData`    : ... of marine fish species.
peguerin's avatar
peguerin committed
59
* IL EN MANQUE...
peguerin's avatar
peguerin committed
60

peguerin's avatar
peguerin committed
61
# 3. Scripts Code Source
peguerin's avatar
peguerin committed
62
## 3.1 `00-scripts/step1` : filter raw data
peguerin's avatar
peguerin committed
63

peguerin's avatar
peguerin committed
64
65
66
- BASH scripts
    * `filter_raw_data.sh`       : Keep only the CO1 sequences with lat/lon information
    * `get_geo_coordinates.sh`   : Uses [http://www.geonames.org/] to find missing coordinates of individual sequences from their textual information of location.
peguerin's avatar
peguerin committed
67

peguerin's avatar
peguerin committed
68
69
- PYTHON scripts
    * `lat_long_DMS_DD_converter.py`     : Converts from DMS format to DD format the given coordinates.
peguerin's avatar
peguerin committed
70

peguerin's avatar
peguerin committed
71
## 3.2 `00-scripts/step2` : georeferenced sequences alignments by species
peguerin's avatar
peguerin committed
72

peguerin's avatar
peguerin committed
73
74
75
- BASH scripts
    * `seq_alnt_filtered_data.sh`       : aligns sequences from the same species with MUSCLE and creates coordinates file for each sequence.
    * `cluster_freshwater_vs_marine.sh` : according to a list of marine species, moves the fasta and coords files into marine, freshwater repertories.
peguerin's avatar
peguerin committed
76

peguerin's avatar
peguerin committed
77
78
- PYTHON scripts
    * `fasta_coords_files_species_generator.py` : extracts sequences and associated coordinates from the filtered data.
peguerin's avatar
peguerin committed
79

peguerin's avatar
peguerin committed
80
81
82
- R scripts
    * `equalareacoords.R`      : attributes at each sequence an ID of cell of the shapefile of worldmap equal area projection from its coordinates.
    * `equalarea_worldcoast.R` : relou 
peguerin's avatar
peguerin committed
83

peguerin's avatar
peguerin committed
84
## 3.3 `00-scripts/step3` : filter raw data
peguerin's avatar
peguerin committed
85

peguerin's avatar
peguerin committed
86
87
88
89
- JULIA scripts
    * `Lib_Compare_Pairwise.jl`       : functions to compute the Genetic Diversity value from a set of sequences.
    * `Lib_Create_Master_Matrices.jl` : functions to create master data matrices that are used to compute genetic diversity.
    * `master_matrices.jl`            : generates master data matrices from species sequences alignments.
peguerin's avatar
peguerin committed
90

peguerin's avatar
peguerin committed
91
## 3.4 `00-scripts/step4` : Genetic Diversity calculation
peguerin's avatar
peguerin committed
92

peguerin's avatar
peguerin committed
93
94
- R scripts
    * `gdval_by_site.R` : removes the cells that are not in the area of at least one marine/freshwater species according to IUCN shapefiles, creates the data files used to generate Figure 1
peguerin's avatar
peguerin committed
95

peguerin's avatar
peguerin committed
96
97
98
99
- JULIA scripts
    * `equalarea_numbers.jl`          : attributes mean genetic diversity at each equal area grid cell. Genetic diversity is calculated from master data matrices
    * `metrics_by_area_and_species.jl`: generates files for statistical analysis at next step : mean genetic diversity per cell, genetic diversity per species per cell, number of individuals per species, number of species per cell, cell coordinates, cell ID...
    * `Lib_GD_summary_functions.jl`   : functions to calculate genetic diversity at species level and cell level
peguerin's avatar
peguerin committed
100
101


peguerin's avatar
peguerin committed
102
103
## 3.5 `00-scripts/step5` : Statistical analysis
- R scripts
peguerin's avatar
peguerin committed
104
105
    * `descripteurs.R`
    * `figures.R`
peguerin's avatar
peguerin committed
106

peguerin's avatar
peguerin committed
107
108
109
110
111
112
113
114
115
116
117
118
119
120

# 4. Reporting bugs

If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report.

I use [GitLab's issue system](https://gitlab.com/reservebenefit/worldmap_fish_genetic_diversity/issues)
as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc

# 5. Running the pipeline

## 5.1 Filter raw data
peguerin's avatar
peguerin committed
121
* `bash ./00-scripts/step1/filter_raw_data.sh`
peguerin's avatar
peguerin committed
122
123

## 5.2 Georeferenced sequences alignments by species
peguerin's avatar
peguerin committed
124
125
126
127
128
129
* `bash ./00-scripts/step2/seq_alnt_filtered_data.sh `
* `mkdir ./06-species_alnt_cluster/total`
* `mkdir ./06-species_alnt_cluster/freshwater`
* `mkdir ./06-species_alnt_cluster/marine`
* `bash ./00-scripts/step2/cluster_freshwater_vs_marine.sh`
* `Rscript ./00-scripts/step2/equalareacoords.R`
peguerin's avatar
peguerin committed
130
131

## 5.3 Species sequence pairwise comparison
peguerin's avatar
peguerin committed
132
* `julia ./00-scripts/step3/master_matrices.jl`
peguerin's avatar
peguerin committed
133
134

## 5.4 Genetic Diversity calculation
peguerin's avatar
peguerin committed
135
136
137
* `julia ./00-scripts/step4/equalarea_numbers.jl`
* `bash ./00-scripts/step4/gdval_by_site.sh`
* `julia ./00-scripts/step4/metrics_by_area_and_species.jl`
peguerin's avatar
peguerin committed
138
139

## 5.5 Statistical analysis
peguerin's avatar
peguerin committed
140
141
* `Rscript ./00-scripts/step5/descripteurs.R`
* `Rscript ./00-scripts/step5/figures.R`
peguerin's avatar
peguerin committed
142