README.md 5.63 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
Analyse scripts for map marine genetic diversity
================================================

Developed by [Pierre-Edouard Guerin](https://gitlab.com/peguerin)

2017

This folder contains all the scripts to reproduce all the analysis

## Prerequisite
You must install the following softwares :

### JULIA Version 0.5.2
[https://julialang.org/]
### R Version 3.2.3
https://cran.r-project.org/
install.packages("ggplot2")
install.packages("rgeos")
please download rgdal at [https://cran.r-project.org/web/packages/rgdal/index.html]
install.packages("~/Downloads/rgdal_1.1-10.tar.gz")
install.packages("nodiv")
install.packages("Cairo")
### Python Version 2.7.12
[https://www.python.org/]
### MUSCLE Version 3.8.31
[https://www.drive5.com/muscle/downloads.html]


## Data Files
The included data files are :

* `02-raw_data/seqbold_data.txt`                            : Georeferenced sequences of individuals from the supergroup "actinopterygii" have been downloaded from [http://www.boldsystems.org/index.php/Public_SearchTerms?taxon=&searchMenu=records&query=actinopterygii]
* `01-infos/grid_equalarea200km`                             : Shapefile of worldmap equal area projection epsg:4326 with nested equal area grids (cell sizes of 200km)
* `01-infos/ne_110m_land`                                   : Shapefile of worldcoast from [http://www.naturalearthdata.com]
* `01-infos/equalarea_id_coords.tsv`                        : ID and left/right/top/bottom coordinates of each equal area into the shapefile grid_equalarea200km.
* `01-infos/marine_actinopterygii_species.txt`              : List of "actinopterygii" saltwater species according to [http://www.fishbase.org/]
* `01-infos/freshwater_behrman_worldcoast_data_object.RData`: R spatial object from "sp package" which is an equal area grid in Berhmann projection with worldcoast shape and presence/absence of freshwater fish species from [https://www.iucn.org/]
* `01-infos/marine_behrman_worldcoast_data_object.RData`    : ... of marine fish species.


## Scripts code sources
### 00-scripts/step1 : filter raw data
======================================

#### BASH scripts
* `filter_raw_data.sh`       : Keep only the CO1 sequences with lat/lon information
* `get_geo_coordinates.sh`   : Uses [http://www.geonames.org/] to find missing coordinates of individual sequences from their textual information of location.

#### PYTHON scripts
* `lat_long_DMS_DD_converter.py`     : Converts from DMS format to DD format the given coordinates.

### 00-scripts/step2 : georeferenced sequences alignments by species
====================================================================

#### BASH scripts
* `seq_alnt_filtered_data.sh`       : aligns sequences from the same species with MUSCLE and creates coordinates file for each sequence.
* `cluster_freshwater_vs_marine.sh` : according to a list of marine species, moves the fasta and coords files into marine, freshwater repertories.

#### PYTHON scripts
* `fasta_coords_files_species_generator.py` : extracts sequences and associated coordinates from the filtered data.

#### R scripts
* `equalareacoords.R`      : attributes at each sequence an ID of cell of the shapefile of worldmap equal area projection from its coordinates.
* `equalarea_worldcoast.R` : relou 

### 00-scripts/step3 : filter raw data
======================================

#### JULIA scripts
* `Lib_Compare_Pairwise.jl`       : functions to compute the Genetic Diversity value from a set of sequences.
* `Lib_Create_Master_Matrices.jl` : functions to create master data matrices that are used to compute genetic diversity.
* `master_matrices.jl`            : generates master data matrices from species sequences alignments.

### 00-scripts/step4 : Genetic Diversity calculation
====================================================

#### R scripts
* `gdval_by_site.R` : removes the cells that are not in the area of at least one marine/freshwater species according to IUCN shapefiles, creates the data files used to generate Figure 1

#### JULIA scripts
* `equalarea_numbers.jl`          : attributes mean genetic diversity at each equal area grid cell. Genetic diversity is calculated from master data matrices
* `metrics_by_area_and_species.jl`: generates files for statistical analysis at next step : mean genetic diversity per cell, genetic diversity per species per cell, number of individuals per species, number of species per cell, cell coordinates, cell ID...
* `Lib_GD_summary_functions.jl`   : functions to calculate genetic diversity at species level and cell level


### 00-scripts/step5 : Statistical analysis
===========================================

#### R scripts
* `model_area_GDval.R`
* `model_species-area_GDval.R`

## Instructions
### step1 : filter raw data
===========================
bash ./00-scripts/step1/filter_raw_data.sh

### step2 : georeferenced sequences alignments by species
=========================================================
bash ./00-scripts/step2/seq_alnt_filtered_data.sh 
mkdir ./06-species_alnt_cluster/total
mkdir ./06-species_alnt_cluster/freshwater
mkdir ./06-species_alnt_cluster/marine
bash ./00-scripts/step2/cluster_freshwater_vs_marine.sh 
Rscript ./00-scripts/step2/equalareacoords.R

### step3 : species sequence pairwise comparison
================================================
julia ./00-scripts/step3/master_matrices.jl

### step4 : Genetic Diversity calculation
=========================================
julia ./00-scripts/step4/equalarea_numbers.jl
bash ./00-scripts/step4/gdval_by_site.sh
julia ./00-scripts/step4/metrics_by_area_and_species.jl

### step5 : Statistical analysis
================================
Rscript ./00-scripts/step5/model_area_GDval.R
Rscript ./00-scripts/step5/model_species-area_GDval.R