README.md 7.06 KB
Newer Older
peguerin's avatar
peguerin committed
1
Analyse scripts for "Global patterns of fish genetic diversity increase with “current” temperature"
peguerin's avatar
peguerin committed
2
3
4
================================================


peguerin's avatar
peguerin committed
5
Developed by [Pierre-Edouard Guerin](https://gitlab.com/peguerin)
peguerin's avatar
peguerin committed
6
7
8
2017
This folder contains all the scripts to reproduce all the analysis

peguerin's avatar
peguerin committed
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Table of contents

1. [Introduction](#1-introduction)
2. [Installation](#2-installation)
  1. [Prerequisite](#21-prerequisite)
  2. [Data Files](#22-data-files)
3. [Scripts Code Source](#3-scripts-code-source)

4. [Reporting bugs](#3-reporting-bugs)
5. [Running the pipeline](#4-running-the-pipeline)
  1. [Filter raw data](#51-filter-raw-data)
  2. [Georeferenced sequences alignments by species](#52-data-files)
  3. [Species sequence pairwise comparison](#53-species-sequence-pairwise-comparison)
  4. [Genetic Diversity calculation](#54-genetic-diversity-calculation)
  5. [Statistical analysis](#55-statistical-analysis)

# 1. Introduction

blablabla

# 2. Installation


## 2.1 Prerequisite
You must install the following softwares and packages :

- [JULIA Version 0.5.2](https://julialang.org/)
- [R Version 3.2.3](https://cran.r-project.org/)
    - [R-package]ggplot2 `install.packages("ggplot2")`
    - [R-package]rgeos `install.packages("rgeos")`
    - [R-package]rgdal please download rgdal at [https://cran.r-project.org/web/packages/rgdal/index.html] and then `install.packages("~/Downloads/rgdal_1.1-10.tar.gz")`
    - [R-package]nodiv `install.packages("nodiv")`
    - [R-package]raster `install.packages("raster")`
    - [R-package]lme4 `install.packages("lme4")`
    - [R-package]sp `install.packages("sp")`
    - [R-package]sjPlot `install.packages("sjPlot")`
    - [R-package]FactoMineR `install.packages("FactoMineR")`
    - [R-package]factoextra `install.packages("factoextra")`
    - [R-package]spdep `install.packages("spdep")`
    - [R-package]countrycode `install.packages("countrycode")`
- [Python Version 2.7.12](https://www.python.org/)
- [MUSCLE Version 3.8.31](https://www.drive5.com/muscle/downloads.html)

## 2.2 Data Files
peguerin's avatar
peguerin committed
53
54
55
The included data files are :

* `02-raw_data/seqbold_data.txt`                            : Georeferenced sequences of individuals from the supergroup "actinopterygii" have been downloaded from [http://www.boldsystems.org/index.php/Public_SearchTerms?taxon=&searchMenu=records&query=actinopterygii]
peguerin's avatar
peguerin committed
56
* `01-infos/grid_equalarea200km`                            : Shapefile of worldmap equal area projection epsg:4326 with nested equal area grids (cell sizes of 200km)
peguerin's avatar
peguerin committed
57
58
59
60
61
* `01-infos/ne_110m_land`                                   : Shapefile of worldcoast from [http://www.naturalearthdata.com]
* `01-infos/equalarea_id_coords.tsv`                        : ID and left/right/top/bottom coordinates of each equal area into the shapefile grid_equalarea200km.
* `01-infos/marine_actinopterygii_species.txt`              : List of "actinopterygii" saltwater species according to [http://www.fishbase.org/]
* `01-infos/freshwater_behrman_worldcoast_data_object.RData`: R spatial object from "sp package" which is an equal area grid in Berhmann projection with worldcoast shape and presence/absence of freshwater fish species from [https://www.iucn.org/]
* `01-infos/marine_behrman_worldcoast_data_object.RData`    : ... of marine fish species.
peguerin's avatar
peguerin committed
62
* IL EN MANQUE...
peguerin's avatar
peguerin committed
63

peguerin's avatar
peguerin committed
64
# 3. Scripts Code Source
peguerin's avatar
peguerin committed
65
## 3.1 `00-scripts/step1` : filter raw data
peguerin's avatar
peguerin committed
66

peguerin's avatar
peguerin committed
67
68
69
- BASH scripts
    * `filter_raw_data.sh`       : Keep only the CO1 sequences with lat/lon information
    * `get_geo_coordinates.sh`   : Uses [http://www.geonames.org/] to find missing coordinates of individual sequences from their textual information of location.
peguerin's avatar
peguerin committed
70

peguerin's avatar
peguerin committed
71
72
- PYTHON scripts
    * `lat_long_DMS_DD_converter.py`     : Converts from DMS format to DD format the given coordinates.
peguerin's avatar
peguerin committed
73

peguerin's avatar
peguerin committed
74
## 3.2 00-scripts/step2 : georeferenced sequences alignments by species
peguerin's avatar
peguerin committed
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113

#### BASH scripts
* `seq_alnt_filtered_data.sh`       : aligns sequences from the same species with MUSCLE and creates coordinates file for each sequence.
* `cluster_freshwater_vs_marine.sh` : according to a list of marine species, moves the fasta and coords files into marine, freshwater repertories.

#### PYTHON scripts
* `fasta_coords_files_species_generator.py` : extracts sequences and associated coordinates from the filtered data.

#### R scripts
* `equalareacoords.R`      : attributes at each sequence an ID of cell of the shapefile of worldmap equal area projection from its coordinates.
* `equalarea_worldcoast.R` : relou 

### 00-scripts/step3 : filter raw data
======================================

#### JULIA scripts
* `Lib_Compare_Pairwise.jl`       : functions to compute the Genetic Diversity value from a set of sequences.
* `Lib_Create_Master_Matrices.jl` : functions to create master data matrices that are used to compute genetic diversity.
* `master_matrices.jl`            : generates master data matrices from species sequences alignments.

### 00-scripts/step4 : Genetic Diversity calculation
====================================================

#### R scripts
* `gdval_by_site.R` : removes the cells that are not in the area of at least one marine/freshwater species according to IUCN shapefiles, creates the data files used to generate Figure 1

#### JULIA scripts
* `equalarea_numbers.jl`          : attributes mean genetic diversity at each equal area grid cell. Genetic diversity is calculated from master data matrices
* `metrics_by_area_and_species.jl`: generates files for statistical analysis at next step : mean genetic diversity per cell, genetic diversity per species per cell, number of individuals per species, number of species per cell, cell coordinates, cell ID...
* `Lib_GD_summary_functions.jl`   : functions to calculate genetic diversity at species level and cell level


### 00-scripts/step5 : Statistical analysis
===========================================

#### R scripts
* `model_area_GDval.R`
* `model_species-area_GDval.R`

peguerin's avatar
peguerin committed
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148

# 4. Reporting bugs

If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report.

I use [GitLab's issue system](https://gitlab.com/reservebenefit/worldmap_fish_genetic_diversity/issues)
as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc

# 5. Running the pipeline

## 5.1 Filter raw data
`bash ./00-scripts/step1/filter_raw_data.sh`

## 5.2 Georeferenced sequences alignments by species
`bash ./00-scripts/step2/seq_alnt_filtered_data.sh `
`mkdir ./06-species_alnt_cluster/total`
`mkdir ./06-species_alnt_cluster/freshwater`
`mkdir ./06-species_alnt_cluster/marine`
`bash ./00-scripts/step2/cluster_freshwater_vs_marine.sh`
`Rscript ./00-scripts/step2/equalareacoords.R`

## 5.3 Species sequence pairwise comparison
`julia ./00-scripts/step3/master_matrices.jl`

## 5.4 Genetic Diversity calculation
`julia ./00-scripts/step4/equalarea_numbers.jl`
`bash ./00-scripts/step4/gdval_by_site.sh`
`julia ./00-scripts/step4/metrics_by_area_and_species.jl`

## 5.5 Statistical analysis
`Rscript ./00-scripts/step5/model_area_GDval.R`
`Rscript ./00-scripts/step5/model_species-area_GDval.R`
peguerin's avatar
peguerin committed
149