Commit 5905dce4 authored by peguerin's avatar peguerin
Browse files

Update README.md

parent b453f3e1
Analyse scripts for map marine genetic diversity Analyse scripts for "Global patterns of fish genetic diversity increase with “current” temperature"
================================================ ================================================
Developed by [Pierre-Edouard Guerin](https://gitlab.com/peguerin)
Developed by [Pierre-Edouard Guerin](https://gitlab.com/peguerin)
2017 2017
This folder contains all the scripts to reproduce all the analysis This folder contains all the scripts to reproduce all the analysis
## Prerequisite # Table of contents
You must install the following softwares :
1. [Introduction](#1-introduction)
### JULIA Version 0.5.2 2. [Installation](#2-installation)
[https://julialang.org/] 1. [Prerequisite](#21-prerequisite)
### R Version 3.2.3 2. [Data Files](#22-data-files)
https://cran.r-project.org/ 3. [Scripts Code Source](#3-scripts-code-source)
install.packages("ggplot2")
install.packages("rgeos") 4. [Reporting bugs](#3-reporting-bugs)
please download rgdal at [https://cran.r-project.org/web/packages/rgdal/index.html] 5. [Running the pipeline](#4-running-the-pipeline)
install.packages("~/Downloads/rgdal_1.1-10.tar.gz") 1. [Filter raw data](#51-filter-raw-data)
install.packages("nodiv") 2. [Georeferenced sequences alignments by species](#52-data-files)
install.packages("Cairo") 3. [Species sequence pairwise comparison](#53-species-sequence-pairwise-comparison)
### Python Version 2.7.12 4. [Genetic Diversity calculation](#54-genetic-diversity-calculation)
[https://www.python.org/] 5. [Statistical analysis](#55-statistical-analysis)
### MUSCLE Version 3.8.31
[https://www.drive5.com/muscle/downloads.html] # 1. Introduction
blablabla
## Data Files
# 2. Installation
## 2.1 Prerequisite
You must install the following softwares and packages :
- [JULIA Version 0.5.2](https://julialang.org/)
- [R Version 3.2.3](https://cran.r-project.org/)
- [R-package]ggplot2 `install.packages("ggplot2")`
- [R-package]rgeos `install.packages("rgeos")`
- [R-package]rgdal please download rgdal at [https://cran.r-project.org/web/packages/rgdal/index.html] and then `install.packages("~/Downloads/rgdal_1.1-10.tar.gz")`
- [R-package]nodiv `install.packages("nodiv")`
- [R-package]raster `install.packages("raster")`
- [R-package]lme4 `install.packages("lme4")`
- [R-package]sp `install.packages("sp")`
- [R-package]sjPlot `install.packages("sjPlot")`
- [R-package]FactoMineR `install.packages("FactoMineR")`
- [R-package]factoextra `install.packages("factoextra")`
- [R-package]spdep `install.packages("spdep")`
- [R-package]countrycode `install.packages("countrycode")`
- [Python Version 2.7.12](https://www.python.org/)
- [MUSCLE Version 3.8.31](https://www.drive5.com/muscle/downloads.html)
## 2.2 Data Files
The included data files are : The included data files are :
* `02-raw_data/seqbold_data.txt` : Georeferenced sequences of individuals from the supergroup "actinopterygii" have been downloaded from [http://www.boldsystems.org/index.php/Public_SearchTerms?taxon=&searchMenu=records&query=actinopterygii] * `02-raw_data/seqbold_data.txt` : Georeferenced sequences of individuals from the supergroup "actinopterygii" have been downloaded from [http://www.boldsystems.org/index.php/Public_SearchTerms?taxon=&searchMenu=records&query=actinopterygii]
* `01-infos/grid_equalarea200km` : Shapefile of worldmap equal area projection epsg:4326 with nested equal area grids (cell sizes of 200km) * `01-infos/grid_equalarea200km` : Shapefile of worldmap equal area projection epsg:4326 with nested equal area grids (cell sizes of 200km)
* `01-infos/ne_110m_land` : Shapefile of worldcoast from [http://www.naturalearthdata.com] * `01-infos/ne_110m_land` : Shapefile of worldcoast from [http://www.naturalearthdata.com]
* `01-infos/equalarea_id_coords.tsv` : ID and left/right/top/bottom coordinates of each equal area into the shapefile grid_equalarea200km. * `01-infos/equalarea_id_coords.tsv` : ID and left/right/top/bottom coordinates of each equal area into the shapefile grid_equalarea200km.
* `01-infos/marine_actinopterygii_species.txt` : List of "actinopterygii" saltwater species according to [http://www.fishbase.org/] * `01-infos/marine_actinopterygii_species.txt` : List of "actinopterygii" saltwater species according to [http://www.fishbase.org/]
* `01-infos/freshwater_behrman_worldcoast_data_object.RData`: R spatial object from "sp package" which is an equal area grid in Berhmann projection with worldcoast shape and presence/absence of freshwater fish species from [https://www.iucn.org/] * `01-infos/freshwater_behrman_worldcoast_data_object.RData`: R spatial object from "sp package" which is an equal area grid in Berhmann projection with worldcoast shape and presence/absence of freshwater fish species from [https://www.iucn.org/]
* `01-infos/marine_behrman_worldcoast_data_object.RData` : ... of marine fish species. * `01-infos/marine_behrman_worldcoast_data_object.RData` : ... of marine fish species.
* IL EN MANQUE...
# 3. Scripts Code Source
## 3.1 00-scripts/step1 : filter raw data
## Scripts code sources ### 3.1.1 BASH scripts
### 00-scripts/step1 : filter raw data
======================================
#### BASH scripts
* `filter_raw_data.sh` : Keep only the CO1 sequences with lat/lon information * `filter_raw_data.sh` : Keep only the CO1 sequences with lat/lon information
* `get_geo_coordinates.sh` : Uses [http://www.geonames.org/] to find missing coordinates of individual sequences from their textual information of location. * `get_geo_coordinates.sh` : Uses [http://www.geonames.org/] to find missing coordinates of individual sequences from their textual information of location.
#### PYTHON scripts ### 3.1.2 PYTHON scripts
* `lat_long_DMS_DD_converter.py` : Converts from DMS format to DD format the given coordinates. * `lat_long_DMS_DD_converter.py` : Converts from DMS format to DD format the given coordinates.
### 00-scripts/step2 : georeferenced sequences alignments by species ## 3.2 00-scripts/step2 : georeferenced sequences alignments by species
====================================================================
#### BASH scripts #### BASH scripts
* `seq_alnt_filtered_data.sh` : aligns sequences from the same species with MUSCLE and creates coordinates file for each sequence. * `seq_alnt_filtered_data.sh` : aligns sequences from the same species with MUSCLE and creates coordinates file for each sequence.
...@@ -90,32 +111,39 @@ The included data files are : ...@@ -90,32 +111,39 @@ The included data files are :
* `model_area_GDval.R` * `model_area_GDval.R`
* `model_species-area_GDval.R` * `model_species-area_GDval.R`
## Instructions
### step1 : filter raw data # 4. Reporting bugs
===========================
bash ./00-scripts/step1/filter_raw_data.sh If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
### step2 : georeferenced sequences alignments by species of the original data, then by all means submit a bug report.
=========================================================
bash ./00-scripts/step2/seq_alnt_filtered_data.sh I use [GitLab's issue system](https://gitlab.com/reservebenefit/worldmap_fish_genetic_diversity/issues)
mkdir ./06-species_alnt_cluster/total as my bug database. You can submit your bug reports there. Please be as
mkdir ./06-species_alnt_cluster/freshwater verbose as possible — e.g. include the command line, etc
mkdir ./06-species_alnt_cluster/marine
bash ./00-scripts/step2/cluster_freshwater_vs_marine.sh # 5. Running the pipeline
Rscript ./00-scripts/step2/equalareacoords.R
## 5.1 Filter raw data
### step3 : species sequence pairwise comparison `bash ./00-scripts/step1/filter_raw_data.sh`
================================================
julia ./00-scripts/step3/master_matrices.jl ## 5.2 Georeferenced sequences alignments by species
`bash ./00-scripts/step2/seq_alnt_filtered_data.sh `
### step4 : Genetic Diversity calculation `mkdir ./06-species_alnt_cluster/total`
========================================= `mkdir ./06-species_alnt_cluster/freshwater`
julia ./00-scripts/step4/equalarea_numbers.jl `mkdir ./06-species_alnt_cluster/marine`
bash ./00-scripts/step4/gdval_by_site.sh `bash ./00-scripts/step2/cluster_freshwater_vs_marine.sh`
julia ./00-scripts/step4/metrics_by_area_and_species.jl `Rscript ./00-scripts/step2/equalareacoords.R`
### step5 : Statistical analysis ## 5.3 Species sequence pairwise comparison
================================ `julia ./00-scripts/step3/master_matrices.jl`
Rscript ./00-scripts/step5/model_area_GDval.R
Rscript ./00-scripts/step5/model_species-area_GDval.R ## 5.4 Genetic Diversity calculation
`julia ./00-scripts/step4/equalarea_numbers.jl`
`bash ./00-scripts/step4/gdval_by_site.sh`
`julia ./00-scripts/step4/metrics_by_area_and_species.jl`
## 5.5 Statistical analysis
`Rscript ./00-scripts/step5/model_area_GDval.R`
`Rscript ./00-scripts/step5/model_species-area_GDval.R`
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment