Commit 5905dce4 authored by peguerin's avatar peguerin
Browse files

Update README.md

parent b453f3e1
Analyse scripts for map marine genetic diversity
Analyse scripts for "Global patterns of fish genetic diversity increase with “current” temperature"
================================================
Developed by [Pierre-Edouard Guerin](https://gitlab.com/peguerin)
Developed by [Pierre-Edouard Guerin](https://gitlab.com/peguerin)
2017
This folder contains all the scripts to reproduce all the analysis
## Prerequisite
You must install the following softwares :
### JULIA Version 0.5.2
[https://julialang.org/]
### R Version 3.2.3
https://cran.r-project.org/
install.packages("ggplot2")
install.packages("rgeos")
please download rgdal at [https://cran.r-project.org/web/packages/rgdal/index.html]
install.packages("~/Downloads/rgdal_1.1-10.tar.gz")
install.packages("nodiv")
install.packages("Cairo")
### Python Version 2.7.12
[https://www.python.org/]
### MUSCLE Version 3.8.31
[https://www.drive5.com/muscle/downloads.html]
## Data Files
# Table of contents
1. [Introduction](#1-introduction)
2. [Installation](#2-installation)
1. [Prerequisite](#21-prerequisite)
2. [Data Files](#22-data-files)
3. [Scripts Code Source](#3-scripts-code-source)
4. [Reporting bugs](#3-reporting-bugs)
5. [Running the pipeline](#4-running-the-pipeline)
1. [Filter raw data](#51-filter-raw-data)
2. [Georeferenced sequences alignments by species](#52-data-files)
3. [Species sequence pairwise comparison](#53-species-sequence-pairwise-comparison)
4. [Genetic Diversity calculation](#54-genetic-diversity-calculation)
5. [Statistical analysis](#55-statistical-analysis)
# 1. Introduction
blablabla
# 2. Installation
## 2.1 Prerequisite
You must install the following softwares and packages :
- [JULIA Version 0.5.2](https://julialang.org/)
- [R Version 3.2.3](https://cran.r-project.org/)
- [R-package]ggplot2 `install.packages("ggplot2")`
- [R-package]rgeos `install.packages("rgeos")`
- [R-package]rgdal please download rgdal at [https://cran.r-project.org/web/packages/rgdal/index.html] and then `install.packages("~/Downloads/rgdal_1.1-10.tar.gz")`
- [R-package]nodiv `install.packages("nodiv")`
- [R-package]raster `install.packages("raster")`
- [R-package]lme4 `install.packages("lme4")`
- [R-package]sp `install.packages("sp")`
- [R-package]sjPlot `install.packages("sjPlot")`
- [R-package]FactoMineR `install.packages("FactoMineR")`
- [R-package]factoextra `install.packages("factoextra")`
- [R-package]spdep `install.packages("spdep")`
- [R-package]countrycode `install.packages("countrycode")`
- [Python Version 2.7.12](https://www.python.org/)
- [MUSCLE Version 3.8.31](https://www.drive5.com/muscle/downloads.html)
## 2.2 Data Files
The included data files are :
* `02-raw_data/seqbold_data.txt` : Georeferenced sequences of individuals from the supergroup "actinopterygii" have been downloaded from [http://www.boldsystems.org/index.php/Public_SearchTerms?taxon=&searchMenu=records&query=actinopterygii]
* `01-infos/grid_equalarea200km` : Shapefile of worldmap equal area projection epsg:4326 with nested equal area grids (cell sizes of 200km)
* `01-infos/grid_equalarea200km` : Shapefile of worldmap equal area projection epsg:4326 with nested equal area grids (cell sizes of 200km)
* `01-infos/ne_110m_land` : Shapefile of worldcoast from [http://www.naturalearthdata.com]
* `01-infos/equalarea_id_coords.tsv` : ID and left/right/top/bottom coordinates of each equal area into the shapefile grid_equalarea200km.
* `01-infos/marine_actinopterygii_species.txt` : List of "actinopterygii" saltwater species according to [http://www.fishbase.org/]
* `01-infos/freshwater_behrman_worldcoast_data_object.RData`: R spatial object from "sp package" which is an equal area grid in Berhmann projection with worldcoast shape and presence/absence of freshwater fish species from [https://www.iucn.org/]
* `01-infos/marine_behrman_worldcoast_data_object.RData` : ... of marine fish species.
* IL EN MANQUE...
# 3. Scripts Code Source
## 3.1 00-scripts/step1 : filter raw data
## Scripts code sources
### 00-scripts/step1 : filter raw data
======================================
#### BASH scripts
### 3.1.1 BASH scripts
* `filter_raw_data.sh` : Keep only the CO1 sequences with lat/lon information
* `get_geo_coordinates.sh` : Uses [http://www.geonames.org/] to find missing coordinates of individual sequences from their textual information of location.
#### PYTHON scripts
### 3.1.2 PYTHON scripts
* `lat_long_DMS_DD_converter.py` : Converts from DMS format to DD format the given coordinates.
### 00-scripts/step2 : georeferenced sequences alignments by species
====================================================================
## 3.2 00-scripts/step2 : georeferenced sequences alignments by species
#### BASH scripts
* `seq_alnt_filtered_data.sh` : aligns sequences from the same species with MUSCLE and creates coordinates file for each sequence.
......@@ -90,32 +111,39 @@ The included data files are :
* `model_area_GDval.R`
* `model_species-area_GDval.R`
## Instructions
### step1 : filter raw data
===========================
bash ./00-scripts/step1/filter_raw_data.sh
### step2 : georeferenced sequences alignments by species
=========================================================
bash ./00-scripts/step2/seq_alnt_filtered_data.sh
mkdir ./06-species_alnt_cluster/total
mkdir ./06-species_alnt_cluster/freshwater
mkdir ./06-species_alnt_cluster/marine
bash ./00-scripts/step2/cluster_freshwater_vs_marine.sh
Rscript ./00-scripts/step2/equalareacoords.R
### step3 : species sequence pairwise comparison
================================================
julia ./00-scripts/step3/master_matrices.jl
### step4 : Genetic Diversity calculation
=========================================
julia ./00-scripts/step4/equalarea_numbers.jl
bash ./00-scripts/step4/gdval_by_site.sh
julia ./00-scripts/step4/metrics_by_area_and_species.jl
### step5 : Statistical analysis
================================
Rscript ./00-scripts/step5/model_area_GDval.R
Rscript ./00-scripts/step5/model_species-area_GDval.R
# 4. Reporting bugs
If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report.
I use [GitLab's issue system](https://gitlab.com/reservebenefit/worldmap_fish_genetic_diversity/issues)
as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc
# 5. Running the pipeline
## 5.1 Filter raw data
`bash ./00-scripts/step1/filter_raw_data.sh`
## 5.2 Georeferenced sequences alignments by species
`bash ./00-scripts/step2/seq_alnt_filtered_data.sh `
`mkdir ./06-species_alnt_cluster/total`
`mkdir ./06-species_alnt_cluster/freshwater`
`mkdir ./06-species_alnt_cluster/marine`
`bash ./00-scripts/step2/cluster_freshwater_vs_marine.sh`
`Rscript ./00-scripts/step2/equalareacoords.R`
## 5.3 Species sequence pairwise comparison
`julia ./00-scripts/step3/master_matrices.jl`
## 5.4 Genetic Diversity calculation
`julia ./00-scripts/step4/equalarea_numbers.jl`
`bash ./00-scripts/step4/gdval_by_site.sh`
`julia ./00-scripts/step4/metrics_by_area_and_species.jl`
## 5.5 Statistical analysis
`Rscript ./00-scripts/step5/model_area_GDval.R`
`Rscript ./00-scripts/step5/model_species-area_GDval.R`
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment