README.md 4.01 KB
Newer Older
peguerin's avatar
peguerin committed
1
2
<a href="https://www.biodiversa.org/1023"><img align="right" width="100" height="100" src="reservebenefit.jpg"></a>

peguerin's avatar
peguerin committed
3
4
5



peguerin's avatar
peguerin committed
6
7
8
9
10
# Source codes for the paper : "Genomic resources for Mediterranean fishes"




peguerin's avatar
peguerin committed
11
#### Pierre-Edouard Guerin*, Katharina Fietz*, Elena Trofimenko*, Véronique Arnal, Montserrat Torres-Oliva, Stéphane Lobréaux, Angel Pérez-Ruzafa, Stephanie Manel, Oscar Puebla
peguerin's avatar
peguerin committed
12

peguerin's avatar
peguerin committed
13
2017-2019
peguerin's avatar
peguerin committed
14

peguerin's avatar
peguerin committed
15
Submited to Genomics, 2020
peguerin's avatar
peguerin committed
16
17
18
19
20
21
22


_______________________________________________________________________________




peguerin's avatar
peguerin committed
23
# Table of contents
peguerin's avatar
peguerin committed
24

peguerin's avatar
peguerin committed
25
26
27
28
1. [Nuclear Genomes assembly](#1-nuclear-genomes-assembly)
2. [RAD-seq data processing](#2-rad-seq-data-processing)
3. [SNPs statistics](#3-snps-statistics)
4. [Mitochondrial genomes assembly](#4-mitochondrial-genomes-assembly)
peguerin's avatar
peguerin committed
29

peguerin's avatar
peguerin committed
30
31
32
33



# 1. Nuclear Genomes assembly
peguerin's avatar
peguerin committed
34
Nuclear genomes were assembled using the Platanus assembler. Platanus was selected due to its excellent performance with highly heterozygous genomes. The paired-end libraries were used to assemble reads into contigs, and both the paired-end and mate-pair libraries were used for scaffolding and gap closing. 
peguerin's avatar
peguerin committed
35

peguerin's avatar
peguerin committed
36
## Source codes 
peguerin's avatar
peguerin committed
37
38
39

All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/genome_assemblies_collection

peguerin's avatar
peguerin committed
40
## Clone repository
peguerin's avatar
peguerin committed
41
42
43
44
45
46

```
git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/genome_assemblies_collection.git
```


peguerin's avatar
peguerin committed
47
# 2. RAD-seq data processing
peguerin's avatar
peguerin committed
48
RAD-seq sequences were demultiplexed and filtered using the process_radtags pipeline in STACKS v2.2. Sequences were trimmed to a final length of 139bp due to a drop in read quality towards the end of the read. Taking advantage of paired-end information, clone_filter was used to remove pairs of paired-end reads that match exactly, as the vast majority of these are expected to be PCR clones. Paired-end read sequences were subsequently aligned with BWA to the reference genomes of _M. surmuletus_ and _D. sargus_, and _S. cabrilla_, thereby improving the reliability of stacks building. Aligned reads were sorted using SAMTOOLS 1.9, and loci were built with gstacks providing genotype calls.
peguerin's avatar
peguerin committed
49
50
51
52
53
54
55
56
57
58
59
60

### Source codes 

All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2


### Clone repository

```
git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2.git
```

peguerin's avatar
peguerin committed
61
# 3. SNPs statistics
peguerin's avatar
peguerin committed
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

In order to retain only high quality biallelic SNPs for population genetics, called genotypes were further filtered with the populations pipeline and vcftools v0.1.16. Only one randomly selected SNP was retained per locus, and a locus was retained only if present in at least 85% of individuals, and with a minimum minor allele frequency (MAF) of 1%. In order to reduce linkage among markers, only one locus was retained for all pairs of loci that were closer than 5000 bp or that had an r2 value >0.8. Finally, individuals with >30% missing data were also filtered out. 

We calculated number of SNPs, distance between consecutive loci (in bp) and number of SNPs located on a coding region for each species

### Source codes 

All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/snps_statistics


### Clone repository

```
git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/snps_statistics.git
```

peguerin's avatar
peguerin committed
78
# 4. Mitochondrial genomes assembly
peguerin's avatar
peguerin committed
79
80
81
82
83
84
85
86
87
88
89
90
91
92
Mitochondrial genomes were assembled  and annotated using MitoZ. Five million sequences were randomly selected as a subset of the full paired-end sequence set. Mitochondrial sequences were then identified from this subset using a ranking method based on a Hidden Markov Model profile of known mitochondrial sequences from 2413 chordate species. Mitochondrial sequences were then used to assemble the mitochondrial genome. Finally, mitochondrial assemblies were annotated using BLAST family alignments on known protein coding genes, transfer RNA genes and rRNA genes.


### Source codes 

All source codes are available at https://gitlab.mbb.univ-montp2.fr/intrapop/assemble_mitogenome

### Clone repository

```
git clone https://gitlab.mbb.univ-montp2.fr/intrapop/assemble_mitogenome.git
```


peguerin's avatar
peguerin committed
93