# Source codes for the paper : "Genomic resources for Mediterranean fishes" #### Pierre-Edouard Guerin*, Katharina Fietz*, Elena Trofimenko*, Véronique Arnal, Montserrat Torres-Oliva, Stéphane Lobréaux, Angel Pérez-Ruzafa, Stephanie Manel, Oscar Puebla 2017-2019 Submited to Genomics, 2020 _______________________________________________________________________________ # Table of contents 1. [Nuclear Genomes assembly](#1-nuclear-genomes-assembly) 2. [RAD-seq data processing](#2-rad-seq-data-processing) 3. [SNPs statistics](#3-snps-statistics) 4. [Mitochondrial genomes assembly](#4-mitochondrial-genomes-assembly) # 1. Nuclear Genomes assembly Nuclear genomes were assembled using the Platanus assembler. Platanus was selected due to its excellent performance with highly heterozygous genomes. The paired-end libraries were used to assemble reads into contigs, and both the paired-end and mate-pair libraries were used for scaffolding and gap closing. ## Source codes All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/genome_assemblies_collection ## Clone repository ``` git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/genome_assemblies_collection.git ``` # 2. RAD-seq data processing RAD-seq sequences were demultiplexed and filtered using the process_radtags pipeline in STACKS v2.2. Sequences were trimmed to a final length of 139bp due to a drop in read quality towards the end of the read. Taking advantage of paired-end information, clone_filter was used to remove pairs of paired-end reads that match exactly, as the vast majority of these are expected to be PCR clones. Paired-end read sequences were subsequently aligned with BWA to the reference genomes of _M. surmuletus_ and _D. sargus_, and _S. cabrilla_, thereby improving the reliability of stacks building. Aligned reads were sorted using SAMTOOLS 1.9, and loci were built with gstacks providing genotype calls. ### Source codes All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2 ### Clone repository ``` git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2.git ``` # 3. SNPs statistics In order to retain only high quality biallelic SNPs for population genetics, called genotypes were further filtered with the populations pipeline and vcftools v0.1.16. Only one randomly selected SNP was retained per locus, and a locus was retained only if present in at least 85% of individuals, and with a minimum minor allele frequency (MAF) of 1%. In order to reduce linkage among markers, only one locus was retained for all pairs of loci that were closer than 5000 bp or that had an r2 value >0.8. Finally, individuals with >30% missing data were also filtered out. We calculated number of SNPs, distance between consecutive loci (in bp) and number of SNPs located on a coding region for each species ### Source codes All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/snps_statistics ### Clone repository ``` git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/snps_statistics.git ``` # 4. Mitochondrial genomes assembly Mitochondrial genomes were assembled and annotated using MitoZ. Five million sequences were randomly selected as a subset of the full paired-end sequence set. Mitochondrial sequences were then identified from this subset using a ranking method based on a Hidden Markov Model profile of known mitochondrial sequences from 2413 chordate species. Mitochondrial sequences were then used to assemble the mitochondrial genome. Finally, mitochondrial assemblies were annotated using BLAST family alignments on known protein coding genes, transfer RNA genes and rRNA genes. ### Source codes All source codes are available at https://gitlab.mbb.univ-montp2.fr/intrapop/assemble_mitogenome ### Clone repository ``` git clone https://gitlab.mbb.univ-montp2.fr/intrapop/assemble_mitogenome.git ```