Commit d88113a6 authored by peguerin's avatar peguerin

readme update

parent 50c0be0b
......@@ -20,23 +20,31 @@ _______________________________________________________________________________
# Table of contents
1. [Nuclear Genomes assembly](#1-nuclear-genomes-assembly)
2. [RAD-seq data processing](#2-rad-seq-data-processing)
3. [SNPs statistics](#3-snps-statistics)
4. [Mitochondrial genomes assembly](#4-mitochondrial-genomes-assembly)
## Nuclear Genomes assembly
# 1. Nuclear Genomes assembly
Nuclear genomes were assembled using the Platanus assembler. Platanus was selected due to its excellent performance with highly heterozygous genomes. The paired-end libraries were used to assemble reads into contigs, and both the paired-end and mate-pair libraries were used for scaffolding and gap closing.
### Source codes
## Source codes
All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenefit/genome_assemblies_collection
### Clone repository
## Clone repository
```
git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/genome_assemblies_collection.git
```
## RAD-seq data processing
# 2. RAD-seq data processing
RAD-seq sequences were demultiplexed and filtered using the process_radtags pipeline in STACKS v2.2. Sequences were trimmed to a final length of 139bp due to a drop in read quality towards the end of the read. Taking advantage of paired-end information, clone_filter was used to remove pairs of paired-end reads that match exactly, as the vast majority of these are expected to be PCR clones. Paired-end read sequences were subsequently aligned with BWA to the reference genomes of _M. surmuletus_ and _D. sargus_, and _S. cabrilla_, thereby improving the reliability of stacks building. Aligned reads were sorted using SAMTOOLS 1.9, and loci were built with gstacks providing genotype calls.
### Source codes
......@@ -50,7 +58,7 @@ All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenef
git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2.git
```
## SNPs statistics
# 3. SNPs statistics
In order to retain only high quality biallelic SNPs for population genetics, called genotypes were further filtered with the populations pipeline and vcftools v0.1.16. Only one randomly selected SNP was retained per locus, and a locus was retained only if present in at least 85% of individuals, and with a minimum minor allele frequency (MAF) of 1%. In order to reduce linkage among markers, only one locus was retained for all pairs of loci that were closer than 5000 bp or that had an r2 value >0.8. Finally, individuals with >30% missing data were also filtered out.
......@@ -67,7 +75,7 @@ All source codes are available at https://gitlab.mbb.univ-montp2.fr/reservebenef
git clone https://gitlab.mbb.univ-montp2.fr/reservebenefit/snps_statistics.git
```
## Mitochondrial genomes assembly
# 4. Mitochondrial genomes assembly
Mitochondrial genomes were assembled and annotated using MitoZ. Five million sequences were randomly selected as a subset of the full paired-end sequence set. Mitochondrial sequences were then identified from this subset using a ranking method based on a Hidden Markov Model profile of known mitochondrial sequences from 2413 chordate species. Mitochondrial sequences were then used to assemble the mitochondrial genome. Finally, mitochondrial assemblies were annotated using BLAST family alignments on known protein coding genes, transfer RNA genes and rRNA genes.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment