Commit 1bd98186 authored by RomainFeron's avatar RomainFeron
Browse files

Quick readme update

parent d2c659db
[![GitHub release (latest by date)](https://img.shields.io/github/v/release/SexGenomicsToolkit/RADSex?color=lightorange)](https://github.com/SexGenomicsToolkit/RADSex/releases)
[![Conda (channel only)](https://img.shields.io/conda/vn/bioconda/radsex?color=lightorange)](https://bioconda.github.io/recipes/radsex/README.html)
[![DOI](https://zenodo.org/badge/86720601.svg)](https://zenodo.org/badge/latestdoi/86720601)
# RADSex
Current pre-release : 0.2.0
The RADSex pipeline is **currently under development** and has not been officially released yet.
Missing features are been implemented, some bugs are to be expected in this current development version, and parameters are subject to change.
Please contact me by email or on Github, or open an issue if you encounter bugs or would like to discuss a feature !
# radsex
## Overview
RADSex is a software package for the analysis of sex-determination using RAD-Sequencing data.
The `process` command generates a data structure summarizing a set of demultiplexed RAD reads,
and other commands use this data structure to:
The `radsex` software is part of RADSex, a computational workflow for the analysis of sex-determination using RAD-Sequencing data. This workflow contains the software `radsex` and the R package `sgtr`; a Snakemake implementation of the workflow is available [here](https://github.com/SexGenomicsToolkit/RADSex-workflow).
The first step of the RADSex workflow is to use `radsex` to generate a summary file for a set demultiplexed RAD reads and use this file to:
- infer the type of sex-determination system
- identify sex-biased markers
- align markers to a genome and identify genomic regions differentiated between sexes
- compute marker depth statistics
RADSex results can be visualized with the `radsex-vis` R package available here: https://github.com/RomainFeron/RADSex-vis.
Results from `radsex` can be visualized with the `sgtr` R package available [here](https://github.com/SexGenomicsToolkit/sgtr).
Although RADSex has been developed specifically to study sex-determination, it was designed to be flexibla and can be used to compare any two populations.
Although RADSex was developed specifically to study sex-determination, it was designed to be flexible and can be used to compare two groups for any binary trait.
This pipeline was developed in the [LPGP](https://www6.rennes.inra.fr/lpgp/) lab from INRA, Rennes, France for the PhyloSex project, which investigates sex determining factors in a wide range of fish species.
The RADSex computational workflow was developed in the [LPGP](https://www6.rennes.inra.fr/lpgp/) lab from INRA, Rennes, France for the PhyloSex project, which investigates sex determining factors in a wide range of fish species.
## Documentation
This README file includes a basic installation guide and a quick start section. The full documentation for RADSex, including a complete example walktrhough, is available [here](https://romainferon.github.io/RADSex/).
This README file includes a basic installation guide and a quick start section. The full documentation for RADSex, including a complete example walkthrough, is available [here](https://sexgenomicstoolkit.github.io/html/radsex/introduction.html).
## Installation
......@@ -38,26 +34,26 @@ This README file includes a basic installation guide and a quick start section.
### Install the latest official release
- Download the [latest release](https://github.com/RomainFeron/RadSex/releases)
- Download the [latest release](https://github.com/SexGenomicsToolkit/radsex/releases)
- Unzip the archive
- Navigate to the `RADSex` directory
- Navigate to the `radsex` directory
- Run `make`
The compiled `radsex` binary will be located in **RADSex/bin/**.
The compiled `radsex` binary will be located in **radsex/bin/**.
### Install the latest stable development version
```bash
git clone https://github.com/RomainFeron/RADSex.git
cd RADSex
git clone https://github.com/SexGenomicsToolkit/radsex.git
cd radsex
make
```
The compiled `radsex` binary will be located in **RADSex/bin/**.
The compiled `radsex` binary will be located in **radsex/bin/**.
### Install RADSex with Conda
### Install radsex with Conda
RADSex is available in [Bioconda](https://bioconda.github.io/recipes/radsex/README.html?#recipe-Recipe%20'radsex'). To install RADSex with Conda, run the following command:
All versions of radsex are available in [Bioconda](https://bioconda.github.io/recipes/radsex/README.html?#recipe-Recipe%20'radsex'). To install the latest radsex release with Conda, run the following command:
```bash
conda install -c bioconda radsex
......@@ -67,18 +63,16 @@ conda install -c bioconda radsex
### Preparing the data
Before running the pipeline, you should prepare the following elements:
Before running the workflow, you should prepare the following elements:
- A **set of demultiplexed reads**. The current version of RADSex does not implement demultiplexing;
raw sequencing reads can be demultiplexed using [Stacks](http://catchenlab.life.illinois.edu/stacks/comp/process_radtags.php)
or [pyRAD](http://nbviewer.jupyter.org/gist/dereneaton/af9548ea0e94bff99aa0/pyRAD_v.3.0.ipynb#The-seven-steps-described).
- A **population map** (popmap): a tabulated file with individual IDs in the first column and sex (or group) in the second column. Individual IDs in the popmap must be the same as the names of the demultiplexed reads files (*e.g.* 'individual1' for the reads file 'individual1.fq.gz')
- A **group info file** (popmap): a tabulated file with individual IDs in the first column and sex (or group) in the second column. Individual IDs in the popmap must be the same as the names of the demultiplexed reads files (*e.g.* 'individual_1' for the reads file 'individual_1.fq.gz')
- To align the markers to a genome: the **genome** sequence in a FASTA file.
Note that when visualizing `map` results with `radsex-vis`, linkage groups / chromosomes are automatically inferred from scaffold names in the reference sequence
if their name starts with *LG*, *CHR*, or *NC* (case unsensitive).
If chromosomes are named differently in the reference genome, you can use a tabulated file with contig ID in the first column and corresponding chromosome name in the second column (see the doc for details)
Note that when visualizing `map` results with `sgtr`, linkage groups / chromosomes are automatically inferred from scaffold names in the genome if their name starts with *LG*, *CHR*, or *NC* (case unsensitive). If chromosomes are named differently in the genome, you can use a tabulated file with contig ID in the first column and corresponding chromosome name in the second column (see the doc for details).
### Computing the marker depths table
......@@ -102,8 +96,7 @@ radsex distrib --markers-table markers_table.tsv --output-file distribution.tsv
In this example, `--markers-table` is the table generated with `process` and the distribution of markers between males and females will be saved to **distribution.tsv**. The sex of each individual in the population is given by **popmap.tsv**. Groups of individuals to compare (as defined in the popmap) are specified manually with the parameter `--groups`. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
The resulting distribution can be visualized with the `plot_sex_distribution()` function of [RADSex-vis](https://github.com/RomainFeron/RADSex-vis), which generates a tile plot of marker counts with number of males on the x-axis and number of females on the y-axis.
The resulting distribution can be visualized with the `radsex_distrib()` function of [sgtr](https://github.com/SexGenomicsToolkit/sgtr), which generates a tile plot of marker counts with number of males on the x-axis and number of females on the y-axis.
### Extracting markers significantly associated with sex
......@@ -113,11 +106,11 @@ Markers significantly associated with sex are obtained with the `signif` command
radsex signif --markers-table markers_table.tsv --output-file markers.tsv --popmap popmap.tsv --min-depth 5 --groups M,F [ --output-fasta ]
```
In this example, `--markers-table` is the table generated with `process` and markers significantly associated with sex are saved to **markers.tsv**. The sex of each individual in the population is given by **popmap.tsv** (see the :ref:`population-map` section). Groups of individuals to compare (as defined in popmap) are specified manually with the parameter `--groups`. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
In this example, `--markers-table` is the table generated with `process` and markers significantly associated with sex are saved to **markers.tsv**. The sex of each individual in the population is given by **popmap.tsv**. Groups of individuals to compare (as defined in popmap) are specified manually with the parameter `--groups`. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
By default, the `signif` function generates an output file in the same format as the markers depth table. Markers can also be exported to a fasta file using the parameter `--output-fasta`.
The markers table generated by `signif` can be visualized with the `plot_depth()` function of [RADSex-vis](https://github.com/RomainFeron/RADSex-vis), which generates a heatmap showing the depth of each marker in each individual.
The markers table generated by `signif` can be visualized with the `radsex_markers_depth()` function of [sgtr](https://github.com/SexGenomicsToolkit/sgtr), which generates a heatmap showing the depth of each marker in each individual.
### Aligning markers to a genome
......@@ -132,9 +125,9 @@ In this example, `--markers-file` is the markers depth table generated with `pro
The parameter `--min-quality` specifies the minimum mapping quality (as defined in [BWA](http://bio-bwa.sourceforge.net/bwa.shtml)) to consider a marker properly aligned and is set to 20 in this example. The parameter `--min-frequency` specifies the minimum frequency of a marker in the population to retain this marker and is set to 0.1 here, meaning that only sequences present in at least 10% of individuals of the population are aligned to the genome.
Alignment results from `map` can be visualized with the `plot_genome()` function of [RADSex-vis](https://github.com/RomainFeron/RADSex-vis), which generates a circular plot showing bias and association with sex for each marker aligned to the genome.
Alignment results from `map` can be visualized with the `radsex_map_circos()` function of [sgtr](https://github.com/SexGenomicsToolkit/sgtr), which generates a circular plot showing bias and association with sex for each marker aligned to the genome.
Alignment results for a specific contig can be visualized with the `plot_contig()` function to show the same metrics for a single contig.
Alignment results for a specific contig can be visualized with the `radsex_map_region()` function to show the same metrics for a single contig.
## LICENSE
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment