README.md 9.22 KB
Newer Older
1
2
[![DOI](https://zenodo.org/badge/86720601.svg)](https://zenodo.org/badge/latestdoi/86720601)

3
4
# RADSex

5
6
Current pre-release : 0.2.0

Romain Feron's avatar
Romain Feron committed
7
8
The RADSex pipeline is **currently under development** and has not been officially released yet.
Missing features are been implemented, some bugs are to be expected in this current development version, and parameters are subject to change.
9
Please contact me by email or on Github, or open an issue if you encounter bugs or would like to discuss a feature !
10

Romain Feron's avatar
Romain Feron committed
11
## Overview
12

Romain Feron's avatar
Romain Feron committed
13
RADSex is a software package for the analysis of sex-determination using RAD-Sequencing data.
RomainFeron's avatar
RomainFeron committed
14
15
16
17
18
19
20
21
22
23
24
The `process` command generates a data structure summarizing a set of demultiplexed RAD reads,
and other commands use this data structure to:

- infer the type of sex-determination system
- identify sex-biased markers
- align markers to a genome and identify genomic regions differentiated between sexes
- compute marker depth statistics

RADSex results can be visualized with the `radsex-vis` R package available here: https://github.com/RomainFeron/RADSex-vis.

Although RADSex has been developed specifically to study sex-determination, it was designed to be flexibla and can be used to compare any two populations.
25

Romain Feron's avatar
Romain Feron committed
26
This pipeline was developed in the [LPGP](https://www6.rennes.inra.fr/lpgp/) lab from INRA, Rennes, France for the PhyloSex project, which investigates sex determining factors in a wide range of fish species.
Romain Feron's avatar
Romain Feron committed
27

28
## Documentation
Romain Feron's avatar
Romain Feron committed
29

RomainFeron's avatar
RomainFeron committed
30
This README file includes a basic installation guide and a quick start section. The full documentation for RADSex, including a complete example walktrhough, is available [here](https://romainferon.github.io/RADSex/).
Romain Feron's avatar
Romain Feron committed
31

RomainFeron's avatar
RomainFeron committed
32
## Installation
Romain Feron's avatar
Romain Feron committed
33

RomainFeron's avatar
RomainFeron committed
34
### Requirements
35

36
- A C++11 compliant compiler (GCC >= 4.8.1, Clang >= 3.3)
RomainFeron's avatar
RomainFeron committed
37
- The zlib library (usually installed on linux by default)
Romain Feron's avatar
Romain Feron committed
38

RomainFeron's avatar
RomainFeron committed
39
### Install the latest official release
40

RomainFeron's avatar
RomainFeron committed
41
42
43
- Download the [latest release](https://github.com/RomainFeron/RadSex/releases)
- Unzip the archive
- Navigate to the `RADSex` directory
44
- Run `make`
RomainFeron's avatar
RomainFeron committed
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

The compiled `radsex` binary will be located in **RADSex/bin/**.

### Install the latest stable development version

```bash
git clone https://github.com/RomainFeron/RADSex.git
cd RADSex
make
```

The compiled `radsex` binary will be located in **RADSex/bin/**.

### Install RADSex with Conda

RADSex is available in [Bioconda](https://bioconda.github.io/recipes/radsex/README.html?#recipe-Recipe%20'radsex'). To install RADSex with Conda, run the following command:

```bash
conda install -c bioconda radsex
```
65

66
67
## Quick start

RomainFeron's avatar
RomainFeron committed
68
### Preparing the data
69
70

Before running the pipeline, you should prepare the following elements:
RomainFeron's avatar
RomainFeron committed
71

72
73
74
- A **set of demultiplexed reads**. The current version of RADSex does not implement demultiplexing;
  raw sequencing reads can be demultiplexed using [Stacks](http://catchenlab.life.illinois.edu/stacks/comp/process_radtags.php)
  or [pyRAD](http://nbviewer.jupyter.org/gist/dereneaton/af9548ea0e94bff99aa0/pyRAD_v.3.0.ipynb#The-seven-steps-described).
75

RomainFeron's avatar
RomainFeron committed
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
- A **population map** (popmap): a tabulated file with individual IDs in the first column and sex (or group) in the second column. Individual IDs in the popmap must be the same as the names of the demultiplexed reads files (*e.g.* 'individual1' for the reads file 'individual1.fq.gz')

- To align the markers to a genome: the **genome** sequence in a FASTA file.
  Note that when visualizing `map` results with `radsex-vis`, linkage groups / chromosomes are automatically inferred from scaffold names in the reference sequence
  if their name starts with *LG*, *CHR*, or *NC* (case unsensitive).
  If chromosomes are named differently in the reference genome, you can use a tabulated file with contig ID in the first column and corresponding chromosome name in the second column (see the doc for details)

### Computing the marker depths table

The first step of RADSex is to create a table of marker depths for the entire dataset using the `process` command:

```bash
radsex process --input-dir ./samples --output-file markers_table.tsv --threads 16 --min-depth 1
```

In this example, demultiplexed reads are located in **./samples** and the markers table generated by `process` will be saved to **markers_table.tsv**. The parameter `--threads` specifies the number of threads to use, and `--min-depth` specifies the minimum depth to consider a marker present in an individual: markers which are not present with depth higher than this value in at least one individual will not be retained in the markers table.
It is advised to keep the minimum depth to the default value of 1 for this step, as it can be adjusted for each analysis later.


### Computing the distribution of markers between sexes

The `distrib` command computes the distribution of markers between males and females from a marker depths table:
98

RomainFeron's avatar
RomainFeron committed
99
100
101
```bash
radsex distrib --markers-table markers_table.tsv --output-file distribution.tsv --popmap popmap.tsv --min-depth 5 --groups M,F
```
102

RomainFeron's avatar
RomainFeron committed
103
In this example, `--markers-table` is the table generated with `process` and the distribution of markers between males and females will be saved to **distribution.tsv**. The sex of each individual in the population is given by **popmap.tsv**. Groups of individuals to compare (as defined in the popmap) are specified manually with the parameter `--groups`. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
104

RomainFeron's avatar
RomainFeron committed
105
The resulting distribution can be visualized with the `plot_sex_distribution()` function of [RADSex-vis](https://github.com/RomainFeron/RADSex-vis), which generates a tile plot of marker counts with number of males on the x-axis and number of females on the y-axis.
106
107


RomainFeron's avatar
RomainFeron committed
108
### Extracting markers significantly associated with sex
109

RomainFeron's avatar
RomainFeron committed
110
Markers significantly associated with sex are obtained with the `signif` command:
111

RomainFeron's avatar
RomainFeron committed
112
113
114
```bash
radsex signif --markers-table markers_table.tsv --output-file markers.tsv --popmap popmap.tsv --min-depth 5 --groups M,F [ --output-fasta ]
```
115

RomainFeron's avatar
RomainFeron committed
116
In this example, `--markers-table` is the table generated with `process` and markers significantly associated with sex are saved to **markers.tsv**. The sex of each individual in the population is given by **popmap.tsv** (see the :ref:`population-map` section). Groups of individuals to compare (as defined in popmap) are specified manually with the parameter `--groups`. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
117

RomainFeron's avatar
RomainFeron committed
118
By default, the `signif` function generates an output file in the same format as the markers depth table. Markers can also be exported to a fasta file using the parameter `--output-fasta`.
119

RomainFeron's avatar
RomainFeron committed
120
The markers table generated by `signif` can be visualized with the `plot_depth()` function of [RADSex-vis](https://github.com/RomainFeron/RADSex-vis), which generates a heatmap showing the depth of each marker in each individual.
121
122


RomainFeron's avatar
RomainFeron committed
123
### Aligning markers to a genome
124

RomainFeron's avatar
RomainFeron committed
125
Markers can be aligned to a genome using the `map` command:
126

RomainFeron's avatar
RomainFeron committed
127
128
129
```bash
radsex map --markers-file markers_table.tsv --output-file alignment_results.tsv --popmap popmap.tsv --genome-file genome.fasta --min-quality 20 --min-frequency 0.1 --min-depth 5 --groups M,F
```
130

RomainFeron's avatar
RomainFeron committed
131
In this example, `--markers-file` is the markers depth table generated with `process` and the path to the reference genome file is given by `--genome-file`; results will are saved to **alignment_results.tsv**. The sex of each individual in the population is given by **popmap.tsv** and the minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual. Groups of individuals to compare (as defined in the popmap) are specified manually with the parameter `--groups`
132

RomainFeron's avatar
RomainFeron committed
133
The parameter `--min-quality` specifies the minimum mapping quality (as defined in [BWA](http://bio-bwa.sourceforge.net/bwa.shtml)) to consider a marker properly aligned and is set to 20 in this example. The parameter `--min-frequency` specifies the minimum frequency of a marker in the population to retain this marker and is set to 0.1 here, meaning that only sequences present in at least 10% of individuals of the population are aligned to the genome.
134

RomainFeron's avatar
RomainFeron committed
135
Alignment results from `map` can be visualized with the `plot_genome()` function of [RADSex-vis](https://github.com/RomainFeron/RADSex-vis), which generates a circular plot showing bias and association with sex for each marker aligned to the genome.
136

RomainFeron's avatar
RomainFeron committed
137
Alignment results for a specific contig can be visualized with the `plot_contig()` function to show the same metrics for a single contig.
138
139


Romain Feron's avatar
Romain Feron committed
140
## LICENSE
141

RomainFeron's avatar
RomainFeron committed
142
Copyright (C) 2018-2020 Romain Feron
143

Romain Feron's avatar
Romain Feron committed
144
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation,
145
either version 3 of the License, or (at your option) any later version.
146

Romain Feron's avatar
Romain Feron committed
147
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
148
See the GNU General Public License for more details.
149
150

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/