README.md 4.52 KB
Newer Older
1
## RadSex
2
3
4

### Overview

5
The RADSex pipeline is used to analyze RADSeq data with focus on sex. This pipeline was developed for the PhyloSex project, which investigates sex determining factors in a wide range of fish species.
6
7
8

### Requirements

9
10
- A C++11 compliant compiler (GCC >= 4.8.1, Clang >= 3.3)
- The zlib library (should be installed on linux by default)
11
12
13
14
15
16
17
18
19
20
21
22
23
- *Optional (for visualization)* : R 3.3 or higher with the following packages:
    + readr
    + ggplot2
    + reshape2
    + ggdendro
    + grid
    + gtable
    + base
    + svglite
    + scales

### Installation

24
25
26
27
- Clone: `git clone git@github.com:INRA-LPGP/RadSex.git`
- Alternative: Download the archive and unzip it
- Go to the RadSex directory (`cd RadSex`)
- Run `make`
Romain Feron's avatar
Romain Feron committed
28
- *Optional* : install R packages for visualization with `Rscript install_packages.R`
29
30
31
32
33
34

### Usage


#### General

35
`radsex <command> [options]`
36
37
38

**Available commands** :

39
40
41
42
43
Command            | Description
------------------ | ------------
`process_reads`    | Compute a matrix of coverage from a set of demultiplexed reads files
`sex_distribution` | Calculate a distribution of sequences between sexes
`subset` | Extract a subset of the coverage matrix
44
45
46
47


<br/>

48
#### Process reads
49

50
`radsex process_reads -d input_dir_path -o output_file_path [ -t n_threads -c min_cov ]`
51

52
*Generates a matrix of coverage for all individuals and all sequences. The output is a tabulated file, where each line contains the ID, sequence and coverage for each individual of a marker.*
53
54
55
56
57

**Options** :

Option | Full name | Description
--- | --- | ---
58
59
60
61
`-d` | `input_dir_path` | Path to a folder containing demultiplexed reads |
`-o``output_file_path` | Path to the output file |
`-t``n_threads` | Number of threads to use (default: 1) |
`-c``min_cov` | Minimum coverage to consider a marker in an individual (default: 1) |
62
63
64

<br/>

65
#### Sex distribution
66

67
`radsex sex_distribution -f input_file_path -o output_file_path -p popmap_file_path [ -c min_cov ]`
68

69
*Generates a matrix of dimensions (Number of males) x (Number of females). The value at coordinates **(i, j)** corresponds to the number of haplotypes found in precisely **i** males and **j** females.*
70
71
72
73
74

**Options** :

Option | Full name | Description
--- | --- | ---
75
76
77
78
`-f` | `input_file_path` | Path to an input file (result of process_reads) |
`-o``output_file_path` | Path to the output file |
`-p``popmap_file_path` | Path to a popmap file (indicating the sex of each individual) |
`-c``min_cov` | Minimum coverage to consider a marker in an individual (default: 1) |
79
80
81

<br/>

82
#### Subset
83

84
`radsex subset -f input_file_path -o output_file_path -p popmap_file_path [ -c min_cov --min-males min_males --min-females min_females --max-males max_males --max-females max_females ]`
85

86
*Filters the coverage matrix to export markers matching the values of min_males, min_females, max_males, and max_females (i.e. markers found in M males with min_males <= M <= max_males and F females with min_females <= F <= max_females)*
87
88
89
90
91

**Options** :

Option | Full name | Description
--- | --- | ---
92
93
94
95
96
97
98
99
`-f` | `input_file_path` | Path to an input file (result of process_reads) |
`-o``output_file_path` | Path to the output file |
`-p``popmap_file_path` | Path to a popmap file (indicating the sex of each individual) |
`-c``min_cov` | Minimum coverage to consider a marker in an individual (default: 1) |
`--min-males``min_males` | Minimum number of males with a marker |
`--min-females``min_females` | Minimum number of females with a marker |
`--max-males``max_males` | Maximum number of males with a marker |
`--max-females``max_females` | Maximum number of females with a marker |
100
101
102

<br/>

103
**Example output** :
104
105
106
107
108
109
110
111
112
113
114

- heatmap :
![Heatmap](./examples/plots/heatmap.png)
- clustering :
![Presence/Absence](./examples/plots/presence_clustering.png)
![Coverage](./examples/plots/coverage_clustering.png)
- frequencies:
![Coverage](./examples/plots/frequencies.png)

### LICENSE

115
116
117
118
119
120
121
Copyright (C) 2018 Romain Feron and INRA LPGP

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/