Commit 79bf1bc2 authored by Romain Feron's avatar Romain Feron
Browse files

Quick README update

parent c79e3a32
# RADSex
The RADSex pipeline is **currently under development** and has not been officially released yet.
Missing features are been implemented, and some bugs are to be expected in this current development version.
The RADSex pipeline is **currently under development** and has not been officially released yet.
Missing features are been implemented, some bugs are to be expected in this current development version, and parameters are subject to change.
Please contact me by email or on Github, or open an issue if you encounter bugs or would like to discuss a feature !
## Overview
RADSex is a software package for the analysis of sex-determination using RAD-Sequencing data.
The `process` function generates a data structure summarizing a set of demultiplexed RAD reads,
and other functions use this data structure to infer information about the type of sex-determination system, identify sex-biased sequences, and map the RAD sequences to a reference genome.
The results of RADSex are meant to be visualized with the `radsex-vis` R package, available here: https://github.com/INRA-LPGP/radsex-vis.
RADSex is a software package for the analysis of sex-determination using RAD-Sequencing data.
The `process` function generates a data structure summarizing a set of demultiplexed RAD reads,
and other functions use this data structure to infer information about the type of sex-determination system, identify sex-biased sequences, and map the RAD sequences to a reference genome.
The results of RADSex are meant to be visualized with the `radsex-vis` R package, available here: https://github.com/RomainFeron/RADSex-vis.
This pipeline was developed for the PhyloSex project, which investigates sex determining factors in a wide range of fish species.
## Documentation
This README file contains a simple documentation, including a basic installation guide as well as a quick start section.
This README file contains a simple documentation, including a basic installation guide as well as a quick start section.
The full documentation for RADSex (still under construction) is available [there](https://radsex.readthedocs.io/en/docs).
The full documentation for RADSex (still under construction) is available [there](https://radsex.readthedocs.io/en/latest).
It contains a complete Getting Started section, a detailed usage for all functions, and real-life datasets examples covering many situations.
......@@ -58,8 +58,8 @@ The first step of RADSex is to create a table of coverage for the dataset using
In this example, demultiplexed reads are stored in `./samples` and the coverage table generated by `process` will be stored in `coverage_table.tsv`. The parameter `--threads` specifies the number of threads to use.
The parameter `--min-coverage` specifies the minimum coverage value to consider a sequence present in an individual:
sequences which are not present with coverage higher than this value in at least one individual will not be retained in the coverage table.
The parameter `--min-coverage` specifies the minimum coverage value to consider a sequence present in an individual:
sequences which are not present with coverage higher than this value in at least one individual will not be retained in the coverage table.
It is advised to keep the minimum coverage to 1 for this step, as it can be adjusted for each analysis later.
#### Computing the distribution of sequences between sexes
......@@ -68,17 +68,11 @@ After generating the coverage table, the `distrib` command is used to compute th
`radsex distrib --input-file coverage_table.tsv --output-file distribution.tsv --popmap-file popmap.tsv --min-coverage 5`
In this example, the input file `--input-file` is the coverage table generated in the [previous step](#computing-the-coverage-table), and the distribution of sequences between sexes will be stored in `distribution.tsv`.
The sex of each individual in the population is given by `popmap.tsv` (see the [popmap section](#population-map) for details).
In this example, the input file `--input-file` is the coverage table generated in the [previous step](#computing-the-coverage-table), and the distribution of sequences between sexes will be stored in `distribution.tsv`.
The sex of each individual in the population is given by `popmap.tsv` (see the [popmap section](#population-map) for details).
The minimum coverage to consider a sequence present in an individual is set to 5, meaning that sequences present with coverage (depth) lower than 5 in one individual will not be counted in this individual.
The resulting file `distribution.tsv` is a table with four columns:
- **Males** : number of males in which a sequence was present.
- **Females** : number of females in which a sequence was present.
- **Sequences** : number of sequences present in the corresponding number of males and females.
- **P** : p-value of a chi-squared test for association with sex.
This distribution can be visualized with the `plot_sex_distribution()` function of `radsex-vis`, which generates a [distribution heatmap](./examples/figures/sex_distribution.png).
This distribution can be visualized with the `plot_sex_distribution()` function of `radsex-vis`, which generates a tile plot.
#### Extracting sequences significantly associated with sex
......@@ -86,12 +80,12 @@ Sequences significantly associated with sex can be obtained with the `signif` co
`radsex signif --input-file coverage_table.tsv --output-file sequences.tsv --popmap-file popmap.tsv --min-coverage 5 [ --output-format fasta ]`
In this example, the input file `--input-file` is the coverage table generated in the [first step](#computing-the-coverage-table), and the sequences significantly associated with sex will be stored in `sequences.tsv`.
The sex of each individual in the population is given by `popmap.tsv` (see the [popmap section](#population-map) for details),
and the minimum coverage to consider a sequence present in an individual is set to 5 (see the [previous section](#computing-the-distribution-of-sequences-between-sexes)).
In this example, the input file `--input-file` is the coverage table generated in the [first step](#computing-the-coverage-table), and the sequences significantly associated with sex will be stored in `sequences.tsv`.
The sex of each individual in the population is given by `popmap.tsv` (see the [popmap section](#population-map) for details),
and the minimum coverage to consider a sequence present in an individual is set to 5 (see the [previous section](#computing-the-distribution-of-sequences-between-sexes)).
By default, the `signif` function exports a small coverage table; sequences can be exported to fasta using the `--output-format` parameter.
The coverage table generated by `signif` can be visualized with the `plot_coverage()` function of `radsex-vis`, which generates a [coverage heatmap](./examples/figures/coverage.png)
The coverage table generated by `signif` can be visualized with the `plot_coverage()` function of `radsex-vis`, which generates a heatmap of coverage.
#### Mapping sequences to a reference genome
......@@ -99,30 +93,23 @@ Sequences can be mapped to a reference genome using the `map` command:
`radsex map --input-file coverage_table.tsv --output-file mapping.tsv --popmap-file popmap.tsv --genome-file genome.fasta --min-quality 20 --min-frequency 0.1 --min-coverage 5`
In this example, the input file `--input-file` is the coverage table generated in the [first step](#computing-the-coverage-table), the mapping results will be stored in `sequences.tsv`,
and the path to the reference genome file is given by `--genome-file`. The sex of each individual in the population is given by `popmap.tsv` (see the [popmap section](#population-map) for details),
and the minimum coverage to consider a sequence present in an individual is set to 5 (see the [previous section](#computing-the-distribution-of-sequences-between-sexes)).
The parameter `--min-quality` specifies the minimum mapping quality (as defined in [BWA](http://bio-bwa.sourceforge.net/bwa.shtml)) to consider a sequence mapped (`--min-quality`), here set to 20.
In this example, the input file `--input-file` is the coverage table generated in the [first step](#computing-the-coverage-table), the mapping results will be stored in `sequences.tsv`,
and the path to the reference genome file is given by `--genome-file`. The sex of each individual in the population is given by `popmap.tsv` (see the [popmap section](#population-map) for details),
and the minimum coverage to consider a sequence present in an individual is set to 5 (see the [previous section](#computing-the-distribution-of-sequences-between-sexes)).
The parameter `--min-quality` specifies the minimum mapping quality (as defined in [BWA](http://bio-bwa.sourceforge.net/bwa.shtml)) to consider a sequence mapped (`--min-quality`), here set to 20.
The parameter `--min-frequency` specifies the minimum frequency of a sequence in at least one sex; it is set to 0.1 here, meaning that only sequences present in at least 10% of individuals of one sex are retained for mapping.
The resulting file `mapping.tsv` is a table with five columns:
- **Sequence :** ID of the mapped sequence
- **Contig :** ID of the contig where the sequence mapped
- **Position :** position of the mapped sequence on the contig
- **SexBias :** sex-bias of the mapped sequence, defined as (Males / Total males ) - (Females / Total females)
- **P :** p-value of a chi-squared test for association with sex
The mapping results generated by `map` can be visualized with the `plot_genome()` function of `radsex-vis`, which generates a [circular plot](./examples/figures/genome.png).
Mapping results for a specific scaffold can be visualized with the `plot_scaffold()` function to generate a [linear plot](./examples/figures/scaffold.png).
The mapping results generated by `map` can be visualized with the `plot_genome()` function of `radsex-vis`, which generates a circos plot for the entire genome.
Mapping results for a specific scaffold can be visualized with the `plot_contig()` function to generate a linear plot for the specified scaffold.
## LICENSE
Copyright (C) 2018 Romain Feron and INRA LPGP
Copyright (C) 2018 Romain Feron
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation,
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment