Commit c0c18d7c authored by RomainFeron's avatar RomainFeron
Browse files

Finished first major doc update

parent b5ff6582
......@@ -11,103 +11,135 @@ Please contact me by email or on Github, or open an issue if you encounter bugs
## Overview
RADSex is a software package for the analysis of sex-determination using RAD-Sequencing data.
The `process` function generates a data structure summarizing a set of demultiplexed RAD reads,
and other functions use this data structure to infer information about the type of sex-determination system, identify sex-biased sequences, and map the RAD sequences to a reference genome.
The results of RADSex are meant to be visualized with the `radsex-vis` R package, available here: https://github.com/RomainFeron/RADSex-vis.
The `process` command generates a data structure summarizing a set of demultiplexed RAD reads,
and other commands use this data structure to:
- infer the type of sex-determination system
- identify sex-biased markers
- align markers to a genome and identify genomic regions differentiated between sexes
- compute marker depth statistics
RADSex results can be visualized with the `radsex-vis` R package available here: https://github.com/RomainFeron/RADSex-vis.
Although RADSex has been developed specifically to study sex-determination, it was designed to be flexibla and can be used to compare any two populations.
This pipeline was developed in the [LPGP](https://www6.rennes.inra.fr/lpgp/) lab from INRA, Rennes, France for the PhyloSex project, which investigates sex determining factors in a wide range of fish species.
## Documentation
This README file contains a simple documentation, including a basic installation guide as well as a quick start section.
The full documentation for RADSex (still under construction) is available [there](https://radsex.readthedocs.io/en/latest).
It contains a complete Getting Started section, a detailed usage for all functions, and real-life datasets examples covering many situations.
This README file includes a basic installation guide and a quick start section. The full documentation for RADSex, including a complete example walktrhough, is available [here](https://romainferon.github.io/RADSex/).
## Installation
## Requirements
### Requirements
- A C++11 compliant compiler (GCC >= 4.8.1, Clang >= 3.3)
- The zlib library (which should be installed on linux by default)
- The zlib library (usually installed on linux by default)
## Installation
### Install the latest official release
- Clone: `git clone https://github.com/RomainFeron/RadSex.git`
- *Alternative: download the archive and unzip it*
- Go to the RadSex directory (`cd RadSex`)
- Download the [latest release](https://github.com/RomainFeron/RadSex/releases)
- Unzip the archive
- Navigate to the `RADSex` directory
- Run `make`
- The compiled `radsex` binary is located in `RadSex/bin/`
The compiled `radsex` binary will be located in **RADSex/bin/**.
### Install the latest stable development version
```bash
git clone https://github.com/RomainFeron/RADSex.git
cd RADSex
make
```
The compiled `radsex` binary will be located in **RADSex/bin/**.
### Install RADSex with Conda
RADSex is available in [Bioconda](https://bioconda.github.io/recipes/radsex/README.html?#recipe-Recipe%20'radsex'). To install RADSex with Conda, run the following command:
```bash
conda install -c bioconda radsex
```
## Quick start
#### Before starting
### Preparing the data
Before running the pipeline, you should prepare the following elements:
- A **set of demultiplexed reads**. The current version of RADSex does not implement demultiplexing;
raw sequencing reads can be demultiplexed using [Stacks](http://catchenlab.life.illinois.edu/stacks/comp/process_radtags.php)
or [pyRAD](http://nbviewer.jupyter.org/gist/dereneaton/af9548ea0e94bff99aa0/pyRAD_v.3.0.ipynb#The-seven-steps-described).
- A **population map** (popmap): a tabulated file with individual ID as the first column and sex as the second column.
It is important that the individual IDs in the popmap are the same as the names of the demultiplexed reads files (see the doc for details).
- If you want to map the sequences to a reference genome: a **reference genome** in fasta format.
Note that when visualizing `mapping` results with `radsex-vis`, linkage groups / chromosomes are automatically inferred from scaffold names in the reference sequence
if their name starts with *LG*, *chr*, or *chromosome* (case unsensitive).
If chromosomes are named differently in the reference genome, you should prepare a tabulated file with reference scaffold ID in the first column and corresponding chromosome name in the second column (see the doc for details)
#### Computing the coverage table
- A **population map** (popmap): a tabulated file with individual IDs in the first column and sex (or group) in the second column. Individual IDs in the popmap must be the same as the names of the demultiplexed reads files (*e.g.* 'individual1' for the reads file 'individual1.fq.gz')
- To align the markers to a genome: the **genome** sequence in a FASTA file.
Note that when visualizing `map` results with `radsex-vis`, linkage groups / chromosomes are automatically inferred from scaffold names in the reference sequence
if their name starts with *LG*, *CHR*, or *NC* (case unsensitive).
If chromosomes are named differently in the reference genome, you can use a tabulated file with contig ID in the first column and corresponding chromosome name in the second column (see the doc for details)
### Computing the marker depths table
The first step of RADSex is to create a table of marker depths for the entire dataset using the `process` command:
```bash
radsex process --input-dir ./samples --output-file markers_table.tsv --threads 16 --min-depth 1
```
In this example, demultiplexed reads are located in **./samples** and the markers table generated by `process` will be saved to **markers_table.tsv**. The parameter `--threads` specifies the number of threads to use, and `--min-depth` specifies the minimum depth to consider a marker present in an individual: markers which are not present with depth higher than this value in at least one individual will not be retained in the markers table.
It is advised to keep the minimum depth to the default value of 1 for this step, as it can be adjusted for each analysis later.
### Computing the distribution of markers between sexes
The `distrib` command computes the distribution of markers between males and females from a marker depths table:
The first step of RADSex is to create a table of coverage for the dataset using the `process` command:
```bash
radsex distrib --markers-table markers_table.tsv --output-file distribution.tsv --popmap popmap.tsv --min-depth 5 --groups M,F
```
`radsex process --input-dir ./samples --output-file coverage_table.tsv --threads 16 --min-coverage 1`
In this example, `--markers-table` is the table generated with `process` and the distribution of markers between males and females will be saved to **distribution.tsv**. The sex of each individual in the population is given by **popmap.tsv**. Groups of individuals to compare (as defined in the popmap) are specified manually with the parameter `--groups`. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
In this example, demultiplexed reads are stored in `./samples` and the coverage table generated by `process` will be stored in `coverage_table.tsv`. The parameter `--threads` specifies the number of threads to use.
The resulting distribution can be visualized with the `plot_sex_distribution()` function of [RADSex-vis](https://github.com/RomainFeron/RADSex-vis), which generates a tile plot of marker counts with number of males on the x-axis and number of females on the y-axis.
The parameter `--min-coverage` specifies the minimum coverage value to consider a sequence present in an individual:
sequences which are not present with coverage higher than this value in at least one individual will not be retained in the coverage table.
It is advised to keep the minimum coverage to 1 for this step, as it can be adjusted for each analysis later.
#### Computing the distribution of sequences between sexes
### Extracting markers significantly associated with sex
After generating the coverage table, the `distrib` command is used to compute the distribution of sequences between sexes:
Markers significantly associated with sex are obtained with the `signif` command:
`radsex distrib --input-file coverage_table.tsv --output-file distribution.tsv --popmap-file popmap.tsv --min-coverage 5`
```bash
radsex signif --markers-table markers_table.tsv --output-file markers.tsv --popmap popmap.tsv --min-depth 5 --groups M,F [ --output-fasta ]
```
In this example, the input file `--input-file` is the coverage table generated in the [previous step](#computing-the-coverage-table), and the distribution of sequences between sexes will be stored in `distribution.tsv`.
The sex of each individual in the population is given by `popmap.tsv` (see the [popmap section](#population-map) for details).
The minimum coverage to consider a sequence present in an individual is set to 5, meaning that sequences present with coverage (depth) lower than 5 in one individual will not be counted in this individual.
In this example, `--markers-table` is the table generated with `process` and markers significantly associated with sex are saved to **markers.tsv**. The sex of each individual in the population is given by **popmap.tsv** (see the :ref:`population-map` section). Groups of individuals to compare (as defined in popmap) are specified manually with the parameter `--groups`. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
This distribution can be visualized with the `plot_sex_distribution()` function of `radsex-vis`, which generates a tile plot.
By default, the `signif` function generates an output file in the same format as the markers depth table. Markers can also be exported to a fasta file using the parameter `--output-fasta`.
#### Extracting sequences significantly associated with sex
The markers table generated by `signif` can be visualized with the `plot_depth()` function of [RADSex-vis](https://github.com/RomainFeron/RADSex-vis), which generates a heatmap showing the depth of each marker in each individual.
Sequences significantly associated with sex can be obtained with the `signif` command:
`radsex signif --input-file coverage_table.tsv --output-file sequences.tsv --popmap-file popmap.tsv --min-coverage 5 [ --output-format fasta ]`
### Aligning markers to a genome
In this example, the input file `--input-file` is the coverage table generated in the [first step](#computing-the-coverage-table), and the sequences significantly associated with sex will be stored in `sequences.tsv`.
The sex of each individual in the population is given by `popmap.tsv` (see the [popmap section](#population-map) for details),
and the minimum coverage to consider a sequence present in an individual is set to 5 (see the [previous section](#computing-the-distribution-of-sequences-between-sexes)).
By default, the `signif` function exports a small coverage table; sequences can be exported to fasta using the `--output-format` parameter.
Markers can be aligned to a genome using the `map` command:
The coverage table generated by `signif` can be visualized with the `plot_coverage()` function of `radsex-vis`, which generates a heatmap of coverage.
```bash
radsex map --markers-file markers_table.tsv --output-file alignment_results.tsv --popmap popmap.tsv --genome-file genome.fasta --min-quality 20 --min-frequency 0.1 --min-depth 5 --groups M,F
```
#### Mapping sequences to a reference genome
In this example, `--markers-file` is the markers depth table generated with `process` and the path to the reference genome file is given by `--genome-file`; results will are saved to **alignment_results.tsv**. The sex of each individual in the population is given by **popmap.tsv** and the minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual. Groups of individuals to compare (as defined in the popmap) are specified manually with the parameter `--groups`
Sequences can be mapped to a reference genome using the `map` command:
The parameter `--min-quality` specifies the minimum mapping quality (as defined in [BWA](http://bio-bwa.sourceforge.net/bwa.shtml)) to consider a marker properly aligned and is set to 20 in this example. The parameter `--min-frequency` specifies the minimum frequency of a marker in the population to retain this marker and is set to 0.1 here, meaning that only sequences present in at least 10% of individuals of the population are aligned to the genome.
`radsex map --input-file coverage_table.tsv --output-file mapping.tsv --popmap-file popmap.tsv --genome-file genome.fasta --min-quality 20 --min-frequency 0.1 --min-coverage 5`
Alignment results from `map` can be visualized with the `plot_genome()` function of [RADSex-vis](https://github.com/RomainFeron/RADSex-vis), which generates a circular plot showing bias and association with sex for each marker aligned to the genome.
In this example, the input file `--input-file` is the coverage table generated in the [first step](#computing-the-coverage-table), the mapping results will be stored in `sequences.tsv`,
and the path to the reference genome file is given by `--genome-file`. The sex of each individual in the population is given by `popmap.tsv` (see the [popmap section](#population-map) for details),
and the minimum coverage to consider a sequence present in an individual is set to 5 (see the [previous section](#computing-the-distribution-of-sequences-between-sexes)).
The parameter `--min-quality` specifies the minimum mapping quality (as defined in [BWA](http://bio-bwa.sourceforge.net/bwa.shtml)) to consider a sequence mapped (`--min-quality`), here set to 20.
The parameter `--min-frequency` specifies the minimum frequency of a sequence in at least one sex; it is set to 0.1 here, meaning that only sequences present in at least 10% of individuals of one sex are retained for mapping.
Alignment results for a specific contig can be visualized with the `plot_contig()` function to show the same metrics for a single contig.
The mapping results generated by `map` can be visualized with the `plot_genome()` function of `radsex-vis`, which generates a circos plot for the entire genome.
Mapping results for a specific scaffold can be visualized with the `plot_contig()` function to generate a linear plot for the specified scaffold.
## LICENSE
Copyright (C) 2018 Romain Feron
Copyright (C) 2018-2020 Romain Feron
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.
......
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 817c8122c81c3f4bd7033a46fe98006f
config: 2367542786f934c1f5c265874076375a
tags: 645f666f9bcd5a90fca523b33c5a78b7
......@@ -15,8 +15,7 @@ Requirements
Installation
~~~~~~~~~~~~
RADSex can be installed from one of the `release packages <https://github.com/RomainFeron/RadSex/releases>`_.
The latest stable development version can be installed directly from the GitHub repository.
There are three ways to install RADSex:
**1. Install the latest release**
......@@ -25,6 +24,8 @@ The latest stable development version can be installed directly from the GitHub
* Navigate to the `RADSex` directory
* Run ``make``
The compiled ``radsex`` binary will be located in **RADSex/bin/**.
**2. Install the latest stable development version**
To install the latest stable version of RADSex directly from the GitHub repository, run the following commands:
......@@ -37,6 +38,14 @@ To install the latest stable version of RADSex directly from the GitHub reposito
The compiled ``radsex`` binary will be located in **RADSex/bin/**.
**3. Install RADSex with conda**
RADSex is available in `Bioconda <https://bioconda.github.io/recipes/radsex/README.html?#recipe-Recipe%20&#x27;radsex&#x27;>`_. To install RADSex with Conda, run the following command:
::
conda install -c bioconda radsex
Update RADSex
~~~~~~~~~~~~~
......@@ -50,6 +59,12 @@ If you installed RADSex directly from the GitHub repository, update RADSex by ru
git pull
make rebuild
If you installed RADSex with Conda, run:
::
conda update -c bioconda radsex
Before starting
---------------
......@@ -92,9 +107,9 @@ The ``distrib`` command computes the distribution of markers between groups from
::
radsex distrib --markers-table markers_table.tsv --output-file distribution.tsv --popmap popmap.tsv --min-depth 5``
radsex distrib --markers-table markers_table.tsv --output-file distribution.tsv --popmap popmap.tsv --min-depth 5 --groups M,F``
In this example, ``--markers-table`` is the table generated in the :ref:`computing-depth-table` section, and the distribution of markers between groups will be saved to **distribution.tsv**. The group of each individual in the population is given by **popmap.tsv** (see the :ref:`population-map` section). The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
In this example, ``--markers-table`` is the table generated in the :ref:`computing-depth-table` section, and the distribution of markers between groups will be saved to **distribution.tsv**. The group of each individual in the population is given by **popmap.tsv** (see the :ref:`population-map` section). Groups of individuals to compare (as defined in the :ref:`population-map`) are specified manually with the parameter ``--groups``. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
The resulting file **distribution.tsv** is a table described in the :ref:`sex-distribution-file` section.
......@@ -108,10 +123,9 @@ Markers significantly associated with sex are obtained with the ``signif`` comma
::
radsex signif --markers-table markers_table.tsv --output-file markers.tsv --popmap popmap.tsv --min-depth 5 [ --output-fasta ]
radsex signif --markers-table markers_table.tsv --output-file markers.tsv --popmap popmap.tsv --min-depth 5 --groups M,F [ --output-fasta ]
In this example, ``--markers-table`` is the table generated in the :ref:`computing-depth-table` section, and markers significantly associated with sex are saved to **markers.tsv**. The sex of each individual in the population is given by **popmap.tsv** (see the :ref:`population-map` section).
The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
In this example, ``--markers-table`` is the table generated in the :ref:`computing-depth-table` section, and markers significantly associated with sex are saved to **markers.tsv**. The sex of each individual in the population is given by **popmap.tsv** (see the :ref:`population-map` section). Groups of individuals to compare (as defined in the :ref:`population-map`) are specified manually with the parameter ``--groups``. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
By default, the ``signif`` function generates an output file in the same format as the markers depth table. Markers can also be exported to a fasta file using the ``--output-fasta`` parameter (see the :ref:`fasta-file` section).
......@@ -125,9 +139,9 @@ Markers can be aligned to a genome using the ``map`` command:
::
radsex map --markers-file markers_table.tsv --output-file alignment_results.tsv --popmap popmap.tsv --genome-file genome.fasta --min-quality 20 --min-frequency 0.1 --min-depth 5
radsex map --markers-file markers_table.tsv --output-file alignment_results.tsv --popmap popmap.tsv --genome-file genome.fasta --min-quality 20 --min-frequency 0.1 --min-depth 5 --groups M,F
In this example, ``--markers-file`` is the markers depth table generated in the :ref:`computing-depth-table` step, and the path to the reference genome file is given by ``--genome-file``; results will are saved to **alignment_results.tsv**. The sex of each individual in the population is given by **popmap.tsv** (see the :ref:`population-map` section), and the minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
In this example, ``--markers-file`` is the markers depth table generated in the :ref:`computing-depth-table` step, and the path to the reference genome file is given by ``--genome-file``; results will are saved to **alignment_results.tsv**. The sex of each individual in the population is given by **popmap.tsv** (see the :ref:`population-map` section), and the minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual. Groups of individuals to compare (as defined in the :ref:`population-map`) are specified manually with the parameter ``--groups``
The parameter ``--min-quality`` specifies the minimum mapping quality (as defined in `BWA <http://bio-bwa.sourceforge.net/bwa.shtml>`_) to consider a marker properly aligned and is set to 20 in this example. The parameter ``--min-frequency`` specifies the minimum frequency of a marker in the population to retain this marker and is set to 0.1 here, meaning that only sequences present in at least 10% of individuals of the population are aligned to the genome.
......
......@@ -2,7 +2,7 @@ License
=======
This software is released under GPLv3 license.
Copyrights Romain Feron and INRA LPGP.
Copyrights Romain Feron.
::
......
......@@ -3,7 +3,7 @@
RADSex usage
============
The RADSex software presents the general command-line interface:
The RADSex software implements the general command-line interface:
``radsex <command> [options]``
......@@ -22,6 +22,9 @@ subset Extract a subset of markers from a marker depths table
======= ===========
------------------------------------------------------------------------------------------------------------
.. _process-usage:
process
......@@ -64,12 +67,15 @@ A :ref:`markers-depths-table-file`:
.. note:: Input files are automatically dectected from the input directory based on their extensions. Supported extensions are **.fa**, **.fa.gz**, **.fq**, **.fq.gz**, **.fasta**, **.fasta.gz**, **.fastq**, **.fastq.gz**, **.fna**, and **.fna.gz**. Individual IDs in the output table will be inferred from the input files names after removing the extension. For instance, a file named **individual_1.fastq.gz** will be attributed the ID **individual_1**. Make sure to use the same IDs when creating the population map.
------------------------------------------------------------------------------------------------------------
.. _distrib-usage:
distrib
-------
Generate a table containing the distribution of markers between two groups of individuals defined in a :ref:`popmap-file`.
Generate a table containing the distribution of markers between two groups of individuals defined in a :ref:`population-map`.
**Command**
......@@ -115,18 +121,21 @@ A :ref:`sex-distribution-file`:
3 3 9 1 False 0.000
.. _subset-usage:
------------------------------------------------------------------------------------------------------------
subset
.. _signif-usage:
signif
------
Extract a subset of markers from a markers depth table based on presence in individuals from each compared group.
Find markers significantly associated with groups of individuals defined in a :ref:`population-map`.
**Command**
::
radsex distrib --markers-table markers_table_path --output-file output_file_path --popmap popmap_path [ --min-depth min_depth --groups group1,group2 --signif-threshold signif_threshold --disable-correction --min-group1 min_group1 --min-group2 min_group2 --max-group1 max_group1 --max-group2 max_group2 --min-individuals min_individuals --max-individuals max_individuals --output-fasta ]
radsex signif --markers-table markers_table_path --output-file output_file_path --popmap popmap_path [ --min-depth min_depth --groups group1,group2 --signif-threshold signif_threshold --disable-correction --output-fasta ]
**Options**
......@@ -135,18 +144,12 @@ Option Description
============================ ===========
``--markers-table, -t`` Path to a marker depths table generated by "process"
``--popmap, -p`` Path to a tabulated file specifying groups for all individuals
``--output-file, -o`` Path to the output file (subset of markers)
``--output-file, -o`` Path to the output file (distribution of markers between groups)
``--min-depth, -d`` Minimum depth to consider a marker present in an individual
``--groups, -G`` Names of the groups to compare if there are more than two groups in the popmap (--groups group1, group2)
``--signif-threshold, -S`` P-value threshold to consider a marker significantly associated with a phenotypic group
``--disable-correction, -C`` If set, Bonferroni correction will NOT be used when assessing significance of association with phenotypic group
``--output-fasta, -a`` If set, markers are saved to a fasta file instead of a markers depth table
``--min-group1, -m`` Minimum number of individuals from the first group to retain a marker in the subset
``--min-group2, -n`` Minimum number of individuals from the second group to retain a marker in the subset
``--max-group1, -M`` Maximum number of individuals from the first group to retain a marker in the subset
``--max-group2, -N`` Maximum number of individuals from the second group to retain a marker in the subset
``--min-individuals, -i`` Minimum number of individuals to retain a marker in the subset
``--max-individuals, -I`` Maximum number of individuals to retain a marker in the subset
============================ ===========
**Output**
......@@ -178,108 +181,130 @@ Option Description
TGCAG.....CTGGAGAAGAGTAGG
.. _signif-usage:
------------------------------------------------------------------------------------------------------------
signif
------
.. _map-usage:
map
---
Align markers to a genome and compute bias and probability of association with group for each aligned marker.
**Command**
::
radsex signif --input-file input_file_path --output-file output_file_path --popmap-file popmap_file_path [ --output-format output_format --min-depth min_depth ]
radsex map --input-file input_file_path --output-file output_file_path --population-map popmap_file_path --genome-file genome_file_path [ --min-depth min_depth --min-quality min_quality --min-frequency min_frequency ]
The ``signif`` command filters the coverage table to only export markers significantly associated with sex. The probability of association with sex is computed using a chi-squared test with Yate's correction for continuity. Markers are significantly associated with sex when p ≤ (0.05 / total number of markers), to implement Bonferroni correction. Markers significantly associated with sex can be exported either in table format (same as the output of ``process``) or in fasta format, with marker information contained in the sequence IDs.
**Options**
===================== ===========
============================ ===========
Option Description
===================== ===========
``--input-file`` Path to an coverage table obtained with ``process``
``--output-file`` Path to the output file (in tsv or fasta format)
``--popmap-file`` Path to a popmap file indicating the sex of each individual
``--output-format`` Output format, either "table" or "fasta" (default: "table")
``--min-depth`` Minimum coverage to consider a marker present in an individual (default: 1)
===================== ===========
============================ ===========
``--markers-table, -t`` Path to a marker depths table generated by "process"
``--popmap, -p`` Path to a tabulated file specifying groups for all individuals
``--output-file, -o`` Path to the output file (distribution of markers between groups)
``--genome-file, -g`` Path to the genome file
``--min-depth, -d`` Minimum depth to consider a marker present in an individual
``--groups, -G`` Names of the groups to compare if there are more than two groups in the popmap (--groups group1, group2)
``--min-quality, -q`` Minimum mapping quality to retain a marker
``--min-frequency, -Q`` Minimum frequency of individuals in which a marker is present to retain a marker
``--signif-threshold, -S`` P-value threshold to consider a marker significantly associated with a phenotypic group
``--disable-correction, -C`` If set, Bonferroni correction will NOT be used when assessing significance of association with phenotypic group
============================ ===========
**Sample output**
**Output**
* Table format :
A :ref:`mapping-results-file`:
::
ID Sequence individual_1 individual_2 individual_3 individual_4 individual_5
15 TGCA..TATT 0 15 24 17 21
27 TGCA..GACC 20 18 3 26 4
43 TGCA..ATCG 2 1 5 16 0
86 TGCA..CCGA 14 29 23 2 19
* FASTA format :
Contig Position Length Marker_id Bias P Signif
LG03 18366992 36623554 4335174 -0.202 0.073 False
LG05 28289991 33792114 4335919 0 1 False
LG05 29738230 33792114 4336169 0.149 0.356 False
LG22 71119 28810691 4336631 0.159 0.162 False
LG15 20142338 30000224 4336732 0 1 False
LG02 26668964 31118443 4337320 0 1 False
LG03 4463700 36623554 4337383 -0.033 0.973 False
LG13 32240045 33409148 4338936 -0.073 0.704 False
LG13 19113343 33409148 4340342 0.064 0.479 False
LG22 22503191 28810691 4341087 -0.080 0.704 False
LG01 17881236 39973033 8678129 -0.736 3.417e-08 True
LG01 16475480 39973033 8888270 -0.705 1.462e-07 True
LG01 15761951 39973033 8954765 -0.769 8.054e-09 True
LG01 16562550 39973033 8990122 -0.736 3.417e-08 True
In FASTA format, IDs are generated with the following pattern : <marker_ID>_<number_of_males>M_<number_of_females>F_cov:<minimum_coverage>.
::
------------------------------------------------------------------------------------------------------------
>15_5M_0F_cov:5
TGCA..TATT
>27_5M_1F_cov:5
TGCA..GACC
>43_5M_1F_cov:5
TGCA..ATCG
>86_5M_0F_cov:5
TGCA..CCGA
.. _subset-usage:
.. _map-usage:
subset
------
map
---
Extract a subset of markers from a markers depth table based on presence in individuals from each compared group.
**Command**
::
radsex map --input-file input_file_path --output-file output_file_path --popmap-file popmap_file_path --genome-file genome_file_path [ --min-depth min_depth --min-quality min_quality --min-frequency min_frequency ]
The ``map`` command aligns all makers from a coverage table (obtained either from ``process``, ``subset``, or ``signif``) to a reference genome provided in fasta format. The output is a tabulated file where each line gives a marker ID, the contig where the marker mapped, the mapping position of the marker on this contig, the sex-bias of the marker (defined as M / Tm - F / Tf where M and F are the number of males and females in which the marker is present, and Tm and Tf are the total number of males and females in the population), the probability of association with sex for this marker (obtained with a chi-square test with Yate's correction for continuity), and the significativity of the association with sex after Bonferroni correction.
radsex distrib --markers-table markers_table_path --output-file output_file_path --popmap popmap_path [ --min-depth min_depth --groups group1,group2 --signif-threshold signif_threshold --disable-correction --min-group1 min_group1 --min-group2 min_group2 --max-group1 max_group1 --max-group2 max_group2 --min-individuals min_individuals --max-individuals max_individuals --output-fasta ]
**Options**
===================== ===========
============================ ===========
Option Description
===================== ===========
``--input-file`` Path to an coverage table obtained with ``process``, ``subset``, or ``signif``
``--output-file`` Path to the output file (in tsv format)
``--popmap-file`` Path to a popmap file indicating the sex of each individual
``--genome-file`` Path to a reference genome file in fasta format
``--min-depth`` Minimum coverage to consider a marker present in an individual (default: 1)
``--min-quality`` Minimum mapping quality, as defined in BWA, to consider a sequence properly mapped (default: 20)
``--min-frequency`` Minimum frequency in at least one sex for a sequence to be retained (default: 0.25)
===================== ===========
============================ ===========
``--markers-table, -t`` Path to a marker depths table generated by "process"
``--popmap, -p`` Path to a tabulated file specifying groups for all individuals
``--output-file, -o`` Path to the output file (subset of markers)
``--min-depth, -d`` Minimum depth to consider a marker present in an individual
``--groups, -G`` Names of the groups to compare if there are more than two groups in the popmap (--groups group1, group2)
``--signif-threshold, -S`` P-value threshold to consider a marker significantly associated with a phenotypic group
``--disable-correction, -C`` If set, Bonferroni correction will NOT be used when assessing significance of association with phenotypic group
``--output-fasta, -a`` If set, markers are saved to a fasta file instead of a markers depth table
``--min-group1, -m`` Minimum number of individuals from the first group to retain a marker in the subset
``--min-group2, -n`` Minimum number of individuals from the second group to retain a marker in the subset
``--max-group1, -M`` Maximum number of individuals from the first group to retain a marker in the subset
``--max-group2, -N`` Maximum number of individuals from the second group to retain a marker in the subset
``--min-individuals, -i`` Minimum number of individuals to retain a marker in the subset
``--max-individuals, -I`` Maximum number of individuals to retain a marker in the subset
============================ ===========
**Output**
**Sample output**
:ref:`markers-depths-table-file`:
::
Marker Contig Position SexBias P Signif
0 LG09 10052920 0 1 False
1 LG45 4008419 0 1 False
2 LG06 20521435 0 1 False
3 LG24 7643946 0.13 0.44 False
4 LG06 16975491 0 1 False
5 LG27 16580048 0 1 False
6 LG49 7206356 0.03 1 False
7 LG30 5571989 0 1 False
8 LG05 20094761 0 1 False
9 LG14 20088495 0 1 False
10 LG34 11566459 -0.04 1 False
11 LG21 17338149 0 1 False
12 LG05 14652417 0.13 0.55 False
13 LG25 23851527 0.75 0.001 True
#Number of markers : 4
id sequence individual_1 individual_2 individual_3 individual_4 individual_5
0 TGCA..TATT 0 15 24 17 21
1 TGCA..GACC 20 18 3 26 4
2 TGCA..ATCG 2 1 5 16 0
3 TGCA..CCGA 14 29 23 2 19
:ref:`fasta-file`:
::
>4495827_F:0_M:21_p:1.14577e-07_mindepth:10
TGCAG.....GGATGTGTATCCATA
>4609394_F:0_M:23_p:1.08057e-08_mindepth:10
TGCAG.....GGTTCCATCCCAAAT
>4661391_F:0_M:26_p:1.92225e-10_mindepth:10
TGCAG.....GTAGAGTGACCAGTT
>5182569_F:0_M:22_p:3.62243e-08_mindepth:10
TGCAG.....ACATGCTGTAAATGC
>5625957_F:0_M:25_p:7.91633e-10_mindepth:10
TGCAG.....CTGGAGAAGAGTAGG
------------------------------------------------------------------------------------------------------------
.. _freq-usage:
......@@ -287,34 +312,44 @@ Option Description
freq
----
Generate a table containing the distribution of markers in the entire dataset.
**Command**
::
radsex freq --input-file input_file_path --output-file output_file_path [ --min-depth min_depth ]
The ``freq`` command computes the distribution of markers in the population. The output is a tabulated file where the first column gives the number of individuals, and the second column gives the number of markers found in this number of individuals.
radsex freq --markers-table markers_table_path --output-file output_file_path [ --min-depth min_depth ]
**Options**
===================== ===========
============================ ===========
Option Description
===================== ===========
``--input-file`` Path to an coverage table obtained with ``process``
``--output-file`` Path to the output file (in tsv format)
``--min-depth`` Minimum coverage to consider a marker present in an individual (default: 1)
===================== ===========
============================ ===========
``--markers-table, -t`` Path to a marker depths table generated by "process"
``--output-file, -o`` Path to the output file (distribution of markers between groups)
``--min-depth, -d`` Minimum depth to consider a marker present in an individual
============================ ===========
**Output**
**Sample output**
A :ref:`freq-results-file`:
::
Frequency Count
1 15
2 37
3 48
4 43
5 124
1 10389
2 3869
3 2884
4 1824
5 1672
6 1276
7 1261
8 1278
9 1355