Commit 37accc95 authored by RomainFeron's avatar RomainFeron
Browse files

Cleaned up docs, removed sphinx doc that has been moved to sexgenomicstoolkit.github.io

parent e5cade64
......@@ -58,7 +58,7 @@ PROJECT_LOGO =
# entered, it will be relative to the location where doxygen was started. If
# left blank the current directory will be used.
OUTPUT_DIRECTORY = html/doxygen
OUTPUT_DIRECTORY = .
# If the CREATE_SUBDIRS tag is set to YES then doxygen will create 4096 sub-
# directories (in 2 levels) under the output directory of each output format and
......
SPHINXOPTS = -b html -c .
SPHINXBUILD = sphinx-build
SOURCEDIR = src
BUILDDIR = html
DOXYGEN_BUILDDIR = html/doxygen
DOXYGEN_HTMLDIR = html/doxygen
DOXYGEN_SOURCEDIR = ../src
BUILDDIR = html
SOURCEDIR = ../src
.PHONY: all clean
all: $(BUILDDIR) $(DOXYGEN_BUILDDIR)
all: $(BUILDDIR)
clean:
rm -rf $(BUILDDIR)
rm -rf $(DOXYGEN_BUILDDIR)
rebuild: clean $(BUILDDIR) $(DOXYGEN_BUILDDIR)
rebuild: clean $(BUILDDIR)
$(BUILDDIR): $(SOURCEDIR)/*.rst
@$(SPHINXBUILD) "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS)
$(DOXYGEN_BUILDDIR): Doxyfile $(DOXYGEN_SOURCEDIR)/*.h $(DOXYGEN_SOURCEDIR)/*.cpp
$(BUILDDIR): Doxyfile $(SOURCEDIR)/*.h $(SOURCEDIR)/*.cpp
doxygen $^
mv $(DOXYGEN_BUILDDIR)/html/* $(DOXYGEN_HTMLDIR)
rm -rf $(DOXYGEN_BUILDDIR)/html
# -*- coding: utf-8 -*-
#
# Configuration file for the Sphinx documentation builder.
#
# This file does only contain a selection of the most common options. For a
# full list see the documentation:
# http://www.sphinx-doc.org/en/master/config
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
import os
# -- Project information -----------------------------------------------------
project = 'RADSex'
copyright = '2018-2020, Romain Feron'
author = 'Romain Feron'
# The short X.Y version
version = '0.1'
# The full version, including alpha/beta/rc tags
release = '0.1.0'
# -- General configuration ---------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = None
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = []
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# The default sidebars (for documents that don't match any pattern) are
# defined by theme itself. Builtin themes are using these templates by
# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
# 'searchbox.html']``.
#
# html_sidebars = {}
# -- Options for HTMLHelp output ---------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'RADSexdoc'
# -- Options for LaTeX output ------------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'RADSex.tex', 'RADSex Documentation',
'Romain Feron', 'manual'),
]
# -- Options for manual page output ------------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'radsex', 'RADSex Documentation',
[author], 1)
]
# -- Options for Texinfo output ----------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'RADSex', 'RADSex Documentation',
author, 'RADSex', 'One line description of project.',
'Miscellaneous'),
]
# -- Options for Epub output -------------------------------------------------
# Bibliographic Dublin Core info.
epub_title = project
# The unique identifier of the text. This can be a ISBN number
# or the project homepage.
#
# epub_identifier = ''
# A unique identification for the text.
#
# epub_uid = ''
# A list of files that should not be packed into the epub file.
epub_exclude_files = ['search.html']
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 2367542786f934c1f5c265874076375a
tags: 645f666f9bcd5a90fca523b33c5a78b7
Example walkthrough
===================
In this example, we will run RADSex on a public *Oryzias latipes* RAD-Sequencing dataset. We will detail each step of the process, highlight important details, and show how to use the R package ``radsex-vis`` to generate plots from the output of ``radsex``. This guide assumes that ``radsex`` and the ``radsex-vis`` package have already been installed. For specific instruction about installing ``radsex`` and ``radsex-vis``, check the :ref:`install-release` section. All reported times and resources usage were measured on a desktop computer with an Intel i7-8700K 4.7 GHz processor, 32 Gb of memory, and a standard 7200 RPM Hard Disk Drive. Input data, results (with the exception of the markers depth table), and figures are provided in the *example* directory.
Preparing the data
------------------
The RAD-Sequencing datasets used* in this example are available on the Sequence Read Archive from NCBI. Reads were demultiplexed before being deposited on NCBI, and samples were grouped in two projects, males and females. The accession number for **female** samples is **SRS662264**, and the accession number for **male** samples is **SRS662265**. For convenience, simple scripts to download male and female samples from the EBI ftp can be found `here <https://github.com/RomainFeron/RadSex/tree/master/example/oryzias_latipes/data/download_female_samples.sh>`__ for female samples and `here <https://github.com/RomainFeron/RadSex/tree/master/example/oryzias_latipes/data/download_male_samples.sh>`__ for male samples. This dataset was published in `Wilson et al 2014 <http://www.genetics.org/content/early/2014/09/18/genetics.114.169284>`__.
A population map specifying the sex of each sample is provided `here <https://github.com/RomainFeron/RadSex/tree/master/example/oryzias_latipes/data/population_map.tsv>`_. The assembly used to align markers with `radsex map` was that of a HSOK strain, NCBI accession number **GCA_002234695.1** (`link <https://www.ncbi.nlm.nih.gov/assembly/GCA_002234695.1>`__). The chromosomes names file used to display chromosomes in genome plots is provided `here <https://github.com/RomainFeron/RadSex/tree/master/example/oryzias_latipes/data/chromosomes_names.tsv>`__.
.. note:: RADSex uses file names to generate individual IDs. Therefore, individual names in the population map must correspond to the file names without their extensions (*e.g.* the ID of an individual whose reads are in **individual_1.fq.gz** will be **individual_1**). Check the file names and population map provided above for an example of how to build the population map from file names. More details about the population map can be found in the :ref:`population-map` section.
From now on, we will assume the following directory structure:
::
.
├─── samples
| ├────── xxx.fastq.gz
| ├────── xxx.fastq.gz
| ├────── ...
| └────── xxx.fastq.gz
├─── chromosomes_names.tsv
├─── genome.fasta
└─── popmap.tsv
Generating a table of marker depths for the entire dataset
----------------------------------------------------------
The first step of RADSex is to create a table containing the depth of each marker in each individual for the entire dataset; a RADSex marker represents a non-polymorphic sequence (no mismatches or SNPs). This step is performed with the ``process`` command :
::
radsex process --input-dir samples --output-file markers_table.tsv --threads 8
**Parameters** (see the :ref:`process-usage` usage section for details):
- ``--input-dir``: location of the demultiplexed reads directory. Supported reads file formats are described in the :ref:`reads-file`.
- ``--output-file``: path to the markers depth table generated by this command.
- ``--threads``: number of threads to use to process input files in parallel.
The output file *markers_table.tsv* will be used as input for all analyses implemented in ``radsex``, but it is not used for any ``radsex-vis`` plots. For more information about this file, check the :ref:`markers-depths-table-file` section.
.. note:: The parameter --min-depth specifies the minimum depth in at least one individual to retain a marker in the markers depth table. In most cases, we advise to keep the value of ``--min-depth`` to the default value **1** in order to retain all the information present in the dataset; this way, the markers depth table is only computed once, which is prefered as this step is by far the most computationally intensive in RADSex and markers can be filtered based on minimum depth in all downstream analyses. If you are certain that all individuals in your dataset were sequenced with high coverage and you do not plan to run analyses with a minimum depth of 1, you can specify a higher value for ``--min-depth``.
With our setup, using 8 threads, this step completed in **9 min 25 seconds** with a peak memory usage of **10.3 GB**. The resulting markers depth table used 5.1 GB of disk space.
Computing the distribution of markers between sexes
---------------------------------------------------
The ``distrib`` command computes a table summarizing the distribution of all markers between males and females:
::
radsex distrib --markers-table markers_table.tsv --output-file distribution.tsv --popmap-file popmap.tsv --min-depth 5``
**Parameters** (see the :ref:`distrib-usage` usage section for details):
- ``--markers-table``: path to the markers depth table generated in the previous step.
- ``--output-file``: path to the distribution of markers between males and females generated by this command.
- ``--popmap-file``: path to the :ref:`population-map`.
- ``--min-depth``: minimum depth to consider a marker present in an individual.
With our setup, this step completed in **36 seconds** with a peak memory usage of **4 Mb**.
The output file *distribution.tsv* is a tabulated file described in the :ref:`sex-distribution-file` section. The distribution can be visualized with ``radsex-vis`` using the ``plot_sex_distribution`` function:
::
radsexvis::plot_sex_distribution("distribution.tsv", output_file_path = "distribution.png")
To generate a basic plot, the only required parameter is the full path to a distribution table (**"distribution.tsv"** in this example). The figure can be directly saved to a file using the parameter ``output_file_path``; if this parameter is not specified, the plot will be generated in the current R graphic device. For a full description of the ``plot_sex_distribution()`` function, including additional parameters, check the TODO_RADSEXVIS_SECTION.
The figure obtained with the previous command is displayed below:
.. image:: ../../example/figures/distribution.png
This figure is a tile plot with number of males on the x-axis and number of females on the y-axis. The color of a tile at coordinates (**x**, **y**) indicates the number of markers that were present in any **x** males and any **y** females. For instance, in this figure, there were between 25 and 99 markers found in 29 males (not necessarily always the same 29 males) and in 0 females. Tiles for which association with sex is significant (chi-squared test, using Bonferroni correction) are highlighted in red. Many markers found predominantly in males are significantly associated with sex, indicating that an XX/XY system determines sex in this species. Interestingly, there are no markers found in more than 29 out if 31 males and absent from all females, *i.e* no markers found at position (30, 0) and (31, 0). This suggests there may be male outliers in the dataset.
With our setup, this step completed in **36 seconds** with a peak memory usage of **4 MB**.
Finding markers significantly associated with sex
-------------------------------------------------
The ``signif`` command extracts all markers significantly associated sex from the dataset:
::
radsex signif --markers-table markers_table.tsv --output-file significant_markers.tsv --popmap-file popmap.tsv --min-depth 5
**Parameters** (see the :ref:`signif-usage` usage section for details):
- ``--markers-table``: path to the markers depth table generated with ``process``.
- ``--output-file``: path to the markers depth table generated by this command. Markers can also be exported to a fasta file with the parameter ``--output-fasta`` (see the :ref:`fasta-file` section).
- ``--popmap-file``: path to the :ref:`population-map`.
- ``--min-depth``: minimum depth to consider a marker present in an individual.
.. note:: The probability of association with sex is obtained with a chi-squared test on the number of females and males in which a marker is present. A marker is considered significantly associated with sex if its probability of association with sex is lower than 0.05 (this threshold can be adjusted with --signif-threshold) after Bonferroni correction. Markers significantly associated with sex are the ones in the tiles highlighted in red in the previous figure.
The markers depth table generated by ``signif`` can be visualized with ``radsex-vis`` using the ``plot_depth()`` function :
::
radsexvis::plot_coverage("significant_markers.tsv", output_file_path = "significant_markers.png", popmap_file_path = "popmap.tsv")
To generate a basic plot, the only required parameter is the full path to the subset of markers depth table (**"significant_markers.tsv"** in this example). The figure can be directly saved to a file using the parameter ``output_file_path``; if this parameter is not specified, the plot will be generated in the current R graphic device. The parameter ``popmap_file_path`` can be specified to color individual IDs by sex. For a full description of the ``plot_depth()`` function, including additional parameters, check the TODO_RADSEXVIS_SECTION.
The resulting figure is displayed below:
.. image:: ../../example/figures/significant_markers.png
This figure is a heatmap with individuals on the x-axis and markers on the y-axis. The color of a tile at coordinates (**x**, **y**) indicates the depth of marker **y** in individual **x**. Both individuals and markers are clustered based on depth and clustering dendrograms are displayed by default. If a popmap is specified, male and female IDs are displayed with different colors.* In this example, two males are clustered with the females, confirming the results from ``distrib`` where male-specific markers were always missing from two males. These two males are actually genetic females whose sex was mis-assigned.
With our setup, this step completed in **37 seconds** with a peak memory usage of **6 MB**.
Aligning markers to a genome
----------------------------
When a reference genome is available, markers can be aligned to it in order to locate sex-differentiated regions. This is done using the ``map`` command:
::
radsex map --markers-file markers_table.tsv --output-file map_results.tsv --popmap-file popmap.tsv --genome-file genome.fasta --min-depth 5
**Parameters** (see the :ref:`map-usage` usage section for details):
- ``--markers-file``: path to a markers depth table generated with ``process``, ``distrib``, or ``subset``.
- ``--output-file``: path to the alignment results table generated by this command.
- ``--popmap-file``: path to the :ref:`population-map`.
- ``--genome-file``: path to the genome file.
- ``--min-depth``: minimum depth to consider a marker present in an individual.
The output file *map_results.tsv* is a tabulated file described in the :ref:`mapping-results-file` section. This file can be visualized with ``radsex-vis`` using the ``plot_genome()`` function:
::
radsexvis::plot_genome("map_results.tsv", chromosomes_names_file_path = "chromosomes_names.tsv", output_file_path = "mapping_genome.png")
To generate a basic plot, the only required parameters is the full path to the alignment results table (**"mapping_results.tsv"** in this example). The figure can be directly saved to a file using the parameter ``output_file_path``; if this parameter is not specified, the plot will be generated in the current R graphic device. The parameter ``chromosomes_names_file_path`` can be used to specify chromosomes names as described in the :ref:`chromosomes-names` section. For a full description of the ``plot_genome()`` function, including additional parameters, check the TODO_RADSEXVIS_SECTION.
The resulting figure is displayed below:
.. image:: ../../example/figures/mapping_genome.png
This figure is a circular plot in which each sector corresponds to a chromosome and all unplaced scaffolds are grouped in an additional sector (not visible in this example as there are no unplaced scaffolds in this assembly). The top track gives the bias of a marker, 1 if the marker is present in all males and no females, and -1 if the marker is present in all females and no males. The bottom track shows the probability of association with sex (chi-squared test with Bonferroni correction).
Results for a specific region can be visualized with ``radsex-vis`` using the ``plot_contig()`` function:
::
radsexvis::plot_contig("mapping_results.tsv", "genome.fasta.lengths", "Chr01", chromosomes_names_file_path = "chromosomes_names.tsv", output_file_path = "mapping_contig.png")
This function uses the same parameters as ``plot_genome()`` with the addition of the contig to plot (*Chr01* here). For a full description of the ``plot_contig()`` function, including additional parameters, check the TODO_RADSEXVIS_SECTION.
The resulting figure is displayed below:
.. image:: ../../example/figures/mapping_contig.png
In this figure, bias and probability of association with sex defined just above are plotted against position on the plotted contig.
With our setup, this step completed in **9 min 36 seconds** with a peak memory usage of **1.3 GB**, most of the time being spent indexing the genome. If the genome is already indexed with BWA, this step completes in **55 seconds**.
Going further
-------------
In this example, we showed the most commonly used functions of ``radsex`` and ``radsex-vis``, mostly using default parameters. In general, it is recommended to run ``distrib`` for several ``min-depth`` values (for instance 1, 2, 5, and 10) to better understand the distribution of marker depths in the dataset and estimate the robustness of markers significantly associated with sex. Three other commands are implemented in ``radsex``:
- ``subset``: extract a subset of markers based on presence in number of males and females (see the :ref:`subset-usage` usage section)
- ``freq``: compute the distribution of presence of markers in all individuals from the dataset (see the :ref:`freq-usage` usage section)
- ``depth``: compute the minimum, maximum, median, and average marker depth for each individual from the dataset (see the :ref:`depth-usage` usage section)
To get the full usage information for any ``radsex`` command, check the :ref:`full-usage` section.
Getting started
===============
Installation
------------
Requirements
~~~~~~~~~~~~
* A C++11 compliant compiler (GCC >= 4.8.1, Clang >= 3.3)
* The zlib library (usually installed on linux by default)
.. _install-release:
Installation
~~~~~~~~~~~~
There are three ways to install RADSex:
**1. Install the latest release**
* Download the latest release from `GitHub <https://github.com/RomainFeron/RadSex/releases>`_
* Unzip the archive
* Navigate to the `RADSex` directory
* Run ``make``
The compiled ``radsex`` binary will be located in **RADSex/bin/**.
**2. Install the latest stable development version**
To install the latest stable version of RADSex directly from the GitHub repository, run the following commands:
::
git clone https://github.com/RomainFeron/RADSex.git
cd RADSex
make
The compiled ``radsex`` binary will be located in **RADSex/bin/**.
**3. Install RADSex with conda**
RADSex is available in `Bioconda <https://bioconda.github.io/recipes/radsex/README.html?#recipe-Recipe%20&#x27;radsex&#x27;>`_. To install RADSex with Conda, run the following command:
::
conda install -c bioconda radsex
Update RADSex
~~~~~~~~~~~~~
To update RADSex, you can download the latest stable release and install it as described in the :ref:`install-release` section.
If you installed RADSex directly from the GitHub repository, update RADSex by running the following commands from the **RADSex** directory:
::
git pull
make rebuild
If you installed RADSex with Conda, run:
::
conda update -c bioconda radsex
Before starting
---------------
Before running the pipeline, you should prepare the following files:
* A **set of demultiplexed reads**. The current version of RADSex does not implement demultiplexing. Raw sequencing reads can be demultiplexed using `Stacks <http://catchenlab.life.illinois.edu/stacks/comp/process_radtags.php>`_ or `pyRAD <http://nbviewer.jupyter.org/gist/dereneaton/af9548ea0e94bff99aa0/pyRAD_v.3.0.ipynb#The-seven-steps-described>`_.
* A **group information file (popmap)**: a tabulated file with individual ID as the first column and group as the second column. It is important that the individual IDs in the popmap are the same as the names of the demultiplexed reads files (see the :ref:`population-map` section).
* To align markers to a genome: the **genome file** in fasta format.
.. note:: When visualizing ``map`` results with ``radsex-vis``, linkage groups / chromosomes are automatically inferred from scaffold names in the reference sequence if their name starts with *LG*, *CHR*, or *NC* (case unsensitive). If chromosomes are named differently in the reference genome, you should prepare a tabulated file with reference contig ID in the first column and corresponding chromosome name in the second column (see the :ref:`chromosomes-names`).
Running RADSex
--------------
.. _computing-depth-table:
Computing the markers depth table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The first step of RADSex is to create a table of marker depths for the entire dataset using the ``process`` command:
::
radsex process --input-dir ./samples --output-file markers_table.tsv --threads 16 --min-depth 1
In this example, demultiplexed reads are located in **./samples** and the markers table generated by ``process`` will be saved to **markers_table.tsv**. The parameter ``--threads`` specifies the number of threads to use, and ``--min-depth`` specifies the minimum depth to consider a marker present in an individual: markers which are not present with depth higher than this value in at least one individual will not be retained in the markers table.
It is advised to keep the minimum depth to the default value of 1 for this step, as it can be adjusted for each analysis later.
The resulting file **markers_table.tsv** is a tabulated file described in the :ref:`markers-depths-table-file` section.
Computing the distribution of markers between groups
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``distrib`` command computes the distribution of markers between groups from a markers depth table:
::
radsex distrib --markers-table markers_table.tsv --output-file distribution.tsv --popmap popmap.tsv --min-depth 5 --groups M,F``
In this example, ``--markers-table`` is the table generated in the :ref:`computing-depth-table` section, and the distribution of markers between groups will be saved to **distribution.tsv**. The group of each individual in the population is given by **popmap.tsv** (see the :ref:`population-map` section). Groups of individuals to compare (as defined in the :ref:`population-map`) are specified manually with the parameter ``--groups``. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
The resulting file **distribution.tsv** is a table described in the :ref:`sex-distribution-file` section.
This distribution can be visualized with the ``plot_sex_distribution()`` function of `RADSex-vis <https://github.com/RomainFeron/RADSex-vis>`_, which generates a tile plot of marker counts with number of males on the x-axis and number of females on the y-axis.
Extracting markers significantly associated with sex
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Markers significantly associated with sex are obtained with the ``signif`` command:
::
radsex signif --markers-table markers_table.tsv --output-file markers.tsv --popmap popmap.tsv --min-depth 5 --groups M,F [ --output-fasta ]
In this example, ``--markers-table`` is the table generated in the :ref:`computing-depth-table` section, and markers significantly associated with sex are saved to **markers.tsv**. The sex of each individual in the population is given by **popmap.tsv** (see the :ref:`population-map` section). Groups of individuals to compare (as defined in the :ref:`population-map`) are specified manually with the parameter ``--groups``. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
By default, the ``signif`` function generates an output file in the same format as the markers depth table. Markers can also be exported to a fasta file using the ``--output-fasta`` parameter (see the :ref:`fasta-file` section).
The markers table generated by ``signif`` can be visualized with the ``plot_depth()`` function of `RADSex-vis <https://github.com/RomainFeron/RADSex-vis>`_, which generates a heatmap showing the depth of each marker in each individual.
Aligning markers to a genome
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Markers can be aligned to a genome using the ``map`` command:
::
radsex map --markers-file markers_table.tsv --output-file alignment_results.tsv --popmap popmap.tsv --genome-file genome.fasta --min-quality 20 --min-frequency 0.1 --min-depth 5 --groups M,F
In this example, ``--markers-file`` is the markers depth table generated in the :ref:`computing-depth-table` step, and the path to the reference genome file is given by ``--genome-file``; results will are saved to **alignment_results.tsv**. The sex of each individual in the population is given by **popmap.tsv** (see the :ref:`population-map` section), and the minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual. Groups of individuals to compare (as defined in the :ref:`population-map`) are specified manually with the parameter ``--groups``
The parameter ``--min-quality`` specifies the minimum mapping quality (as defined in `BWA <http://bio-bwa.sourceforge.net/bwa.shtml>`_) to consider a marker properly aligned and is set to 20 in this example. The parameter ``--min-frequency`` specifies the minimum frequency of a marker in the population to retain this marker and is set to 0.1 here, meaning that only sequences present in at least 10% of individuals of the population are aligned to the genome.
The resulting file ``mapping.tsv`` is a table described in the :ref:`mapping-results-file` section.
Alignment results from ``map`` can be visualized with the ``plot_genome()`` function of `RADSex-vis <https://github.com/RomainFeron/RADSex-vis>`_, which generates a circular plot showing bias and association with sex for each marker aligned to the genome.
Alignment results for a specific contig can be visualized with the ``plot_contig()`` function to show the same metrics for a single contig.
.. RADSex documentation master file, created by
sphinx-quickstart on Thu Sep 13 15:17:16 2018.
RADSex documentation
====================
RADSex is a software package to analyze RAD-Sequencing data. It is primarily designed to look compare female and male populations looking for sex signal, but it can be used to compare any two populations.
The core idea of RADSex is to compare presence / absence of non-polymorphic markers between individuals in two populations. RADSex does not allow mismatches when grouping reads into markers. This means that each allele in a polyallelic locus is represented as a separate marker, whereas other RAD-Sequencing analysis softwares would usually group these alleles in a single polymorphic marker. Separating alleles from polymorphic markers enables RADSex to easily detect sex-specific alleles, using only minimum depth of a marker as a parameter.
The main input of RADSex is a dataset of demultiplexed RAD reads, *i.e.* one reads file per individual. From this dataset, RADSex generates a table containing the depth of each marker in each individual. This table is then used to infer information about the type of sex-determination system, identify sex-biased markers, or align the markers to a reference genome. Several functions are also implemented to assist with general analysis of the dataset, for instance computing the frequencies of markers in individuals from the dataset or the median marker depth in each individual.
Results from RADSex can be visualized with the `radsex-vis <https://github.com/INRA-LPGP/radsex-vis>`_ R package. This R package provides easy-to-use functions to generate visual representations of your data.
RADSex's API documentation generated with Doxygen is available `here <doxygen/index.html>`_
Documentation summary
---------------------
.. toctree::
:maxdepth: 2
getting_started
example
usage
input_files
output_files
license
Input files
===========
.. _reads-file:
Reads files
-----------
RADSex accepts demultiplexed reads files as input for ``process``. RADSex should work with any demultiplexed RAD-sequencing reads files regardless of technology (single / double digest) or enzyme. RADSex cannot support paired-end reads because insert sizes are variable, and thus the second read in a pair does not cover a consistent region of the genome.
Input files can be in fasta or fastq formats and can be compressed with gzip. RADSex uses file extensions to detect input files and supports the following extensions: **.fa**, **.fa.gz**, **.fq**, **.fq.gz**, **.fasta**, **.fasta.gz**, **.fastq**, **.fastq.gz**, **.fna**, and **.fna.gz**.
Individual IDs are inferred from file names, *e.g.* RADSex will attribute the ID **individual_1** to the reads file **individual1.fastq.gz**.
.. _population-map:
Population map
--------------
A population map file is a headerless TSV file (*i.e.* a tabulated file using "\\t" - the "tab" character - as a separator) with individual ID in the first column and group in the second column. Groups can be any value and there can be more than two groups. However, most radsex analyses perform pairwise comparisons between groups and will require specifying the groups to compare with the ``--groups`` parameters (*e.g.* ``--groups males,females``). If the popmap contains two groups, these groups will be used for pairwise group comparisons in the order they are found in the popmap (this order can be overridden with ``--groups``).
An example of population map is given below:
::