Commit b24883e1 authored by RomainFeron's avatar RomainFeron
Browse files

Started updating docs to match new CLI

parent 8e6e3050
......@@ -56,7 +56,7 @@ Before running the pipeline, you should prepare the following elements:
* A **population map**: a tabulated file with individual ID as the first column and sex as the second column. It is important that the individual IDs in the popmap are the same as the names of the demultiplexed reads files (see the [popmap section](#population-map) for details).
* If you want to map the sequences to a reference genome: a **reference genome** in fasta format.
.. note:: When visualizing `mapping` results with `radsex-vis`, linkage groups / chromosomes are automatically inferred from scaffold names in the reference sequence if their name starts with *LG*, *CHR*, or *NC* (case unsensitive). If chromosomes are named differently in the reference genome, you should prepare a tabulated file with reference scaffold ID in the first column and corresponding chromosome name in the second column (see the [chromosomes names section](#chromosomes-names) for details).
.. note:: When visualizing `map` results with `radsex-vis`, linkage groups / chromosomes are automatically inferred from scaffold names in the reference sequence if their name starts with *LG*, *CHR*, or *NC* (case unsensitive). If chromosomes are named differently in the reference genome, you should prepare a tabulated file with reference scaffold ID in the first column and corresponding chromosome name in the second column (see the [chromosomes names section](#chromosomes-names) for details).
Running RADSex
......@@ -64,37 +64,37 @@ Running RADSex
.. _computing-cov-table:
Computing the coverage table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Computing the markers table
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The first step of RADSex is to create a table of coverage for the dataset using the ``process`` command:
The first step of RADSex is to create a table of marker depth for the dataset using the ``process`` command:
::
radsex process --input-dir ./samples --output-file coverage_table.tsv --threads 16 --min-coverage 1
radsex process --input-dir ./samples --output-file markers_table.tsv --threads 16 --min-depth 1
In this example, demultiplexed reads are stored in **./samples** and the coverage table generated by ``process`` will be stored in **coverage_table.tsv**. The parameter ``--threads`` specifies the number of threads to use, and ``--min-coverage`` specifies the minimum coverage to consider a marker present in an individual: markers which are not present with coverage higher than this value in at least one individual will not be retained in the coverage table.
It is advised to keep the minimum coverage to 1 for this step, as it can be adjusted for each analysis later.
In this example, demultiplexed reads are stored in **./samples** and the markers table generated by ``process`` will be stored in **markers_table.tsv**. The parameter ``--threads`` specifies the number of threads to use, and ``--min-depth`` specifies the minimum depth to consider a marker present in an individual: markers which are not present with depth higher than this value in at least one individual will not be retained in the markers table.
It is advised to keep the minimum depth to 1 for this step, as it can be adjusted for each analysis later.
The resulting file **coverage_table.tsv** is a table with N + 2 columns, where *N* is the number of individuals in the dataset :
The resulting file **markers_table.tsv** is a table with N + 2 columns, where *N* is the number of individuals in the dataset :
* **ID** : marker ID.
* **Sequence** : marker sequence.
* For each individual, the coverage of this marker.
* For each individual, the depth of this marker.
Computing the distribution of markers between sexes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
After generating the coverage table, the ``distrib`` command is used to compute the distribution of markers between sexes:
After generating the markers table, the ``distrib`` command is used to compute the distribution of markers between sexes:
::
radsex distrib --input-file coverage_table.tsv --output-file distribution.tsv --popmap-file popmap.tsv --min-coverage 5``
radsex distrib --markers-table markers_table.tsv --output-file distribution.tsv --popmap popmap.tsv --min-depth 5``
In this example, the input file ``--input-file`` is the coverage table generated in the :ref:`computing-cov-table` section, and the distribution of markers between sexes will be stored in **distribution.tsv**.
In this example, the value of ``--markers-table`` is the table generated in the :ref:`computing-cov-table` section, and the distribution of markers between sexes will be stored in **distribution.tsv**.
The sex of each individual in the population is given by **popmap.tsv** (see the [popmap section](#population-map) for details).
The minimum coverage to consider a marker present in an individual is set to 5, meaning that markers with coverage lower than 5 in an individual will not be considered present in this individual.
The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
The resulting file **distribution.tsv** is a table with five columns:
......@@ -116,23 +116,23 @@ Markers significantly associated with sex can be obtained with the ``signif`` co
::
radsex signif --input-file coverage_table.tsv --output-file markers.tsv --popmap-file popmap.tsv --min-coverage 5 [ --output-format fasta ]
radsex signif --markers-table markers_table.tsv --output-file markers.tsv --popmap popmap.tsv --min-depth 5 [ --output-format fasta ]
In this example, the input file ``--input-file`` is the coverage table generated in the :ref:`computing-cov-table` step, and the markers significantly associated with sex are outputed in **markers.tsv**. The sex of each individual in the population is given by **popmap.tsv** (see the [popmap section](#population-map) for details), and the minimum coverage to consider a sequence present in an individual is set to 5, meaning that markers with coverage lower than 5 in an individual will not be considered present in this individual.
In this example, the value of ``--markers-table`` is the table generated in the :ref:`computing-cov-table` section, and the markers significantly associated with sex are output in **markers.tsv**. The sex of each individual in the population is given by **popmap.tsv** (see the [popmap section](#population-map) for details), and the minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.
By default, the ``signif`` function generates an output file in the same format as the coverage table. However, the sequences can be exported to fasta using the ``--output-format`` parameter (see TODO SECTION).
By default, the ``signif`` function generates an output file in the same format as the markers table. However, the sequences can be exported to fasta using the ``--output-fasta`` parameter (see TODO SECTION).
The coverage table generated by ``signif`` can be visualized with the ``plot_coverage()`` function of `RADSex-vis <https://github.com/RomainFeron/RADSex-vis>`_, which generates a heatmap showing the coverage of each sequence in each individual.
The markers table generated by ``signif`` can be visualized with the ``plot_coverage()`` function of `RADSex-vis <https://github.com/RomainFeron/RADSex-vis>`_, which generates a heatmap showing the depth of each marker in each individual.
Mapping markers to a reference genome
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The markers can be mapped to a reference genome using the ``map`` command:
Markers can be aligned to a reference genome using the ``map`` command:
::
radsex map --input-file coverage_table.tsv --output-file mapping.tsv --popmap-file popmap.tsv --genome-file genome.fasta --min-quality 20 --min-frequency 0.1 --min-coverage 5
radsex map --input-file markers_table.tsv --output-file mapping.tsv --popmap popmap.tsv --genome-file genome.fasta --min-quality 20 --min-frequency 0.1 --min-depth 5
In this example, the input file ``--input-file`` is the coverage table generated in the :ref:`computing-cov-table` step, the mapping results will be stored in **sequences.tsv**, and the path to the reference genome file is given by ``--genome-file``. The sex of each individual in the population is given by **popmap.tsv** (see the [popmap section](#population-map) for details), and the minimum coverage to consider a marker present in an individual is set to 5, meaning that markers with coverage lower than 5 in an individual will not be considered present in this individual. The parameter ``--min-quality`` specifies the minimum mapping quality (as defined in `BWA <http://bio-bwa.sourceforge.net/bwa.shtml>`_) to consider a marker properly mapped, and is set to 20 in this example. The parameter ``--min-frequency`` specifies the minimum frequency of a marker in at least one sex; it is set to 0.1 here, meaning that only sequences present in at least 10% of individuals of one sex are retained for mapping.
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment