Commit bd1ba812 authored by peguerin's avatar peguerin
Browse files

README installation update

parent 4021254b
...@@ -9,15 +9,15 @@ This was designed to process RADseq data from [RESERVEBENEFIT](https://www.biodi ...@@ -9,15 +9,15 @@ This was designed to process RADseq data from [RESERVEBENEFIT](https://www.biodi
1. [Introduction](#1-introduction) 1. [Introduction](#1-introduction)
2. [Installation](#2-installation) 2. [Installation](#2-installation)
1. [Prerequisite](#21-prerequisite) 1. [Prerequisite](#21-prerequisite)
2. [Data Files](#22-data-files) 2. [Data Files](#22-data-files)
3. [Set up](#23-set-up) 3. [Set up](#23-set-up)
3. [Reporting bugs](#3-reporting-bugs) 3. [Reporting bugs](#3-reporting-bugs)
4. [Running the pipeline](#5-running-the-pipeline) 4. [Running the pipeline](#5-running-the-pipeline)
1. [Initialisation](#41-initialisation) 1. [Initialisation](#41-initialisation)
2. [Configuration](#42-configuration) 2. [Configuration](#42-configuration)
3. [Run the pipeline into a single command](#43-run-the-pipeline-into-a-single-command) 3. [Run the pipeline into a single command](#43-run-the-pipeline-into-a-single-command)
4. [Run the pipeline step by step](#44-run-the-pipeline-step-by-step) 4. [Run the pipeline step by step](#44-run-the-pipeline-step-by-step)
# 1. Introduction # 1. Introduction
...@@ -40,7 +40,7 @@ You must install the following softwares and packages : ...@@ -40,7 +40,7 @@ You must install the following softwares and packages :
5.3.0 5.3.0
``` ```
- [STACKS 2.0b](http://catchenlab.life.illinois.edu/stacks/) - [STACKS 2.2](http://catchenlab.life.illinois.edu/stacks/)
* Check version and if programs are correctly installed by typing : * Check version and if programs are correctly installed by typing :
``` ```
...@@ -49,7 +49,7 @@ You must install the following softwares and packages : ...@@ -49,7 +49,7 @@ You must install the following softwares and packages :
gstacks --version gstacks --version
populations --version populations --version
## should give you the output ## should give you the output
2.0b 2.2
``` ```
- [BWA 0.7.17](https://icb.med.cornell.edu/wiki/index.php/Elementolab/BWA_tutorial) - [BWA 0.7.17](https://icb.med.cornell.edu/wiki/index.php/Elementolab/BWA_tutorial)
...@@ -99,11 +99,15 @@ You must install the following softwares and packages : ...@@ -99,11 +99,15 @@ You must install the following softwares and packages :
## 2.2 Data Files ## 2.2 Data Files
The included data files are : The included data files are :
let's define some wildcards `*`
- `{run}` : any runs
- `{pool}` : any pools into a run
- `{species}` : any species
* [config.yaml](01-info_files/config.yaml) : * [config.yaml](01-info_files/config.yaml) :
* [barcodes.txt](01-info_files/barcodes.txt) : * [barcodes.txt](01-info_files/barcodes.txt) : file containing barcodes used for {pool} into {run}
* [infos.csv](01-info_files) : * [{species}_infos.csv](01-info_files) : information `.csv` table related to {species} each row is a sample and they are 4 columns which are run,pool,barcode,ID
* [populations_map.txt](01-info_files) : * [{species}_populations_map.txt](01-info_files) : information table `.tsv` related to {species}. Each row is a sample and they are 2 columns which are ID,population. This file can be generated by the pipeline (see [Configuration](#42-configuration) section). However we strongly recommand you to do it manually.
## 2.3 Set Up ## 2.3 Set Up
...@@ -114,7 +118,109 @@ cd snakemake_stacks2 ...@@ -114,7 +118,109 @@ cd snakemake_stacks2
``` ```
You will see the following folders : You will see the following folders :
* [00-scripts](00-scripts): contains all the required scripts to run the whole pipeline
* [01-info_files](01-info_files) : contains all the required data files (see [Data Files](#22-data-files) section below)
* [02-raw](02-raw) : must contain your data from paired-end illumina sequencing runs. The data must be stored this way :
```
02-raw/
runA/
poolA1/
{poolA1}_R1_001.fastq.gz
{poolA1}_R2_001.fastq.gz
poolA2/
{poolA2}_R1_001.fastq.gz
{poolA2}_R2_001.fastq.gz
...
runB/
poolB1/
{poolB1}_R1_001.fastq.gz
{poolB1}_R2_001.fastq.gz
...
...
```
* [03-samples](03-samples): will store the results generated by demultiplexing with [process_radtags](http://catchenlab.life.illinois.edu/stacks/comp/process_radtags.php) and clone filtering [clone_filter](http://catchenlab.life.illinois.edu/stacks/comp/clone_filter.php). The data must be stored this way :
```
02-raw/
runA/
poolA1/
sample_{barcode1}.1.fq.gz
sample_{barcode1}.2.fq.gz
sample_{barcode2}.1.fq.gz
sample_{barcode2}.2.fq.gz
sample_{barcode3}.1.fq.gz
sample_{barcode3}.2.fq.gz
...
poolA1_clone_filtered/
sample_{barcode1}.1.1.fq.gz
sample_{barcode1}.2.2.fq.gz
sample_{barcode2}.1.1.fq.gz
sample_{barcode2}.2.2.fq.gz
sample_{barcode3}.1.1.fq.gz
sample_{barcode3}.2.2.fq.gz
...
poolA2/
sample_{barcode1}.1.fq.gz
sample_{barcode1}.2.fq.gz
...
poolA2_clone_filtered/
sample_{barcode1}.1.1.fq.gz
sample_{barcode1}.2.2.fq.gz
...
...
runB/
poolB1/
sample_{barcode1}.1.fq.gz
sample_{barcode1}.2.fq.gz
...
poolB1_clone_filtered/
sample_{barcode1}.1.1.fq.gz
sample_{barcode1}.2.2.fq.gz
...
...
...
```
* [04-all_samples](04-all_samples): paired end `fastq.gz` files are named according to [{species}_infos.csv](01-info_files) information. Then reads are aligned onto reference genome sequences stored into [08-genomes](08-genomes). This folder contains "named" fatsq files and corresponding alignments `.bam` files. `.sorted.bam` are SORTED alignment files and `.sorted.bam.bai` are corresponding index. The data must be stored this way :
```
02-raw/
speciesA/
{sampleA1}.1.fq.gz
{sampleA1}.2.fq.gz
{sampleA1}.bam
{sampleA1}.sorted.bam
{sampleA1}.sorted.bam.bai
{sampleA2}.1.fq.gz
{sampleA2}.2.fq.gz
{sampleA2}.bam
{sampleA2}.sorted.bam
{sampleA2}.sorted.bam.bai
...
speciesB/
{sampleB1}.1.fq.gz
{sampleB1}.2.fq.gz
{sampleB1}.bam
{sampleB1}.sorted.bam
{sampleB1}.sorted.bam.bai
...
...
```
* [05-stacks](05-stacks) : outputs from [gstacks](http://catchenlab.life.illinois.edu/stacks/comp/gstacks.php)
* [06-populations](06-populations) : outputs from [populations](http://catchenlab.life.illinois.edu/stacks/comp/populations.php)
* [08-genomes](08-genomes) : reference genome of each any species {species} used for the analysis. `.fasta` file is mandatory and stores all the scaffolds sequences of {species} genome assembly. `.amb`, `.ann`, `.bwt`, `.pac`, `.sa` are index files required by [BWA 0.7.17](https://icb.med.cornell.edu/wiki/index.php/Elementolab/BWA_tutorial). They will be automatically generated if absent. The data must be stored this way :
```
08-genomes/
{species}_genome.amb
{species}_genome.ann
{species}_genome.bwt
{species}_genome.fasta
{species}_genome.pac
{species}_genome.sa
```
* [10-logs](10-logs) : log files generated by every command
- process_radtags
- clone_filter
- genome_alignment
- gstacks
- populations
# 3. Reporting bugs # 3. Reporting bugs
...@@ -122,7 +228,7 @@ If you're sure you've found a bug — e.g. if one of my programs crashes ...@@ -122,7 +228,7 @@ If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report. of the original data, then by all means submit a bug report.
I use [GitLab's issue system](https://gitlab.com/reservebenefit/snakemake_stacks2/issues) I use [GitLab's issue system](http://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2/issues)
as my bug database. You can submit your bug reports there. Please be as as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc verbose as possible — e.g. include the command line, etc
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment