Commit bd1ba812 authored by peguerin's avatar peguerin
Browse files

README installation update

parent 4021254b
......@@ -9,15 +9,15 @@ This was designed to process RADseq data from [RESERVEBENEFIT](https://www.biodi
1. [Introduction](#1-introduction)
2. [Installation](#2-installation)
1. [Prerequisite](#21-prerequisite)
2. [Data Files](#22-data-files)
3. [Set up](#23-set-up)
1. [Prerequisite](#21-prerequisite)
2. [Data Files](#22-data-files)
3. [Set up](#23-set-up)
3. [Reporting bugs](#3-reporting-bugs)
4. [Running the pipeline](#5-running-the-pipeline)
1. [Initialisation](#41-initialisation)
2. [Configuration](#42-configuration)
3. [Run the pipeline into a single command](#43-run-the-pipeline-into-a-single-command)
4. [Run the pipeline step by step](#44-run-the-pipeline-step-by-step)
1. [Initialisation](#41-initialisation)
2. [Configuration](#42-configuration)
3. [Run the pipeline into a single command](#43-run-the-pipeline-into-a-single-command)
4. [Run the pipeline step by step](#44-run-the-pipeline-step-by-step)
# 1. Introduction
......@@ -40,7 +40,7 @@ You must install the following softwares and packages :
5.3.0
```
- [STACKS 2.0b](http://catchenlab.life.illinois.edu/stacks/)
- [STACKS 2.2](http://catchenlab.life.illinois.edu/stacks/)
* Check version and if programs are correctly installed by typing :
```
......@@ -49,7 +49,7 @@ You must install the following softwares and packages :
gstacks --version
populations --version
## should give you the output
2.0b
2.2
```
- [BWA 0.7.17](https://icb.med.cornell.edu/wiki/index.php/Elementolab/BWA_tutorial)
......@@ -99,11 +99,15 @@ You must install the following softwares and packages :
## 2.2 Data Files
The included data files are :
let's define some wildcards `*`
- `{run}` : any runs
- `{pool}` : any pools into a run
- `{species}` : any species
* [config.yaml](01-info_files/config.yaml) :
* [barcodes.txt](01-info_files/barcodes.txt) :
* [infos.csv](01-info_files) :
* [populations_map.txt](01-info_files) :
* [barcodes.txt](01-info_files/barcodes.txt) : file containing barcodes used for {pool} into {run}
* [{species}_infos.csv](01-info_files) : information `.csv` table related to {species} each row is a sample and they are 4 columns which are run,pool,barcode,ID
* [{species}_populations_map.txt](01-info_files) : information table `.tsv` related to {species}. Each row is a sample and they are 2 columns which are ID,population. This file can be generated by the pipeline (see [Configuration](#42-configuration) section). However we strongly recommand you to do it manually.
## 2.3 Set Up
......@@ -114,7 +118,109 @@ cd snakemake_stacks2
```
You will see the following folders :
* [00-scripts](00-scripts): contains all the required scripts to run the whole pipeline
* [01-info_files](01-info_files) : contains all the required data files (see [Data Files](#22-data-files) section below)
* [02-raw](02-raw) : must contain your data from paired-end illumina sequencing runs. The data must be stored this way :
```
02-raw/
runA/
poolA1/
{poolA1}_R1_001.fastq.gz
{poolA1}_R2_001.fastq.gz
poolA2/
{poolA2}_R1_001.fastq.gz
{poolA2}_R2_001.fastq.gz
...
runB/
poolB1/
{poolB1}_R1_001.fastq.gz
{poolB1}_R2_001.fastq.gz
...
...
```
* [03-samples](03-samples): will store the results generated by demultiplexing with [process_radtags](http://catchenlab.life.illinois.edu/stacks/comp/process_radtags.php) and clone filtering [clone_filter](http://catchenlab.life.illinois.edu/stacks/comp/clone_filter.php). The data must be stored this way :
```
02-raw/
runA/
poolA1/
sample_{barcode1}.1.fq.gz
sample_{barcode1}.2.fq.gz
sample_{barcode2}.1.fq.gz
sample_{barcode2}.2.fq.gz
sample_{barcode3}.1.fq.gz
sample_{barcode3}.2.fq.gz
...
poolA1_clone_filtered/
sample_{barcode1}.1.1.fq.gz
sample_{barcode1}.2.2.fq.gz
sample_{barcode2}.1.1.fq.gz
sample_{barcode2}.2.2.fq.gz
sample_{barcode3}.1.1.fq.gz
sample_{barcode3}.2.2.fq.gz
...
poolA2/
sample_{barcode1}.1.fq.gz
sample_{barcode1}.2.fq.gz
...
poolA2_clone_filtered/
sample_{barcode1}.1.1.fq.gz
sample_{barcode1}.2.2.fq.gz
...
...
runB/
poolB1/
sample_{barcode1}.1.fq.gz
sample_{barcode1}.2.fq.gz
...
poolB1_clone_filtered/
sample_{barcode1}.1.1.fq.gz
sample_{barcode1}.2.2.fq.gz
...
...
...
```
* [04-all_samples](04-all_samples): paired end `fastq.gz` files are named according to [{species}_infos.csv](01-info_files) information. Then reads are aligned onto reference genome sequences stored into [08-genomes](08-genomes). This folder contains "named" fatsq files and corresponding alignments `.bam` files. `.sorted.bam` are SORTED alignment files and `.sorted.bam.bai` are corresponding index. The data must be stored this way :
```
02-raw/
speciesA/
{sampleA1}.1.fq.gz
{sampleA1}.2.fq.gz
{sampleA1}.bam
{sampleA1}.sorted.bam
{sampleA1}.sorted.bam.bai
{sampleA2}.1.fq.gz
{sampleA2}.2.fq.gz
{sampleA2}.bam
{sampleA2}.sorted.bam
{sampleA2}.sorted.bam.bai
...
speciesB/
{sampleB1}.1.fq.gz
{sampleB1}.2.fq.gz
{sampleB1}.bam
{sampleB1}.sorted.bam
{sampleB1}.sorted.bam.bai
...
...
```
* [05-stacks](05-stacks) : outputs from [gstacks](http://catchenlab.life.illinois.edu/stacks/comp/gstacks.php)
* [06-populations](06-populations) : outputs from [populations](http://catchenlab.life.illinois.edu/stacks/comp/populations.php)
* [08-genomes](08-genomes) : reference genome of each any species {species} used for the analysis. `.fasta` file is mandatory and stores all the scaffolds sequences of {species} genome assembly. `.amb`, `.ann`, `.bwt`, `.pac`, `.sa` are index files required by [BWA 0.7.17](https://icb.med.cornell.edu/wiki/index.php/Elementolab/BWA_tutorial). They will be automatically generated if absent. The data must be stored this way :
```
08-genomes/
{species}_genome.amb
{species}_genome.ann
{species}_genome.bwt
{species}_genome.fasta
{species}_genome.pac
{species}_genome.sa
```
* [10-logs](10-logs) : log files generated by every command
- process_radtags
- clone_filter
- genome_alignment
- gstacks
- populations
# 3. Reporting bugs
......@@ -122,7 +228,7 @@ If you're sure you've found a bug — e.g. if one of my programs crashes
with an obscur error message, or if the resulting file is missing part
of the original data, then by all means submit a bug report.
I use [GitLab's issue system](https://gitlab.com/reservebenefit/snakemake_stacks2/issues)
I use [GitLab's issue system](http://gitlab.mbb.univ-montp2.fr/reservebenefit/snakemake_stacks2/issues)
as my bug database. You can submit your bug reports there. Please be as
verbose as possible — e.g. include the command line, etc
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment