Commit 9c7a1cbf authored by peguerin's avatar peguerin
Browse files

readme update

parent 437b1ffb
......@@ -120,6 +120,8 @@ workdir
Useful parameters for each program are stored into the file [config.yaml](config.yaml)
Before to run the workflow, you have to set your parameters. Please edit [config.yaml](config.yaml).
Here an example of a config.yaml file:
```diff
container:
/workdir/obitools.simg
......@@ -143,8 +145,9 @@ assign_taxon:
fasta : /workdir/reference_database/db_embl_std.fasta
```
Table of parameters into config.yaml file:
parameters | description | software | rule | default value | excepted type
parameters | descriptions | softwares | rules | default values | excepted type
---------|------------------|-------|------------------|---------|----
container | absolute path of singularity container file `obitools.simg` | [singularity](https://singularity.lbl.gov/) | every rules need this container to work | /workdir/obitools.simg | absolute path of `simg` file
fastqFolderPath | absolute path of a folder which contains pairend-end raw reads `.fastq.gz` files and the sample description `.dat` files. | [illuminapairedend](https://pythonhosted.org/OBITools/scripts/illuminapairedend.html?highlight=illumina#module-illuminapairedend), [ngsfilter](https://pythonhosted.org/OBITools/scripts/ngsfilter.html) | illuminapairedend, assign_sequences | /workdir/edna_miseq_rawdata/ | absolute path of a folder
......@@ -162,15 +165,65 @@ fasta | absolute path to the fasta file of the reference database | [ecotag](htt
## 3.3 Run the workflow into a single command
```
bash main.sh /path/to/fastq_dat_files /path/to/reference_database 16
CORES=16
bash main.sh $CORES
```
order of arguments is important : 1) path to the folder which contains paired-end raw reads files and sample description file 2) path to the folder which contains reference database files 3) number of available cores (here for instance 16 cores)
that's it ! The workflow is running and crunching your data. Look for the log folder output folder after the workflow is finished. See [Results](#5-results) section.
with `CORES` the number of available cores to apply parallelization on the workflow.
that's it ! The workflow is running and crunching your data. Look for the [99-log](99-log) folder output folder after the workflow is finished. See [Results](#4-results) section.
## 3.4 Run the workflow step by step
run the workflow step by step : open the file [main.sh](main.sh) to see details
Open the file [main.sh](main.sh) to see details:
### 3.4.1 Merge paired-end sequences and demultiplexing
```
CORES=16
cd 01-assembly
snakemake -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 120
cd ..
```
Check intermediate files into [01-assembly](01-assembly) folder.
Demultiplexed files will be generated into [02-demultiplex/01-raw](02-demultiplex/01-raw) folder.
### 3.4.2 Filter sequences
```
CORES=16
cd 02-demultiplex
snakemake -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 120
cd ..
```
Check intermediate files into [02-demultiplex](02-demultiplex) folder.
Check filtered sequences into [02-demultiplex/03-cleaned](02-demultiplex/03-cleaned) folder.
### 3.4.3 Concatenate sample files into run files
```
for run in `ls 02-demultiplex/03-cleaned/`;
do cat 02-demultiplex/03-cleaned/${run}/*.c.r.l.u.fasta > 03-filtered/01-runs/${run}_run.fasta ;
done
```
Check concatened `{run}` files into [03-filtered/01-runs/](03-filtered/01-runs/) folder.
### 3.4.4 Taxonomic assignation and format fasta files into tables
```
CORES=16
cd 03-filtered
snakemake -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 120
cd ..
```
Check intermediate files into [03-filtered](03-filtered) folder.
Final tables will be generated into [04-final_tables](04-final_tables) folder.
# 4. Results
......
......@@ -13,21 +13,22 @@
##
###############################################################################
## Usage:
## bash main.sh
## CORE=16
## bash main.sh $CORES
##
##
###############################################################################
CORES=$1
###############################################################################
## assemble & demultiplex
cd 01-assembly
snakemake -s Snakefile -j 8 --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 120
snakemake -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 120
cd ..
###############################################################################
## filter sequences
cd 02-demultiplex
snakemake -s Snakefile -j 8 --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 120
snakemake -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 120
cd ..
###############################################################################
## concatenate samples into run
......@@ -38,7 +39,7 @@ done
## taxonomic assignation & format
cd 03-filtered
#snakemake -s Snakefile -j 8 --dry-run --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 120
snakemake -s Snakefile -j 8 --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 120
snakemake -s Snakefile -j $CORES --use-singularity --singularity-args "--bind /media/superdisk:/media/superdisk" --latency-wait 120
cd ..
###############################################################################
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment