... | @@ -38,6 +38,80 @@ Run : R2 .fastq.gz file path |
... | @@ -38,6 +38,80 @@ Run : R2 .fastq.gz file path |
|
# Architecture Diagram Software
|
|
# Architecture Diagram Software
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Workflow management
|
|
|
|
|
|
|
|
We use [snakemake](https://snakemake.readthedocs.io/en/stable/), a workflow management system to create scalable and reproducible metabarcoding analysis.
|
|
|
|
|
|
|
|
Snakemake uses wildcards and rules. Rules describe a shell command defined by input and output variables. Wildcards describe the generalization of the values taking by input and output variables.
|
|
|
|
|
|
|
|
Due to limitation of wildcards (at the demultiplexing and concatening steps, the workflow is splitted into 5 snakemake workflow.
|
|
|
|
|
|
|
|
* **01_settings** produces the table to define wildcards
|
|
|
|
* **02_assembly** merges `run` paired-end .fastq files
|
|
|
|
* **03_demultiplex** generates `projet`/`marker`/`run`/`sample` .fasta files from `run` merged .fastq files
|
|
|
|
* **04_filter_samples** filters `projet`/`marker`/`run`/`sample` .fasta files
|
|
|
|
* We concatenate `projet`/`marker`/`run`/`sample` .fasta files into `projet`/`marker`/`run` .fasta files
|
|
|
|
* **05_assignment** produces `projet`/`marker`/`run` species occurence table files
|
|
|
|
|
|
|
|
Each workflow is stored with the following structure:
|
|
|
|
|
|
|
|
```
|
|
|
|
├── workflow
|
|
|
|
│ ├── rules
|
|
|
|
| │ ├── module1.smk
|
|
|
|
| │ └── module2.smk
|
|
|
|
│ ├── envs
|
|
|
|
| │ ├── tool1.yaml
|
|
|
|
| │ └── tool2.yaml
|
|
|
|
| └── Snakefile
|
|
|
|
├── config
|
|
|
|
│ └── config.yaml
|
|
|
|
│
|
|
|
|
└── results
|
|
|
|
└── workflow
|
|
|
|
├── module1
|
|
|
|
└── module2
|
|
|
|
```
|
|
|
|
The workflow code goes into a subfolder `workflow`, while the configuration is stored in a subfolder `config`. Inside of the `workflow` subfolder, the central `Snakefile` marks the entrypoint of the workflow. Results are written into subfolder `results. Inside of the `results` subfolder, results are stored following the same structure than inside `workflow` subfolder.
|
|
|
|
|
|
|
|
|
|
|
|
# Environment
|
|
|
|
|
|
|
|
Softwares and dependencies can be run directly on the local system or using environments such as containers or using a package management system.
|
|
|
|
|
|
|
|
## Containers
|
|
|
|
|
|
|
|
[](https://singularity-hub.org/collections/2878)
|
|
|
|
|
|
|
|
We provide ready to run versions of container built with [Singularity containers](https://www.sylabs.io/). All required softwares to run the workflow have been installed within this container.
|
|
|
|
|
|
|
|
User can either download the ready-to-use built container OR build this container instead to download it using the [Singularity.obitools](Singularity.obitools) recipe.
|
|
|
|
|
|
|
|
In both case, it gives an `obitools.simg` file. Absolute path to access to the container file fills the field `singularity:` into [config.yaml](config/)
|
|
|
|
|
|
|
|
|
|
|
|
## Conda
|
|
|
|
|
|
|
|
Softwares can be installed throught a conda environment. Each rule loads its own environment. Environment files are stored at `workflow/envs/obitools_envs.yaml`.
|
|
|
|
|
|
|
|
|
|
|
|
# Wildcards
|
|
|
|
|
|
|
|
Output and input can take any values. We defined them as wildcards.
|
|
|
|
|
|
|
|
We use 4 wildcards:
|
|
|
|
|
|
|
|
* `projet`
|
|
|
|
* `marker`
|
|
|
|
* `run`
|
|
|
|
* `sample`
|
|
|
|
|
|
|
|
# Rules
|
|
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
|
|
|
|
```mermaid
|
|
```mermaid
|
|
|
|
|
|
graph TD;
|
|
graph TD;
|
... | @@ -142,71 +216,6 @@ graph TD; |
... | @@ -142,71 +216,6 @@ graph TD; |
|
|
|
|
|
```
|
|
```
|
|
|
|
|
|
# Workflow management
|
|
|
|
|
|
|
|
We use [snakemake](https://snakemake.readthedocs.io/en/stable/), a workflow management system to create scalable and reproducible metabarcoding analysis.
|
|
|
|
|
|
|
|
Snakemake uses wildcards and rules. Rules describe a shell command defined by input and output variables. Wildcards describe the generalization of the values taking by input and output variables.
|
|
|
|
|
|
|
|
Due to limitation of wildcards (at the demultiplexing and concatening steps, the workflow is splitted into 5 snakemake workflow.
|
|
|
|
|
|
|
|
* **01_settings** produces the table to define wildcards
|
|
|
|
* **02_assembly** merges `run` paired-end .fastq files
|
|
|
|
* **03_demultiplex** generates `projet`/`marker`/`run`/`sample` .fasta files from `run` merged .fastq files
|
|
|
|
* **04_filter_samples** filters `projet`/`marker`/`run`/`sample` .fasta files
|
|
|
|
* We concatenate `projet`/`marker`/`run`/`sample` .fasta files into `projet`/`marker`/`run` .fasta files
|
|
|
|
* **05_assignment** produces `projet`/`marker`/`run` species occurence table files
|
|
|
|
|
|
|
|
Each workflow is stored with the following structure:
|
|
|
|
|
|
|
|
```
|
|
|
|
├── workflow
|
|
|
|
│ ├── rules
|
|
|
|
| │ ├── module1.smk
|
|
|
|
| │ └── module2.smk
|
|
|
|
│ ├── envs
|
|
|
|
| │ ├── tool1.yaml
|
|
|
|
| │ └── tool2.yaml
|
|
|
|
| └── Snakefile
|
|
|
|
├── config
|
|
|
|
│ └── config.yaml
|
|
|
|
│
|
|
|
|
└── results
|
|
|
|
└── workflow
|
|
|
|
├── module1
|
|
|
|
└── module2
|
|
|
|
```
|
|
|
|
The workflow code goes into a subfolder `workflow`, while the configuration is stored in a subfolder `config`. Inside of the `workflow` subfolder, the central `Snakefile` marks the entrypoint of the workflow. Results are written into subfolder `results. Inside of the `results` subfolder, results are stored following the same structure than inside `workflow` subfolder.
|
|
|
|
|
|
|
|
|
|
|
|
# Environment
|
|
|
|
|
|
|
|
Softwares and dependencies can be run directly using environments such as containers or package management system.
|
|
|
|
|
|
|
|
## Containers
|
|
|
|
|
|
|
|
[](https://singularity-hub.org/collections/2878)
|
|
|
|
|
|
|
|
We provide ready to run versions of container built with [Singularity containers](https://www.sylabs.io/). All required softwares to run the workflow have been installed within this container.
|
|
|
|
|
|
|
|
User can either download the ready-to-use built container OR build this container instead to download it using the [Singularity.obitools](Singularity.obitools) recipe.
|
|
|
|
|
|
|
|
In both case, it gives an `obitools.simg` file. Absolute path to access to the container file fills the field `singularity:` into [config.yaml](config/)
|
|
|
|
|
|
|
|
|
|
|
|
## Conda
|
|
|
|
|
|
|
|
Softwares can be installed throught a conda environment. Each steps
|
|
|
|
|
|
|
|
04_filter_samples/envs/obitools_envs.yaml
|
|
|
|
|
|
|
|
|
|
|
|
## Wildcards
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Rules
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## write demultiplex table
|
|
## write demultiplex table
|
... | | ... | |