... | @@ -174,7 +174,7 @@ The [config file](config/) defines a dictionary of configuration parameters and |
... | @@ -174,7 +174,7 @@ The [config file](config/) defines a dictionary of configuration parameters and |
|
|
|
|
|
### 1. Settings
|
|
### 1. Settings
|
|
|
|
|
|
[01_settings/readwrite_rapidrun_demultiplexing.py](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/01_settings/readwrite_rapidrun_demultiplexing.py): write the demultiplex.csv file that the Snakefiles will read to define their wildcards.
|
|
[01_settings/readwrite_rapidrun_demultiplexing.py](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/01_settings/readwrite_rapidrun_demultiplexing.py): write the demultiplex.csv file that the Snakefiles will read to define their wildcards. User can remove blacklisted `run` and `projet` as they are written into the config.yaml file.
|
|
* inputs:
|
|
* inputs:
|
|
* [sample description .dat files](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/tree/master/resources/test/sample_description): a table with 6 columns (plaque, plaque1, barcode, primer5, primer3, infos) and rows as a `plaque` element description. Each sample description file belong to a `marker` wildcard.
|
|
* [sample description .dat files](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/tree/master/resources/test/sample_description): a table with 6 columns (plaque, plaque1, barcode, primer5, primer3, infos) and rows as a `plaque` element description. Each sample description file belong to a `marker` wildcard.
|
|
* [config.yaml](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/tree/master/config): see configuration step
|
|
* [config.yaml](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/tree/master/config): see configuration step
|
... | @@ -182,13 +182,7 @@ The [config file](config/) defines a dictionary of configuration parameters and |
... | @@ -182,13 +182,7 @@ The [config file](config/) defines a dictionary of configuration parameters and |
|
* output:
|
|
* output:
|
|
* [results/01_settings/demultiplexing.csv](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/tree/master/results/01_settings) : a dataframe with 14 columns (demultiplex, projet, marker, run, plaque, sample ,barcode5, barcode3 , primer5, primer3, min_f, min_r, lenBarcode5, lenBarcode3) and rows as `projet`/`marker`/`run`/`plaque`==`sample` element description.
|
|
* [results/01_settings/demultiplexing.csv](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/tree/master/results/01_settings) : a dataframe with 14 columns (demultiplex, projet, marker, run, plaque, sample ,barcode5, barcode3 , primer5, primer3, min_f, min_r, lenBarcode5, lenBarcode3) and rows as `projet`/`marker`/`run`/`plaque`==`sample` element description.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```mermaid
|
|
```mermaid
|
|
|
|
|
|
classDiagram
|
|
classDiagram
|
|
|
|
|
|
Sample --> Marker : is defined by
|
|
Sample --> Marker : is defined by
|
... | @@ -197,7 +191,6 @@ Sample --> Projet : is defined by |
... | @@ -197,7 +191,6 @@ Sample --> Projet : is defined by |
|
Projet --> Run : contains
|
|
Projet --> Run : contains
|
|
Marker --> MarkerSampleDescription : refers to
|
|
Marker --> MarkerSampleDescription : refers to
|
|
|
|
|
|
|
|
|
|
Sample : id_sample
|
|
Sample : id_sample
|
|
Sample : id_marker
|
|
Sample : id_marker
|
|
Sample : id_plaque
|
|
Sample : id_plaque
|
... | @@ -223,7 +216,27 @@ Run : R2 .fastq.gz file path |
... | @@ -223,7 +216,27 @@ Run : R2 .fastq.gz file path |
|
```
|
|
```
|
|
|
|
|
|
|
|
|
|
### 2 Assembly
|
|
### 2. Assembly
|
|
|
|
|
|
|
|
The [assembly Snakefile](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/02_assembly/Snakefile) will use the wildcards `run` deduced from [rapidrun.tsv](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/resources/test/all_samples.tsv) and performs the following rules:
|
|
|
|
|
|
|
|
#### Align and merge paired-end reads
|
|
|
|
|
|
|
|
[illuminapairedend](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/02_assembly/rules/illuminapairedend.smk) :paired end alignment then keep reads with quality > s_min
|
|
|
|
* inputs:
|
|
|
|
* `run`.R1.fastq.gz: forward sequence records file
|
|
|
|
* `run`.R2.fastq.gz: reverse sequence records file
|
|
|
|
* parameters
|
|
|
|
* s_min: minimum score for keeping alignment. If the alignment score is below this threshold both the sequences are just concatenated
|
|
|
|
* output:
|
|
|
|
* results/02_assembly/01_illuminapairedend/`run`.fastq: merged sequences fastq file
|
|
|
|
|
|
|
|
#### Remove unaligned sequence records
|
|
|
|
|
|
|
|
[remove_unaligned](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/02_assembly/rules/remove_unaligned.smk):
|
|
|
|
|
|
|
|
* results/02_assembly/02_remove_unaligned/`run`
|
|
|
|
|
|
|
|
|
|
### 3 Demultiplexing
|
|
### 3 Demultiplexing
|
|
|
|
|
... | @@ -243,11 +256,6 @@ Run : R2 .fastq.gz file path |
... | @@ -243,11 +256,6 @@ Run : R2 .fastq.gz file path |
|
|
|
|
|
|
|
|
|
|
|
|
|
### read 'rapidrun' .tsv file
|
|
|
|
### remove blacklisted runs & projects
|
|
|
|
### write table projet/run/sample
|
|
|
|
demultiplex","projet", "marker","run", "plaque","sample","barcode5","barcode3","primer5","primer3","min_f","min_r","lenBarcode5","lenBarcode3"
|
|
|
|
|
|
|
|
## assemble
|
|
## assemble
|
|
### Paired end alignment then keep reads with quality > 40
|
|
### Paired end alignment then keep reads with quality > 40
|
|
### Remove unaligned sequence records
|
|
### Remove unaligned sequence records
|
... | | ... | |