... | @@ -309,23 +309,51 @@ We concatenate `projet`/`marker`/`run`/`sample` .fasta files into `projet`/`mar |
... | @@ -309,23 +309,51 @@ We concatenate `projet`/`marker`/`run`/`sample` .fasta files into `projet`/`mar |
|
|
|
|
|
#### 5.1. Dereplicate sequences at `run` level
|
|
#### 5.1. Dereplicate sequences at `run` level
|
|
|
|
|
|
|
|
[dereplicate_runs](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/05_assignment/rules/dereplicate_runs.smk): dereplicate sequences
|
|
|
|
|
|
|
|
* input:
|
|
|
|
* results/05_assignment/01_runs/`run`.fasta : concatenated `sample` into `run` sequences .fasta file
|
|
|
|
* output:
|
|
|
|
* results/05_assignment/02_dereplicated/`run`.uniq.fasta: dereplicated sequences .fasta file
|
|
|
|
|
|
|
|
|
|
#### 5.2. Taxonomic assignment
|
|
#### 5.2. Taxonomic assignment
|
|
|
|
|
|
Assign each sequence to a taxon
|
|
[assign_taxon](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/05_assignment/rules/assign_taxon.smk): assign each sequence to a taxon
|
|
|
|
|
|
|
|
* inputs:
|
|
|
|
* results/05_assignment/02_dereplicated/`run`.uniq.fasta: dereplicated sequences .fasta file
|
|
|
|
* `marker` reference database taxonomy files: EMBL format reference database with taxonomy information
|
|
|
|
* `marker` reference database fasta file: list of reference sequences for each taxa
|
|
|
|
* output:
|
|
|
|
* results/05_assignment/03_assigned/`run`.tag.u.fasta: taxon assigned sequences .fasta file
|
|
|
|
|
|
#### 5.3. Remove not relevant attributes
|
|
#### 5.3. Remove not relevant attributes
|
|
|
|
|
|
Some attributes can be removed at this stage
|
|
[rm_attributes](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/05_assignment/rules/rm_attributes.smk): some attributes can be removed at this stage
|
|
|
|
|
|
|
|
* input:
|
|
|
|
* results/05_assignment/03_assigned/`run`.tag.u.fasta: taxon assigned sequences .fasta file
|
|
|
|
* output:
|
|
|
|
* results/05_assignment/04_formated/`run`.a.t.u.fasta: cleaned .fasta file
|
|
|
|
|
|
#### 5.4. Sort sequences
|
|
#### 5.4. Sort sequences
|
|
|
|
|
|
The sequences can be sorted by decreasing order of count
|
|
[sort_runs](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/05_assignment/rules/sort_runs.smk): the sequences are sorted by decreasing order of count.
|
|
|
|
|
|
|
|
* input:
|
|
|
|
* results/05_assignment/04_formated/`run`.a.t.u.fasta: cleaned .fasta file
|
|
|
|
output:
|
|
|
|
* results/05_assignment/04_formated/`run`.s.a.t.u.fasta: sorted .fasta file
|
|
|
|
|
|
#### 5.5. Generate species occurrence final tables
|
|
#### 5.5. Generate species occurrence final tables
|
|
|
|
|
|
Generate a table final results
|
|
[table_runs](https://gitlab.mbb.univ-montp2.fr/edna/snakemake_rapidrun_obitools/-/blob/master/05_assignment/rules/table_runs.smk): converts sequence .fasta file to a tabular file that can be open by a spreadsheet program.
|
|
|
|
|
|
|
|
* input:
|
|
|
|
* results/05_assignment/04_formated/`run`.s.a.t.u.fasta: sorted .fasta file
|
|
|
|
* output:
|
|
|
|
* results/06_final_tables/`run`.csv: species occurrence .csv file
|
|
|
|
|
|
|
|
|
|
|
|
|
... | | ... | |